Re: how long does it take executing ./sbt/sbt assembly

2014-09-23 Thread Zhan Zhang
Definitely something wrong. For me, 10 to 30 minutes. Thanks. Zhan Zhang On Sep 23, 2014, at 10:02 PM, christy 760948...@qq.com wrote: This process began yesterday and it has already run for more than 20 hours. Is it normal? Any one has the same problem? No error throw out yet

Re: spark RDD join Error

2014-09-04 Thread Zhan Zhang
Try this: Import org.apache.spark.SparkContext._ Thanks. Zhan Zhang On Sep 4, 2014, at 4:36 PM, Veeranagouda Mukkanagoudar veera...@gmail.com wrote: I am planning to use RDD join operation, to test out i was trying to compile some test code, but am getting following compilation error

Re: Running Wordcount on large file stucks and throws OOM exception

2014-09-03 Thread Zhan Zhang
://sandbox.hortonworks.com:8020/tmp/wordcount) Thanks. Zhan Zhang On Aug 26, 2014, at 12:35 AM, motte1988 wir12...@studserv.uni-leipzig.de wrote: Hello, it's me again. Now I've got an explanation for the behaviour. It seems that the driver memory is not large enough to hold the whole result set

Re: Configuration for big worker nodes

2014-08-22 Thread Zhan Zhang
I think it depends on your job. My personal experiences when I run TB data. spark got loss connection failure if I use big JVM with large memory, but with more executors with small memory, it can run very smoothly. I was running spark on yarn. Thanks. Zhan Zhang On Aug 21, 2014, at 3:42 PM

Re: Web UI doesn't show some stages

2014-08-20 Thread Zhan Zhang
the reduceByKey because it is not cached. I agree with you it is very confusing. Thanks. Zhan Zhang The f On Aug 20, 2014, at 2:28 PM, Patrick Wendell pwend...@gmail.com wrote: The reason is that some operators get pipelined into a single stage. rdd.map(XX).filter(YY) - this executes in a single

Re: NullPointerException when connecting from Spark to a Hive table backed by HBase

2014-08-18 Thread Zhan Zhang
String HBASE_TABLE_NAME = hbase.table.name”; Thanks. Zhan Zhang On Aug 17, 2014, at 11:39 PM, Cesar Arevalo ce...@zephyrhealthinc.com wrote: HadoopRDD -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain

Re: Bug or feature? Overwrite broadcasted variables.

2014-08-18 Thread Zhan Zhang
. Zhan Zhang On Aug 18, 2014, at 11:26 AM, Peng Cheng pc...@uow.edu.au wrote: I'm curious to see that if you declare broadcasted wrapper as a var, and overwrite it in the driver program, the modification can have stable impact on all transformations/actions defined BEFORE the overwrite

Re: Support for ORC Table in Shark/Spark

2014-08-14 Thread Zhan Zhang
I tried with simple spark-hive select and insert, and it works. But to directly manipulate the ORCFile through RDD, spark has to be upgraded to support hive-0.13 first. Because some ORC API is not exposed until Hive-0.12. Thanks. Zhan Zhang On Aug 11, 2014, at 10:23 PM, vinay.kash

Re: Support for ORC Table in Shark/Spark

2014-08-14 Thread Zhan Zhang
Yes. You are right, but I tried old hadoopFile for OrcInputFormat. In hive12, OrcStruct is not exposing its api, so spark cannot access it. With Hive13, RDD can read from OrcFile. Btw, I didn’t see ORCNewOutputFormat in hive-0.13. Direct RDD manipulation (Hive13) val inputRead =

Re: Support for ORC Table in Shark/Spark

2014-08-14 Thread Zhan Zhang
I agree. We need the support similar to parquet file for end user. That’s the purpose of Spark-2883. Thanks. Zhan Zhang On Aug 14, 2014, at 11:42 AM, Yin Huai huaiyin@gmail.com wrote: I feel that using hadoopFile and saveAsHadoopFile to read and write ORCFile are more towards

<    1   2