Re: hiveContext.sql NullPointerException

2015-06-11 Thread patcharee
Hi, Does df.write.partitionBy(partitions).format(format).mode(overwrite).saveAsTable(tbl) support orc file? I tried df.write.partitionBy(zone, z, year, month).format(orc).mode(overwrite).saveAsTable(tbl), but after the insert my table tbl schema has been changed to something I did not

Re: hiveContext.sql NullPointerException

2015-06-08 Thread patcharee
Hi, Thanks for your guidelines. I will try it out. Btw how do you know HiveContext.sql (and also DataFrame.registerTempTable) is only expected to be invoked on driver side? Where can I find document? BR, Patcharee On 07. juni 2015 16:40, Cheng Lian wrote: Spark SQL supports Hive dynamic

Re: hiveContext.sql NullPointerException

2015-06-08 Thread Cheng Lian
On 6/8/15 4:02 PM, patcharee wrote: Hi, Thanks for your guidelines. I will try it out. Btw how do you know HiveContext.sql (and also DataFrame.registerTempTable) is only expected to be invoked on driver side? Where can I find document? I'm afraid we don't state this explicitly on the SQL

Re: hiveContext.sql NullPointerException

2015-06-07 Thread patcharee
Hi, How can I expect to work on HiveContext on the executor? If only the driver can see HiveContext, does it mean I have to collect all datasets (very large) to the driver and use HiveContext there? It will be memory overload on the driver and fail. BR, Patcharee On 07. juni 2015 11:51,

Re: hiveContext.sql NullPointerException

2015-06-07 Thread Cheng Lian
Spark SQL supports Hive dynamic partitioning, so one possible workaround is to create a Hive table partitioned by zone, z, year, and month dynamically, and then insert the whole dataset into it directly. In 1.4, we also provides dynamic partitioning support for non-Hive environment, and you

Re: hiveContext.sql NullPointerException

2015-06-07 Thread Cheng Lian
Hi, This is expected behavior. HiveContext.sql (and also DataFrame.registerTempTable) is only expected to be invoked on driver side. However, the closure passed to RDD.foreach is executed on executor side, where no viable HiveContext instance exists. Cheng On 6/7/15 10:06 AM, patcharee

hiveContext.sql NullPointerException

2015-06-06 Thread patcharee
Hi, I try to insert data into a partitioned hive table. The groupByKey is to combine dataset into a partition of the hive table. After the groupByKey, I converted the iterable[X] to DB by X.toList.toDF(). But the hiveContext.sql throws NullPointerException, see below. Any suggestions? What