Hi,
Does
df.write.partitionBy(partitions).format(format).mode(overwrite).saveAsTable(tbl)
support orc file?
I tried df.write.partitionBy(zone, z, year,
month).format(orc).mode(overwrite).saveAsTable(tbl), but after
the insert my table tbl schema has been changed to something I did not
Hi,
Thanks for your guidelines. I will try it out.
Btw how do you know HiveContext.sql (and also
DataFrame.registerTempTable) is only expected to be invoked on driver
side? Where can I find document?
BR,
Patcharee
On 07. juni 2015 16:40, Cheng Lian wrote:
Spark SQL supports Hive dynamic
On 6/8/15 4:02 PM, patcharee wrote:
Hi,
Thanks for your guidelines. I will try it out.
Btw how do you know HiveContext.sql (and also
DataFrame.registerTempTable) is only expected to be invoked on driver
side? Where can I find document?
I'm afraid we don't state this explicitly on the SQL
Hi,
How can I expect to work on HiveContext on the executor? If only the
driver can see HiveContext, does it mean I have to collect all datasets
(very large) to the driver and use HiveContext there? It will be memory
overload on the driver and fail.
BR,
Patcharee
On 07. juni 2015 11:51,
Spark SQL supports Hive dynamic partitioning, so one possible workaround
is to create a Hive table partitioned by zone, z, year, and month
dynamically, and then insert the whole dataset into it directly.
In 1.4, we also provides dynamic partitioning support for non-Hive
environment, and you
Hi,
This is expected behavior. HiveContext.sql (and also
DataFrame.registerTempTable) is only expected to be invoked on driver
side. However, the closure passed to RDD.foreach is executed on executor
side, where no viable HiveContext instance exists.
Cheng
On 6/7/15 10:06 AM, patcharee
Hi,
I try to insert data into a partitioned hive table. The groupByKey is to
combine dataset into a partition of the hive table. After the
groupByKey, I converted the iterable[X] to DB by X.toList.toDF(). But
the hiveContext.sql throws NullPointerException, see below. Any
suggestions? What