Hi,
How can I split pair rdd [K, V] to map [K, Array(V)] efficiently in Pyspark?
Best,
Patcharee
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
Hi,
How dataframe (What API) can access hive complex type (Struct, Array, Maps)?
Thanks,
Patcharee
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
Hi,
I am using spark 1.4. I wanted to serialize by KryoSerializer, but got
ClassNotFoundException. The configuration and exception is below. When I
submitted the job, I also provided --jars mylib.jar which contains
WRFVariableZ.
conf.set(spark.serializer,
Hi,
How can I know the size of memory needed for each executor (one core) to
execute each job? If there are many cores per executors, will the memory
be the multiplication (memory needed for each executor (one core) * no.
of cores)?
Any suggestions/guidelines?
BR,
Patcharee
://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables
Hope this helps,
Will
On June 13, 2015, at 3:36 PM, pth001 patcharee.thong...@uni.no wrote:
Hi,
I am using spark 0.14. I try to insert data into a hive table (in orc
format) from DF.
partitionedTestDF.write.format
Hi,
I am using spark 0.14. I try to insert data into a hive table (in orc
format) from DF.
partitionedTestDF.write.format(org.apache.spark.sql.hive.orc.DefaultSource)
.mode(org.apache.spark.sql.SaveMode.Append).partitionBy(zone,z,year,month).saveAsTable(testorc)
When this job is submitted by