pyspark split pair rdd to multiple
Hi, How can I split pair rdd [K, V] to map [K, Array(V)] efficiently in Pyspark? Best, Patcharee - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
dataframe access hive complex type
Hi, How dataframe (What API) can access hive complex type (Struct, Array, Maps)? Thanks, Patcharee - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
How to use KryoSerializer : ClassNotFoundException
Hi, I am using spark 1.4. I wanted to serialize by KryoSerializer, but got ClassNotFoundException. The configuration and exception is below. When I submitted the job, I also provided --jars mylib.jar which contains WRFVariableZ. conf.set(spark.serializer, org.apache.spark.serializer.KryoSerializer) conf.registerKryoClasses(Array(classOf[WRFVariableZ])) Exception in thread main org.apache.spark.SparkException: Failed to register classes with Kryo at org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:114) Caused by: java.lang.ClassNotFoundException: no.uni.computing.io.WRFVariableZ How can I configure it? BR, Patcharee - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
memory needed for each executor
Hi, How can I know the size of memory needed for each executor (one core) to execute each job? If there are many cores per executors, will the memory be the multiplication (memory needed for each executor (one core) * no. of cores)? Any suggestions/guidelines? BR, Patcharee - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Dataframe Write : Tables created with SQLContext must be TEMPORARY. Use a HiveContext instead.
I got it. Thanks! Patcharee On 13/06/15 23:00, Will Briggs wrote: The context that is created by spark-shell is actually an instance of HiveContext. If you want to use it programmatically in your driver, you need to make sure that your context is a HiveContext, and not a SQLContext. https://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables Hope this helps, Will On June 13, 2015, at 3:36 PM, pth001 patcharee.thong...@uni.no wrote: Hi, I am using spark 0.14. I try to insert data into a hive table (in orc format) from DF. partitionedTestDF.write.format(org.apache.spark.sql.hive.orc.DefaultSource) .mode(org.apache.spark.sql.SaveMode.Append).partitionBy(zone,z,year,month).saveAsTable(testorc) When this job is submitted by spark-submit I get Exception in thread main java.lang.RuntimeException: Tables created with SQLContext must be TEMPORARY. Use a HiveContext instead But the job works fine on spark-shell. What can be wrong? BR, Patcharee - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Dataframe Write : Tables created with SQLContext must be TEMPORARY. Use a HiveContext instead.
Hi, I am using spark 0.14. I try to insert data into a hive table (in orc format) from DF. partitionedTestDF.write.format(org.apache.spark.sql.hive.orc.DefaultSource) .mode(org.apache.spark.sql.SaveMode.Append).partitionBy(zone,z,year,month).saveAsTable(testorc) When this job is submitted by spark-submit I get Exception in thread main java.lang.RuntimeException: Tables created with SQLContext must be TEMPORARY. Use a HiveContext instead But the job works fine on spark-shell. What can be wrong? BR, Patcharee - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org