pyspark split pair rdd to multiple

2016-04-19 Thread pth001

Hi,

How can I split pair rdd [K, V] to map [K, Array(V)] efficiently in Pyspark?

Best,
Patcharee

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



dataframe access hive complex type

2016-01-19 Thread pth001

Hi,

How dataframe (What API) can access hive complex type (Struct, Array, Maps)?

Thanks,
Patcharee

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



How to use KryoSerializer : ClassNotFoundException

2015-06-24 Thread pth001

Hi,

I am using spark 1.4. I wanted to serialize by KryoSerializer, but got 
ClassNotFoundException. The configuration and exception is below. When I 
submitted the job, I also provided --jars mylib.jar which contains 
WRFVariableZ.


conf.set(spark.serializer, org.apache.spark.serializer.KryoSerializer)
conf.registerKryoClasses(Array(classOf[WRFVariableZ]))

Exception in thread main org.apache.spark.SparkException: Failed to 
register classes with Kryo
at 
org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:114)
Caused by: java.lang.ClassNotFoundException: 
no.uni.computing.io.WRFVariableZ


How can I configure it?

BR,
Patcharee

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



memory needed for each executor

2015-06-21 Thread pth001

Hi,

How can I know the size of memory needed for each executor (one core) to 
execute each job? If there are many cores per executors, will the memory 
be the multiplication (memory needed for each executor (one core) * no. 
of cores)?


Any suggestions/guidelines?

BR,
Patcharee

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Dataframe Write : Tables created with SQLContext must be TEMPORARY. Use a HiveContext instead.

2015-06-13 Thread pth001

I got it. Thanks!
Patcharee

On 13/06/15 23:00, Will Briggs wrote:

The context that is created by spark-shell is actually an instance of 
HiveContext. If you want to use it programmatically in your driver, you need to 
make sure that your context is a HiveContext, and not a SQLContext.

https://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables

Hope this helps,
Will

On June 13, 2015, at 3:36 PM, pth001 patcharee.thong...@uni.no wrote:

Hi,

I am using spark 0.14. I try to insert data into a hive table (in orc
format) from DF.

partitionedTestDF.write.format(org.apache.spark.sql.hive.orc.DefaultSource)
.mode(org.apache.spark.sql.SaveMode.Append).partitionBy(zone,z,year,month).saveAsTable(testorc)

When this job is submitted by spark-submit I get 
Exception in thread main java.lang.RuntimeException: Tables created
with SQLContext must be TEMPORARY. Use a HiveContext instead

But the job works fine on spark-shell. What can be wrong?

BR,
Patcharee

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org




-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Dataframe Write : Tables created with SQLContext must be TEMPORARY. Use a HiveContext instead.

2015-06-13 Thread pth001

Hi,

I am using spark 0.14. I try to insert data into a hive table (in orc 
format) from DF.


partitionedTestDF.write.format(org.apache.spark.sql.hive.orc.DefaultSource)
.mode(org.apache.spark.sql.SaveMode.Append).partitionBy(zone,z,year,month).saveAsTable(testorc)

When this job is submitted by spark-submit I get 
Exception in thread main java.lang.RuntimeException: Tables created 
with SQLContext must be TEMPORARY. Use a HiveContext instead


But the job works fine on spark-shell. What can be wrong?

BR,
Patcharee

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org