Spark SQL and confused about number of partitions/tasks to do a simple join.

2014-10-29 Thread Darin McBeath
I have a SchemaRDD with 100 records in 1 partition.  We'll call this baseline. I have a SchemaRDD with 11 records in 1 partition.  We'll call this daily. After a fairly basic join of these two tables JavaSchemaRDD results = sqlContext.sql(SELECT id, action, daily.epoch, daily.version FROM

Re: Spark SQL and confused about number of partitions/tasks to do a simple join.

2014-10-29 Thread Darin McBeath
ok. after reading some documentation, it would appear the issue is the default number of partitions for a join (200). After doing something like the following, I was able to change the value. From: Darin McBeath ddmcbe...@yahoo.com.INVALID To: User user@spark.apache.org Sent: Wednesday,

Re: Spark SQL and confused about number of partitions/tasks to do a simple join.

2014-10-29 Thread Darin McBeath
Sorry, hit the send key a bitt too early. Anyway, this is the code I set. sqlContext.sql(set spark.sql.shuffle.partitions=10); From: Darin McBeath ddmcbe...@yahoo.com To: Darin McBeath ddmcbe...@yahoo.com; User user@spark.apache.org Sent: Wednesday, October 29, 2014 2:47 PM Subject: Re: