I have a SchemaRDD with 100 records in 1 partition. We'll call this baseline.
I have a SchemaRDD with 11 records in 1 partition. We'll call this daily.
After a fairly basic join of these two tables
JavaSchemaRDD results = sqlContext.sql(SELECT id, action, daily.epoch,
daily.version FROM
ok. after reading some documentation, it would appear the issue is the default
number of partitions for a join (200).
After doing something like the following, I was able to change the value.
From: Darin McBeath ddmcbe...@yahoo.com.INVALID
To: User user@spark.apache.org
Sent: Wednesday,
Sorry, hit the send key a bitt too early.
Anyway, this is the code I set.
sqlContext.sql(set spark.sql.shuffle.partitions=10);
From: Darin McBeath ddmcbe...@yahoo.com
To: Darin McBeath ddmcbe...@yahoo.com; User user@spark.apache.org
Sent: Wednesday, October 29, 2014 2:47 PM
Subject: Re: