Re: Spark SQL and Skewed Joins

2015-06-16 Thread Jon Walton
On Fri, Jun 12, 2015 at 9:43 PM, Michael Armbrust mich...@databricks.com wrote: 2. Does 1.3.2 or 1.4 have any enhancements that can help? I tried to use 1.3.1 but SPARK-6967 prohibits me from doing so.Now that 1.4 is available, would any of the JOIN enhancements help this situation? I

Spark SQL and Skewed Joins

2015-06-12 Thread Jon Walton
Greetings, I am trying to implement a classic star schema ETL pipeline using Spark SQL, 1.2.1. I am running into problems with shuffle joins, for those dimension tables which have skewed keys and are too large to let Spark broadcast them. I have a few questions 1. Can I split my queries so a