Unfortunately, this is a query where we just don't have an efficiently implementation yet. You might try switching the table order.
Here is the JIRA for doing something more efficient: https://issues.apache.org/jira/browse/SPARK-2212 On Fri, Jul 18, 2014 at 7:05 AM, Pei-Lun Lee <pl...@appier.com> wrote: > Hi, > > We have a query with left joining and got this error: > > Caused by: org.apache.spark.SparkException: Job aborted due to stage > failure: Task 1.0:0 failed 4 times, most recent failure: Exception failure > in TID 5 on host ip-10-33-132-101.us-west-2.compute.internal: > com.esotericsoftware.kryo.KryoException: Buffer overflow. Available: 0, > required: 1 > > Looks like spark sql tried to do a broadcast join and collecting one of > the table to master but it is too large. > > How do we explicitly control the join behavior like this? > > -- > Pei-Lun Lee > >