Re: spark sql left join gives KryoException: Buffer overflow

2014-08-05 Thread Dima Zhiyanov
I am also experiencing this kryo buffer problem. My join is left outer with
under 40mb on the right side. I would expect the broadcast join to succeed
in this case (hive did)
Another problem is that the optimizer 
chose nested loop join for some reason
I would expect broadcast (map side) hash join. 
Am I correct in my expectations?




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/spark-sql-left-join-gives-KryoException-Buffer-overflow-tp10157p11432.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: spark sql left join gives KryoException: Buffer overflow

2014-08-05 Thread Dima Zhiyanov
Yes

Sent from my iPhone

 On Aug 5, 2014, at 7:38 AM, Dima Zhiyanov [via Apache Spark User List] 
 ml-node+s1001560n11432...@n3.nabble.com wrote:
 
 I am also experiencing this kryo buffer problem. My join is left outer with 
 under 40mb on the right side. I would expect the broadcast join to succeed 
 in this case (hive did) 
 Another problem is that the optimizer 
 chose nested loop join for some reason 
 I would expect broadcast (map side) hash join. 
 Am I correct in my expectations? 
 
 
 If you reply to this email, your message will be added to the discussion 
 below:
 http://apache-spark-user-list.1001560.n3.nabble.com/spark-sql-left-join-gives-KryoException-Buffer-overflow-tp10157p11432.html
 To unsubscribe from spark sql left join gives KryoException: Buffer overflow, 
 click here.
 NAML




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/spark-sql-left-join-gives-KryoException-Buffer-overflow-tp10157p11433.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: spark sql left join gives KryoException: Buffer overflow

2014-08-05 Thread Michael Armbrust
For outer joins I'd recommend upgrading to master or waiting for a 1.1
release candidate (which should be out this week).


On Tue, Aug 5, 2014 at 7:38 AM, Dima Zhiyanov dimazhiya...@hotmail.com
wrote:

 I am also experiencing this kryo buffer problem. My join is left outer with
 under 40mb on the right side. I would expect the broadcast join to succeed
 in this case (hive did)
 Another problem is that the optimizer
 chose nested loop join for some reason
 I would expect broadcast (map side) hash join.
 Am I correct in my expectations?




 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/spark-sql-left-join-gives-KryoException-Buffer-overflow-tp10157p11432.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Re: spark sql left join gives KryoException: Buffer overflow

2014-07-21 Thread Pei-Lun Lee
Hi Michael,

Thanks for the suggestion. In my query, both table are too large to use
broadcast join.

When SPARK-2211 is done, will spark sql automatically choose join
algorithms?
Is there some way to manually hint the optimizer?


2014-07-19 5:23 GMT+08:00 Michael Armbrust mich...@databricks.com:

 Unfortunately, this is a query where we just don't have an efficiently
 implementation yet.  You might try switching the table order.

 Here is the JIRA for doing something more efficient:
 https://issues.apache.org/jira/browse/SPARK-2212


 On Fri, Jul 18, 2014 at 7:05 AM, Pei-Lun Lee pl...@appier.com wrote:

 Hi,

 We have a query with left joining and got this error:

 Caused by: org.apache.spark.SparkException: Job aborted due to stage
 failure: Task 1.0:0 failed 4 times, most recent failure: Exception failure
 in TID 5 on host ip-10-33-132-101.us-west-2.compute.internal:
 com.esotericsoftware.kryo.KryoException: Buffer overflow. Available: 0,
 required: 1

 Looks like spark sql tried to do a broadcast join and collecting one of
 the table to master but it is too large.

 How do we explicitly control the join behavior like this?

 --
 Pei-Lun Lee





Re: spark sql left join gives KryoException: Buffer overflow

2014-07-21 Thread Michael Armbrust

 When SPARK-2211 is done, will spark sql automatically choose join
 algorithms?
  Is there some way to manually hint the optimizer?


Ideally we will select the best algorithm for you.  We are also considering
ways to allow the user to hint.


Re: spark sql left join gives KryoException: Buffer overflow

2014-07-18 Thread Michael Armbrust
Unfortunately, this is a query where we just don't have an efficiently
implementation yet.  You might try switching the table order.

Here is the JIRA for doing something more efficient:
https://issues.apache.org/jira/browse/SPARK-2212


On Fri, Jul 18, 2014 at 7:05 AM, Pei-Lun Lee pl...@appier.com wrote:

 Hi,

 We have a query with left joining and got this error:

 Caused by: org.apache.spark.SparkException: Job aborted due to stage
 failure: Task 1.0:0 failed 4 times, most recent failure: Exception failure
 in TID 5 on host ip-10-33-132-101.us-west-2.compute.internal:
 com.esotericsoftware.kryo.KryoException: Buffer overflow. Available: 0,
 required: 1

 Looks like spark sql tried to do a broadcast join and collecting one of
 the table to master but it is too large.

 How do we explicitly control the join behavior like this?

 --
 Pei-Lun Lee