Hi

I have created an issue for this
https://issues.apache.org/jira/browse/SPARK-12358

Regards
Deenar

On 16 December 2015 at 06:21, Deenar Toraskar <deenar.toras...@gmail.com>
wrote:

>
>
> On 16 December 2015 at 06:19, Deenar Toraskar <
> deenar.toras...@thinkreactive.co.uk> wrote:
>
>> Hi
>>
>> I had the same problem. There is a query with a lot of small tables (5x)
>> all below the broadcast threshold and Spark is broadcasting all these
>> tables together without checking if there is sufficient memory available.
>>
>> I got around this issue by reducing the
>> *spark.sql.autoBroadcastJoinThreshold* to stop broadcasting the bigger
>> tables in the query.
>>
>> This looks like a issue to me. A fix would be to
>> a) ensure that in addition to the per table threshold, there is a total
>> broadcast size per query, so only data upto that limit is broadcast
>> preventing executors running out of memory.
>>
>> Shall I raise a JIRA for this?
>>
>> Regards
>> Deenar
>>
>>
>> On 4 November 2015 at 22:55, Shuai Zheng <szheng.c...@gmail.com> wrote:
>>
>>> And an update is: this ONLY happen in Spark 1.5, I try to run it under
>>> Spark 1.4 and 1.4.1, there are no issue (the program is developed under
>>> Spark 1.4 last time, and I just re-test it, it works). So this is proven
>>> that there is no issue on the logic and data, it is caused by the new
>>> version of Spark.
>>>
>>>
>>>
>>> So I want to know any new setup I should set in Spark 1.5 to make it
>>> work?
>>>
>>>
>>>
>>> Regards,
>>>
>>>
>>>
>>> Shuai
>>>
>>>
>>>
>>> *From:* Shuai Zheng [mailto:szheng.c...@gmail.com]
>>> *Sent:* Wednesday, November 04, 2015 3:22 PM
>>> *To:* user@spark.apache.org
>>> *Subject:* [Spark 1.5]: Exception in thread "broadcast-hash-join-2"
>>> java.lang.OutOfMemoryError: Java heap space
>>>
>>>
>>>
>>> Hi All,
>>>
>>>
>>>
>>> I have a program which actually run a bit complex business (join) in
>>> spark. And I have below exception:
>>>
>>>
>>>
>>> I running on Spark 1.5, and with parameter:
>>>
>>>
>>>
>>> spark-submit --deploy-mode client --executor-cores=24 --driver-memory=2G
>>> --executor-memory=45G –class …
>>>
>>>
>>>
>>> Some other setup:
>>>
>>>
>>>
>>> sparkConf.set("spark.serializer",
>>> "org.apache.spark.serializer.KryoSerializer").set("spark.kryoserializer.buffer.max",
>>> "2047m");
>>>
>>> sparkConf.set("spark.executor.extraJavaOptions", "-XX:+PrintGCDetails
>>> -XX:+PrintGCTimeStamps").set("spark.sql.autoBroadcastJoinThreshold",
>>> "104857600");
>>>
>>>
>>>
>>> This is running on AWS c3*8xlarge instance. I am not sure what kind of
>>> parameter I should set if I have below OutOfMemoryError exception.
>>>
>>>
>>>
>>> #
>>>
>>> # java.lang.OutOfMemoryError: Java heap space
>>>
>>> # -XX:OnOutOfMemoryError="kill -9 %p"
>>>
>>> #   Executing /bin/sh -c "kill -9 10181"...
>>>
>>> Exception in thread "broadcast-hash-join-2" java.lang.OutOfMemoryError:
>>> Java heap space
>>>
>>>         at
>>> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
>>> Source)
>>>
>>>         at
>>> org.apache.spark.sql.execution.joins.UnsafeHashedRelation$.apply(HashedRelation.scala:380)
>>>
>>>         at
>>> org.apache.spark.sql.execution.joins.HashedRelation$.apply(HashedRelation.scala:123)
>>>
>>>         at
>>> org.apache.spark.sql.execution.joins.BroadcastHashOuterJoin$$anonfun$broadcastFuture$1$$anonfun$apply$1.apply(BroadcastHashOuterJoin.scala:95)
>>>
>>>         at
>>> org.apache.spark.sql.execution.joins.BroadcastHashOuterJoin$$anonfun$broadcastFuture$1$$anonfun$apply$1.apply(BroadcastHashOuterJoin.scala:85)
>>>
>>>         at
>>> org.apache.spark.sql.execution.SQLExecution$.withExecutionId(SQLExecution.scala:100)
>>>
>>>         at
>>> org.apache.spark.sql.execution.joins.BroadcastHashOuterJoin$$anonfun$broadcastFuture$1.apply(BroadcastHashOuterJoin.scala:85)
>>>
>>>         at
>>> org.apache.spark.sql.execution.joins.BroadcastHashOuterJoin$$anonfun$broadcastFuture$1.apply(BroadcastHashOuterJoin.scala:85)
>>>
>>>         at
>>> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
>>>
>>>         at
>>> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
>>>
>>>         at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>
>>>         at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>
>>>         at java.lang.Thread.run(Thread.java:745)
>>>
>>>
>>>
>>> Any hint will be very helpful.
>>>
>>>
>>>
>>> Regards,
>>>
>>>
>>>
>>> Shuai
>>>
>>
>>
>

Reply via email to