Hi I have created an issue for this https://issues.apache.org/jira/browse/SPARK-12358
Regards Deenar On 16 December 2015 at 06:21, Deenar Toraskar <deenar.toras...@gmail.com> wrote: > > > On 16 December 2015 at 06:19, Deenar Toraskar < > deenar.toras...@thinkreactive.co.uk> wrote: > >> Hi >> >> I had the same problem. There is a query with a lot of small tables (5x) >> all below the broadcast threshold and Spark is broadcasting all these >> tables together without checking if there is sufficient memory available. >> >> I got around this issue by reducing the >> *spark.sql.autoBroadcastJoinThreshold* to stop broadcasting the bigger >> tables in the query. >> >> This looks like a issue to me. A fix would be to >> a) ensure that in addition to the per table threshold, there is a total >> broadcast size per query, so only data upto that limit is broadcast >> preventing executors running out of memory. >> >> Shall I raise a JIRA for this? >> >> Regards >> Deenar >> >> >> On 4 November 2015 at 22:55, Shuai Zheng <szheng.c...@gmail.com> wrote: >> >>> And an update is: this ONLY happen in Spark 1.5, I try to run it under >>> Spark 1.4 and 1.4.1, there are no issue (the program is developed under >>> Spark 1.4 last time, and I just re-test it, it works). So this is proven >>> that there is no issue on the logic and data, it is caused by the new >>> version of Spark. >>> >>> >>> >>> So I want to know any new setup I should set in Spark 1.5 to make it >>> work? >>> >>> >>> >>> Regards, >>> >>> >>> >>> Shuai >>> >>> >>> >>> *From:* Shuai Zheng [mailto:szheng.c...@gmail.com] >>> *Sent:* Wednesday, November 04, 2015 3:22 PM >>> *To:* user@spark.apache.org >>> *Subject:* [Spark 1.5]: Exception in thread "broadcast-hash-join-2" >>> java.lang.OutOfMemoryError: Java heap space >>> >>> >>> >>> Hi All, >>> >>> >>> >>> I have a program which actually run a bit complex business (join) in >>> spark. And I have below exception: >>> >>> >>> >>> I running on Spark 1.5, and with parameter: >>> >>> >>> >>> spark-submit --deploy-mode client --executor-cores=24 --driver-memory=2G >>> --executor-memory=45G –class … >>> >>> >>> >>> Some other setup: >>> >>> >>> >>> sparkConf.set("spark.serializer", >>> "org.apache.spark.serializer.KryoSerializer").set("spark.kryoserializer.buffer.max", >>> "2047m"); >>> >>> sparkConf.set("spark.executor.extraJavaOptions", "-XX:+PrintGCDetails >>> -XX:+PrintGCTimeStamps").set("spark.sql.autoBroadcastJoinThreshold", >>> "104857600"); >>> >>> >>> >>> This is running on AWS c3*8xlarge instance. I am not sure what kind of >>> parameter I should set if I have below OutOfMemoryError exception. >>> >>> >>> >>> # >>> >>> # java.lang.OutOfMemoryError: Java heap space >>> >>> # -XX:OnOutOfMemoryError="kill -9 %p" >>> >>> # Executing /bin/sh -c "kill -9 10181"... >>> >>> Exception in thread "broadcast-hash-join-2" java.lang.OutOfMemoryError: >>> Java heap space >>> >>> at >>> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown >>> Source) >>> >>> at >>> org.apache.spark.sql.execution.joins.UnsafeHashedRelation$.apply(HashedRelation.scala:380) >>> >>> at >>> org.apache.spark.sql.execution.joins.HashedRelation$.apply(HashedRelation.scala:123) >>> >>> at >>> org.apache.spark.sql.execution.joins.BroadcastHashOuterJoin$$anonfun$broadcastFuture$1$$anonfun$apply$1.apply(BroadcastHashOuterJoin.scala:95) >>> >>> at >>> org.apache.spark.sql.execution.joins.BroadcastHashOuterJoin$$anonfun$broadcastFuture$1$$anonfun$apply$1.apply(BroadcastHashOuterJoin.scala:85) >>> >>> at >>> org.apache.spark.sql.execution.SQLExecution$.withExecutionId(SQLExecution.scala:100) >>> >>> at >>> org.apache.spark.sql.execution.joins.BroadcastHashOuterJoin$$anonfun$broadcastFuture$1.apply(BroadcastHashOuterJoin.scala:85) >>> >>> at >>> org.apache.spark.sql.execution.joins.BroadcastHashOuterJoin$$anonfun$broadcastFuture$1.apply(BroadcastHashOuterJoin.scala:85) >>> >>> at >>> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) >>> >>> at >>> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) >>> >>> at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>> >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>> >>> at java.lang.Thread.run(Thread.java:745) >>> >>> >>> >>> Any hint will be very helpful. >>> >>> >>> >>> Regards, >>> >>> >>> >>> Shuai >>> >> >> >