[jira] [Comment Edited] (SPARK-23427) spark.sql.autoBroadcastJoinThreshold causing OOM exception in the driver

Kazuaki Ishizaki (JIRA) Sun, 18 Feb 2018 16:51:12 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-23427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16368728#comment-16368728
 ]


Kazuaki Ishizaki edited comment on SPARK-23427 at 2/19/18 12:49 AM:
--------------------------------------------------------------------

Thank you. I ran this program several times with 64GB heap size. I saw the 
following OOM in both cases `-1` or default (`10*1024`*1024`). I am running the 
program with other heap sizes.
Is this OOM what you are seeing?  If not, I would appreciate if you could 
upload stack trace when OOM occurred.

{code}
[info] org.apache.spark.sql.MyTest *** ABORTED *** (2 hours, 14 minutes, 36 
seconds)
[info] java.lang.OutOfMemoryError:
[info] at 
java.lang.AbstractStringBuilder.hugeCapacity(AbstractStringBuilder.java:161)
[info] at 
java.lang.AbstractStringBuilder.newCapacity(AbstractStringBuilder.java:155)
[info] at 
java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:125)
[info] at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448)
[info] at java.lang.StringBuilder.append(StringBuilder.java:136)
[info] at java.lang.StringBuilder.append(StringBuilder.java:131)
[info] at scala.StringContext.standardInterpolator(StringContext.scala:125)
[info] at scala.StringContext.s(StringContext.scala:95)
[info] at 
org.apache.spark.sql.execution.QueryExecution.toString(QueryExecution.scala:199)
[info] at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:74)
[info] at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3252)
[info] at org.apache.spark.sql.Dataset.<init>(Dataset.scala:190)
[info] at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:75)
[info] at 
org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withPlan(Dataset.scala:3295)
[info] at 
org.apache.spark.sql.Dataset.createOrReplaceTempView(Dataset.scala:3033)
[info] at 
org.apache.spark.sql.MyTest$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(MyTest.scala:87)
[info] at 
org.apache.spark.sql.catalyst.plans.PlanTestBase$class.withSQLConf(PlanTest.scala:176)
[info] at 
org.apache.spark.sql.MyTest.org$apache$spark$sql$test$SQLTestUtilsBase$$super$withSQLConf(MyTest.scala:27)
[info] at 
org.apache.spark.sql.test.SQLTestUtilsBase$class.withSQLConf(SQLTestUtils.scala:167)
[info] at org.apache.spark.sql.MyTest.withSQLConf(MyTest.scala:27)
[info] at org.apache.spark.sql.MyTest$$anonfun$1.apply$mcV$sp(MyTest.scala:65)
[info] at org.apache.spark.sql.MyTest$$anonfun$1.apply(MyTest.scala:65)
[info] at org.apache.spark.sql.MyTest$$anonfun$1.apply(MyTest.scala:65)
...
{code}



was (Author: kiszk):
Thank you. I ran this program several times with 64GB heap size. I saw the 
following OOM in both cases `-1` or default (`10*1024`*1024`). I am running the 
program with other heap sizes.
Is this OOM what you are seeing?  If not, I would appreciate if you could 
upload stack trace when OOM occurred.

{code:java}
[info] org.apache.spark.sql.MyTest *** ABORTED *** (2 hours, 14 minutes, 36 
seconds)
[info] java.lang.OutOfMemoryError:
[info] at 
java.lang.AbstractStringBuilder.hugeCapacity(AbstractStringBuilder.java:161)
[info] at 
java.lang.AbstractStringBuilder.newCapacity(AbstractStringBuilder.java:155)
[info] at 
java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:125)
[info] at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448)
[info] at java.lang.StringBuilder.append(StringBuilder.java:136)
[info] at java.lang.StringBuilder.append(StringBuilder.java:131)
[info] at scala.StringContext.standardInterpolator(StringContext.scala:125)
[info] at scala.StringContext.s(StringContext.scala:95)
[info] at 
org.apache.spark.sql.execution.QueryExecution.toString(QueryExecution.scala:199)
[info] at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:74)
[info] at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3252)
[info] at org.apache.spark.sql.Dataset.<init>(Dataset.scala:190)
[info] at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:75)
[info] at 
org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withPlan(Dataset.scala:3295)
[info] at 
org.apache.spark.sql.Dataset.createOrReplaceTempView(Dataset.scala:3033)
[info] at 
org.apache.spark.sql.MyTest$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(MyTest.scala:87)
[info] at 
org.apache.spark.sql.catalyst.plans.PlanTestBase$class.withSQLConf(PlanTest.scala:176)
[info] at 
org.apache.spark.sql.MyTest.org$apache$spark$sql$test$SQLTestUtilsBase$$super$withSQLConf(MyTest.scala:27)
[info] at 
org.apache.spark.sql.test.SQLTestUtilsBase$class.withSQLConf(SQLTestUtils.scala:167)
[info] at org.apache.spark.sql.MyTest.withSQLConf(MyTest.scala:27)
[info] at org.apache.spark.sql.MyTest$$anonfun$1.apply$mcV$sp(MyTest.scala:65)
[info] at org.apache.spark.sql.MyTest$$anonfun$1.apply(MyTest.scala:65)
[info] at org.apache.spark.sql.MyTest$$anonfun$1.apply(MyTest.scala:65)
...
{code:java}


> spark.sql.autoBroadcastJoinThreshold causing OOM exception in the driver 
> -------------------------------------------------------------------------
>
>                 Key: SPARK-23427
>                 URL: https://issues.apache.org/jira/browse/SPARK-23427
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0
>         Environment: SPARK 2.0 version
>            Reporter: Dhiraj
>            Priority: Critical
>
> We are facing issue around value of spark.sql.autoBroadcastJoinThreshold.
> With spark.sql.autoBroadcastJoinThreshold -1 ( disable) we seeing driver 
> memory used flat.
> With any other values 10MB, 5MB, 2 MB, 1MB, 10K, 1K we see driver memory used 
> goes up with rate depending upon the size of the autoBroadcastThreshold and 
> getting OOM exception. The problem is memory used by autoBroadcast is not 
> being free up in the driver.
> Application imports oracle tables as master dataframes which are persisted. 
> Each job applies filter to these tables and then registered them as 
> tempViewTable . Then sql query are using to process data further. At the end 
> all the intermediate dataFrame are unpersisted.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-23427) spark.sql.autoBroadcastJoinThreshold causing OOM exception in the driver

Reply via email to