[jira] [Commented] (SPARK-23427) spark.sql.autoBroadcastJoinThreshold causing OOM exception in the driver

2019-02-15 Thread t oo (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16769738#comment-16769738
 ] 

t oo commented on SPARK-23427:
--

gentle ping

> spark.sql.autoBroadcastJoinThreshold causing OOM exception in the driver 
> -
>
> Key: SPARK-23427
> URL: https://issues.apache.org/jira/browse/SPARK-23427
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: SPARK 2.0 version
>Reporter: Dhiraj
>Priority: Critical
>
> We are facing issue around value of spark.sql.autoBroadcastJoinThreshold.
> With spark.sql.autoBroadcastJoinThreshold -1 ( disable) we seeing driver 
> memory used flat.
> With any other values 10MB, 5MB, 2 MB, 1MB, 10K, 1K we see driver memory used 
> goes up with rate depending upon the size of the autoBroadcastThreshold and 
> getting OOM exception. The problem is memory used by autoBroadcast is not 
> being free up in the driver.
> Application imports oracle tables as master dataframes which are persisted. 
> Each job applies filter to these tables and then registered them as 
> tempViewTable . Then sql query are using to process data further. At the end 
> all the intermediate dataFrame are unpersisted.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23427) spark.sql.autoBroadcastJoinThreshold causing OOM exception in the driver

2018-06-19 Thread Dean Wampler (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16517326#comment-16517326
 ] 

Dean Wampler commented on SPARK-23427:
--

Hi, Kazuaki. Any update on this issue? Any pointers on what you discovered? 
Thanks.

> spark.sql.autoBroadcastJoinThreshold causing OOM exception in the driver 
> -
>
> Key: SPARK-23427
> URL: https://issues.apache.org/jira/browse/SPARK-23427
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: SPARK 2.0 version
>Reporter: Dhiraj
>Priority: Critical
>
> We are facing issue around value of spark.sql.autoBroadcastJoinThreshold.
> With spark.sql.autoBroadcastJoinThreshold -1 ( disable) we seeing driver 
> memory used flat.
> With any other values 10MB, 5MB, 2 MB, 1MB, 10K, 1K we see driver memory used 
> goes up with rate depending upon the size of the autoBroadcastThreshold and 
> getting OOM exception. The problem is memory used by autoBroadcast is not 
> being free up in the driver.
> Application imports oracle tables as master dataframes which are persisted. 
> Each job applies filter to these tables and then registered them as 
> tempViewTable . Then sql query are using to process data further. At the end 
> all the intermediate dataFrame are unpersisted.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23427) spark.sql.autoBroadcastJoinThreshold causing OOM exception in the driver

2018-03-01 Thread Kazuaki Ishizaki (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16381815#comment-16381815
 ] 

Kazuaki Ishizaki commented on SPARK-23427:
--

I am working for this (now trying to collect heap profiling), but I am working 
for other stuffs, too. We cannot promise to prepare a solution, but do the best.
I do not have any workaround regarding this.

> spark.sql.autoBroadcastJoinThreshold causing OOM exception in the driver 
> -
>
> Key: SPARK-23427
> URL: https://issues.apache.org/jira/browse/SPARK-23427
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: SPARK 2.0 version
>Reporter: Dhiraj
>Priority: Critical
>
> We are facing issue around value of spark.sql.autoBroadcastJoinThreshold.
> With spark.sql.autoBroadcastJoinThreshold -1 ( disable) we seeing driver 
> memory used flat.
> With any other values 10MB, 5MB, 2 MB, 1MB, 10K, 1K we see driver memory used 
> goes up with rate depending upon the size of the autoBroadcastThreshold and 
> getting OOM exception. The problem is memory used by autoBroadcast is not 
> being free up in the driver.
> Application imports oracle tables as master dataframes which are persisted. 
> Each job applies filter to these tables and then registered them as 
> tempViewTable . Then sql query are using to process data further. At the end 
> all the intermediate dataFrame are unpersisted.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23427) spark.sql.autoBroadcastJoinThreshold causing OOM exception in the driver

2018-02-28 Thread Pratik Dhumal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16381251#comment-16381251
 ] 

Pratik Dhumal commented on SPARK-23427:
---

Hello,
For the purpose of development plan,
 # Can we expect solution for this in near future?
 # Do you have any suggestion/patch or workaround to deal with issue?

I appreciate your help in this regard, thanks for your time.

 

 

> spark.sql.autoBroadcastJoinThreshold causing OOM exception in the driver 
> -
>
> Key: SPARK-23427
> URL: https://issues.apache.org/jira/browse/SPARK-23427
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: SPARK 2.0 version
>Reporter: Dhiraj
>Priority: Critical
>
> We are facing issue around value of spark.sql.autoBroadcastJoinThreshold.
> With spark.sql.autoBroadcastJoinThreshold -1 ( disable) we seeing driver 
> memory used flat.
> With any other values 10MB, 5MB, 2 MB, 1MB, 10K, 1K we see driver memory used 
> goes up with rate depending upon the size of the autoBroadcastThreshold and 
> getting OOM exception. The problem is memory used by autoBroadcast is not 
> being free up in the driver.
> Application imports oracle tables as master dataframes which are persisted. 
> Each job applies filter to these tables and then registered them as 
> tempViewTable . Then sql query are using to process data further. At the end 
> all the intermediate dataFrame are unpersisted.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23427) spark.sql.autoBroadcastJoinThreshold causing OOM exception in the driver

2018-02-21 Thread Pratik Dhumal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16372000#comment-16372000
 ] 

Pratik Dhumal commented on SPARK-23427:
---

Sorry for very late reply.

I am facing that issue when Autobroadcast value is not -1.

Somehow, I couldn't reproduce same for Autobroadcast = -1, one thing I have 
noticed is, for me it goes through more than double the iteration when 
autobroadcast is -1. But, At certain point it does not iterate, and get stuck 
(with no errors and info message as *ContextCleaner: Cleaned accumulator* 
)

Also,

This is the *stack trace* I'm getting. 
{code:java}
// code placeholder
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3332)
at 
java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448)
at java.lang.StringBuilder.append(StringBuilder.java:136)
at java.lang.StringBuilder.append(StringBuilder.java:131)
at scala.StringContext.standardInterpolator(StringContext.scala:125)
at scala.StringContext.s(StringContext.scala:95)
at 
org.apache.spark.sql.execution.QueryExecution.toString(QueryExecution.scala:220)
at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:54)
at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2546)
at 
org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2192)
at 
org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collect(Dataset.scala:2199)
at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2227)
at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2226)
at org.apache.spark.sql.Dataset.withCallback(Dataset.scala:2559)
at org.apache.spark.sql.Dataset.count(Dataset.scala:2226).
{code}
 

Hope this helps.

Thank you.

 

> spark.sql.autoBroadcastJoinThreshold causing OOM exception in the driver 
> -
>
> Key: SPARK-23427
> URL: https://issues.apache.org/jira/browse/SPARK-23427
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: SPARK 2.0 version
>Reporter: Dhiraj
>Priority: Critical
>
> We are facing issue around value of spark.sql.autoBroadcastJoinThreshold.
> With spark.sql.autoBroadcastJoinThreshold -1 ( disable) we seeing driver 
> memory used flat.
> With any other values 10MB, 5MB, 2 MB, 1MB, 10K, 1K we see driver memory used 
> goes up with rate depending upon the size of the autoBroadcastThreshold and 
> getting OOM exception. The problem is memory used by autoBroadcast is not 
> being free up in the driver.
> Application imports oracle tables as master dataframes which are persisted. 
> Each job applies filter to these tables and then registered them as 
> tempViewTable . Then sql query are using to process data further. At the end 
> all the intermediate dataFrame are unpersisted.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23427) spark.sql.autoBroadcastJoinThreshold causing OOM exception in the driver

2018-02-18 Thread Kazuaki Ishizaki (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16368823#comment-16368823
 ] 

Kazuaki Ishizaki commented on SPARK-23427:
--

I got the OOM with the same stack trace for both configurations when I ran this 
program using 256gb heap.

> spark.sql.autoBroadcastJoinThreshold causing OOM exception in the driver 
> -
>
> Key: SPARK-23427
> URL: https://issues.apache.org/jira/browse/SPARK-23427
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: SPARK 2.0 version
>Reporter: Dhiraj
>Priority: Critical
>
> We are facing issue around value of spark.sql.autoBroadcastJoinThreshold.
> With spark.sql.autoBroadcastJoinThreshold -1 ( disable) we seeing driver 
> memory used flat.
> With any other values 10MB, 5MB, 2 MB, 1MB, 10K, 1K we see driver memory used 
> goes up with rate depending upon the size of the autoBroadcastThreshold and 
> getting OOM exception. The problem is memory used by autoBroadcast is not 
> being free up in the driver.
> Application imports oracle tables as master dataframes which are persisted. 
> Each job applies filter to these tables and then registered them as 
> tempViewTable . Then sql query are using to process data further. At the end 
> all the intermediate dataFrame are unpersisted.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23427) spark.sql.autoBroadcastJoinThreshold causing OOM exception in the driver

2018-02-18 Thread Kazuaki Ishizaki (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16368728#comment-16368728
 ] 

Kazuaki Ishizaki commented on SPARK-23427:
--

Thank you. I ran this program several times with 64GB heap size. I saw the 
following OOM in both cases `-1` or default (`10*1024`*1024`). I am running the 
program with other heap sizes.
Is this OOM what you are seeing?  If not, I would appreciate if you could 
upload stack trace when OOM occurred.

{code:java}
[info] org.apache.spark.sql.MyTest *** ABORTED *** (2 hours, 14 minutes, 36 
seconds)
[info] java.lang.OutOfMemoryError:
[info] at 
java.lang.AbstractStringBuilder.hugeCapacity(AbstractStringBuilder.java:161)
[info] at 
java.lang.AbstractStringBuilder.newCapacity(AbstractStringBuilder.java:155)
[info] at 
java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:125)
[info] at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448)
[info] at java.lang.StringBuilder.append(StringBuilder.java:136)
[info] at java.lang.StringBuilder.append(StringBuilder.java:131)
[info] at scala.StringContext.standardInterpolator(StringContext.scala:125)
[info] at scala.StringContext.s(StringContext.scala:95)
[info] at 
org.apache.spark.sql.execution.QueryExecution.toString(QueryExecution.scala:199)
[info] at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:74)
[info] at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3252)
[info] at org.apache.spark.sql.Dataset.(Dataset.scala:190)
[info] at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:75)
[info] at 
org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withPlan(Dataset.scala:3295)
[info] at 
org.apache.spark.sql.Dataset.createOrReplaceTempView(Dataset.scala:3033)
[info] at 
org.apache.spark.sql.MyTest$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(MyTest.scala:87)
[info] at 
org.apache.spark.sql.catalyst.plans.PlanTestBase$class.withSQLConf(PlanTest.scala:176)
[info] at 
org.apache.spark.sql.MyTest.org$apache$spark$sql$test$SQLTestUtilsBase$$super$withSQLConf(MyTest.scala:27)
[info] at 
org.apache.spark.sql.test.SQLTestUtilsBase$class.withSQLConf(SQLTestUtils.scala:167)
[info] at org.apache.spark.sql.MyTest.withSQLConf(MyTest.scala:27)
[info] at org.apache.spark.sql.MyTest$$anonfun$1.apply$mcV$sp(MyTest.scala:65)
[info] at org.apache.spark.sql.MyTest$$anonfun$1.apply(MyTest.scala:65)
[info] at org.apache.spark.sql.MyTest$$anonfun$1.apply(MyTest.scala:65)
...
{code:java}


> spark.sql.autoBroadcastJoinThreshold causing OOM exception in the driver 
> -
>
> Key: SPARK-23427
> URL: https://issues.apache.org/jira/browse/SPARK-23427
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: SPARK 2.0 version
>Reporter: Dhiraj
>Priority: Critical
>
> We are facing issue around value of spark.sql.autoBroadcastJoinThreshold.
> With spark.sql.autoBroadcastJoinThreshold -1 ( disable) we seeing driver 
> memory used flat.
> With any other values 10MB, 5MB, 2 MB, 1MB, 10K, 1K we see driver memory used 
> goes up with rate depending upon the size of the autoBroadcastThreshold and 
> getting OOM exception. The problem is memory used by autoBroadcast is not 
> being free up in the driver.
> Application imports oracle tables as master dataframes which are persisted. 
> Each job applies filter to these tables and then registered them as 
> tempViewTable . Then sql query are using to process data further. At the end 
> all the intermediate dataFrame are unpersisted.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org