[ 
https://issues.apache.org/jira/browse/SPARK-26103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin reassigned SPARK-26103:
--------------------------------------

    Assignee: Dave DeCaprio

> OutOfMemory error with large query plans
> ----------------------------------------
>
>                 Key: SPARK-26103
>                 URL: https://issues.apache.org/jira/browse/SPARK-26103
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.0, 2.3.1, 2.3.2
>         Environment: Amazon EMR 5.19
> 1 c5.4xlarge master instance
> 1 c5.4xlarge core instance
> 2 c5.4xlarge task instances
>            Reporter: Dave DeCaprio
>            Assignee: Dave DeCaprio
>            Priority: Major
>
> Large query plans can cause OutOfMemory errors in the Spark driver.
> We are creating data frames that are not extremely large but contain lots of 
> nested joins.  These plans execute efficiently because of caching and 
> partitioning, but the text version of the query plans generated can be 
> hundreds of megabytes.  Running many of these in parallel causes our driver 
> process to fail.
> {{{{Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at 
> java.util.Arrays.copyOfRange(Arrays.java:2694) at 
> java.lang.String.<init>(String.java:203) at 
> java.lang.StringBuilder.toString(StringBuilder.java:405) at 
> scala.StringContext.standardInterpolator(StringContext.scala:125) at 
> scala.StringContext.s(StringContext.scala:90) at 
> org.apache.spark.sql.execution.QueryExecution.toString(QueryExecution.scala:70)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:52)
>  at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation.run(InsertIntoHadoopFsRelation.scala:108)
>  }}}}
>  
> A similar error is reported in 
> [https://stackoverflow.com/questions/38307258/out-of-memory-error-when-writing-out-spark-dataframes-to-parquet-format]
>  
> Code exists to truncate the string if the number of output columns is larger 
> than 25, but not if the rest of the query plan is huge.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to