[jira] [Updated] (SPARK-44759) Do not combine multiple Generate operators in the same WholeStageCodeGen node because it can easily cause OOM failures if arrays are relatively large

Franck Tago (Jira) Thu, 10 Aug 2023 11:17:34 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-44759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Franck Tago updated SPARK-44759:
--------------------------------
    Description: 
This is an issue since the WSCG  implementation of the generate node. 

Because WSCG compute rows in batches , the combination of WSCG and the explode 
operation consume a lot of the dedicated executor memory. This is even more 
true when the WSCG node contains multiple explode nodes. This is the case when 
flattening a nested array.

The generate node used to flatten array generally  produces an amount of output 
rows that is significantly higher than the input rows.

the number of output rows generated is even drastically higher when flattening 
a nested array .

When we combine more that 1 generate node in the same WholeStageCodeGen  node, 
we run  a high risk of running out of memory for multiple reasons. 

1- As you can see from snapshots added in the comments ,  the rows created in 
the nested loop are saved in a writer buffer.  In this case because the rows 
were big , the job failed with an Out Of Memory Exception error .

2_ The generated WholeStageCodeGen result in a nested loop that for each row  , 
will explode the parent array and then explode the inner array.  The rows are 
accumulated in the writer buffer without accounting for the row size.

Please view the attached Spark Gui and Spark Dag 

In my case the wholestagecodegen includes 2 explode nodes. 

Because the array elements are large , we end up with an Out Of Memory error. 

 

I recommend that we do not merge  multiple explode nodes in the same whole 
stage code gen node . Doing so leads to potential memory issues.

In our case , the job execution failed with an  OOM error because the the WSCG 
executed  into a nested for loop . 

 

 

 

  was:
This is an issue since the WSCG  implementation of the generate node. 

The generate node used to flatten array generally  produces an amount of output 
rows that is significantly higher than the input rows to the node. 

the number of output rows generated is even drastically higher when flattening 
a nested array .

When we combine more that 1 generate node in the same WholeStageCodeGen  node, 
we run  a high risk of running out of memory for multiple reasons. 

1- As you can see from the attachment ,  the rows created in the nested loop 
are saved in writer buffer.  In my case because the rows were big , I hit an 
Out Of Memory Exception error .2_ The generated WholeStageCodeGen result in a 
nested loop that for each row  , will explode the parent array and then explode 
the inner array.  This is prone to OutOfmerry errors

 

Please view the attached Spark Gui and Spark Dag 

In my case the wholestagecodegen includes 2 explode nodes. 

Because the array elements are large , we end up with an Out Of Memory error. 

 

I recommend that we do not merge  multiple explode nodes in the same whole 
stage code gen node . Doing so leads to potential memory issues.

In our case , the job execution failed with an  OOM error because the the WSCG 
executed  into a nested for loop . 

 

 

 


> Do not combine multiple Generate operators in the same WholeStageCodeGen node 
> because it can  easily cause OOM failures if arrays are relatively large
> ------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-44759
>                 URL: https://issues.apache.org/jira/browse/SPARK-44759
>             Project: Spark
>          Issue Type: Bug
>          Components: Optimizer
>    Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0, 
> 3.1.3, 3.2.1, 3.3.0, 3.2.2, 3.3.1, 3.2.3, 3.2.4, 3.3.2, 3.4.0, 3.4.1
>            Reporter: Franck Tago
>            Priority: Major
>         Attachments: image-2023-08-10-09-27-24-124.png, 
> image-2023-08-10-09-29-24-804.png, image-2023-08-10-09-32-46-163.png, 
> image-2023-08-10-09-33-47-788.png, 
> wholestagecodegen_wc1_debug_wholecodegen_passed
>
>
> This is an issue since the WSCG  implementation of the generate node. 
> Because WSCG compute rows in batches , the combination of WSCG and the 
> explode operation consume a lot of the dedicated executor memory. This is 
> even more true when the WSCG node contains multiple explode nodes. This is 
> the case when flattening a nested array.
> The generate node used to flatten array generally  produces an amount of 
> output rows that is significantly higher than the input rows.
> the number of output rows generated is even drastically higher when 
> flattening a nested array .
> When we combine more that 1 generate node in the same WholeStageCodeGen  
> node, we run  a high risk of running out of memory for multiple reasons. 
> 1- As you can see from snapshots added in the comments ,  the rows created in 
> the nested loop are saved in a writer buffer.  In this case because the rows 
> were big , the job failed with an Out Of Memory Exception error .
> 2_ The generated WholeStageCodeGen result in a nested loop that for each row  
> , will explode the parent array and then explode the inner array.  The rows 
> are accumulated in the writer buffer without accounting for the row size.
> Please view the attached Spark Gui and Spark Dag 
> In my case the wholestagecodegen includes 2 explode nodes. 
> Because the array elements are large , we end up with an Out Of Memory error. 
>  
> I recommend that we do not merge  multiple explode nodes in the same whole 
> stage code gen node . Doing so leads to potential memory issues.
> In our case , the job execution failed with an  OOM error because the the 
> WSCG executed  into a nested for loop . 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-44759) Do not combine multiple Generate operators in the same WholeStageCodeGen node because it can easily cause OOM failures if arrays are relatively large

Reply via email to