[ 
https://issues.apache.org/jira/browse/SPARK-42162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Supun Nakandala updated SPARK-42162:
------------------------------------
    Description: 
With the [recent changes|https://github.com/apache/spark/pull/37851]  in the 
expression canonicalization, a complex query with a large number of Add 
operations ends up consuming 10x more memory on the executors.

The reason for this issue is that with the new changes the canonicalization 
process ends up generating lot of intermediate objects, especially for complex 
queries with a large number of commutative operators. In this specific case, a 
heap histogram analysis shows that a large number of Add objects use the extra 
memory.
This issue does not happen before PR 
[#37851.|https://github.com/apache/spark/pull/37851]

The high memory usage causes the executors to lose heartbeat signals and 
results in task failures.

  was:
With the [recent changes|https://github.com/apache/spark/pull/37851]  in the 
expression canonicalization, a complex query with a large number of Add 
operations ends up consuming 10x more memory on the executors.

A heap histogram analysis shows that a large number of Add objects use the 
extra memory.
Before the PR [#37851|https://github.com/apache/spark/pull/37851], this issue 
does not happen.

The high memory usage causes the executors to lose heartbeat signals and cause 
task failures.


> Memory usage on executors increased drastically for a complex query with 
> large number of addition operations
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-42162
>                 URL: https://issues.apache.org/jira/browse/SPARK-42162
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.3.0
>            Reporter: Supun Nakandala
>            Priority: Major
>
> With the [recent changes|https://github.com/apache/spark/pull/37851]  in the 
> expression canonicalization, a complex query with a large number of Add 
> operations ends up consuming 10x more memory on the executors.
> The reason for this issue is that with the new changes the canonicalization 
> process ends up generating lot of intermediate objects, especially for 
> complex queries with a large number of commutative operators. In this 
> specific case, a heap histogram analysis shows that a large number of Add 
> objects use the extra memory.
> This issue does not happen before PR 
> [#37851.|https://github.com/apache/spark/pull/37851]
> The high memory usage causes the executors to lose heartbeat signals and 
> results in task failures.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to