[ https://issues.apache.org/jira/browse/SPARK-42162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Supun Nakandala updated SPARK-42162: ------------------------------------ Description: With the [recent changes|https://github.com/apache/spark/pull/37851] in the expression canonicalization, a complex query with a large number of Add operations ends up consuming 10x more memory on the executors. The reason for this issue is that with the new changes the canonicalization process ends up generating lot of intermediate objects, especially for complex queries with a large number of commutative operators. In this specific case, a heap histogram analysis shows that a large number of Add objects use the extra memory. This issue does not happen before PR [#37851.|https://github.com/apache/spark/pull/37851] The high memory usage causes the executors to lose heartbeat signals and results in task failures. was: With the [recent changes|https://github.com/apache/spark/pull/37851] in the expression canonicalization, a complex query with a large number of Add operations ends up consuming 10x more memory on the executors. A heap histogram analysis shows that a large number of Add objects use the extra memory. Before the PR [#37851|https://github.com/apache/spark/pull/37851], this issue does not happen. The high memory usage causes the executors to lose heartbeat signals and cause task failures. > Memory usage on executors increased drastically for a complex query with > large number of addition operations > ------------------------------------------------------------------------------------------------------------ > > Key: SPARK-42162 > URL: https://issues.apache.org/jira/browse/SPARK-42162 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.3.0 > Reporter: Supun Nakandala > Priority: Major > > With the [recent changes|https://github.com/apache/spark/pull/37851] in the > expression canonicalization, a complex query with a large number of Add > operations ends up consuming 10x more memory on the executors. > The reason for this issue is that with the new changes the canonicalization > process ends up generating lot of intermediate objects, especially for > complex queries with a large number of commutative operators. In this > specific case, a heap histogram analysis shows that a large number of Add > objects use the extra memory. > This issue does not happen before PR > [#37851.|https://github.com/apache/spark/pull/37851] > The high memory usage causes the executors to lose heartbeat signals and > results in task failures. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org