[ 
https://issues.apache.org/jira/browse/SPARK-40594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

XiDuo You updated SPARK-40594:
------------------------------
    Description: 
ShuffledHashJoin releases the built hashed relation at the end of task using 
taskCompletionListener. It is not always good enough for complex sql query.

If a smj or window on the top of the shj, then the hashed relation in shj would 
be leak. All rows have been consumed in sort before smj or window then the 
buffer can not allocate the memory which is hold by hashed relation. Then it 
causes unnecessary spill.

It is a common case in multi-join, since AQE supports convert smj to shj at 
runtime.

  was:
ShuffledHashJoin releases the built hashed relation at the end of task using 
taskCompletionListener. It is not always good enough for complex sql query.

If a smj on the top of the shj, then the hashed relation in shj would be leak. 
All rows have been consumed in sort before smj and then in smj the buffered 
rows can not allocate the memory which is hold by hashed relation. Then it 
causes unnecessary spill.

It is a common case in multi-join, since AQE supports convert smj to shj at 
runtime.


> Eagerly release hashed relation in ShuffledHashJoin
> ---------------------------------------------------
>
>                 Key: SPARK-40594
>                 URL: https://issues.apache.org/jira/browse/SPARK-40594
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.4.0
>            Reporter: XiDuo You
>            Priority: Major
>
> ShuffledHashJoin releases the built hashed relation at the end of task using 
> taskCompletionListener. It is not always good enough for complex sql query.
> If a smj or window on the top of the shj, then the hashed relation in shj 
> would be leak. All rows have been consumed in sort before smj or window then 
> the buffer can not allocate the memory which is hold by hashed relation. Then 
> it causes unnecessary spill.
> It is a common case in multi-join, since AQE supports convert smj to shj at 
> runtime.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to