Gengliang Wang created SPARK-57026:
--------------------------------------
Summary: SortMergeJoinExec and ShuffledHashJoinExec: replace
anonymous TaskCompletionListener with shared JoinHelper methods
Key: SPARK-57026
URL: https://issues.apache.org/jira/browse/SPARK-57026
Project: Spark
Issue Type: Sub-task
Components: SQL
Affects Versions: 5.0.0
Reporter: Gengliang Wang
Two join operators emit anonymous {{TaskCompletionListener}}s whose bodies are
type-independent:
- {{SortMergeJoinExec.doProduce}} registers a per-stage anonymous inner class
that adds {{matches.spillSize()}} to the {{spillSize}} metric.
- {{ShuffledHashJoinExec.buildSideOrFullOuterJoinNonUniqueKey}} registers a
runtime anonymous closure that adds the {{OpenHashSet[Long]}} memory footprint
(bit-set + data array) to {{buildDataSize}}.
Hoist both into shared static helpers in {{JoinHelper}}:
{code:java}
recordSpillSizeOnTaskCompletion(ExternalAppendOnlyUnsafeRowArray, SQLMetric)
recordOpenHashSetMemoryUsageOnTaskCompletion(OpenHashSet<?>, SQLMetric)
{code}
Each site shrinks to a single static call. The SMJ change removes one anonymous
inner class per whole-stage-codegen stage.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]