Gengliang Wang created SPARK-57026:
--------------------------------------

             Summary: SortMergeJoinExec and ShuffledHashJoinExec: replace 
anonymous TaskCompletionListener with shared JoinHelper methods
                 Key: SPARK-57026
                 URL: https://issues.apache.org/jira/browse/SPARK-57026
             Project: Spark
          Issue Type: Sub-task
          Components: SQL
    Affects Versions: 5.0.0
            Reporter: Gengliang Wang


Two join operators emit anonymous {{TaskCompletionListener}}s whose bodies are 
type-independent:

- {{SortMergeJoinExec.doProduce}} registers a per-stage anonymous inner class 
that adds {{matches.spillSize()}} to the {{spillSize}} metric.
- {{ShuffledHashJoinExec.buildSideOrFullOuterJoinNonUniqueKey}} registers a 
runtime anonymous closure that adds the {{OpenHashSet[Long]}} memory footprint 
(bit-set + data array) to {{buildDataSize}}.

Hoist both into shared static helpers in {{JoinHelper}}:

{code:java}
recordSpillSizeOnTaskCompletion(ExternalAppendOnlyUnsafeRowArray, SQLMetric)
recordOpenHashSetMemoryUsageOnTaskCompletion(OpenHashSet<?>, SQLMetric)
{code}

Each site shrinks to a single static call. The SMJ change removes one anonymous 
inner class per whole-stage-codegen stage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to