gengliangwang opened a new pull request, #56074: URL: https://github.com/apache/spark/pull/56074
### What changes were proposed in this pull request? This is a sub-task of [SPARK-56908](https://issues.apache.org/jira/browse/SPARK-56908). Two join operators emit anonymous `TaskCompletionListener`s whose bodies are type-independent: - `SortMergeJoinExec.doProduce` registers a per-stage anonymous inner class that adds `matches.spillSize()` to the `spillSize` metric. - `ShuffledHashJoinExec.buildSideOrFullOuterJoinNonUniqueKey` registers a runtime anonymous closure that adds the `OpenHashSet[Long]` memory footprint (bit-set + data array) to `buildDataSize`. Hoist both into shared static helpers in a new file `sql/core/src/main/java/org/apache/spark/sql/execution/joins/JoinHelper.java`: ```java recordSpillSizeOnTaskCompletion(ExternalAppendOnlyUnsafeRowArray, SQLMetric) recordOpenHashSetMemoryUsageOnTaskCompletion(OpenHashSet<?>, SQLMetric) ``` Also remove the now-unused `SortMergeJoinExec.getTaskContext()` whose only caller was the inlined listener. ### Why are the changes needed? - Smaller generated Java per `SortMergeJoinExec` whole-stage-codegen stage: one anonymous inner class is no longer emitted per stage. - Centralises the metric-recording listener bodies in one place where the JIT can compile them once instead of once per stage. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing test suites cover both paths with whole-stage codegen on and off: - `OuterJoinSuite` (SMJ full-outer codegen path). - `InnerJoinSuite` (SMJ codegen path with spill). - ShuffledHashJoin full-outer non-unique-key path tests in `OuterJoinSuite`. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Code -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
