Gengliang Wang created SPARK-57027:
--------------------------------------
Summary: SortMergeJoinExec: skip statically-dead branches in
codegen
Key: SPARK-57027
URL: https://issues.apache.org/jira/browse/SPARK-57027
Project: Spark
Issue Type: Sub-task
Components: SQL
Affects Versions: 5.0.0
Reporter: Gengliang Wang
Two statically-dead patterns in {{SortMergeJoinExec}} codegen:
1. {{genComparison}} emits {{comp = 0; if (comp == 0) { comp = compare(k1); }
...}}. The first {{if (comp == 0)}} is always true. Emit {{comp =
compare(k1);}} directly; only wrap subsequent keys. {{genComparison}} is called
5x per SMJ stage (twice in {{genScanner}}, three times in
{{codegenFullOuter}}). For single-key joins (common), each call collapses to
one line.
2. {{genScanner}} and {{codegenFullOuter}} emit {{if (k1IsNull || k2IsNull ||
...) { handler }}}. When all key {{ExprValue}}s have {{isNull ==
FalseLiteral}}, the disjunction is statically false and the whole block
(including its {{handleStreamedAnyNull}} / "join with null row" handler) is
dead. Detect this and omit the block. Hits fact/dimension joins on numeric keys
where Spark has already proved non-nullability.
Behavior preserved -- JIT eliminates the dead code at runtime; the win is
smaller generated source, more 64KB method-limit headroom, and slightly faster
Janino compile.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]