Gengliang Wang created SPARK-57027:
--------------------------------------

             Summary: SortMergeJoinExec: skip statically-dead branches in 
codegen
                 Key: SPARK-57027
                 URL: https://issues.apache.org/jira/browse/SPARK-57027
             Project: Spark
          Issue Type: Sub-task
          Components: SQL
    Affects Versions: 5.0.0
            Reporter: Gengliang Wang


Two statically-dead patterns in {{SortMergeJoinExec}} codegen:

1. {{genComparison}} emits {{comp = 0; if (comp == 0) { comp = compare(k1); } 
...}}. The first {{if (comp == 0)}} is always true. Emit {{comp = 
compare(k1);}} directly; only wrap subsequent keys. {{genComparison}} is called 
5x per SMJ stage (twice in {{genScanner}}, three times in 
{{codegenFullOuter}}). For single-key joins (common), each call collapses to 
one line.

2. {{genScanner}} and {{codegenFullOuter}} emit {{if (k1IsNull || k2IsNull || 
...) { handler }}}. When all key {{ExprValue}}s have {{isNull == 
FalseLiteral}}, the disjunction is statically false and the whole block 
(including its {{handleStreamedAnyNull}} / "join with null row" handler) is 
dead. Detect this and omit the block. Hits fact/dimension joins on numeric keys 
where Spark has already proved non-nullability.

Behavior preserved -- JIT eliminates the dead code at runtime; the win is 
smaller generated source, more 64KB method-limit headroom, and slightly faster 
Janino compile.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to