[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

rednaxelafx Wed, 24 Jan 2018 16:33:12 -0800

Github user rednaxelafx commented on the issue:

    https://github.com/apache/spark/pull/20224
  
    Updated the PR:
    
    1. addressed @cloud-fan 's comment to make sure the `codegenStageId` is 
properly copied in transformations after `CollapseCodegenStages`. Added a new 
unit test case for it.
    
    The test case triggers `ReuseExchange`, which is a rule that runs after 
`CollapseCodegenStages`.
    Before this update, the explain output for the test query is:
    ```
    == Physical Plan ==
    *(0) Project [id#7L]
    +- *(0) SortMergeJoin [id#7L], [id#10L], Inner
       :- *(2) Sort [id#7L ASC NULLS FIRST], false, 0
       :  +- Exchange hashpartitioning(id#7L, 200)
       :     +- *(1) Range (0, 100, step=1, splits=8)
       +- *(0) Sort [id#10L ASC NULLS FIRST], false, 0
          +- ReusedExchange [id#10L], Exchange hashpartitioning(id#7L, 200)
    ```
    Note the `*(0)`s are indicating that the `codegenStageId`s are not properly 
copied. After this update, it is now:
    ```
    == Physical Plan ==
    *(5) Project [id#0L]
    +- *(5) SortMergeJoin [id#0L], [id#3L], Inner
       :- *(2) Sort [id#0L ASC NULLS FIRST], false, 0
       :  +- Exchange hashpartitioning(id#0L, 200)
       :     +- *(1) Range (0, 100, step=1, splits=8)
       +- *(4) Sort [id#3L ASC NULLS FIRST], false, 0
          +- ReusedExchange [id#3L], Exchange hashpartitioning(id#0L, 200)
    ```
    
    2. Flipped the default value of the new conf option 
"spark.sql.codegen.wholeStage.useIdInClassName" to true.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

Reply via email to