Github user rednaxelafx commented on the issue: https://github.com/apache/spark/pull/20224 Updated the PR: 1. addressed @cloud-fan 's comment to make sure the `codegenStageId` is properly copied in transformations after `CollapseCodegenStages`. Added a new unit test case for it. The test case triggers `ReuseExchange`, which is a rule that runs after `CollapseCodegenStages`. Before this update, the explain output for the test query is: ``` == Physical Plan == *(0) Project [id#7L] +- *(0) SortMergeJoin [id#7L], [id#10L], Inner :- *(2) Sort [id#7L ASC NULLS FIRST], false, 0 : +- Exchange hashpartitioning(id#7L, 200) : +- *(1) Range (0, 100, step=1, splits=8) +- *(0) Sort [id#10L ASC NULLS FIRST], false, 0 +- ReusedExchange [id#10L], Exchange hashpartitioning(id#7L, 200) ``` Note the `*(0)`s are indicating that the `codegenStageId`s are not properly copied. After this update, it is now: ``` == Physical Plan == *(5) Project [id#0L] +- *(5) SortMergeJoin [id#0L], [id#3L], Inner :- *(2) Sort [id#0L ASC NULLS FIRST], false, 0 : +- Exchange hashpartitioning(id#0L, 200) : +- *(1) Range (0, 100, step=1, splits=8) +- *(4) Sort [id#3L ASC NULLS FIRST], false, 0 +- ReusedExchange [id#3L], Exchange hashpartitioning(id#0L, 200) ``` 2. Flipped the default value of the new conf option "spark.sql.codegen.wholeStage.useIdInClassName" to true.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org