Github user rednaxelafx commented on the issue: https://github.com/apache/spark/pull/20224 BTW, inspired by @cloud-fan 's comment, here's an example of the codegen stage IDs when scalar subqueries are involved: ```scala val sub = "(select sum(id) from range(5))" val df = spark.sql(s"select $sub as a, $sub as b") df.explain(true) ``` would give: ``` == Parsed Logical Plan == 'Project [scalar-subquery#0 [] AS a#1, scalar-subquery#2 [] AS b#3] : :- 'Project [unresolvedalias('sum('id), None)] : : +- 'UnresolvedTableValuedFunction range, [5] : +- 'Project [unresolvedalias('sum('id), None)] : +- 'UnresolvedTableValuedFunction range, [5] +- OneRowRelation == Analyzed Logical Plan == a: bigint, b: bigint Project [scalar-subquery#0 [] AS a#1L, scalar-subquery#2 [] AS b#3L] : :- Aggregate [sum(id#14L) AS sum(id)#16L] : : +- Range (0, 5, step=1, splits=None) : +- Aggregate [sum(id#17L) AS sum(id)#19L] : +- Range (0, 5, step=1, splits=None) +- OneRowRelation == Optimized Logical Plan == Project [scalar-subquery#0 [] AS a#1L, scalar-subquery#2 [] AS b#3L] : :- Aggregate [sum(id#14L) AS sum(id)#16L] : : +- Range (0, 5, step=1, splits=None) : +- Aggregate [sum(id#17L) AS sum(id)#19L] : +- Range (0, 5, step=1, splits=None) +- OneRowRelation == Physical Plan == *(1) Project [Subquery subquery0 AS a#1L, Subquery subquery2 AS b#3L] : :- Subquery subquery0 : : +- *(2) HashAggregate(keys=[], functions=[sum(id#14L)], output=[sum(id)#16L]) : : +- Exchange SinglePartition : : +- *(1) HashAggregate(keys=[], functions=[partial_sum(id#14L)], output=[sum#21L]) : : +- *(1) Range (0, 5, step=1, splits=8) : +- Subquery subquery2 : +- *(2) HashAggregate(keys=[], functions=[sum(id#17L)], output=[sum(id)#19L]) : +- Exchange SinglePartition : +- *(1) HashAggregate(keys=[], functions=[partial_sum(id#17L)], output=[sum#23L]) : +- *(1) Range (0, 5, step=1, splits=8) +- Scan OneRowRelation[] ``` The reason why the IDs look a bit "odd" (that there are three separate codegen stages with ID 1) is because the main "spine" query and each individual subqueries are "planned" separately, thus they'd run `CollapseCodegenStages` separately, each counting up from 1 afresh. I would consider this behavior acceptable, but I wonder what others would think in this case. If this behavior for subqueries is not acceptable, I'll have to find alternative places to put the initialization and reset of the thread-local ID counter.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org