Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/12719#discussion_r62420393 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -617,6 +618,77 @@ object NullPropagation extends Rule[LogicalPlan] { } /** + * Propagate foldable expressions: + * Replace all attributes with aliases of the original foldable expressions except the followings. + * 1) Command and Set(UNION/INTERSECT/EXCEPT): Do not optimize. --- End diff -- Thank you, @cloud-fan . * For set queries, they uses **the same AttributeReference** in the global query and one of subqueries. It causes theoretically incorrect result for `FoldablePropagation`. We must prevent this. ``` scala> sql("select 1 a union select 2 a").explain == Physical Plan == WholeStageCodegen : +- TungstenAggregate(key=[a#0], functions=[], output=[a#0]) : +- INPUT +- Exchange hashpartitioning(a#0, 200), None +- WholeStageCodegen : +- TungstenAggregate(key=[a#0], functions=[], output=[a#0]) : +- INPUT +- Union :- WholeStageCodegen : : +- Project [1 AS a#0] : : +- INPUT : +- Scan OneRowRelation[] +- WholeStageCodegen : +- Project [2 AS a#1] : +- INPUT +- Scan OneRowRelation[] ``` * For command queries, it seems some command querys (CTAS) raises exceptions when they received non-AttributeReference column outputs (here, aliased literals). Actually, I hope to investigate that as an other issue. It may need to touch other modules.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org