[ https://issues.apache.org/jira/browse/SPARK-38531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17562495#comment-17562495 ]
Hyukjin Kwon commented on SPARK-38531: -------------------------------------- Reverted at https://github.com/apache/spark/commit/161c596cafea9c235b5c918d8999c085401d73a9 and https://github.com/apache/spark/commit/4512e0943036d30587ab19a95efb0e66b47dd746 > "Prune unrequired child index" branch of ColumnPruning has wrong condition > -------------------------------------------------------------------------- > > Key: SPARK-38531 > URL: https://issues.apache.org/jira/browse/SPARK-38531 > Project: Spark > Issue Type: Bug > Components: Optimizer > Affects Versions: 3.2.1 > Reporter: Min Yang > Assignee: Min Yang > Priority: Minor > Fix For: 3.3.0 > > > The "prune unrequired references" branch has the condition: > {code:java} > case p @ Project(_, g: Generate) if p.references != g.outputSet => {code} > This is wrong as generators like Inline will always enter this branch as long > as it does not use all the generator output. > > Example: > > input: <col1: array<struct<a: struct<a: int, b: int>, b: int>>> > > Project(a.a as x) > - Generate(Inline(col1), ..., a, b) > > p.references is [a] > g.outputSet is [a, b] > > This bug makes us never enter the GeneratorNestedColumnAliasing branch below > thus miss some optimization opportunities. The condition should be > {code:java} > g.requiredChildOutput.contains(!p.references.contains(_)) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org