[ https://issues.apache.org/jira/browse/SPARK-38531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-38531: ------------------------------------ Assignee: Apache Spark > "Prune unrequired child index" branch of ColumnPruning has wrong condition > -------------------------------------------------------------------------- > > Key: SPARK-38531 > URL: https://issues.apache.org/jira/browse/SPARK-38531 > Project: Spark > Issue Type: Bug > Components: Optimizer > Affects Versions: 3.2.1, 3.4.0, 3.3.1 > Reporter: Min Yang > Assignee: Apache Spark > Priority: Minor > > The "prune unrequired references" branch has the condition: > {code:java} > case p @ Project(_, g: Generate) if p.references != g.outputSet => {code} > This is wrong as generators like Inline will always enter this branch as long > as it does not use all the generator output. > > Example: > > input: <col1: array<struct<a: struct<a: int, b: int>, b: int>>> > > Project(a.a as x) > - Generate(Inline(col1), ..., a, b) > > p.references is [a] > g.outputSet is [a, b] > > This bug makes us never enter the GeneratorNestedColumnAliasing branch below > thus miss some optimization opportunities. The condition should be > {code:java} > g.requiredChildOutput.contains(!p.references.contains(_)) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org