[ https://issues.apache.org/jira/browse/SPARK-42704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-42704: ------------------------------------ Assignee: Apache Spark > SubqueryAlias should propagate metadata columns its child already selects > -------------------------------------------------------------------------- > > Key: SPARK-42704 > URL: https://issues.apache.org/jira/browse/SPARK-42704 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 3.3.2, 3.4.0 > Reporter: Ryan Johnson > Assignee: Apache Spark > Priority: Major > > The `AddMetadataColumns` analyzer rule intends to make resolve available > metadata columns, even if the plan already contains projections that did not > explicitly mention the metadata column. > The `SubqueryAlias` plan node intentionally does not propagate metadata > columns automatically from a non-leaf/non-subquery child node, because the > following should _not_ work: > > {code:java} > spark.read.table("t").select("a", "b").as("s").select("_metadata"){code} > However, today it is too strict in breaks the metadata chain, in case the > child node's output already includes the metadata column: > > {code:java} > // expected to work (and does) > spark.read.table("t") > .select("a", "b").select("_metadata") > // by extension, should also work (but does not) > spark.read.table("t").select("a", "b", "_metadata").as("s") > .select("a", "b").select("_metadata"){code} > The solution is for `SubqueryAlias` to always propagate metadata columns that > are already in the child's output, thus preserving the `metadataOutput` chain > for that column. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org