Mark Ressler created SPARK-34805: ------------------------------------ Summary: PySpark loses metadata in DataFrame fields when selecting nested columns Key: SPARK-34805 URL: https://issues.apache.org/jira/browse/SPARK-34805 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 3.1.1, 3.0.1 Reporter: Mark Ressler
For a DataFrame schema with nested StructTypes, where metadata is set for fields in the schema, that metadata is lost when a DataFrame selects nested fields. For example, suppose {code:java} df.schema.fields[0].dataType.fields[0].metadata {code} returns a non-empty dictionary, then {code:java} df.select('Field0.SubField0').schema.fields[0].metadata{code} returns an empty dictionary, where "Field0" is the name of the first field in the DataFrame and "SubField0" is the name of the first nested field under "Field0". -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org