[ https://issues.apache.org/jira/browse/SPARK-34805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mark Ressler updated SPARK-34805: --------------------------------- Attachment: jsonMetadataTest.py > PySpark loses metadata in DataFrame fields when selecting nested columns > ------------------------------------------------------------------------ > > Key: SPARK-34805 > URL: https://issues.apache.org/jira/browse/SPARK-34805 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 3.0.1, 3.1.1 > Reporter: Mark Ressler > Priority: Major > Attachments: jsonMetadataTest.py > > > For a DataFrame schema with nested StructTypes, where metadata is set for > fields in the schema, that metadata is lost when a DataFrame selects nested > fields. For example, suppose > {code:java} > df.schema.fields[0].dataType.fields[0].metadata > {code} > returns a non-empty dictionary, then > {code:java} > df.select('Field0.SubField0').schema.fields[0].metadata{code} > returns an empty dictionary, where "Field0" is the name of the first field in > the DataFrame and "SubField0" is the name of the first nested field under > "Field0". > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org