[ https://issues.apache.org/jira/browse/SPARK-34805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17479991#comment-17479991 ]
Apache Spark commented on SPARK-34805: -------------------------------------- User 'kevinwallimann' has created a pull request for this issue: https://github.com/apache/spark/pull/35270 > PySpark loses metadata in DataFrame fields when selecting nested columns > ------------------------------------------------------------------------ > > Key: SPARK-34805 > URL: https://issues.apache.org/jira/browse/SPARK-34805 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 3.0.1, 3.1.1 > Reporter: Mark Ressler > Priority: Major > Attachments: jsonMetadataTest.py, nested_columns_metadata.scala > > > For a DataFrame schema with nested StructTypes, where metadata is set for > fields in the schema, that metadata is lost when a DataFrame selects nested > fields. For example, suppose > {code:java} > df.schema.fields[0].dataType.fields[0].metadata > {code} > returns a non-empty dictionary, then > {code:java} > df.select('Field0.SubField0').schema.fields[0].metadata{code} > returns an empty dictionary, where "Field0" is the name of the first field in > the DataFrame and "SubField0" is the name of the first nested field under > "Field0". > -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org