[ https://issues.apache.org/jira/browse/SPARK-32639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chen Zhang updated SPARK-32639: ------------------------------- Attachment: 000.snappy.parquet > Support GroupType parquet mapkey field > -------------------------------------- > > Key: SPARK-32639 > URL: https://issues.apache.org/jira/browse/SPARK-32639 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.4.6, 3.0.0 > Reporter: Chen Zhang > Priority: Major > Attachments: 000.snappy.parquet > > > I have a parquet file, and the MessageType recorded in the file is: > {code:java} > message parquet_schema { > optional group value (MAP) { > repeated group key_value { > required group key { > optional binary first (UTF8); > optional binary middle (UTF8); > optional binary last (UTF8); > } > optional binary value (UTF8); > } > } > }{code} > > Use +spark.read.parquet("000.snappy.parquet")+ to read the file. Spark will > throw an exception when converting Parquet MessageType to Spark SQL > StructType: > {code:java} > AssertionError(Map key type is expected to be a primitive type, but found...) > {code} > > Use +spark.read.schema("value MAP<STRUCT<first:STRING, middle:STRING, > last:STRING>, STRING>").parquet("000.snappy.parquet")+ to read the file, > spark returns the correct result . > According to the parquet project document > (https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#maps), > the mapKey in the parquet format does not need to be a primitive type. > > Note: This parquet file is not written by spark, because spark will write > additional sparkSchema string information in the parquet file. When Spark > reads, it will directly use the additional sparkSchema information in the > file instead of converting Parquet MessageType to Spark SQL StructType. > I will submit a PR later. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org