[ https://issues.apache.org/jira/browse/SPARK-9340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14680385#comment-14680385 ]
Apache Spark commented on SPARK-9340: ------------------------------------- User 'liancheng' has created a pull request for this issue: https://github.com/apache/spark/pull/8070 > ParquetTypeConverter incorrectly handling of repeated types results in schema > mismatch > -------------------------------------------------------------------------------------- > > Key: SPARK-9340 > URL: https://issues.apache.org/jira/browse/SPARK-9340 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.5.0 > Reporter: Damian Guy > Attachments: ParquetTypesConverterTest.scala > > > The way ParquetTypesConverter handles primitive repeated types results in an > incompatible schema being used for querying data. For example, given a schema > like so: > message root { > repeated int32 repeated_field; > } > Spark produces a read schema like: > message root { > optional int32 repeated_field; > } > These are incompatible and all attempts to read fail. > In ParquetTypesConverter.toDataType: > if (parquetType.isPrimitive) { > toPrimitiveDataType(parquetType.asPrimitiveType, isBinaryAsString, > isInt96AsTimestamp) > } else {...} > The if condition should also have > !parquetType.isRepetition(Repetition.REPEATED) > > And then this case will need to be handled in the else -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org