[ https://issues.apache.org/jira/browse/SPARK-9340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Cheng Lian updated SPARK-9340: ------------------------------ Summary: CatalystSchemaConverter and CatalystRowConverter don't handle unannotated repeated fields correctly (was: ParquetTypeConverter incorrectly handling of repeated types results in schema mismatch) > CatalystSchemaConverter and CatalystRowConverter don't handle unannotated > repeated fields correctly > --------------------------------------------------------------------------------------------------- > > Key: SPARK-9340 > URL: https://issues.apache.org/jira/browse/SPARK-9340 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.5.0 > Reporter: Damian Guy > Attachments: ParquetTypesConverterTest.scala > > > The way ParquetTypesConverter handles primitive repeated types results in an > incompatible schema being used for querying data. For example, given a schema > like so: > message root { > repeated int32 repeated_field; > } > Spark produces a read schema like: > message root { > optional int32 repeated_field; > } > These are incompatible and all attempts to read fail. > In ParquetTypesConverter.toDataType: > if (parquetType.isPrimitive) { > toPrimitiveDataType(parquetType.asPrimitiveType, isBinaryAsString, > isInt96AsTimestamp) > } else {...} > The if condition should also have > !parquetType.isRepetition(Repetition.REPEATED) > > And then this case will need to be handled in the else -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org