Damian Guy created SPARK-9340:
---------------------------------

             Summary: ParquetTypeConverter incorrectly handling of repeated 
types results in schema mismatch
                 Key: SPARK-9340
                 URL: https://issues.apache.org/jira/browse/SPARK-9340
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 1.4.0, 1.2.0
            Reporter: Damian Guy


The way ParquetTypesConverter handles primitive repeated types results in an 
incompatible schema being used for querying data. For example, given a schema 
like so:
message root {
   repeated int32 repeated_field;
 }

Spark produces a read schema like:
message root {
   optional int32 repeated_field;
 }

These are incompatible and all attempts to read fail.
In ParquetTypesConverter.toDataType:

 if (parquetType.isPrimitive) {
      toPrimitiveDataType(parquetType.asPrimitiveType, isBinaryAsString, 
isInt96AsTimestamp)
    } else {...}

The if condition should also have !parquetType.isRepetition(Repetition.REPEATED)
 
And then this case will need to be handled in the else 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to