Yin Huai created PARQUET-221:
--------------------------------

             Summary: For array type, inconsistent names are passed into 
convertType.
                 Key: PARQUET-221
                 URL: https://issues.apache.org/jira/browse/PARQUET-221
             Project: Parquet
          Issue Type: Bug
          Components: parquet-mr
    Affects Versions: 1.6.0
            Reporter: Yin Huai


When creating a convert for an array, Parquet Avro uses "array" as the field 
name name ([see 
here|https://github.com/apache/incubator-parquet-mr/blob/parquet-1.6.0rc7/parquet-avro/src/main/java/parquet/avro/AvroSchemaConverter.java#L131])
 , but Parquet Hive SerDe uses "array_element" as the field name [see 
here|https://github.com/apache/incubator-parquet-mr/blob/parquet-1.6.0rc7/parquet-hive/parquet-hive-storage-handler/src/main/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveSchemaConverter.java#L109].
 In Spark SQL, our native Parquet support is following Parquet Avro's 
convention, for data generated by Parquet Hive SerDe, the array value cannot be 
correctly read and null will be returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to