Yin Huai created PARQUET-221:
--------------------------------
Summary: For array type, inconsistent names are passed into
convertType.
Key: PARQUET-221
URL: https://issues.apache.org/jira/browse/PARQUET-221
Project: Parquet
Issue Type: Bug
Components: parquet-mr
Affects Versions: 1.6.0
Reporter: Yin Huai
When creating a convert for an array, Parquet Avro uses "array" as the field
name name ([see
here|https://github.com/apache/incubator-parquet-mr/blob/parquet-1.6.0rc7/parquet-avro/src/main/java/parquet/avro/AvroSchemaConverter.java#L131])
, but Parquet Hive SerDe uses "array_element" as the field name [see
here|https://github.com/apache/incubator-parquet-mr/blob/parquet-1.6.0rc7/parquet-hive/parquet-hive-storage-handler/src/main/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveSchemaConverter.java#L109].
In Spark SQL, our native Parquet support is following Parquet Avro's
convention, for data generated by Parquet Hive SerDe, the array value cannot be
correctly read and null will be returned.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)