Sergio Peña created HIVE-9502:
---------------------------------

             Summary: Parquet cannot read Map types from files written with 
Hive <= 0.12
                 Key: HIVE-9502
                 URL: https://issues.apache.org/jira/browse/HIVE-9502
             Project: Hive
          Issue Type: Bug
            Reporter: Sergio Peña
            Assignee: Sergio Peña


When reading a Parquet file written by Hive <= 0.12, the following error is 
thrown:

{noformat}
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
        at 
org.apache.hadoop.hive.ql.io.parquet.serde.AbstractParquetMapInspector.getMap(AbstractParquetMapInspector.java:73)
        at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:519)
        at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:443)
        at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:427)
        at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:582)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
        at 
org.apache.hadoop.hive.ql.exec.LimitOperator.processOp(LimitOperator.java:51)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
        at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
        at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
        at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:539)
        ... 9 more
{noformat}

This is because old versions of Hive (<= 0.12) write Map types using the 
following schema:

{noformat}
optional group m1 (MAP_KEY_VALUE) {
        repeated group map {
                required binary key;
                optional binary key;
        }
}       
{noformat}

PARQUET-113 mentions new annotations for Parquet nested types. 
https://github.com/rdblue/incubator-parquet-format/blob/PARQUET-113-add-list-and-map-spec/LogicalTypes.md#maps

And now the correct schema is:
{noformat}
optional group m1f (MAP) {
        repeated group map (MAP_KEY_VALUE) {
                required binary key;
                optional binary key;
        }
}
{noformat}

We should be backwards compatible to the old schema as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to