[
https://issues.apache.org/jira/browse/HIVE-9502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sergio Peña updated HIVE-9502:
------------------------------
Affects Version/s: 0.14.0
> Parquet cannot read Map types from files written with Hive <= 0.12
> ------------------------------------------------------------------
>
> Key: HIVE-9502
> URL: https://issues.apache.org/jira/browse/HIVE-9502
> Project: Hive
> Issue Type: Bug
> Affects Versions: 0.14.0
> Reporter: Sergio Peña
> Assignee: Sergio Peña
>
> When reading a Parquet file written by Hive <= 0.12, the following error is
> thrown:
> {noformat}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
> at
> org.apache.hadoop.hive.ql.io.parquet.serde.AbstractParquetMapInspector.getMap(AbstractParquetMapInspector.java:73)
> at
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:519)
> at
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:443)
> at
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:427)
> at
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:582)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
> at
> org.apache.hadoop.hive.ql.exec.LimitOperator.processOp(LimitOperator.java:51)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
> at
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
> at
> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
> at
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:539)
> ... 9 more
> {noformat}
> This is because old versions of Hive (<= 0.12) write Map types using the
> following schema:
> {noformat}
> optional group m1 (MAP_KEY_VALUE) {
> repeated group map {
> required binary key;
> optional binary key;
> }
> }
> {noformat}
> PARQUET-113 mentions new annotations for Parquet nested types.
> https://github.com/rdblue/incubator-parquet-format/blob/PARQUET-113-add-list-and-map-spec/LogicalTypes.md#maps
> And now the correct schema is:
> {noformat}
> optional group m1f (MAP) {
> repeated group map (MAP_KEY_VALUE) {
> required binary key;
> optional binary key;
> }
> }
> {noformat}
> We should be backwards compatible to the old schema as well.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)