Ganesha Shreedhara created HIVE-22670:
-----------------------------------------
Summary: ArrayIndexOutOfBoundsException when vectorized reader is
used for reading a parquet file
Key: HIVE-22670
URL: https://issues.apache.org/jira/browse/HIVE-22670
Project: Hive
Issue Type: Bug
Affects Versions: 2.3.6, 3.1.2
Reporter: Ganesha Shreedhara
Assignee: Ganesha Shreedhara
ArrayIndexOutOfBoundsException is getting thrown while decoding dictionaryIds
of a row group in parquet file with vectorization enabled.
*Exception stack trace:*
{code:java}
Caused by: java.lang.ArrayIndexOutOfBoundsException: 0
at
org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainBinaryDictionary.decodeToBinary(PlainValuesDictionary.java:122)
at
org.apache.hadoop.hive.ql.io.parquet.vector.ParquetDataColumnReaderFactory$DefaultParquetDataColumnReader.readString(ParquetDataColumnReaderFactory.java:95)
at
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedPrimitiveColumnReader.decodeDictionaryIds(VectorizedPrimitiveColumnReader.java:467)
at
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedPrimitiveColumnReader.readBatch(VectorizedPrimitiveColumnReader.java:68)
at
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:410)
at
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353)
at
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92)
at
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365)
... 24 more{code}
This issue seems to be caused by re-using the same dictionary column vector
while reading consecutive row groups. This looks like one of the corner case
bug which occurs for a certain distribution of dictionary/plain encoded data
while we read/populate the underlying bit packed dictionary data into a
column-vector based data structure.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)