[jira] [Commented] (HIVE-19015) Vectorization and Parquet: When vectorized, parquet_map_of_arrays_of_ints.q gets a ClassCastException

Haifeng Chen (JIRA) Wed, 09 May 2018 19:08:55 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-19015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469821#comment-16469821
 ]


Haifeng Chen commented on HIVE-19015:
-------------------------------------

[~vihangk1] This is the same nested complex type problem as it is not yet 
implement in Parquet vectorized reader. I will get this done together with 
HIVE-19016. 

The nested complex type handling will be much complex than primitives but fewer 
cases. My current thought is for root columns which is primitive or List, 
Struct and Map with primitives, we will go with the current implementation as 
fast path.  When we found a root column with nested complex types, we will go 
with a tree reader which can handling the definition level and repetition level 
properly.

> Vectorization and Parquet: When vectorized, parquet_map_of_arrays_of_ints.q 
> gets a ClassCastException
> -----------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-19015
>                 URL: https://issues.apache.org/jira/browse/HIVE-19015
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 3.0.0
>            Reporter: Matt McCline
>            Assignee: Vihang Karajgaonkar
>            Priority: Critical
>
> Adding "SET hive.vectorized.execution.enabled=true;"  to 
> parquet_map_of_arrays_of_ints.q triggers this call stack:
> {noformat}
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.serde2.typeinfo.ListTypeInfo cannot be cast to 
> org.apache.hadoop.hive.serde2.typeinfo.PrimitiveTypeInfo
>       at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.readBatch(VectorizedListColumnReader.java:67)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>       at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedMapColumnReader.readBatch(VectorizedMapColumnReader.java:57)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>       at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:410)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>       at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>       at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>       at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> {noformat}
> FYI: [~vihangk1]
> Adding parquet_map_of_maps.q, too.  Stack trace seems related.
> {noformat}
> Caused by: java.lang.ClassCastException: optional group value (MAP) {
>   repeated group key_value {
>     optional binary key (UTF8);
>     required int32 value;
>   }
> } is not primitive
>       at org.apache.parquet.schema.Type.asPrimitiveType(Type.java:213) 
> ~[parquet-hadoop-bundle-1.9.0.jar:1.9.0]
>       at 
> org.apache.hadoop.hive.ql.io.parquet.vector.BaseVectorizedColumnReader.<init>(BaseVectorizedColumnReader.java:130)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>       at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.<init>(VectorizedListColumnReader.java:52)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>       at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.buildVectorizedParquetReader(VectorizedParquetRecordReader.java:568)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>       at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:440)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>       at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:401)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>       at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>       at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>       at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-19015) Vectorization and Parquet: When vectorized, parquet_map_of_arrays_of_ints.q gets a ClassCastException

Reply via email to