[
https://issues.apache.org/jira/browse/HIVE-9235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14276620#comment-14276620
]
Matt McCline commented on HIVE-9235:
------------------------------------
First issue (vectorization of Parquet):
Missing cases in VectorColumnAssignFactory.java's public static
VectorColumnAssign[] buildAssigners(VectorizedRowBatch outputBatch,
Writable[] writables) for HiveCharWritable, HiveVarcharWritable,
DateWritable, HiveDecimalWriter.
Example of exception caused:
{noformat}
Caused by: java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException: Unimplemented vector assigner
for writable type class org.apache.hadoop.hive.serde2.io.HiveDecimalWritable
at
org.apache.hadoop.hive.ql.io.parquet.VectorizedParquetInputFormat$VectorizedParquetRecordReader.next(VectorizedParquetInputFormat.java:136)
at
org.apache.hadoop.hive.ql.io.parquet.VectorizedParquetInputFormat$VectorizedParquetRecordReader.next(VectorizedParquetInputFormat.java:49)
at
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:347)
... 21 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unimplemented
vector assigner for writable type class
org.apache.hadoop.hive.serde2.io.HiveDecimalWritable
at
org.apache.hadoop.hive.ql.exec.vector.VectorColumnAssignFactory.buildAssigners(VectorColumnAssignFactory.java:528)
at
org.apache.hadoop.hive.ql.io.parquet.VectorizedParquetInputFormat$VectorizedParquetRecordReader.next(VectorizedParquetInputFormat.java:127)
... 23 more
{noformat}
Added code to fix that.
Then, I copied a half dozen q vectorized tests using ORC tables and tried
converted them to use PARQUET, but encountered another issue in
*non-vectorized* mode. I was trying to establish base query outputs that I
could use to verify the vectorized query output. This indicated a basic
non-vectorized case of CHAR data type usage wasn't working for PARQUET.
{noformat}
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to
org.apache.hadoop.hive.serde2.io.HiveCharWritable
at
org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:814)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
at
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
at
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493)
... 10 more
Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be
cast to org.apache.hadoop.hive.serde2.io.HiveCharWritable
at
org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableHiveCharObjectInspector.copyObject(WritableHiveCharObjectInspector.java:104)
at
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:305)
at
org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:150)
at
org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:142)
at
org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.copyKey(KeyWrapperFactory.java:119)
at
org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:827)
at
org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:739)
at
org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:809)
... 16 more
{noformat}
I filed this problem under HIVE-9371: Execution error for Parquet table and
GROUP BY involving CHAR data type
At that point we concluded we should temporarily disable vectorization of
PARQUET since there is only one test that doesn't provide complete coverage of
data types.
FYI: [~hagleitn]
> Turn off Parquet Vectorization until all data types work: DECIMAL, DATE,
> TIMESTAMP, CHAR, and VARCHAR
> -----------------------------------------------------------------------------------------------------
>
> Key: HIVE-9235
> URL: https://issues.apache.org/jira/browse/HIVE-9235
> Project: Hive
> Issue Type: Bug
> Components: Vectorization
> Reporter: Matt McCline
> Assignee: Matt McCline
> Priority: Critical
> Attachments: HIVE-9235.01.patch
>
>
> Title was: Make Parquet Vectorization of these data types work: DECIMAL,
> DATE, TIMESTAMP, CHAR, and VARCHAR
> Support for doing vector column assign is missing for some data types.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)