zhihai xu created HIVE-14564:
--------------------------------
Summary: Column Pruning generates out of order columns in
SelectOperator which cause ArrayIndexOutOfBoundsException.
Key: HIVE-14564
URL: https://issues.apache.org/jira/browse/HIVE-14564
Project: Hive
Issue Type: Bug
Components: Query Planning
Affects Versions: 2.1.0
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Critical
Column Pruning generates out of order columns in SelectOperator which cause
ArrayIndexOutOfBoundsException.
{code}
2016-07-26 21:49:24,390 FATAL [main]
org.apache.hadoop.hive.ql.exec.mr.ExecMapper:
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
processing row
{"_col0":null,"_col1":0,"_col2":36,"_col3":"499ec44-6dd2-4709-a019-33d6d484ed90�\u0001U5�\u001c��\t\u001b�\u0000","_col4":"5264db53-d650-4678-9261-cdd51efab8bb","_col5":"cb5233dd-214a-4b0b-b43e-0f41befb5c5c","_col6":"","_col8":48,"_col9":null,"_col10":"1befb5c5c�\u00192016-06-09T15:31:15+00:00\u0002\u0005Rider\u0011svc-dash","_col11":64,"_col12":null,"_col13":null,"_col14":"ber.com�\u0001U5ߨP�\u0001U5ᷨider)
-
1000\u0005Rider\[email protected]�\u0001U4�;x�\u0001U5\u0004��\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000","_col15":"","_col16":null}
at
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:507)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.ArrayIndexOutOfBoundsException
at
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:397)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
at
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
at
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:497)
... 9 more
Caused by: java.lang.ArrayIndexOutOfBoundsException
at java.lang.System.arraycopy(Native Method)
at org.apache.hadoop.io.Text.set(Text.java:225)
at
org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryString.init(LazyBinaryString.java:48)
at
org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.uncheckedGetField(LazyBinaryStruct.java:264)
at
org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:201)
at
org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:64)
at
org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator._evaluate(ExprNodeColumnEvaluator.java:94)
at
org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
at
org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
at
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.makeValueWritable(ReduceSinkOperator.java:550)
at
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:377)
... 13 more
{code}
The exception is because the serialization and deserialization doesn't match.
The serialization by LazyBinarySerDe from previous MapReduce job used different
order of columns. When the current MapReduce job deserialized the intermediate
sequence file generated by previous MapReduce job, it will get corrupted data
from the deserialization using wrong order of columns by LazyBinaryStruct. The
unmatched columns between serialization and deserialization caused by
SelectOperator's Column Pruning {{ColumnPrunerSelectProc}}.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)