zhihai xu created HIVE-16368:
--------------------------------
Summary: Unexpected java.lang.ArrayIndexOutOfBoundsException from
query with LaterView Operation for hive on MR.
Key: HIVE-16368
URL: https://issues.apache.org/jira/browse/HIVE-16368
Project: Hive
Issue Type: Bug
Components: Query Planning
Reporter: zhihai xu
Assignee: zhihai xu
Unexpected java.lang.ArrayIndexOutOfBoundsException from query. It happened in
LaterView Operation. It happened for hive-on-mr. The reason is because the
column prune change the column order in LaterView operation, for back-back
reducesink operators using MR engine, FileSinkOperator and TableScanOperator
are added before the second ReduceSink operator, The serialization column order
used by FileSinkOperator in LazyBinarySerDe of previous reducer is different
from deserialization column order from table desc used by
MapOperator/TableScanOperator in LazyBinarySerDe of current failed mapper.
The serialization is decided by the outputObjInspector from
LateralViewJoinOperator,
{code}
ArrayList<String> fieldNames = conf.getOutputInternalColNames();
outputObjInspector = ObjectInspectorFactory
.getStandardStructObjectInspector(fieldNames, ois);
{code}
So the column order for serialization is decided by getOutputInternalColNames
in LateralViewJoinOperator.
The deserialization is decided by TableScanOperator which is created at
GenMapRedUtils.splitTasks.
{code}
TableDesc tt_desc = PlanUtils.getIntermediateFileTableDesc(PlanUtils
.getFieldSchemasFromRowSchema(parent.getSchema(), "temporarycol"));
// Create the temporary file, its corresponding FileSinkOperaotr, and
// its corresponding TableScanOperator.
TableScanOperator tableScanOp =
createTemporaryFile(parent, op, taskTmpDir, tt_desc, parseCtx);
{code}
The column order for deserialization is decided by rowSchema of
LateralViewJoinOperator.
But ColumnPrunerLateralViewJoinProc changed the order of outputInternalColNames
but still keep the original order of rowSchema,
Which cause the mismatch between serialization and deserialization for two
back-to-back MR jobs.
Similar issue for ColumnPrunerLateralViewForwardProc which change the column
order of its child selector colList but not rowSchema.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)