-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60432/
-----------------------------------------------------------
(Updated July 13, 2017, 3:05 p.m.)
Review request for hive and Sergio Pena.
Bugs: HIVE-16845
https://issues.apache.org/jira/browse/HIVE-16845
Repository: hive-git
Description
-------
The following steps lead to the NPE in the
ConditionalResolverMergeFiles.generateActualTasks method:
In the GenMapRedUtils.createCondTask method, the tasks for the merge, move and
"merge and move" use cases are created and set as task list to the
ConditionalWork. Originally the moveOnlyMoveTask and the mergeAndMoveMoveTask
was created from the same moveWork, which was the dummyWork created like this
in the createMRWorkForMergingFiles method:
MoveWork dummyMv = new MoveWork(null, null, null,
new LoadFileDesc(fsInputDesc.getFinalDirName(), finalName, true, null,
null), false);
Then in the ConditionalResolverMergeFiles.generateActualTasks method we get
these tasks and use them to create result "resTsks" list.
For the "merge and move" use case, the code looks like this:
if (toMove.size() > 0) {
resTsks.add(mrAndMvTask);
MoveWork mvWork = (MoveWork) mvTask.getWork();
LoadFileDesc lfd = mvWork.getLoadFileWork();
...
LoadMultiFilesDesc lmfd = new LoadMultiFilesDesc(toMove,
targetDirs, lfd.getIsDfsDir(), lfd.getColumns(),
lfd.getColumnTypes());
mvWork.setLoadFileWork(null);
mvWork.setLoadTableWork(null);
mvWork.setMultiFilesDesc(lmfd);
}
It adds the mrAndMvTask task to the resTsks list and modifies the move work to
move all necessary files in one-step. The mrAndMvTask contains a move task as
child task, which is the same as the mvWork work.
With the blobstore optimization on, the moveOnlyMoveTask task is created from a
different move work, not from the dummyMoveWork as before:
MoveWork workForMoveOnlyTask;
if (shouldMergeMovePaths) {
workForMoveOnlyTask = mergeMovePaths(condInputPath,
moveTaskToLink.getWork());
} else {
workForMoveOnlyTask = dummyMoveWork;
}
...
Task<? extends Serializable> mergeOnlyMergeTask =
TaskFactory.get(mergeWork, conf);
Task<? extends Serializable> moveOnlyMoveTask =
TaskFactory.get(workForMoveOnlyTask, conf);
Task<? extends Serializable> mergeAndMoveMergeTask =
TaskFactory.get(mergeWork, conf);
Task<? extends Serializable> mergeAndMoveMoveTask =
TaskFactory.get(dummyMoveWork, conf);
Because of this the mvWork in the
ConditionalResolverMergeFiles.generateActualTasks method will also be
different. It has the LoadTableDesc variable set and not the LoadFileDesc, that
causes the NPE.
When the blobstore optimization is on and the move work is changed, we should
use the child move task of the mrAndMvTask in the generateActualTasks method,
instead of the mvTask. Not just to avoid the NPE, but because this is the
correct move task for the "merge and move" use case.
Diffs (updated)
-----
itests/hive-blobstore/src/test/queries/clientpositive/insert_overwrite_dynamic_partitions_merge_move.q
PRE-CREATION
itests/hive-blobstore/src/test/queries/clientpositive/insert_overwrite_dynamic_partitions_merge_only.q
PRE-CREATION
itests/hive-blobstore/src/test/queries/clientpositive/insert_overwrite_dynamic_partitions_move_only.q
PRE-CREATION
itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_dynamic_partitions_merge_move.q.out
PRE-CREATION
itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_dynamic_partitions_merge_only.q.out
PRE-CREATION
itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_dynamic_partitions_move_only.q.out
PRE-CREATION
ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java
4266569
Diff: https://reviews.apache.org/r/60432/diff/4/
Changes: https://reviews.apache.org/r/60432/diff/3-4/
Testing
-------
Created q tests for the move-only, merge-only and move-and-merge use cases. The
tests do an "insert overwiew" with blobstore optimizations on and off.
Thanks,
Marta Kuczora