> On June 26, 2017, 2:53 p.m., Barna Zsombor Klara wrote:
> > Thank you for the review Marta, I just had one comment.
> > Also do you think it makes sense to add tests for HoS as well or is this 
> > problem only MR related?

Thanks a lot for the review.
That's a good question. I don't think this issue is MR related, but at the 
moment we don't have blobstore q tests with Spark. We can create a new driver 
and tests, but it is a bigger work, so I think it would make sense to do it in 
the scope of an other Jira.


> On June 26, 2017, 2:53 p.m., Barna Zsombor Klara wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java
> > Lines 304 (patched)
> > <https://reviews.apache.org/r/60432/diff/1/?file=1762840#file1762840line304>
> >
> >     If this all depends on the *hive.blobstore.optimizations.enabled* could 
> > we use the HiveConf object and check that value instead of introducing a 
> > boolean instance variable?

It doesn't just depend on the hive.blobstore.optimizations.enabled parameter. 
There are other checks in the GenMapRedUtils.shouldMergeMovePaths method to 
decide whether or not the move work should be changed. Maybe I should rephrase 
the comment on my change.


- Marta


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60432/#review178891
-----------------------------------------------------------


On June 26, 2017, 1:56 p.m., Marta Kuczora wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60432/
> -----------------------------------------------------------
> 
> (Updated June 26, 2017, 1:56 p.m.)
> 
> 
> Review request for hive and Sergio Pena.
> 
> 
> Bugs: HIVE-16845
>     https://issues.apache.org/jira/browse/HIVE-16845
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> The following steps lead to the NPE in the 
> ConditionalResolverMergeFiles.generateActualTasks method:
> 
> In the GenMapRedUtils.createCondTask method, the tasks for the merge, move 
> and "merge and move" use cases are created and set as task list to the 
> ConditionalWork. Originally the moveOnlyMoveTask and the mergeAndMoveMoveTask 
> was created from the same moveWork, which was the dummyWork created like this 
> in the createMRWorkForMergingFiles method:
> 
>     MoveWork dummyMv = new MoveWork(null, null, null,
>          new LoadFileDesc(fsInputDesc.getFinalDirName(), finalName, true, 
> null, null), false);
> 
> 
> Then in the ConditionalResolverMergeFiles.generateActualTasks method we get 
> these tasks and use them to create result "resTsks" list.
> 
> For the "merge and move" use case, the code looks like this:
> 
>     if (toMove.size() > 0) {
>         resTsks.add(mrAndMvTask);
> 
>         MoveWork mvWork = (MoveWork) mvTask.getWork();
>         LoadFileDesc lfd = mvWork.getLoadFileWork();
>         
>         ...
>         
>         LoadMultiFilesDesc lmfd = new LoadMultiFilesDesc(toMove,
>             targetDirs, lfd.getIsDfsDir(), lfd.getColumns(), 
> lfd.getColumnTypes());
>         mvWork.setLoadFileWork(null);
>         mvWork.setLoadTableWork(null);
>         mvWork.setMultiFilesDesc(lmfd);
>       }
> 
> It adds the mrAndMvTask task to the resTsks list and modifies the move work 
> to move all necessary files in one-step. The mrAndMvTask contains a move task 
> as child task, which is the same as the mvWork work. 
> 
> With the blobstore optimization on, the moveOnlyMoveTask task is created from 
> a different move work, not from the dummyMoveWork as before:
> 
>     MoveWork workForMoveOnlyTask;
>     if (shouldMergeMovePaths) {
>       workForMoveOnlyTask = mergeMovePaths(condInputPath, 
> moveTaskToLink.getWork());
>     } else {
>       workForMoveOnlyTask = dummyMoveWork;
>     }
> 
>     ...
> 
>     Task<? extends Serializable> mergeOnlyMergeTask = 
> TaskFactory.get(mergeWork, conf);
>     Task<? extends Serializable> moveOnlyMoveTask = 
> TaskFactory.get(workForMoveOnlyTask, conf);
>     Task<? extends Serializable> mergeAndMoveMergeTask = 
> TaskFactory.get(mergeWork, conf);
>     Task<? extends Serializable> mergeAndMoveMoveTask = 
> TaskFactory.get(dummyMoveWork, conf);
> 
> Because of this the mvWork in the 
> ConditionalResolverMergeFiles.generateActualTasks method will also be 
> different. It has the LoadTableDesc variable set and not the LoadFileDesc, 
> that causes the NPE.
> 
> When the blobstore optimization is on and the move work is changed, we should 
> use the child move task of the mrAndMvTask in the generateActualTasks method, 
> instead of the mvTask. Not just to avoid the NPE, but because this is the 
> correct move task for the "merge and move" use case.
> 
> 
> Diffs
> -----
> 
>   
> itests/hive-blobstore/src/test/queries/clientpositive/insert_overwrite_dynamic_partitions_merge_move.q
>  PRE-CREATION 
>   
> itests/hive-blobstore/src/test/queries/clientpositive/insert_overwrite_dynamic_partitions_merge_only.q
>  PRE-CREATION 
>   
> itests/hive-blobstore/src/test/queries/clientpositive/insert_overwrite_dynamic_partitions_move_only.q
>  PRE-CREATION 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_dynamic_partitions_merge_move.q.out
>  PRE-CREATION 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_dynamic_partitions_merge_only.q.out
>  PRE-CREATION 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_dynamic_partitions_move_only.q.out
>  PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 88bf829 
>   
> ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java 
> 4266569 
> 
> 
> Diff: https://reviews.apache.org/r/60432/diff/1/
> 
> 
> Testing
> -------
> 
> Created q tests for the move-only, merge-only and move-and-merge use cases. 
> The tests do an "insert overwiew" with blobstore optimizations on and off.
> 
> 
> Thanks,
> 
> Marta Kuczora
> 
>

Reply via email to