> On June 5, 2018, 3:21 a.m., Gunther Hagleitner wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/CommonMergeJoinOperator.java
> > Lines 172 (patched)
> > <https://reviews.apache.org/r/67296/diff/3/?file=2034850#file2034850line172>
> >
> >     It seems you fixed the same problem twice now. Once by fixing the close 
> > logic, and a second time with this. Did the close logic by itself not 
> > suffice?
> >     
> >     I prefer the fix in the join operator to be honest. For multiple 
> > reasons:
> >     
> >     a) This is a lot of new code.
> >     b) The code assumes a lot about surrounding operators, that can easily 
> > break when you add new code paths.
> >     c) Fixing it in the group by operator seems wrong. What if other 
> > operators flush on close? PTF? other joins? This seems brittle.
> >     
> >     Can you go back to the original exec fix? Was there an issue with it?

The fix in join op is not sufficient. I have multiple scenarios in which it 
broke that too with a lot of additional fixes (See Patch 11). The reason being, 
the join op is written with shuffle join in mind first where it is the top 
operator in the Reducer. If it pulls a row, it gets it immediately, whereas a 
reduce side SMB does not. A parent GBY Op has it and it depends on reading the 
next group for pushing the current group. This behavior differs from regular 
shuffle. The numerous fixes are just patching one case at a time which makes 
this code really ugly, bug prone and inefficient due to too much branching.

Inorder to fix this properly in Join Op, the correct fix would be to go back to 
drawing board and rewrite the entire thing.

a) The new code is very specific to SMB on reduce side and only applies to it. 
This makes sure it mimics the state machine assumed by shuffle join. Infact, I 
think the fix in close() may not be even needed, but I need to verify that.
b) The assumptions are very rigid. A lot of conditions have to be met inorder 
to set reduceSMB true, however, I think, we can do this at compile time when 
SMB Op is created right after DummyOp is created. If a new code path is added, 
it has to make sure existing ones don't break.
c) The rows are forwarded as they are read. Once a row is pushed out of GBY, it 
is pushed out until it reaches JoinOp. Good point about PTF, I think we need to 
do similar thing in PTF as well. There is no other known case for reduce side 
SMB.

As mentioned above, the fix before this is a giant pool of patches which can 
run clean on ptests but can break on very simple and small tests which I plan 
to add once this code is green lit.


> On June 5, 2018, 3:21 a.m., Gunther Hagleitner wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/CommonMergeJoinOperator.java
> > Lines 313 (patched)
> > <https://reviews.apache.org/r/67296/diff/3/?file=2034850#file2034850line319>
> >
> >     introducing method calls in the inner loop can have negative perf 
> > implications, are you sure this won't hurt?

Thanks for pointing this out, its a remnant of the earlier patch which needs to 
go away.


> On June 5, 2018, 3:21 a.m., Gunther Hagleitner wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
> > Lines 733 (patched)
> > <https://reviews.apache.org/r/67296/diff/3/?file=2034851#file2034851line733>
> >
> >     this adds another branch in the inner loop also. might have perf 
> > implications.

Sure, will find a way to avoid it.


> On June 5, 2018, 3:21 a.m., Gunther Hagleitner wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
> > Lines 906 (patched)
> > <https://reviews.apache.org/r/67296/diff/3/?file=2034851#file2034851line906>
> >
> >     see other comment. this is a lot of new code - and unnecessary if 
> > you've already fixed it in join.

Explained in 1st comment.


> On June 5, 2018, 3:21 a.m., Gunther Hagleitner wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java
> > Lines 614 (patched)
> > <https://reviews.apache.org/r/67296/diff/3/?file=2034852#file2034852line614>
> >
> >     I still don't think this code should be here. This seems to be doing 
> > the exact same thing as the "checkColEquality" below. If not can you tell 
> > me how this is different and why the calls below don't suffice? Otherwise 
> > let's remove this?

I will see if this code can be removed.


> On June 5, 2018, 3:21 a.m., Gunther Hagleitner wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/metainfo/annotation/OpTraitsRulesProcFactory.java
> > Lines 265 (patched)
> > <https://reviews.apache.org/r/67296/diff/3/?file=2034853#file2034853line265>
> >
> >     I don't think this comment should be here. In this class you should be 
> > explaining what the rules are not what certain situations you ran in on the 
> > execution side. Can you please remove?

I will rewrite it.


> On June 5, 2018, 3:21 a.m., Gunther Hagleitner wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/metainfo/annotation/OpTraitsRulesProcFactory.java
> > Lines 299 (patched)
> > <https://reviews.apache.org/r/67296/diff/3/?file=2034853#file2034853line299>
> >
> >     This is a nit, feel free to ignore: I don't think it's necessarily a 
> > "parent gby" that creates the bucketing. Just drop after "than".

Sure, thanks.


- Deepak


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67296/#review204301
-----------------------------------------------------------


On June 4, 2018, 5:38 a.m., Deepak Jaiswal wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67296/
> -----------------------------------------------------------
> 
> (Updated June 4, 2018, 5:38 a.m.)
> 
> 
> Review request for hive, Gunther Hagleitner and Jason Dere.
> 
> 
> Bugs: HIVE-18875
>     https://issues.apache.org/jira/browse/HIVE-18875
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Fixed various issues with SMB, mostly on the Reducer side join.
> GBY Op now uses inputObjectInspector[0] all the time as it is the only OI it 
> has. The tag is irrelevant here. Was causing problem with SMB.
> Disabled SMB in spark on hive tests as the same config for Tez was enabling 
> it there.
> Some SMB specific tests were designed to first run without SMB and then with 
> SMB. With SMB enabled by default, it is explicitely turned off to make sure 
> the behavior is maintained.
> 
> Please go through JIRA comments as they may clear out some questions.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 3295d1dbc5 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/CommonMergeJoinOperator.java 
> aefaa0586e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 4b766382ef 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java 
> 4019f132d3 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/metainfo/annotation/OpTraitsRulesProcFactory.java
>  9e5446566b 
>   ql/src/test/queries/clientpositive/auto_sortmerge_join_11.q 7416eb0ec0 
>   ql/src/test/queries/clientpositive/skewjoinopt19.q 02cadda7f5 
>   ql/src/test/queries/clientpositive/skewjoinopt20.q 160e5b82d9 
>   ql/src/test/queries/clientpositive/smb_mapjoin_11.q 6ce49b83c2 
>   ql/src/test/queries/clientpositive/smb_mapjoin_12.q 753e4d3c9a 
>   ql/src/test/queries/clientpositive/smb_mapjoin_17.q d68f5f3139 
>   ql/src/test/queries/clientpositive/subquery_notin.q 64940277bb 
>   ql/src/test/results/clientpositive/llap/correlationoptimizer2.q.out 
> 0f839ead0e 
>   ql/src/test/results/clientpositive/llap/correlationoptimizer6.q.out 
> 499ef4b178 
>   ql/src/test/results/clientpositive/llap/explainuser_1.q.out 0c339e5c8f 
>   ql/src/test/results/clientpositive/llap/limit_pushdown.q.out 76fae9a152 
>   ql/src/test/results/clientpositive/llap/mergejoin.q.out 832ed487ec 
>   ql/src/test/results/clientpositive/llap/mrr.q.out 737c73893f 
>   ql/src/test/results/clientpositive/llap/offset_limit_ppd_optimizer.q.out 
> 66460271b4 
>   ql/src/test/results/clientpositive/llap/smb_cache.q.out 7c885d1ffa 
>   ql/src/test/results/clientpositive/llap/smb_mapjoin_14.q.out c334b9386b 
>   ql/src/test/results/clientpositive/llap/smb_mapjoin_15.q.out 21aac455f2 
>   ql/src/test/results/clientpositive/llap/smb_mapjoin_4.q.out 4b8728fbff 
>   ql/src/test/results/clientpositive/llap/smb_mapjoin_5.q.out a1313696f0 
>   ql/src/test/results/clientpositive/llap/smb_mapjoin_6.q.out f44a0dbc70 
>   ql/src/test/results/clientpositive/llap/subquery_in_having.q.out c9956121f8 
>   ql/src/test/results/clientpositive/llap/subquery_notin.q.out d72e8c349c 
>   ql/src/test/results/clientpositive/llap/vectorized_bucketmapjoin1.q.out 
> 61c5051bb9 
>   ql/src/test/results/clientpositive/spark/bucketmapjoin1.q.out a79a8c466a 
>   ql/src/test/results/clientpositive/spark/smb_mapjoin_14.q.out 1fd4490ac4 
>   ql/src/test/results/clientpositive/spark/smb_mapjoin_15.q.out 6ca577fdbb 
>   ql/src/test/results/clientpositive/spark/smb_mapjoin_4.q.out 629a6c428a 
>   ql/src/test/results/clientpositive/spark/smb_mapjoin_5.q.out 7d0934010e 
>   ql/src/test/results/clientpositive/spark/smb_mapjoin_6.q.out 7445135159 
>   ql/src/test/results/clientpositive/spark/subquery_notin.q.out ea473c3b40 
> 
> 
> Diff: https://reviews.apache.org/r/67296/diff/3/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Deepak Jaiswal
> 
>

Reply via email to