[ 
https://issues.apache.org/jira/browse/HIVE-18875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16488047#comment-16488047
 ] 

Deepak Jaiswal commented on HIVE-18875:
---------------------------------------

[~hagleitn] thanks for looking into the patch. your questions are in italics.

 _The join operator should not have to check whether the parent is a group by - 
that seems brittle. Can we always force the logic to close other branches to 
flush out remaining records? If we introduce any other blocking operators the 
same logic has to apply, right?_

 

I agree it is brittle. The logic was put there to establish it is on reducer 
side so that we dont execute it otherwise. I think it will be simpler to put a 
flag somewhere or use some other existing info to establish the same info.

 

 _Not using the tag in the group by operator (hard code to 0) seems wrong, why 
is that the correct thing to do?_

 

The tag is irrelevant in GBY. There is no other use case of tag other than SMB 
as of now. There is always exactly one OI and SMB may send tag 1 or larger 
which causes ArrayIndexOutOfBoundsExcpetion.

 

 _Why are you turning sortmerge join conversion off explicitly in some test 
files? Can you explain + add comment there?_

 

Most of those tests are already testing SMB. The way I structured those tests 
is such that it first runs the query without SMB and then with SMB, however, 
since it is now on by default it needs to be explicitly turned off for those 
sections. 

subquery_notin.q had typo in it which I fixed after discussing it with 
[~vgarg]. The query which ran and explain before it were different.

 _If I read this right, then you're new check in ConvertJoinMapjoin basically 
makes sure that there is no projection in between gby and join that would alter 
bucketing or sorting. That is exactly what op traits are for - why can't we use 
that in this case?_

That would be ideal case.  It looks like op traits in its current form are not 
sufficient to handle all SMB cases. Maybe I can do it as a follow through?

> Enable SMB Join by default in Tez
> ---------------------------------
>
>                 Key: HIVE-18875
>                 URL: https://issues.apache.org/jira/browse/HIVE-18875
>             Project: Hive
>          Issue Type: Task
>            Reporter: Deepak Jaiswal
>            Assignee: Deepak Jaiswal
>            Priority: Major
>         Attachments: HIVE-18875.1.patch, HIVE-18875.2.patch, 
> HIVE-18875.3.patch, HIVE-18875.4.patch, HIVE-18875.5.patch, HIVE-18875.6.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to