[ 
https://issues.apache.org/jira/browse/HIVE-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13723407#comment-13723407
 ] 

Yin Huai commented on HIVE-4952:
--------------------------------

To fix this bug, Demux will be modified to be aware that rows associated with a 
key are ordered by the tag. When Demux see a row with new tag coming, it will 
know that rows with tags which are less than this incoming tag can be processed.

Taking the example in the description, with this fix, inputs of JOIN2 will be 
ordered by the tag. When Demux sees a tag with 1, it will ask GBY to process 
its buffer, and then GBY will ask JOIN1 to process its buffer. Before Demux 
forwards a new row with the tag of 1 to JOIN2, all rows with the tag of 0 will 
be forwarded into JOIN2.
                
> When hive.join.emit.interval is small, queries optimized by Correlation 
> Optimizer may generate wrong results
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-4952
>                 URL: https://issues.apache.org/jira/browse/HIVE-4952
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.12.0
>            Reporter: Yin Huai
>            Assignee: Yin Huai
>         Attachments: replay.txt
>
>
> If we have a query like this ...
> {code:sql}
> SELECT xx.key, xx.cnt, yy.key
> FROM
> (SELECT x.key as key, count(1) as cnt FROM src1 x JOIN src1 y ON (x.key = 
> y.key) group by x.key) xx
> JOIN src yy
> ON xx.key=yy.key;
> {\code}
> After Correlation Optimizer, the operator tree in the reducer will be 
> {code}
>      JOIN2
>        |
>        |
>       MUX
>      /   \
>     /     \
>    GBY     |
>     |      |
>   JOIN1    |
>     \     /
>      \   /
>      DEMUX
> {\code}
> For JOIN2, the right table will arrive at this operator first. If 
> hive.join.emit.interval is small, e.g. 1, JOIN2 will output the results even 
> it has not got any row from the left table. The logic related 
> hive.join.emit.interval in JoinOperator assumes that inputs will be ordered 
> by the tag. But, if a query has been optimized by Correlation Optimizer, this 
> assumption may not hold for those JoinOperators inside the reducer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to