[
https://issues.apache.org/jira/browse/HIVE-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yin Huai updated HIVE-4952:
---------------------------
Status: Patch Available (was: Open)
> When hive.join.emit.interval is small, queries optimized by Correlation
> Optimizer may generate wrong results
> ------------------------------------------------------------------------------------------------------------
>
> Key: HIVE-4952
> URL: https://issues.apache.org/jira/browse/HIVE-4952
> Project: Hive
> Issue Type: Bug
> Affects Versions: 0.12.0
> Reporter: Yin Huai
> Assignee: Yin Huai
> Attachments: HIVE-4952.D11889.1.patch, replay.txt
>
>
> If we have a query like this ...
> {code:sql}
> SELECT xx.key, xx.cnt, yy.key
> FROM
> (SELECT x.key as key, count(1) as cnt FROM src1 x JOIN src1 y ON (x.key =
> y.key) group by x.key) xx
> JOIN src yy
> ON xx.key=yy.key;
> {\code}
> After Correlation Optimizer, the operator tree in the reducer will be
> {code}
> JOIN2
> |
> |
> MUX
> / \
> / \
> GBY |
> | |
> JOIN1 |
> \ /
> \ /
> DEMUX
> {\code}
> For JOIN2, the right table will arrive at this operator first. If
> hive.join.emit.interval is small, e.g. 1, JOIN2 will output the results even
> it has not got any row from the left table. The logic related
> hive.join.emit.interval in JoinOperator assumes that inputs will be ordered
> by the tag. But, if a query has been optimized by Correlation Optimizer, this
> assumption may not hold for those JoinOperators inside the reducer.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira