[
https://issues.apache.org/jira/browse/HIVE-24199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Krisztian Kasa reassigned HIVE-24199:
-------------------------------------
> Incorrect result when subquey in exists contains limit
> ------------------------------------------------------
>
> Key: HIVE-24199
> URL: https://issues.apache.org/jira/browse/HIVE-24199
> Project: Hive
> Issue Type: Bug
> Reporter: Krisztian Kasa
> Assignee: Krisztian Kasa
> Priority: Major
>
> {code:java}
> create table web_sales (ws_order_number int, ws_warehouse_sk int) stored as
> orc;
> insert into web_sales values
> (1, 1),
> (1, 2),
> (2, 1),
> (2, 2);
> select * from web_sales ws1
> where exists (select 1 from web_sales ws2 where ws1.ws_order_number =
> ws2.ws_order_number limit 1);
> 1 1
> 1 2
> {code}
> {code:java}
> CBO PLAN:
> HiveSemiJoin(condition=[=($0, $2)], joinType=[semi])
> HiveProject(ws_order_number=[$0], ws_warehouse_sk=[$1])
> HiveFilter(condition=[IS NOT NULL($0)])
> HiveTableScan(table=[[default, web_sales]], table:alias=[ws1])
> HiveProject(ws_order_number=[$0])
> HiveSortLimit(fetch=[1]) <-- This shouldn't be added
> HiveProject(ws_order_number=[$0])
> HiveFilter(condition=[IS NOT NULL($0)])
> HiveTableScan(table=[[default, web_sales]], table:alias=[ws2])
> {code}
> Limit n on the right side of the join reduces the result set coming from the
> right to only n record hence not all the ws_order_number values are included
> which leads to correctness issue.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)