[ https://issues.apache.org/jira/browse/HIVE-23230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Adesh Kumar Rao updated HIVE-23230: ----------------------------------- Description: Issue: Running the query {noformat}select * from <table> limit n{noformat} from spark via hive warehouse connector may return more rows than "n". This happens because "get_splits" udf creates splits ignoring the limit constraint. These splits when submitted to multiple llap daemons will return "n" rows each. How to reproduce: Needs spark-shell, hive-warehouse-connector and hive on llap with more that 1 llap daemons running. run below commands via beeline to create and populate the table {noformat} create table test (id int); insert into table test values (1); insert into table test values (2); insert into table test values (3); insert into table test values (4); insert into table test values (5); insert into table test values (6); insert into table test values (7); delete from test where id = 7;{noformat} now running below query via spark-shell {noformat} import com.hortonworks.hwc.HiveWarehouseSession val hive = HiveWarehouseSession.session(spark).build() hive.executeQuery("select * from test limit 1").show() {noformat} will return more than 1 rows. was: The issue is reproducible when number of llap daemons is greater than 1. How to reproduce: run below commands via beeline to create and populate the table {noformat} create table test (id int); insert into table test values (1); insert into table test values (2); insert into table test values (3); insert into table test values (4); insert into table test values (5); insert into table test values (6); insert into table test values (7); delete from test where id = 7;{noformat} now running below query via spark-shell {noformat} import com.hortonworks.hwc.HiveWarehouseSession val hive = HiveWarehouseSession.session(spark).build() hive.executeQuery("select * from test limit 1").show(){noformat} will return more than 1 rows. > "get_splits" udf ignores limit constraint while creating splits > --------------------------------------------------------------- > > Key: HIVE-23230 > URL: https://issues.apache.org/jira/browse/HIVE-23230 > Project: Hive > Issue Type: Bug > Components: HiveServer2 > Affects Versions: 3.1.0 > Reporter: Adesh Kumar Rao > Assignee: Adesh Kumar Rao > Priority: Major > Attachments: HIVE-23230.1.patch, HIVE-23230.patch > > > Issue: Running the query {noformat}select * from <table> limit n{noformat} > from spark via hive warehouse connector may return more rows than "n". > This happens because "get_splits" udf creates splits ignoring the limit > constraint. These splits when submitted to multiple llap daemons will return > "n" rows each. > How to reproduce: Needs spark-shell, hive-warehouse-connector and hive on > llap with more that 1 llap daemons running. > run below commands via beeline to create and populate the table > > {noformat} > create table test (id int); > insert into table test values (1); > insert into table test values (2); > insert into table test values (3); > insert into table test values (4); > insert into table test values (5); > insert into table test values (6); > insert into table test values (7); > delete from test where id = 7;{noformat} > now running below query via spark-shell > {noformat} > import com.hortonworks.hwc.HiveWarehouseSession > val hive = HiveWarehouseSession.session(spark).build() > hive.executeQuery("select * from test limit 1").show() > {noformat} > will return more than 1 rows. -- This message was sent by Atlassian Jira (v8.3.4#803005)