[ https://issues.apache.org/jira/browse/HIVE-23230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Adesh Kumar Rao updated HIVE-23230: ----------------------------------- Status: Patch Available (was: Open) > "get_splits" udf ignores limit constraint while creating splits > --------------------------------------------------------------- > > Key: HIVE-23230 > URL: https://issues.apache.org/jira/browse/HIVE-23230 > Project: Hive > Issue Type: Bug > Components: HiveServer2 > Affects Versions: 3.1.0 > Reporter: Adesh Kumar Rao > Assignee: Adesh Kumar Rao > Priority: Major > Attachments: HIVE-23230.1.patch, HIVE-23230.2.patch, HIVE-23230.patch > > > Issue: Running the query {noformat}select * from <table> limit n{noformat} > from spark via hive warehouse connector may return more rows than "n". > This happens because "get_splits" udf creates splits ignoring the limit > constraint. These splits when submitted to multiple llap daemons will return > "n" rows each. > How to reproduce: Needs spark-shell, hive-warehouse-connector and hive on > llap with more that 1 llap daemons running. > run below commands via beeline to create and populate the table > > {noformat} > create table test (id int); > insert into table test values (1); > insert into table test values (2); > insert into table test values (3); > insert into table test values (4); > insert into table test values (5); > insert into table test values (6); > insert into table test values (7); > delete from test where id = 7;{noformat} > now running below query via spark-shell > {noformat} > import com.hortonworks.hwc.HiveWarehouseSession > val hive = HiveWarehouseSession.session(spark).build() > hive.executeQuery("select * from test limit 1").show() > {noformat} > will return more than 1 rows. -- This message was sent by Atlassian Jira (v8.3.4#803005)