[ 
https://issues.apache.org/jira/browse/HIVE-23230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adesh Kumar Rao updated HIVE-23230:
-----------------------------------
    Description: 
Issue: Running the query {noformat}select * from <table> limit n{noformat} from 
spark via hive warehouse connector may return more rows than "n".

This happens because "get_splits" udf creates splits ignoring the limit 
constraint. These splits when submitted to multiple llap daemons will return 
"n" rows each.


How to reproduce: Needs spark-shell, hive-warehouse-connector and hive on llap 
with more that 1 llap daemons running.

run below commands via beeline to create and populate the table
 
{noformat}
create table test (id int);
insert into table test values (1);
insert into table test values (2);
insert into table test values (3);
insert into table test values (4);
insert into table test values (5);
insert into table test values (6);
insert into table test values (7);
delete from test where id = 7;{noformat}

now running below query via spark-shell

{noformat}
import com.hortonworks.hwc.HiveWarehouseSession 
val hive = HiveWarehouseSession.session(spark).build() 
hive.executeQuery("select * from test limit 1").show()
{noformat}

will return more than 1 rows.

  was:
The issue is reproducible when number of llap daemons is greater than 1.

 

How to reproduce:

run below commands via beeline to create and populate the table

 
{noformat}
create table test (id int);
insert into table test values (1);
insert into table test values (2);
insert into table test values (3);
insert into table test values (4);
insert into table test values (5);
insert into table test values (6);
insert into table test values (7);
delete from test where id = 7;{noformat}
now running below query via spark-shell
{noformat}
import com.hortonworks.hwc.HiveWarehouseSession 
val hive = HiveWarehouseSession.session(spark).build() 
hive.executeQuery("select * from test limit 1").show(){noformat}
will return more than 1 rows.


> "get_splits" udf ignores limit constraint while creating splits
> ---------------------------------------------------------------
>
>                 Key: HIVE-23230
>                 URL: https://issues.apache.org/jira/browse/HIVE-23230
>             Project: Hive
>          Issue Type: Bug
>          Components: HiveServer2
>    Affects Versions: 3.1.0
>            Reporter: Adesh Kumar Rao
>            Assignee: Adesh Kumar Rao
>            Priority: Major
>         Attachments: HIVE-23230.1.patch, HIVE-23230.patch
>
>
> Issue: Running the query {noformat}select * from <table> limit n{noformat} 
> from spark via hive warehouse connector may return more rows than "n".
> This happens because "get_splits" udf creates splits ignoring the limit 
> constraint. These splits when submitted to multiple llap daemons will return 
> "n" rows each.
> How to reproduce: Needs spark-shell, hive-warehouse-connector and hive on 
> llap with more that 1 llap daemons running.
> run below commands via beeline to create and populate the table
>  
> {noformat}
> create table test (id int);
> insert into table test values (1);
> insert into table test values (2);
> insert into table test values (3);
> insert into table test values (4);
> insert into table test values (5);
> insert into table test values (6);
> insert into table test values (7);
> delete from test where id = 7;{noformat}
> now running below query via spark-shell
> {noformat}
> import com.hortonworks.hwc.HiveWarehouseSession 
> val hive = HiveWarehouseSession.session(spark).build() 
> hive.executeQuery("select * from test limit 1").show()
> {noformat}
> will return more than 1 rows.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to