Chiran Ravani created HIVE-23265:
------------------------------------
Summary: Duplicate rowsets are returned with Limit and Offset ste
Key: HIVE-23265
URL: https://issues.apache.org/jira/browse/HIVE-23265
Project: Hive
Issue Type: Bug
Components: HiveServer2, Vectorization
Affects Versions: 3.1.2, 3.1.0
Reporter: Chiran Ravani
Attachments: 000000_0
We have a query which produces duplicate results even when there is no
duplicate records in underlying tables.
Sample Query
{code:java}
select * from orderdatatest_ext order by col1 limit 1000,50
{code}
The problem appears when order by clause is used with col1 having non-unique
rows. Apparently the duplicates are being produced during reducer phase of the
query.
set hive.vectorized.execution.reduce.enabled=false does not cause the problem.
Data in table is as follows.
{code:java}
1,1
1,2
1,3
.
.
1,1500
{code}
Results with hive.vectorized.execution.reduce.enabled=true
{code:java}
+-------------------------+-------------------------+
| orderdatatest_ext.col1 | orderdatatest_ext.col2 |
+-------------------------+-------------------------+
| 1 | 1001 |
| 1 | 1002 |
| 1 | 1003 |
| 1 | 1004 |
| 1 | 1005 |
| 1 | 1006 |
| 1 | 1007 |
| 1 | 1008 |
| 1 | 1009 |
| 1 | 1010 |
| 1 | 1011 |
| 1 | 1012 |
| 1 | 1013 |
| 1 | 1014 |
| 1 | 1015 |
| 1 | 1016 |
| 1 | 1017 |
| 1 | 1018 |
| 1 | 1019 |
| 1 | 1020 |
| 1 | 1021 |
| 1 | 1022 |
| 1 | 1023 |
| 1 | 1024 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
+-------------------------+-------------------------+
{code}
Results with hive.vectorized.execution.reduce.enabled=false
{code:java}
+-------------------------+-------------------------+
| orderdatatest_ext.col1 | orderdatatest_ext.col2 |
+-------------------------+-------------------------+
| 1 | 1001 |
| 1 | 1002 |
| 1 | 1003 |
| 1 | 1004 |
| 1 | 1005 |
| 1 | 1006 |
| 1 | 1007 |
| 1 | 1008 |
| 1 | 1009 |
| 1 | 1010 |
| 1 | 1011 |
| 1 | 1012 |
| 1 | 1013 |
| 1 | 1014 |
| 1 | 1015 |
| 1 | 1016 |
| 1 | 1017 |
| 1 | 1018 |
| 1 | 1019 |
| 1 | 1020 |
| 1 | 1021 |
| 1 | 1022 |
| 1 | 1023 |
| 1 | 1024 |
| 1 | 1025 |
| 1 | 1026 |
| 1 | 1027 |
| 1 | 1028 |
| 1 | 1029 |
| 1 | 1030 |
| 1 | 1031 |
| 1 | 1032 |
| 1 | 1033 |
| 1 | 1034 |
| 1 | 1035 |
| 1 | 1036 |
| 1 | 1037 |
| 1 | 1038 |
| 1 | 1039 |
| 1 | 1040 |
| 1 | 1041 |
| 1 | 1042 |
| 1 | 1043 |
| 1 | 1044 |
| 1 | 1045 |
| 1 | 1046 |
| 1 | 1047 |
| 1 | 1048 |
| 1 | 1049 |
| 1 | 1050 |
+-------------------------+-------------------------+
{code}
Table DDL
{code:java}
CREATE EXTERNAL TABLE orderdatatest_ext (col1 int, col2 int) stored as orc
{code}
Attached sample ORC file.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
