[
https://issues.apache.org/jira/browse/HIVE-29174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18035294#comment-18035294
]
Krisztian Kasa commented on HIVE-29174:
---------------------------------------
I copied the repro steps to a q file and built Hive from the latest master
branch. I was not able to repro the issue.
I also got the same plan mentioned in the description, it shows that RS has
already the necessary sort keys:
{code}
Reduce Output Operator
key expressions: _col1 (type: string), _col2 (type:
string), _col3 (type: string), _col0 (type: string)
null sort order: zzzz
sort order: ++++
Map-reduce partition columns: _col1 (type: string), _col2
(type: string), _col3 (type: string)
{code}
I also run the repro steps using the Hive 4.1.0 docker image and I was not able
to repro the issue.
> count (distinct) from subquery DISTRIBUTE BY sort return error result
> ---------------------------------------------------------------------
>
> Key: HIVE-29174
> URL: https://issues.apache.org/jira/browse/HIVE-29174
> Project: Hive
> Issue Type: Bug
> Affects Versions: 3.1.0, 4.1.0, 4.0.1
> Reporter: zhaolong
> Priority: Critical
> Labels: correctness
> Attachments: image-2025-09-02-19-55-01-845.png
>
>
> create table zyj0715(shoujihaoma string ,msisdn_2 string,user_name
> string,certificate_code string);
>
> insert into zyj0715 values ('13920150169','10100000',null,null);
> insert into zyj0715 values ('13920157788','10100000',null,null);
> insert into zyj0715 values ('13920157788','10100000',null,null);
> insert into zyj0715 values ('13920150169','10100000',null,null);
> select count (distinct shoujihaoma) FROM(select * from zyj0715 DISTRIBUTE BY
> msisdn_2, user_name,certificate_code SORT BY shoujihaoma asc)t GROUP BY
> msisdn_2,user_name ,certificate_code;
>
> Expected Result:
> 2
>
> Actual Results:
> 3
>
> ReduceSinkOp should be sorted based on the _col1, _col2, _col3,_col0, field.
> Actually, only _col1, _col2, and _col3 are included. As a result, data is not
> sorted on the Reduce side, and the return result of count(distinct) is
> incorrect.
>
> explain:
> !image-2025-09-02-19-55-01-845.png!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)