[
https://issues.apache.org/jira/browse/HIVE-29174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18033889#comment-18033889
]
Denys Kuzmenko commented on HIVE-29174:
---------------------------------------
cc [~kkasa] , [~zabetak]
> count (distinct) from subquery DISTRIBUTE BY sort return error result
> ---------------------------------------------------------------------
>
> Key: HIVE-29174
> URL: https://issues.apache.org/jira/browse/HIVE-29174
> Project: Hive
> Issue Type: Bug
> Affects Versions: 3.1.0, 4.1.0, 4.0.1
> Reporter: zhaolong
> Priority: Critical
> Attachments: image-2025-09-02-19-55-01-845.png
>
>
> create table zyj0715(shoujihaoma string ,msisdn_2 string,user_name
> string,certificate_code string);
>
> insert into zyj0715 values ('13920150169','10100000',null,null);
> insert into zyj0715 values ('13920157788','10100000',null,null);
> insert into zyj0715 values ('13920157788','10100000',null,null);
> insert into zyj0715 values ('13920150169','10100000',null,null);
> select count (distinct shoujihaoma) FROM(select * from zyj0715 DISTRIBUTE BY
> msisdn_2, user_name,certificate_code SORT BY shoujihaoma asc)t GROUP BY
> msisdn_2,user_name ,certificate_code;
>
> Expected Result:
> 2
>
> Actual Results:
> 3
>
> ReduceSinkOp should be sorted based on the _col1, _col2, _col3,_col0, field.
> Actually, only _col1, _col2, and _col3 are included. As a result, data is not
> sorted on the Reduce side, and the return result of count(distinct) is
> incorrect.
>
> explain:
> !image-2025-09-02-19-55-01-845.png!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)