[ https://issues.apache.org/jira/browse/HIVE-21930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920717#comment-16920717 ]
Igor commented on HIVE-21930: ----------------------------- Any updates on this? > WINDOW COUNT DISTINCT return wrong value with PARTITION BY > ---------------------------------------------------------- > > Key: HIVE-21930 > URL: https://issues.apache.org/jira/browse/HIVE-21930 > Project: Hive > Issue Type: Bug > Components: PTF-Windowing > Affects Versions: 3.1.0 > Environment: Beeline version 3.1.0.3.0.1.0-187 by Apache Hive > Reporter: Igor > Priority: Major > Labels: distinct, window_funcion > > count(distinct a) over (partiton by b) return wring result. For example (T is > CTE here): > {code:java} > select p, day, ts > , row_number() OVER (PARTITION BY phone ORDER BY ts ASC) as line_number > , count(1) OVER (PARTITION BY phone ROWS BETWEEN UNBOUNDED PRECEDING AND > UNBOUNDED FOLLOWING) as lines > , count(distinct day) OVER (PARTITION BY phone ROWS BETWEEN UNBOUNDED > PRECEDING AND UNBOUNDED FOLLOWING) as days > FROM T{code} > WINDOW specification doesn't affect on results: same wrong with and without > window. > count(1) and count(distinct day) return the same result. Count distinct is > wrong. > > I've add size(collect_set(day) OVER (PARTITION BY phone)) as days2 and > count(distinct return correct result. > Following query return non-empty result: > {code:java} > select A.*, B.days, B. from ( > select p, day, ts > , row_number() OVER (PARTITION BY phone ORDER BY ts ASC) as line_number > , count(1) OVER (PARTITION BY p ROWS BETWEEN UNBOUNDED PRECEDING AND > UNBOUNDED FOLLOWING) as lines > , count(distinct day) OVER (PARTITION BY phone ROWS BETWEEN UNBOUNDED > PRECEDING AND UNBOUNDED FOLLOWING) as days > , size(collect_set(day) OVER (PARTITION BY phone)) as days2 > , dense_rank() over (partition by phone order by day) + dense_rank() over > (partition by phone order by day desc) - 1 as days3 > FROM T ) as A > join ( > select p, day, ts > , row_number() OVER (PARTITION BY phone ORDER BY ts ASC) as line_number > , count(1) OVER (PARTITION BY phone ROWS BETWEEN UNBOUNDED PRECEDING AND > UNBOUNDED FOLLOWING) as lines > , count(distinct day) OVER (PARTITION BY phone ROWS BETWEEN UNBOUNDED > PRECEDING AND UNBOUNDED FOLLOWING) as days > FROM T > ) as B on A.p=B.p and A.line_number=B.line_number > where A.days!=B.days > order by A.p, A.line_number > {code} > -- This message was sent by Atlassian Jira (v8.3.2#803003)