[ 
https://issues.apache.org/jira/browse/HIVE-21930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor updated HIVE-21930:
------------------------
    Description: 
count(distinct a) over (partiton by b) return wring result. For example (T is 
CTE here):
{code:java}
select p, day, ts
, row_number() OVER (PARTITION BY phone ORDER BY ts ASC) as line_number
, count(1) OVER (PARTITION BY phone ROWS BETWEEN UNBOUNDED PRECEDING AND 
UNBOUNDED FOLLOWING) as lines
, count(distinct day) OVER (PARTITION BY phone ROWS BETWEEN UNBOUNDED PRECEDING 
AND UNBOUNDED FOLLOWING) as days
FROM T{code}
 WINDOW specification doesn't affect on results: same wrong with and without 
window.

count(1) and count(distinct day) return the same result. Count distinct is 
wrong.

 

I've add size(collect_set(day) OVER (PARTITION BY phone)) as days2 and 
count(distinct return correct result.

Following query return non-empty result:
{code:java}
select A.*, B.days, B. from (
select p, day, ts
, row_number() OVER (PARTITION BY phone ORDER BY ts ASC) as line_number
, count(1) OVER (PARTITION BY p ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED 
FOLLOWING) as lines
, count(distinct day) OVER (PARTITION BY phone ROWS BETWEEN UNBOUNDED PRECEDING 
AND UNBOUNDED FOLLOWING) as days
, size(collect_set(day) OVER (PARTITION BY phone)) as days2
, dense_rank() over (partition by phone order by day) + dense_rank() over 
(partition by phone order by day desc) - 1 as days3
FROM T ) as A 
join (
select p, day, ts
, row_number() OVER (PARTITION BY phone ORDER BY ts ASC) as line_number
, count(1) OVER (PARTITION BY phone ROWS BETWEEN UNBOUNDED PRECEDING AND 
UNBOUNDED FOLLOWING) as lines
, count(distinct day) OVER (PARTITION BY phone ROWS BETWEEN UNBOUNDED PRECEDING 
AND UNBOUNDED FOLLOWING) as days
FROM T
) as B on A.p=B.p and A.line_number=B.line_number
where A.days!=B.days
order by A.p, A.line_number
{code}
 

  was:
count(distinct a) over (partiton by b) return wring result. For example:
{code:java}
select p, day, ts
, row_number() OVER (PARTITION BY phone ORDER BY ts ASC) as line_number
, count(1) OVER (PARTITION BY phone ROWS BETWEEN UNBOUNDED PRECEDING AND 
UNBOUNDED FOLLOWING) as lines
, count(distinct day) OVER (PARTITION BY phone ROWS BETWEEN UNBOUNDED PRECEDING 
AND UNBOUNDED FOLLOWING) as days
FROM T{code}
 WINDOW specification doesn't affect on results: same wrong with and without 
window.

count(1) and count(distinct day) return the same result. Count distinct is 
wrong.

 

I've add size(collect_set(day) OVER (PARTITION BY phone)) as days2 and 
count(distinct return correct result.

Following query return non-empty result:
{code:java}
select A.*, B.days, B. from (
select p, day, ts
, row_number() OVER (PARTITION BY phone ORDER BY ts ASC) as line_number
, count(1) OVER (PARTITION BY p ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED 
FOLLOWING) as lines
, count(distinct day) OVER (PARTITION BY phone ROWS BETWEEN UNBOUNDED PRECEDING 
AND UNBOUNDED FOLLOWING) as days
, size(collect_set(day) OVER (PARTITION BY phone)) as days2
, dense_rank() over (partition by phone order by day) + dense_rank() over 
(partition by phone order by day desc) - 1 as days3
FROM T ) as A 
join (
select p, day, ts
, row_number() OVER (PARTITION BY phone ORDER BY ts ASC) as line_number
, count(1) OVER (PARTITION BY phone ROWS BETWEEN UNBOUNDED PRECEDING AND 
UNBOUNDED FOLLOWING) as lines
, count(distinct day) OVER (PARTITION BY phone ROWS BETWEEN UNBOUNDED PRECEDING 
AND UNBOUNDED FOLLOWING) as days
FROM T
) as B on A.p=B.p and A.line_number=B.line_number
where A.days!=B.days
order by A.p, A.line_number
{code}
 


> WINDOW COUNT DISTINCT return wrong value with PARTITION BY
> ----------------------------------------------------------
>
>                 Key: HIVE-21930
>                 URL: https://issues.apache.org/jira/browse/HIVE-21930
>             Project: Hive
>          Issue Type: Bug
>          Components: PTF-Windowing
>    Affects Versions: 3.1.0
>         Environment: Beeline version 3.1.0.3.0.1.0-187 by Apache Hive
>            Reporter: Igor
>            Priority: Major
>              Labels: distinct, window_funcion
>
> count(distinct a) over (partiton by b) return wring result. For example (T is 
> CTE here):
> {code:java}
> select p, day, ts
> , row_number() OVER (PARTITION BY phone ORDER BY ts ASC) as line_number
> , count(1) OVER (PARTITION BY phone ROWS BETWEEN UNBOUNDED PRECEDING AND 
> UNBOUNDED FOLLOWING) as lines
> , count(distinct day) OVER (PARTITION BY phone ROWS BETWEEN UNBOUNDED 
> PRECEDING AND UNBOUNDED FOLLOWING) as days
> FROM T{code}
>  WINDOW specification doesn't affect on results: same wrong with and without 
> window.
> count(1) and count(distinct day) return the same result. Count distinct is 
> wrong.
>  
> I've add size(collect_set(day) OVER (PARTITION BY phone)) as days2 and 
> count(distinct return correct result.
> Following query return non-empty result:
> {code:java}
> select A.*, B.days, B. from (
> select p, day, ts
> , row_number() OVER (PARTITION BY phone ORDER BY ts ASC) as line_number
> , count(1) OVER (PARTITION BY p ROWS BETWEEN UNBOUNDED PRECEDING AND 
> UNBOUNDED FOLLOWING) as lines
> , count(distinct day) OVER (PARTITION BY phone ROWS BETWEEN UNBOUNDED 
> PRECEDING AND UNBOUNDED FOLLOWING) as days
> , size(collect_set(day) OVER (PARTITION BY phone)) as days2
> , dense_rank() over (partition by phone order by day) + dense_rank() over 
> (partition by phone order by day desc) - 1 as days3
> FROM T ) as A 
> join (
> select p, day, ts
> , row_number() OVER (PARTITION BY phone ORDER BY ts ASC) as line_number
> , count(1) OVER (PARTITION BY phone ROWS BETWEEN UNBOUNDED PRECEDING AND 
> UNBOUNDED FOLLOWING) as lines
> , count(distinct day) OVER (PARTITION BY phone ROWS BETWEEN UNBOUNDED 
> PRECEDING AND UNBOUNDED FOLLOWING) as days
> FROM T
> ) as B on A.p=B.p and A.line_number=B.line_number
> where A.days!=B.days
> order by A.p, A.line_number
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to