Victoria Markman created DRILL-3182: ---------------------------------------
Summary: Window function with DISTINCT qualifier returns seemingly incorrect result Key: DRILL-3182 URL: https://issues.apache.org/jira/browse/DRILL-3182 Project: Apache Drill Issue Type: Bug Components: Functions - Drill Affects Versions: 1.0.0 Reporter: Victoria Markman Assignee: Daniel Barclay (Drill) Both count(distinct <colname>) and count(all <colname>) return the same result. It does not look correct to me and I'm not sure what the correct behavior is going to be. (1) Latest postgres does not support distinct with Window functions: postgres=# select a2, count(distinct b2) over(partition by a2) from t2; ERROR: DISTINCT is not implemented for window functions LINE 1: select a2, count(distinct b2) over(partition by a2) from t2; ^ postgres=# select a2, avg(distinct a2) over(partition by a2) from t2; ERROR: DISTINCT is not implemented for window functions LINE 1: select a2, avg(distinct a2) over(partition by a2) from t2; (2) Calcite does not support this either: https://github.com/apache/incubator-calcite/blob/master/doc/reference.md Do we support it ? If not, I think we should throw an error ... {code} 0: jdbc:drill:schema=dfs> select * from t2; +-----+--------+-------------+ | a2 | b2 | c2 | +-----+--------+-------------+ | 0 | zzz | 2014-12-31 | | 1 | aaaaa | 2015-01-01 | | 2 | bbbbb | 2015-01-02 | | 2 | bbbbb | 2015-01-02 | | 2 | bbbbb | 2015-01-02 | | 3 | ccccc | 2015-01-03 | | 4 | ddddd | 2015-01-04 | | 5 | eeeee | 2015-01-05 | | 6 | fffff | 2015-01-06 | | 7 | ggggg | 2015-01-07 | | 7 | ggggg | 2015-01-07 | | 8 | hhhhh | 2015-01-08 | | 9 | iiiii | 2015-01-09 | +-----+--------+-------------+ 13 rows selected (0.134 seconds) 0: jdbc:drill:schema=dfs> select a2, count(distinct b2) over(partition by a2) from t2; +-----+---------+ | a2 | EXPR$1 | +-----+---------+ | 0 | 1 | | 1 | 1 | | 2 | 3 | | 2 | 3 | | 2 | 3 | | 3 | 1 | | 4 | 1 | | 5 | 1 | | 6 | 1 | | 7 | 2 | | 7 | 2 | | 8 | 1 | | 9 | 1 | +-----+---------+ 13 rows selected (0.224 seconds) 0: jdbc:drill:schema=dfs> select a2, count(b2) over(partition by a2) from t2; +-----+---------+ | a2 | EXPR$1 | +-----+---------+ | 0 | 1 | | 1 | 1 | | 2 | 3 | | 2 | 3 | | 2 | 3 | | 3 | 1 | | 4 | 1 | | 5 | 1 | | 6 | 1 | | 7 | 2 | | 7 | 2 | | 8 | 1 | | 9 | 1 | +-----+---------+ 13 rows selected (0.219 seconds) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)