[ 
https://issues.apache.org/jira/browse/DRILL-3657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14698885#comment-14698885
 ] 

Sean Hsuan-Yi Chu commented on DRILL-3657:
------------------------------------------

The window function in the top window should not have pointed at $2, which 
represents a the output of the window function below.

> Wrong result with SUM(1) window function when multiple partitions are present
> -----------------------------------------------------------------------------
>
>                 Key: DRILL-3657
>                 URL: https://issues.apache.org/jira/browse/DRILL-3657
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>    Affects Versions: 1.2.0
>            Reporter: Victoria Markman
>            Assignee: Sean Hsuan-Yi Chu
>            Priority: Critical
>              Labels: window_function
>             Fix For: 1.2.0
>
>
> The common use case for this would be: SUM( CASE WHEN x=y THEN 1 ELSE 0 END)
> Wrong result:
> {code}
> 0: jdbc:drill:schema=dfs> select
> . . . . . . . . . . . . >         b2,
> . . . . . . . . . . . . >         c2,
> . . . . . . . . . . . . >         sum(1)  over(partition by b2 order by c2),
> . . . . . . . . . . . . >         sum(1)  over(partition by c2)
> . . . . . . . . . . . . > from
> . . . . . . . . . . . . >         t2
> . . . . . . . . . . . . > order by
> . . . . . . . . . . . . >         1,2;
> +--------+-------------+---------+---------+
> |   b2   |     c2      | EXPR$2  | EXPR$3  |
> +--------+-------------+---------+---------+
> | aaaaa  | 2015-01-01  | 1       | 1       |
> | bbbbb  | 2015-01-02  | 3       | 9       |
> | bbbbb  | 2015-01-02  | 3       | 9       |
> | bbbbb  | 2015-01-02  | 3       | 9       |
> | ccccc  | 2015-01-03  | 1       | 1       |
> | ddddd  | 2015-01-04  | 1       | 1       |
> | eeeee  | 2015-01-05  | 1       | 1       |
> | fffff  | 2015-01-06  | 1       | 1       |
> | ggggg  | 2015-01-07  | 2       | 4       |
> | ggggg  | 2015-01-07  | 2       | 4       |
> | hhhhh  | 2015-01-08  | 1       | 1       |
> | iiiii  | 2015-01-09  | 1       | 1       |
> | zzz    | 2014-12-31  | 1       | 1       |
> +--------+-------------+---------+---------+
> 13 rows selected (0.204 seconds)
> {code}
> Explain plan:
> {code}
> | 00-00    Screen
> 00-01      Project(b2=[$0], c2=[$1], EXPR$2=[$2], EXPR$3=[$3])
> 00-02        SelectionVectorRemover
> 00-03          Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC])
> 00-04            Window(window#0=[window(partition {1} order by [] range 
> between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [SUM($2)])])
> 00-05              SelectionVectorRemover
> 00-06                Sort(sort0=[$1], dir0=[ASC])
> 00-07                  Window(window#0=[window(partition {0} order by [1] 
> range between UNBOUNDED PRECEDING and CURRENT ROW aggs [SUM($2)])])
> 00-08                    SelectionVectorRemover
> 00-09                      Sort(sort0=[$0], sort1=[$1], dir0=[ASC], 
> dir1=[ASC])
> 00-10                        Project(b2=[$1], c2=[$0])
> 00-11                          Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=maprfs:///drill/testdata/subqueries/t2]], 
> selectionRoot=maprfs:/drill/testdata/subqueries/t2, numFiles=1, 
> columns=[`b2`, `c2`]]])
> {code}
> If you have a query with only one of these partitions, result is correct :
> {code}
> 0: jdbc:drill:schema=dfs> select
> . . . . . . . . . . . . >         b2,
> . . . . . . . . . . . . >         c2,
> . . . . . . . . . . . . >         sum(1)  over(partition by b2 order by c2)
> . . . . . . . . . . . . > from
> . . . . . . . . . . . . >         t2
> . . . . . . . . . . . . > order by
> . . . . . . . . . . . . >         1,2;
> +--------+-------------+---------+
> |   b2   |     c2      | EXPR$2  |
> +--------+-------------+---------+
> | aaaaa  | 2015-01-01  | 1       |
> | bbbbb  | 2015-01-02  | 3       |
> | bbbbb  | 2015-01-02  | 3       |
> | bbbbb  | 2015-01-02  | 3       |
> | ccccc  | 2015-01-03  | 1       |
> | ddddd  | 2015-01-04  | 1       |
> | eeeee  | 2015-01-05  | 1       |
> | fffff  | 2015-01-06  | 1       |
> | ggggg  | 2015-01-07  | 2       |
> | ggggg  | 2015-01-07  | 2       |
> | hhhhh  | 2015-01-08  | 1       |
> | iiiii  | 2015-01-09  | 1       |
> | zzz    | 2014-12-31  | 1       |
> +--------+-------------+---------+
> 13 rows selected (0.196 seconds)
> 0: jdbc:drill:schema=dfs> select
> . . . . . . . . . . . . >         b2,
> . . . . . . . . . . . . >         c2,
> . . . . . . . . . . . . >         sum(1)  over(partition by c2)
> . . . . . . . . . . . . > from
> . . . . . . . . . . . . >         t2
> . . . . . . . . . . . . > order by
> . . . . . . . . . . . . >         1,2;
> +--------+-------------+---------+
> |   b2   |     c2      | EXPR$2  |
> +--------+-------------+---------+
> | aaaaa  | 2015-01-01  | 1       |
> | bbbbb  | 2015-01-02  | 3       |
> | bbbbb  | 2015-01-02  | 3       |
> | bbbbb  | 2015-01-02  | 3       |
> | ccccc  | 2015-01-03  | 1       |
> | ddddd  | 2015-01-04  | 1       |
> | eeeee  | 2015-01-05  | 1       |
> | fffff  | 2015-01-06  | 1       |
> | ggggg  | 2015-01-07  | 2       |
> | ggggg  | 2015-01-07  | 2       |
> | hhhhh  | 2015-01-08  | 1       |
> | iiiii  | 2015-01-09  | 1       |
> | zzz    | 2014-12-31  | 1       |
> +--------+-------------+---------+
> 13 rows selected (0.179 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to