[ 
https://issues.apache.org/jira/browse/DRILL-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Victoria Markman updated DRILL-3680:
------------------------------------
    Fix Version/s: 1.2.0

> window function query returns Incorrect results 
> ------------------------------------------------
>
>                 Key: DRILL-3680
>                 URL: https://issues.apache.org/jira/browse/DRILL-3680
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Flow
>    Affects Versions: 1.2.0
>         Environment: private-branch 
> https://github.com/adeneche/incubator-drill/tree/new-window-funcs
>            Reporter: Khurram Faraaz
>            Assignee: Deneche A. Hakim
>            Priority: Critical
>              Labels: window_function
>             Fix For: 1.2.0
>
>         Attachments: tblWnulls.parquet
>
>
> Query plan from Drill for the query that returns wrong results
> {code}
> 0: jdbc:drill:schema=dfs.tmp> explain plan for select c1 , c2 , lead(c2) OVER 
> ( PARTITION BY c2 ORDER BY c1) lead_c2 FROM (SELECT c1 , c2, ntile(3) 
> over(PARTITION BY c2 ORDER BY c1) FROM `tblWnulls.parquet`);
> +------+------+
> | text | json |
> +------+------+
> | 00-00    Screen
> 00-01      Project(c1=[$0], c2=[$1], lead_c2=[$2])
> 00-02        Project(c1=[$0], c2=[$1], lead_c2=[$2])
> 00-03          Project(c1=[$0], c2=[$1], $2=[$3])
> 00-04            Window(window#0=[window(partition {1} order by [0] range 
> between UNBOUNDED PRECEDING and CURRENT ROW aggs [LEAD($1)])])
> 00-05              Window(window#0=[window(partition {1} order by [0] range 
> between UNBOUNDED PRECEDING and CURRENT ROW aggs [NTILE($2)])])
> 00-06                SelectionVectorRemover
> 00-07                  Sort(sort0=[$1], sort1=[$0], dir0=[ASC], dir1=[ASC])
> 00-08                    Project(c1=[$1], c2=[$0])
> 00-09                      Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=maprfs:///tmp/tblWnulls.parquet]], 
> selectionRoot=maprfs:/tmp/tblWnulls.parquet, numFiles=1, columns=[`c1`, 
> `c2`]]])
> {code}
> Results returned by Drill.
> {code}
> 0: jdbc:drill:schema=dfs.tmp> select c1 , c2 , lead(c2) OVER ( PARTITION BY 
> c2 ORDER BY c1) lead_c2 FROM (SELECT c1 , c2, ntile(3) over(PARTITION BY c2 
> ORDER BY c1) FROM `tblWnulls.parquet`);
> +-------------+-------+----------+
> |     c1      |  c2   | lead_c2  |
> +-------------+-------+----------+
> | 0           | a     | null     |
> | 1           | a     | null     |
> | 5           | a     | null     |
> | 10          | a     | null     |
> | 11          | a     | null     |
> | 14          | a     | null     |
> | 11111       | a     | null     |
> | 2           | b     | null     |
> | 9           | b     | null     |
> | 13          | b     | null     |
> | 17          | b     | null     |
> | 4           | c     | null     |
> | 6           | c     | null     |
> | 8           | c     | null     |
> | 12          | c     | null     |
> | 13          | c     | null     |
> | 13          | c     | null     |
> | null        | c     | null     |
> | 10          | d     | null     |
> | 11          | d     | null     |
> | 2147483647  | d     | null     |
> | 2147483647  | d     | null     |
> | null        | d     | null     |
> | null        | d     | null     |
> | -1          | e     | null     |
> | 15          | e     | null     |
> | 19          | null  | null     |
> | 65536       | null  | null     |
> | 1000000     | null  | null     |
> | null        | null  | null     |
> +-------------+-------+----------+
> 30 rows selected (0.339 seconds)
> {code}
> Results returned by Postgres
> {code}
> postgres=# select c1 , c2 , lead(c2) OVER ( PARTITION BY c2 ORDER BY c1) 
> lead_c2 FROM (SELECT c1 , c2, ntile(3) over(PARTITION BY c2 ORDER BY c1) FROM 
> t222) sub_query;
>      c1     | c2 | lead_c2 
> ------------+----+---------
>           0 | a  | a
>           1 | a  | a
>           5 | a  | a
>          10 | a  | a
>          11 | a  | a
>          14 | a  | a
>       11111 | a  | 
>           2 | b  | b
>           9 | b  | b
>          13 | b  | b
>          17 | b  | 
>           4 | c  | c
>           6 | c  | c
>           8 | c  | c
>          12 | c  | c
>          13 | c  | c
>          13 | c  | c
>             | c  | 
>          10 | d  | d
>          11 | d  | d
>  2147483647 | d  | d
>  2147483647 | d  | d
>             | d  | d
>             | d  | 
>          -1 | e  | e
>          15 | e  | 
>          19 |    | 
>       65536 |    | 
>     1000000 |    | 
>             |    | 
> (30 rows)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to