[ 
https://issues.apache.org/jira/browse/DRILL-3689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14740032#comment-14740032
 ] 

Khurram Faraaz commented on DRILL-3689:
---------------------------------------

I tried with data that was not close to boundary values of INT64 type, I did 
not see the difference in results with such data.

However, it will be good to inform user of the overflow error as soon as it is 
hit, rather than return wrong results, that way the user knows that there was 
an overflow error. Just returning results (in this case incorrect results) is 
not a good practice. Other systems like DB2 report an overflow error as soon as 
it is hit.

> incorrect results : aggregate AVG returns wrong results over results returned 
> by LEAD function.
> -----------------------------------------------------------------------------------------------
>
>                 Key: DRILL-3689
>                 URL: https://issues.apache.org/jira/browse/DRILL-3689
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Relational Operators
>    Affects Versions: 1.2.0
>         Environment:  private-branch 
> https://github.com/adeneche/incubator-drill/tree/new-window-funcs
>            Reporter: Khurram Faraaz
>            Assignee: Khurram Faraaz
>            Priority: Critical
>              Labels: window_function
>             Fix For: 1.2.0
>
>
> Aggregate AVG returns wrong results over results returned by LEAD function.
> results returned by Drill
> {code}
> 0: jdbc:drill:schema=dfs.tmp> SELECT  avg(lead_col1) FROM (SELECT LEAD(col1) 
> OVER(PARTITION BY col7 ORDER BY col1) lead_col1 , col7 FROM FEWRWSPQQ_101) 
> sub_query;
> +-------------------------+
> |         EXPR$0          |
> +-------------------------+
> | 2.35195986941647008E17  |
> +-------------------------+
> 1 row selected (0.264 seconds)
> {code}
> Explain plan for above query from Drill
> {code}
> 0: jdbc:drill:schema=dfs.tmp> explain plan for SELECT  avg(lead_col1) FROM 
> (SELECT LEAD(col1) OVER(PARTITION BY col7 ORDER BY col1) lead_col1 , col7 
> FROM FEWRWSPQQ_101) sub_query;
> +------+------+
> | text | json |
> +------+------+
> | 00-00    Screen
> 00-01      Project(EXPR$0=[$0])
> 00-02        Project(EXPR$0=[CAST(/(CastHigh(CASE(=($1, 0), null, $0)), 
> $1)):ANY NOT NULL])
> 00-03          StreamAgg(group=[{}], agg#0=[$SUM0($0)], agg#1=[COUNT($0)])
> 00-04            Project(w0$o0=[$3])
> 00-05              Window(window#0=[window(partition {2} order by [1] range 
> between UNBOUNDED PRECEDING and CURRENT ROW aggs [LEAD($1)])])
> 00-06                SelectionVectorRemover
> 00-07                  Sort(sort0=[$2], sort1=[$1], dir0=[ASC], dir1=[ASC])
> 00-08                    Project(T36¦¦*=[$0], col1=[$1], col7=[$2])
> 00-09                      Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=maprfs:///tmp/FEWRWSPQQ_101]], 
> selectionRoot=maprfs:/tmp/FEWRWSPQQ_101, numFiles=1, columns=[`*`]]])
> {code}
> results returned by Postgres
> {code}
> postgres=# SELECT  avg(lead_col1) FROM (SELECT LEAD(col1) OVER(PARTITION BY 
> col7 ORDER BY col1) lead_col1 , col7 FROM FEWRWSPQQ_101) sub_query;
>          avg         
> ---------------------
>  1157533190627124568
> (1 row)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to