[ https://issues.apache.org/jira/browse/DRILL-3689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14740032#comment-14740032 ]
Khurram Faraaz commented on DRILL-3689: --------------------------------------- I tried with data that was not close to boundary values of INT64 type, I did not see the difference in results with such data. However, it will be good to inform user of the overflow error as soon as it is hit, rather than return wrong results, that way the user knows that there was an overflow error. Just returning results (in this case incorrect results) is not a good practice. Other systems like DB2 report an overflow error as soon as it is hit. > incorrect results : aggregate AVG returns wrong results over results returned > by LEAD function. > ----------------------------------------------------------------------------------------------- > > Key: DRILL-3689 > URL: https://issues.apache.org/jira/browse/DRILL-3689 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators > Affects Versions: 1.2.0 > Environment: private-branch > https://github.com/adeneche/incubator-drill/tree/new-window-funcs > Reporter: Khurram Faraaz > Assignee: Khurram Faraaz > Priority: Critical > Labels: window_function > Fix For: 1.2.0 > > > Aggregate AVG returns wrong results over results returned by LEAD function. > results returned by Drill > {code} > 0: jdbc:drill:schema=dfs.tmp> SELECT avg(lead_col1) FROM (SELECT LEAD(col1) > OVER(PARTITION BY col7 ORDER BY col1) lead_col1 , col7 FROM FEWRWSPQQ_101) > sub_query; > +-------------------------+ > | EXPR$0 | > +-------------------------+ > | 2.35195986941647008E17 | > +-------------------------+ > 1 row selected (0.264 seconds) > {code} > Explain plan for above query from Drill > {code} > 0: jdbc:drill:schema=dfs.tmp> explain plan for SELECT avg(lead_col1) FROM > (SELECT LEAD(col1) OVER(PARTITION BY col7 ORDER BY col1) lead_col1 , col7 > FROM FEWRWSPQQ_101) sub_query; > +------+------+ > | text | json | > +------+------+ > | 00-00 Screen > 00-01 Project(EXPR$0=[$0]) > 00-02 Project(EXPR$0=[CAST(/(CastHigh(CASE(=($1, 0), null, $0)), > $1)):ANY NOT NULL]) > 00-03 StreamAgg(group=[{}], agg#0=[$SUM0($0)], agg#1=[COUNT($0)]) > 00-04 Project(w0$o0=[$3]) > 00-05 Window(window#0=[window(partition {2} order by [1] range > between UNBOUNDED PRECEDING and CURRENT ROW aggs [LEAD($1)])]) > 00-06 SelectionVectorRemover > 00-07 Sort(sort0=[$2], sort1=[$1], dir0=[ASC], dir1=[ASC]) > 00-08 Project(T36¦¦*=[$0], col1=[$1], col7=[$2]) > 00-09 Scan(groupscan=[ParquetGroupScan > [entries=[ReadEntryWithPath [path=maprfs:///tmp/FEWRWSPQQ_101]], > selectionRoot=maprfs:/tmp/FEWRWSPQQ_101, numFiles=1, columns=[`*`]]]) > {code} > results returned by Postgres > {code} > postgres=# SELECT avg(lead_col1) FROM (SELECT LEAD(col1) OVER(PARTITION BY > col7 ORDER BY col1) lead_col1 , col7 FROM FEWRWSPQQ_101) sub_query; > avg > --------------------- > 1157533190627124568 > (1 row) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)