[jira] [Commented] (DRILL-3298) Wrong result with SUM window function and order by without partition by in the OVER clause
[ https://issues.apache.org/jira/browse/DRILL-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605890#comment-14605890 ] Victoria Markman commented on DRILL-3298: - Verified fixed in: 1.1 Test: Passing/window_functions/bugs/drill-3298.sql Wrong result with SUM window function and order by without partition by in the OVER clause -- Key: DRILL-3298 URL: https://issues.apache.org/jira/browse/DRILL-3298 Project: Apache Drill Issue Type: Bug Components: Execution - Flow Affects Versions: 1.0.0 Reporter: Victoria Markman Assignee: Aman Sinha Priority: Critical Labels: window_function Fix For: 1.1.0 Attachments: DRILL-3298.1.patch.txt, j1.tar, test.res This query returns incorrect result when planner.slice_target = 1 {code} select j1.c_integer, sum(j1.c_integer) over w from j1 window w as (order by c_integer desc) order by 1, 2; {code} Query plan with planner.slice_target = 1 {noformat} 00-01 Project(c_integer=[$0], EXPR$1=[$1]) 00-02SingleMergeExchange(sort0=[0 ASC], sort1=[1 ASC]) 01-01 SelectionVectorRemover 01-02Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) 01-03 Project(c_integer=[$0], EXPR$1=[$1]) 01-04HashToRandomExchange(dist0=[[$0]], dist1=[[$1]]) 02-01 UnorderedMuxExchange 03-01Project(c_integer=[$0], EXPR$1=[$1], E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($1, hash64AsDouble($0)))]) 03-02 Project(c_integer=[$0], EXPR$1=[CASE(($1, 0), CAST($2):ANY, null)]) 03-03Window(window#0=[window(partition {} order by [0 DESC] range between UNBOUNDED PRECEDING and CURRENT ROW aggs [COUNT($0), $SUM0($0)])]) 03-04 SelectionVectorRemover 03-05Sort(sort0=[$0], dir0=[DESC]) 03-06 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///drill/testdata/subqueries/j1]], selectionRoot=/drill/testdata/subqueries/j1, numFiles=1, columns=[`c_integer`]]]) {noformat} Query plan with planner.slice_target = 10; {noformat} 00-01 Project(c_integer=[$0], EXPR$1=[$1]) 00-02SelectionVectorRemover 00-03 Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) 00-04Project(c_integer=[$0], EXPR$1=[CASE(($1, 0), CAST($2):ANY, null)]) 00-05 Window(window#0=[window(partition {} order by [0 DESC] range between UNBOUNDED PRECEDING and CURRENT ROW aggs [COUNT($0), $SUM0($0)])]) 00-06SelectionVectorRemover 00-07 Sort(sort0=[$0], dir0=[DESC]) 00-08Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///drill/testdata/subqueries/j1]], selectionRoot=/drill/testdata/subqueries/j1, numFiles=1, columns=[`c_integer`]]]) {noformat} Attached: * table j1 * test.res - result generated with postgres -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3298) Wrong result with SUM window function and order by without partition by in the OVER clause
[ https://issues.apache.org/jira/browse/DRILL-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14590186#comment-14590186 ] Aman Sinha commented on DRILL-3298: --- Doing the singleton trait for the child of window operator will work functionally; the only problem is it will do the entire Sort on a single node/slice. Ideally, we would want to do a parallel local sort followed by a SingleMergeExchange and provide the output of that into the Window operator. Wrong result with SUM window function and order by without partition by in the OVER clause -- Key: DRILL-3298 URL: https://issues.apache.org/jira/browse/DRILL-3298 Project: Apache Drill Issue Type: Bug Components: Execution - Flow Affects Versions: 1.0.0 Reporter: Victoria Markman Assignee: Deneche A. Hakim Priority: Critical Labels: window_function Fix For: 1.1.0 Attachments: DRILL-3298.1.patch.txt, j1.tar, test.res This query returns incorrect result when planner.slice_target = 1 {code} select j1.c_integer, sum(j1.c_integer) over w from j1 window w as (order by c_integer desc) order by 1, 2; {code} Query plan with planner.slice_target = 1 {noformat} 00-01 Project(c_integer=[$0], EXPR$1=[$1]) 00-02SingleMergeExchange(sort0=[0 ASC], sort1=[1 ASC]) 01-01 SelectionVectorRemover 01-02Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) 01-03 Project(c_integer=[$0], EXPR$1=[$1]) 01-04HashToRandomExchange(dist0=[[$0]], dist1=[[$1]]) 02-01 UnorderedMuxExchange 03-01Project(c_integer=[$0], EXPR$1=[$1], E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($1, hash64AsDouble($0)))]) 03-02 Project(c_integer=[$0], EXPR$1=[CASE(($1, 0), CAST($2):ANY, null)]) 03-03Window(window#0=[window(partition {} order by [0 DESC] range between UNBOUNDED PRECEDING and CURRENT ROW aggs [COUNT($0), $SUM0($0)])]) 03-04 SelectionVectorRemover 03-05Sort(sort0=[$0], dir0=[DESC]) 03-06 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///drill/testdata/subqueries/j1]], selectionRoot=/drill/testdata/subqueries/j1, numFiles=1, columns=[`c_integer`]]]) {noformat} Query plan with planner.slice_target = 10; {noformat} 00-01 Project(c_integer=[$0], EXPR$1=[$1]) 00-02SelectionVectorRemover 00-03 Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) 00-04Project(c_integer=[$0], EXPR$1=[CASE(($1, 0), CAST($2):ANY, null)]) 00-05 Window(window#0=[window(partition {} order by [0 DESC] range between UNBOUNDED PRECEDING and CURRENT ROW aggs [COUNT($0), $SUM0($0)])]) 00-06SelectionVectorRemover 00-07 Sort(sort0=[$0], dir0=[DESC]) 00-08Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///drill/testdata/subqueries/j1]], selectionRoot=/drill/testdata/subqueries/j1, numFiles=1, columns=[`c_integer`]]]) {noformat} Attached: * table j1 * test.res - result generated with postgres -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3298) Wrong result with SUM window function and order by without partition by in the OVER clause
[ https://issues.apache.org/jira/browse/DRILL-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14590887#comment-14590887 ] Aman Sinha commented on DRILL-3298: --- I have a uploaded a patch for this to review board: https://reviews.apache.org/r/35584/ [~jni] could you pls review ? I am still in the process of running all regression tests. Wrong result with SUM window function and order by without partition by in the OVER clause -- Key: DRILL-3298 URL: https://issues.apache.org/jira/browse/DRILL-3298 Project: Apache Drill Issue Type: Bug Components: Execution - Flow Affects Versions: 1.0.0 Reporter: Victoria Markman Assignee: Aman Sinha Priority: Critical Labels: window_function Fix For: 1.1.0 Attachments: DRILL-3298.1.patch.txt, j1.tar, test.res This query returns incorrect result when planner.slice_target = 1 {code} select j1.c_integer, sum(j1.c_integer) over w from j1 window w as (order by c_integer desc) order by 1, 2; {code} Query plan with planner.slice_target = 1 {noformat} 00-01 Project(c_integer=[$0], EXPR$1=[$1]) 00-02SingleMergeExchange(sort0=[0 ASC], sort1=[1 ASC]) 01-01 SelectionVectorRemover 01-02Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) 01-03 Project(c_integer=[$0], EXPR$1=[$1]) 01-04HashToRandomExchange(dist0=[[$0]], dist1=[[$1]]) 02-01 UnorderedMuxExchange 03-01Project(c_integer=[$0], EXPR$1=[$1], E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($1, hash64AsDouble($0)))]) 03-02 Project(c_integer=[$0], EXPR$1=[CASE(($1, 0), CAST($2):ANY, null)]) 03-03Window(window#0=[window(partition {} order by [0 DESC] range between UNBOUNDED PRECEDING and CURRENT ROW aggs [COUNT($0), $SUM0($0)])]) 03-04 SelectionVectorRemover 03-05Sort(sort0=[$0], dir0=[DESC]) 03-06 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///drill/testdata/subqueries/j1]], selectionRoot=/drill/testdata/subqueries/j1, numFiles=1, columns=[`c_integer`]]]) {noformat} Query plan with planner.slice_target = 10; {noformat} 00-01 Project(c_integer=[$0], EXPR$1=[$1]) 00-02SelectionVectorRemover 00-03 Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) 00-04Project(c_integer=[$0], EXPR$1=[CASE(($1, 0), CAST($2):ANY, null)]) 00-05 Window(window#0=[window(partition {} order by [0 DESC] range between UNBOUNDED PRECEDING and CURRENT ROW aggs [COUNT($0), $SUM0($0)])]) 00-06SelectionVectorRemover 00-07 Sort(sort0=[$0], dir0=[DESC]) 00-08Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///drill/testdata/subqueries/j1]], selectionRoot=/drill/testdata/subqueries/j1, numFiles=1, columns=[`c_integer`]]]) {noformat} Attached: * table j1 * test.res - result generated with postgres -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3298) Wrong result with SUM window function and order by without partition by in the OVER clause
[ https://issues.apache.org/jira/browse/DRILL-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589994#comment-14589994 ] Deneche A. Hakim commented on DRILL-3298: - I meant the sort that is input to the window function. When there is no partition-by clause in a window function, all rows are part of a single partition. Which means if we have an order-by without a partition-by the window function will expect it's input to be sorted on the order-by column(s) (there is no partitioning). Wrong result with SUM window function and order by without partition by in the OVER clause -- Key: DRILL-3298 URL: https://issues.apache.org/jira/browse/DRILL-3298 Project: Apache Drill Issue Type: Bug Components: Execution - Flow Affects Versions: 1.0.0 Reporter: Victoria Markman Assignee: Deneche A. Hakim Priority: Critical Labels: window_function Fix For: 1.1.0 Attachments: j1.tar, test.res This query returns incorrect result when planner.slice_target = 1 {code} select j1.c_integer, sum(j1.c_integer) over w from j1 window w as (order by c_integer desc) order by 1, 2; {code} Query plan with planner.slice_target = 1 {noformat} 00-01 Project(c_integer=[$0], EXPR$1=[$1]) 00-02SingleMergeExchange(sort0=[0 ASC], sort1=[1 ASC]) 01-01 SelectionVectorRemover 01-02Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) 01-03 Project(c_integer=[$0], EXPR$1=[$1]) 01-04HashToRandomExchange(dist0=[[$0]], dist1=[[$1]]) 02-01 UnorderedMuxExchange 03-01Project(c_integer=[$0], EXPR$1=[$1], E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($1, hash64AsDouble($0)))]) 03-02 Project(c_integer=[$0], EXPR$1=[CASE(($1, 0), CAST($2):ANY, null)]) 03-03Window(window#0=[window(partition {} order by [0 DESC] range between UNBOUNDED PRECEDING and CURRENT ROW aggs [COUNT($0), $SUM0($0)])]) 03-04 SelectionVectorRemover 03-05Sort(sort0=[$0], dir0=[DESC]) 03-06 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///drill/testdata/subqueries/j1]], selectionRoot=/drill/testdata/subqueries/j1, numFiles=1, columns=[`c_integer`]]]) {noformat} Query plan with planner.slice_target = 10; {noformat} 00-01 Project(c_integer=[$0], EXPR$1=[$1]) 00-02SelectionVectorRemover 00-03 Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) 00-04Project(c_integer=[$0], EXPR$1=[CASE(($1, 0), CAST($2):ANY, null)]) 00-05 Window(window#0=[window(partition {} order by [0 DESC] range between UNBOUNDED PRECEDING and CURRENT ROW aggs [COUNT($0), $SUM0($0)])]) 00-06SelectionVectorRemover 00-07 Sort(sort0=[$0], dir0=[DESC]) 00-08Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///drill/testdata/subqueries/j1]], selectionRoot=/drill/testdata/subqueries/j1, numFiles=1, columns=[`c_integer`]]]) {noformat} Attached: * table j1 * test.res - result generated with postgres -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3298) Wrong result with SUM window function and order by without partition by in the OVER clause
[ https://issues.apache.org/jira/browse/DRILL-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589995#comment-14589995 ] Deneche A. Hakim commented on DRILL-3298: - I meant the sort that is input to the window function. When there is no partition-by clause in a window function, all rows are part of a single partition. Which means if we have an order-by without a partition-by the window function will expect it's input to be sorted on the order-by column(s) (there is no partitioning). Wrong result with SUM window function and order by without partition by in the OVER clause -- Key: DRILL-3298 URL: https://issues.apache.org/jira/browse/DRILL-3298 Project: Apache Drill Issue Type: Bug Components: Execution - Flow Affects Versions: 1.0.0 Reporter: Victoria Markman Assignee: Deneche A. Hakim Priority: Critical Labels: window_function Fix For: 1.1.0 Attachments: j1.tar, test.res This query returns incorrect result when planner.slice_target = 1 {code} select j1.c_integer, sum(j1.c_integer) over w from j1 window w as (order by c_integer desc) order by 1, 2; {code} Query plan with planner.slice_target = 1 {noformat} 00-01 Project(c_integer=[$0], EXPR$1=[$1]) 00-02SingleMergeExchange(sort0=[0 ASC], sort1=[1 ASC]) 01-01 SelectionVectorRemover 01-02Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) 01-03 Project(c_integer=[$0], EXPR$1=[$1]) 01-04HashToRandomExchange(dist0=[[$0]], dist1=[[$1]]) 02-01 UnorderedMuxExchange 03-01Project(c_integer=[$0], EXPR$1=[$1], E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($1, hash64AsDouble($0)))]) 03-02 Project(c_integer=[$0], EXPR$1=[CASE(($1, 0), CAST($2):ANY, null)]) 03-03Window(window#0=[window(partition {} order by [0 DESC] range between UNBOUNDED PRECEDING and CURRENT ROW aggs [COUNT($0), $SUM0($0)])]) 03-04 SelectionVectorRemover 03-05Sort(sort0=[$0], dir0=[DESC]) 03-06 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///drill/testdata/subqueries/j1]], selectionRoot=/drill/testdata/subqueries/j1, numFiles=1, columns=[`c_integer`]]]) {noformat} Query plan with planner.slice_target = 10; {noformat} 00-01 Project(c_integer=[$0], EXPR$1=[$1]) 00-02SelectionVectorRemover 00-03 Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) 00-04Project(c_integer=[$0], EXPR$1=[CASE(($1, 0), CAST($2):ANY, null)]) 00-05 Window(window#0=[window(partition {} order by [0 DESC] range between UNBOUNDED PRECEDING and CURRENT ROW aggs [COUNT($0), $SUM0($0)])]) 00-06SelectionVectorRemover 00-07 Sort(sort0=[$0], dir0=[DESC]) 00-08Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///drill/testdata/subqueries/j1]], selectionRoot=/drill/testdata/subqueries/j1, numFiles=1, columns=[`c_integer`]]]) {noformat} Attached: * table j1 * test.res - result generated with postgres -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3298) Wrong result with SUM window function and order by without partition by in the OVER clause
[ https://issues.apache.org/jira/browse/DRILL-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588248#comment-14588248 ] Deneche A. Hakim commented on DRILL-3298: - {{test.res}} doesn't seem to match the query, [~vicky] did you attach the wrong file ? Wrong result with SUM window function and order by without partition by in the OVER clause -- Key: DRILL-3298 URL: https://issues.apache.org/jira/browse/DRILL-3298 Project: Apache Drill Issue Type: Bug Components: Execution - Flow Affects Versions: 1.0.0 Reporter: Victoria Markman Assignee: Chris Westin Priority: Critical Labels: window_function Attachments: j1.tar, test.res This query returns incorrect result when planner.slice_target = 1 {code} select j1.c_integer, sum(j1.c_integer) over w from j1 window w as (order by c_integer desc) order by 1, 2; {code} Query plan with planner.slice_target = 1 {code} 00-01 Project(c_integer=[$0], EXPR$1=[$1]) 00-02SingleMergeExchange(sort0=[0 ASC], sort1=[1 ASC]) 01-01 SelectionVectorRemover 01-02Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) 01-03 Project(c_integer=[$0], EXPR$1=[$1]) 01-04HashToRandomExchange(dist0=[[$0]], dist1=[[$1]]) 02-01 UnorderedMuxExchange 03-01Project(c_integer=[$0], EXPR$1=[$1], E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($1, hash64AsDouble($0)))]) 03-02 Project(c_integer=[$0], EXPR$1=[CASE(($1, 0), CAST($2):ANY, null)]) 03-03Window(window#0=[window(partition {} order by [0 DESC] range between UNBOUNDED PRECEDING and CURRENT ROW aggs [COUNT($0), $SUM0($0)])]) 03-04 SelectionVectorRemover 03-05Sort(sort0=[$0], dir0=[DESC]) 03-06 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///drill/testdata/subqueries/j1]], selectionRoot=/drill/testdata/subqueries/j1, numFiles=1, columns=[`c_integer`]]]) {code} Query plan with planner.slice_target = 10; {code} 00-01 Project(c_integer=[$0], EXPR$1=[$1]) 00-02SelectionVectorRemover 00-03 Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) 00-04Project(c_integer=[$0], EXPR$1=[CASE(($1, 0), CAST($2):ANY, null)]) 00-05 Window(window#0=[window(partition {} order by [0 DESC] range between UNBOUNDED PRECEDING and CURRENT ROW aggs [COUNT($0), $SUM0($0)])]) 00-06SelectionVectorRemover 00-07 Sort(sort0=[$0], dir0=[DESC]) 00-08Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///drill/testdata/subqueries/j1]], selectionRoot=/drill/testdata/subqueries/j1, numFiles=1, columns=[`c_integer`]]]) {code} Attached: * table j1 * test.res - result generated with postgres -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3298) Wrong result with SUM window function and order by without partition by in the OVER clause
[ https://issues.apache.org/jira/browse/DRILL-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588432#comment-14588432 ] Deneche A. Hakim commented on DRILL-3298: - I did some tests, looks like the results are correct when we use {{slice_target=1}} but the results are incorrect when {{slice_target=1}} Investigating further Wrong result with SUM window function and order by without partition by in the OVER clause -- Key: DRILL-3298 URL: https://issues.apache.org/jira/browse/DRILL-3298 Project: Apache Drill Issue Type: Bug Components: Execution - Flow Affects Versions: 1.0.0 Reporter: Victoria Markman Assignee: Chris Westin Priority: Critical Labels: window_function Attachments: j1.tar, test.res This query returns incorrect result when planner.slice_target = 1 {code} select j1.c_integer, sum(j1.c_integer) over w from j1 window w as (order by c_integer desc) order by 1, 2; {code} Query plan with planner.slice_target = 1 {code} 00-01 Project(c_integer=[$0], EXPR$1=[$1]) 00-02SingleMergeExchange(sort0=[0 ASC], sort1=[1 ASC]) 01-01 SelectionVectorRemover 01-02Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) 01-03 Project(c_integer=[$0], EXPR$1=[$1]) 01-04HashToRandomExchange(dist0=[[$0]], dist1=[[$1]]) 02-01 UnorderedMuxExchange 03-01Project(c_integer=[$0], EXPR$1=[$1], E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($1, hash64AsDouble($0)))]) 03-02 Project(c_integer=[$0], EXPR$1=[CASE(($1, 0), CAST($2):ANY, null)]) 03-03Window(window#0=[window(partition {} order by [0 DESC] range between UNBOUNDED PRECEDING and CURRENT ROW aggs [COUNT($0), $SUM0($0)])]) 03-04 SelectionVectorRemover 03-05Sort(sort0=[$0], dir0=[DESC]) 03-06 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///drill/testdata/subqueries/j1]], selectionRoot=/drill/testdata/subqueries/j1, numFiles=1, columns=[`c_integer`]]]) {code} Query plan with planner.slice_target = 10; {code} 00-01 Project(c_integer=[$0], EXPR$1=[$1]) 00-02SelectionVectorRemover 00-03 Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) 00-04Project(c_integer=[$0], EXPR$1=[CASE(($1, 0), CAST($2):ANY, null)]) 00-05 Window(window#0=[window(partition {} order by [0 DESC] range between UNBOUNDED PRECEDING and CURRENT ROW aggs [COUNT($0), $SUM0($0)])]) 00-06SelectionVectorRemover 00-07 Sort(sort0=[$0], dir0=[DESC]) 00-08Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///drill/testdata/subqueries/j1]], selectionRoot=/drill/testdata/subqueries/j1, numFiles=1, columns=[`c_integer`]]]) {code} Attached: * table j1 * test.res - result generated with postgres -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3298) Wrong result with SUM window function and order by without partition by in the OVER clause
[ https://issues.apache.org/jira/browse/DRILL-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588391#comment-14588391 ] Victoria Markman commented on DRILL-3298: - My apologies, wrong file was attached. Wrong result with SUM window function and order by without partition by in the OVER clause -- Key: DRILL-3298 URL: https://issues.apache.org/jira/browse/DRILL-3298 Project: Apache Drill Issue Type: Bug Components: Execution - Flow Affects Versions: 1.0.0 Reporter: Victoria Markman Assignee: Chris Westin Priority: Critical Labels: window_function Attachments: j1.tar This query returns incorrect result when planner.slice_target = 1 {code} select j1.c_integer, sum(j1.c_integer) over w from j1 window w as (order by c_integer desc) order by 1, 2; {code} Query plan with planner.slice_target = 1 {code} 00-01 Project(c_integer=[$0], EXPR$1=[$1]) 00-02SingleMergeExchange(sort0=[0 ASC], sort1=[1 ASC]) 01-01 SelectionVectorRemover 01-02Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) 01-03 Project(c_integer=[$0], EXPR$1=[$1]) 01-04HashToRandomExchange(dist0=[[$0]], dist1=[[$1]]) 02-01 UnorderedMuxExchange 03-01Project(c_integer=[$0], EXPR$1=[$1], E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($1, hash64AsDouble($0)))]) 03-02 Project(c_integer=[$0], EXPR$1=[CASE(($1, 0), CAST($2):ANY, null)]) 03-03Window(window#0=[window(partition {} order by [0 DESC] range between UNBOUNDED PRECEDING and CURRENT ROW aggs [COUNT($0), $SUM0($0)])]) 03-04 SelectionVectorRemover 03-05Sort(sort0=[$0], dir0=[DESC]) 03-06 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///drill/testdata/subqueries/j1]], selectionRoot=/drill/testdata/subqueries/j1, numFiles=1, columns=[`c_integer`]]]) {code} Query plan with planner.slice_target = 10; {code} 00-01 Project(c_integer=[$0], EXPR$1=[$1]) 00-02SelectionVectorRemover 00-03 Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) 00-04Project(c_integer=[$0], EXPR$1=[CASE(($1, 0), CAST($2):ANY, null)]) 00-05 Window(window#0=[window(partition {} order by [0 DESC] range between UNBOUNDED PRECEDING and CURRENT ROW aggs [COUNT($0), $SUM0($0)])]) 00-06SelectionVectorRemover 00-07 Sort(sort0=[$0], dir0=[DESC]) 00-08Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///drill/testdata/subqueries/j1]], selectionRoot=/drill/testdata/subqueries/j1, numFiles=1, columns=[`c_integer`]]]) {code} Attached: * table j1 * test.res - result generated with postgres -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3298) Wrong result with SUM window function and order by without partition by in the OVER clause
[ https://issues.apache.org/jira/browse/DRILL-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588525#comment-14588525 ] Deneche A. Hakim commented on DRILL-3298: - [~amansinha100] can you take a look at this plan (when slice_target = 1) is it correct ? looks like the sort and window function are applied in parallel on the incoming data, if this is the case then the window function will compute wrong results: {noformat} 00-01 Project(c_integer=[$0], EXPR$1=[$1]) 00-02SingleMergeExchange(sort0=[0 ASC], sort1=[1 ASC]) 01-01 SelectionVectorRemover 01-02Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) 01-03 Project(c_integer=[$0], EXPR$1=[$1]) 01-04HashToRandomExchange(dist0=[[$0]], dist1=[[$1]]) 02-01 UnorderedMuxExchange 03-01Project(c_integer=[$0], EXPR$1=[$1], E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($1, hash64AsDouble($0)))]) 03-02 Project(c_integer=[$0], EXPR$1=[CASE(($1, 0), CAST($2):ANY, null)]) 03-03Window(window#0=[window(partition {} order by [0 DESC] range between UNBOUNDED PRECEDING and CURRENT ROW aggs [COUNT($0), $SUM0($0)])]) 03-04 SelectionVectorRemover 03-05Sort(sort0=[$0], dir0=[DESC]) 03-06 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///drill/testdata/subqueries/j1]], selectionRoot=/drill/testdata/subqueries/j1, numFiles=1, columns=[`c_integer`]]]) {noformat} Wrong result with SUM window function and order by without partition by in the OVER clause -- Key: DRILL-3298 URL: https://issues.apache.org/jira/browse/DRILL-3298 Project: Apache Drill Issue Type: Bug Components: Execution - Flow Affects Versions: 1.0.0 Reporter: Victoria Markman Assignee: Chris Westin Priority: Critical Labels: window_function Attachments: j1.tar, test.res This query returns incorrect result when planner.slice_target = 1 {code} select j1.c_integer, sum(j1.c_integer) over w from j1 window w as (order by c_integer desc) order by 1, 2; {code} Query plan with planner.slice_target = 1 {noformat} 00-01 Project(c_integer=[$0], EXPR$1=[$1]) 00-02SingleMergeExchange(sort0=[0 ASC], sort1=[1 ASC]) 01-01 SelectionVectorRemover 01-02Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) 01-03 Project(c_integer=[$0], EXPR$1=[$1]) 01-04HashToRandomExchange(dist0=[[$0]], dist1=[[$1]]) 02-01 UnorderedMuxExchange 03-01Project(c_integer=[$0], EXPR$1=[$1], E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($1, hash64AsDouble($0)))]) 03-02 Project(c_integer=[$0], EXPR$1=[CASE(($1, 0), CAST($2):ANY, null)]) 03-03Window(window#0=[window(partition {} order by [0 DESC] range between UNBOUNDED PRECEDING and CURRENT ROW aggs [COUNT($0), $SUM0($0)])]) 03-04 SelectionVectorRemover 03-05Sort(sort0=[$0], dir0=[DESC]) 03-06 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///drill/testdata/subqueries/j1]], selectionRoot=/drill/testdata/subqueries/j1, numFiles=1, columns=[`c_integer`]]]) {noformat} Query plan with planner.slice_target = 10; {noformat} 00-01 Project(c_integer=[$0], EXPR$1=[$1]) 00-02SelectionVectorRemover 00-03 Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) 00-04Project(c_integer=[$0], EXPR$1=[CASE(($1, 0), CAST($2):ANY, null)]) 00-05 Window(window#0=[window(partition {} order by [0 DESC] range between UNBOUNDED PRECEDING and CURRENT ROW aggs [COUNT($0), $SUM0($0)])]) 00-06SelectionVectorRemover 00-07 Sort(sort0=[$0], dir0=[DESC]) 00-08Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///drill/testdata/subqueries/j1]], selectionRoot=/drill/testdata/subqueries/j1, numFiles=1, columns=[`c_integer`]]]) {noformat} Attached: * table j1 * test.res - result generated with postgres -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3298) Wrong result with SUM window function and order by without partition by in the OVER clause
[ https://issues.apache.org/jira/browse/DRILL-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589218#comment-14589218 ] Aman Sinha commented on DRILL-3298: --- [~adeneche] were you asking about the final sort or the sort that is input to the window function ? Also, what is the semantics of a window aggregate if there is no partition-by clause but there's an order-by ? If the semantics is that the order-by column should be treated as a 'partitioning' column, then we would need to insert a HashToRandomExchange below the Sort that is input to the Window operator. Let me illustrate the issue with a simpler example below. I removed the final order-by, added a filter condition to reduce the result set and am just doing a COUNT instead of SUM. slice_target = 1: // incorrect result {code} select c_integer, count(*) over (order by c_integer) as cnt from j1 where c_integer in (447503383, 452697297, 456808172) ; ++--+ | c_integer | cnt | ++--+ | 456808172 | 1| | 447503383 | 2| | 447503383 | 2| | 452697297 | 4| | 452697297 | 4| | 456808172 | 5| | 447503383 | 2| | 447503383 | 2| | 456808172 | 2| | 456808172 | 2| ++--+ {code} slice_target = 10: // correct result {code} ++--+ | c_integer | cnt | ++--+ | 447503383 | 4| | 447503383 | 4| | 447503383 | 4| | 447503383 | 4| | 452697297 | 6| | 452697297 | 6| | 456808172 | 10 | | 456808172 | 10 | | 456808172 | 10 | | 456808172 | 10 | ++--+ {code} Note that in the incorrect result, not only are the the 'cnt' values incorrect, they are not the same for all rows of 456808172. So it seems to me that each minor fragment of the window is computing its own COUNT and that is not being set for all identical values of c_integer. If you can verify this, I think adding a distribution step below the Sort would resolve it. I can work on adding that. Wrong result with SUM window function and order by without partition by in the OVER clause -- Key: DRILL-3298 URL: https://issues.apache.org/jira/browse/DRILL-3298 Project: Apache Drill Issue Type: Bug Components: Execution - Flow Affects Versions: 1.0.0 Reporter: Victoria Markman Assignee: Deneche A. Hakim Priority: Critical Labels: window_function Fix For: 1.1.0 Attachments: j1.tar, test.res This query returns incorrect result when planner.slice_target = 1 {code} select j1.c_integer, sum(j1.c_integer) over w from j1 window w as (order by c_integer desc) order by 1, 2; {code} Query plan with planner.slice_target = 1 {noformat} 00-01 Project(c_integer=[$0], EXPR$1=[$1]) 00-02SingleMergeExchange(sort0=[0 ASC], sort1=[1 ASC]) 01-01 SelectionVectorRemover 01-02Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) 01-03 Project(c_integer=[$0], EXPR$1=[$1]) 01-04HashToRandomExchange(dist0=[[$0]], dist1=[[$1]]) 02-01 UnorderedMuxExchange 03-01Project(c_integer=[$0], EXPR$1=[$1], E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($1, hash64AsDouble($0)))]) 03-02 Project(c_integer=[$0], EXPR$1=[CASE(($1, 0), CAST($2):ANY, null)]) 03-03Window(window#0=[window(partition {} order by [0 DESC] range between UNBOUNDED PRECEDING and CURRENT ROW aggs [COUNT($0), $SUM0($0)])]) 03-04 SelectionVectorRemover 03-05Sort(sort0=[$0], dir0=[DESC]) 03-06 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///drill/testdata/subqueries/j1]], selectionRoot=/drill/testdata/subqueries/j1, numFiles=1, columns=[`c_integer`]]]) {noformat} Query plan with planner.slice_target = 10; {noformat} 00-01 Project(c_integer=[$0], EXPR$1=[$1]) 00-02SelectionVectorRemover 00-03 Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) 00-04Project(c_integer=[$0], EXPR$1=[CASE(($1, 0), CAST($2):ANY, null)]) 00-05 Window(window#0=[window(partition {} order by [0 DESC] range between UNBOUNDED PRECEDING and CURRENT ROW aggs [COUNT($0), $SUM0($0)])]) 00-06SelectionVectorRemover 00-07 Sort(sort0=[$0], dir0=[DESC]) 00-08Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///drill/testdata/subqueries/j1]], selectionRoot=/drill/testdata/subqueries/j1, numFiles=1, columns=[`c_integer`]]]) {noformat}
[jira] [Commented] (DRILL-3298) Wrong result with SUM window function and order by without partition by in the OVER clause
[ https://issues.apache.org/jira/browse/DRILL-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14587199#comment-14587199 ] Victoria Markman commented on DRILL-3298: - Labeled it as window_function until proven otherwise. Wrong result with SUM window function and order by without partition by in the OVER clause -- Key: DRILL-3298 URL: https://issues.apache.org/jira/browse/DRILL-3298 Project: Apache Drill Issue Type: Bug Components: Execution - Flow Affects Versions: 1.0.0 Reporter: Victoria Markman Assignee: Chris Westin Priority: Critical Labels: window_function Attachments: j1.tar, test.res This query returns incorrect result when planner.slice_target = 1 {code} select j1.c_integer, sum(j1.c_integer) over w from j1 window w as (order by c_integer desc) order by 1, 2; {code} Query plan with planner.slice_target = 1 {code} 00-01 Project(c_integer=[$0], EXPR$1=[$1]) 00-02SingleMergeExchange(sort0=[0 ASC], sort1=[1 ASC]) 01-01 SelectionVectorRemover 01-02Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) 01-03 Project(c_integer=[$0], EXPR$1=[$1]) 01-04HashToRandomExchange(dist0=[[$0]], dist1=[[$1]]) 02-01 UnorderedMuxExchange 03-01Project(c_integer=[$0], EXPR$1=[$1], E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($1, hash64AsDouble($0)))]) 03-02 Project(c_integer=[$0], EXPR$1=[CASE(($1, 0), CAST($2):ANY, null)]) 03-03Window(window#0=[window(partition {} order by [0 DESC] range between UNBOUNDED PRECEDING and CURRENT ROW aggs [COUNT($0), $SUM0($0)])]) 03-04 SelectionVectorRemover 03-05Sort(sort0=[$0], dir0=[DESC]) 03-06 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///drill/testdata/subqueries/j1]], selectionRoot=/drill/testdata/subqueries/j1, numFiles=1, columns=[`c_integer`]]]) {code} Query plan with planner.slice_target = 10; {code} 00-01 Project(c_integer=[$0], EXPR$1=[$1]) 00-02SelectionVectorRemover 00-03 Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) 00-04Project(c_integer=[$0], EXPR$1=[CASE(($1, 0), CAST($2):ANY, null)]) 00-05 Window(window#0=[window(partition {} order by [0 DESC] range between UNBOUNDED PRECEDING and CURRENT ROW aggs [COUNT($0), $SUM0($0)])]) 00-06SelectionVectorRemover 00-07 Sort(sort0=[$0], dir0=[DESC]) 00-08Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///drill/testdata/subqueries/j1]], selectionRoot=/drill/testdata/subqueries/j1, numFiles=1, columns=[`c_integer`]]]) {code} Attached: * table j1 * test.res - result generated with postgres -- This message was sent by Atlassian JIRA (v6.3.4#6332)