Paul Rogers created DRILL-5144:
----------------------------------

             Summary: Query with two identical windows produces incorrect 
results
                 Key: DRILL-5144
                 URL: https://issues.apache.org/jira/browse/DRILL-5144
             Project: Apache Drill
          Issue Type: Bug
    Affects Versions: 1.8.0
            Reporter: Paul Rogers


Consider the following query (from {{TestWindowFrame.test4657}}):

{code}
select row_number() over(order by position_id) rn, 
        rank() over(order by position_id) rnk
from dfs_test.`%s/window/b3.p2`
{code}

The planner produces the following plan:

{code}
{
  ...
  "graph" : [ {
    "pop" : "fs-scan",
    ...
    "columns" : [ "`position_id`" ],
    ...
  }, {
    "pop" : "external-sort",
    "@id" : 8,
    "child" : 9,
    "orderings" : [ {
      "order" : "ASC",
      "expr" : "`position_id`",
      "nullDirection" : "UNSPECIFIED"
    } ],
    ...
  }, {
    "pop" : "selection-vector-remover",
    ...
  }, {
    "pop" : "window",
    "@id" : 6,
    "child" : 7,
    "aggregations" : [ {
      "ref" : "`w0$o0`",
      "expr" : "row_number(1) "
    } ],
    "orderings" : [ {
      "order" : "ASC",
      "expr" : "`position_id`",
      "nullDirection" : "UNSPECIFIED"
    } ],
    "frameUnitsRows" : true,
    "start" : {
      "unbounded" : true,
      "offset" : -9223372036854775808
    },
    "end" : {
      "unbounded" : false,
      "offset" : 0
    },
    ...
  }, {
    "pop" : "external-sort",
    "@id" : 5,
    "child" : 6,
    "orderings" : [ {
      "order" : "ASC",
      "expr" : "`position_id`",
      "nullDirection" : "UNSPECIFIED"
    } ],
    ...
  }, {
    "pop" : "selection-vector-remover",
    ...
  }, {
    "pop" : "window",
    ...
    "aggregations" : [ {
      "ref" : "`w1$o0`",
      "expr" : "rank(1) "
    } ],
    "orderings" : [ {
      "order" : "ASC",
      "expr" : "`position_id`",
      "nullDirection" : "UNSPECIFIED"
    } ],
    "frameUnitsRows" : false,
    "start" : {
      "unbounded" : true,
      "offset" : -9223372036854775808
    },
    "end" : {
      "unbounded" : false,
      "offset" : 0
    },
    ...
  }, {
    "pop" : "project",
    "@id" : 2,
    "exprs" : [ {
      "ref" : "`$0`",
      "expr" : "`w0$o0`"
    }, {
      "ref" : "`$1`",
      "expr" : "`w1$o0`"
    } ],
    ...
  }, {
    "pop" : "project",
    ...
    "exprs" : [ {
      "ref" : "`rn`",
      "expr" : "`$0`"
    }, {
      "ref" : "`rnk`",
      "expr" : "`$1`"
    } ],
    ...
  }, {
    "pop" : "screen",
    ...
  } ]
}
{code}

Note that the plan sorts the input data on {{position_id}} twice. One would 
work just as well.

Note also that, since Drill's sort is unstable, the order of like keys can 
change between sorts; resulting in non-deterministic results (which show up in 
failures of the test mentioned earlier).

Indeed, earlier versions of the tests "fudged" the external sort to make it 
stable so that the tests would pass; but those tests hid the instability of 
this particular query. Without the fudge, results are incorrect:

{code}
rn, rnk
1,1
3,1
4,1
5,1
2,1 <-- Wrong results
31,6 <-- Wrong results
6,6
...
30,6
32,6
...
60,6
{code}

The expected results is the equivalent of the following (invalid) query:

{code}
select row_number() rn, rank() rnk
         over(order by position_id)
from dfs_test.`%s/window/b3.p2`
{code}

{{TestWindowFrame.test4657}} will be disabled until a fix for this problem is 
available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to