Paul Rogers created DRILL-5143: ---------------------------------- Summary: Planner creates redundant sort for nested query on single fragment Key: DRILL-5143 URL: https://issues.apache.org/jira/browse/DRILL-5143 Project: Apache Drill Issue Type: Bug Affects Versions: 1.8.0 Reporter: Paul Rogers Priority: Minor
The unit test {{TestWindowFrame.testUnboundedFollowing}} identified an optimization opportunity in the planner. It uses the following query: {code} SELECT position_id, employee_id, MAX(employee_id) OVER(PARTITION BY position_id) AS `last_value` FROM ( SELECT * FROM dfs_test.`%s/window/b4.p4` ORDER BY position_id, employee_id ) {code} Which produces the following (heavily elided) plan: {code} "graph" : [ { "pop" : "fs-scan", ... }, { "pop" : "project", ... }, { "pop" : "external-sort", "orderings" : [ { "order" : "ASC", "expr" : "`position_id`", "nullDirection" : "UNSPECIFIED" }, { "order" : "ASC", "expr" : "`employee_id`", "nullDirection" : "UNSPECIFIED" } ], ... }, { "pop" : "selection-vector-remover", ... }, { "pop" : "project", ... "exprs" : [ { "ref" : "`$0`", "expr" : "`T0¦¦position_id`" }... }, { "pop" : "external-sort", ... "orderings" : [ { "order" : "ASC", "expr" : "`$0`", "nullDirection" : "UNSPECIFIED" } ], ... }, { "pop" : "selection-vector-remover", ... }, { "pop" : "window", ... "aggregations" : [ { "ref" : "`w0$o0`", "expr" : "max(`$1`) " } ], ... }, { "pop" : "project", ... }, { "pop" : "screen", ... } ] {code} Note that two sorts are stacked one atop the other. This is a "degenerate" plan; normally a shuffle operation would occur between the two sorts in a distributed query. But, because the query is so small, it runs on a single node and has a redundant sort. Either: * Note that the data is already sorted and omit the downstream sort, or * Note that the inner query sort is ignored by downstream operators and discard the inner sort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)