[ https://issues.apache.org/jira/browse/CALCITE-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Haisheng Yuan updated CALCITE-3181: ----------------------------------- Description: We have a lot of queries like the following to retrieve top N tuples per group: {code:java} SELECT x, y FROM (SELECT x, y, ROW_NUMBER() OVER (PARTITION BY x ORDER BY y) AS rn FROM t1) t2 WHERE rn <= 3; {code} The performance is not good if each group has a lot more tuples than wanted, because we will retrieve and sort all the tuples, instead of just doing a top-N heap sort. In order to do optimization for this kind of query, we need to extend window to support limit, if and only if there is only 1 window function, and it is {{row_number()}}. We also need a substitute rule to push the limit into window. Of course, we also need to modify executor to support this optimization (can be later). {code:java} Filter (rn <= 3) +- Window (window#0={Partition by x order by y ROW_NUMBER()}) {code} to {code:java} Filter (rn <= 3) +- Window (window#0={Partition by x order by y limit 3 ROW_NUMBER()}) {code} Thoughts? Objections? was: We have a lot of queries like the following to retrieve top N tuples per group: {code:java} SELECT x, y FROM (SELECT x, y, ROW_NUMBER() OVER (PARTITION BY x ORDER BY y) AS rn FROM t1) t2 WHERE rn <= 3; {code} The performance is not good if each group has a lot more tuples than wanted, because we will retrieve and sort all the tuples, instead of just doing a top-N heap sort. In order to do optimization for this kind of query, we need to extend window to support limit, if and only if there is only 1 window function, and it is {{row_number()}}. We also need a substitute rule to push the limit into window. Of course, we also need to modify executor to support this optimization (can be later). {code:java} Filter (rn <= 3) +- Window (window#0={Partition by x order by y ROW_NUMBER()}) {code} to {code:java} Window (window#0={Partition by x order by y limit 3 ROW_NUMBER()}) {code} Thoughts? Objections? > Support limit per group in Window > --------------------------------- > > Key: CALCITE-3181 > URL: https://issues.apache.org/jira/browse/CALCITE-3181 > Project: Calcite > Issue Type: Improvement > Components: core > Reporter: Haisheng Yuan > Priority: Major > > We have a lot of queries like the following to retrieve top N tuples per > group: > {code:java} > SELECT x, y FROM > (SELECT x, y, ROW_NUMBER() OVER (PARTITION BY x ORDER BY y) > AS rn FROM t1) t2 WHERE rn <= 3; > {code} > The performance is not good if each group has a lot more tuples than wanted, > because we will retrieve and sort all the tuples, instead of just doing a > top-N heap sort. > In order to do optimization for this kind of query, we need to extend window > to support limit, if and only if there is only 1 window function, and it is > {{row_number()}}. We also need a substitute rule to push the limit into > window. Of course, we also need to modify executor to support this > optimization (can be later). > {code:java} > Filter (rn <= 3) > +- Window (window#0={Partition by x order by y ROW_NUMBER()}) > {code} > to > {code:java} > Filter (rn <= 3) > +- Window (window#0={Partition by x order by y limit 3 ROW_NUMBER()}) > {code} > Thoughts? Objections? -- This message was sent by Atlassian JIRA (v7.6.3#76005)