[jira] [Commented] (CALCITE-3181) Support limit per group in Window

2021-07-26 Thread Haisheng Yuan (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387518#comment-17387518
 ] 

Haisheng Yuan commented on CALCITE-3181:


Yeah, with "row()" the transformation should work. Thanks for the example.

> Support limit per group in Window
> -
>
> Key: CALCITE-3181
> URL: https://issues.apache.org/jira/browse/CALCITE-3181
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Haisheng Yuan
>Priority: Major
>
> We have a lot of queries like the following to retrieve top N tuples per 
> group:
> {code:java}
> SELECT x, y FROM
>  (SELECT x, y, ROW_NUMBER() OVER (PARTITION BY x ORDER BY y) 
>  AS rn FROM t1) t2 WHERE rn <= 3;
> {code}
> The performance is not good if each group has a lot more tuples than wanted, 
> because we will retrieve and sort all the tuples, instead of just doing a 
> top-N heap sort.
> In order to do optimization for this kind of query, we need to extend window 
> to support limit, if and only if there is only 1 window function, and it is 
> {{row_number()}}. We also need a substitute rule to push the limit into 
> window. Of course, we also need to modify executor to support this 
> optimization (can be later).
> {code:java}
> Filter (rn <= 3)
>   +- Window (window#0={Partition by x order by y ROW_NUMBER()})
> {code}
> to
> {code:java}
> Filter (rn <= 3)
>   +- Window (window#0={Partition by x order by y limit 3 ROW_NUMBER()})
> {code}
> Thoughts? Objections?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-3181) Support limit per group in Window

2021-07-10 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378573#comment-17378573
 ] 

Julian Hyde commented on CALCITE-3181:
--

You can use use {{ROW(a, b, y)}}. For example, if a is {{state}}, b is 
{{year}}, x is {{color}} and {{y}} is {{units}}, you can write
{code:java}
SELECT color,
  ARRAY_AGG(r) WITHIN GROUP (ORDER BY r.units DESC LIMIT 3)
FROM (SELECT color, ROW (state, year, units) AS r
  FROM Sales)
GROUP BY color{code}
instead of
{code:java}
SELECT state, year, color
FROM (SELECT state, year, color,
ROW_NUMBER() OVER (PARTITION BY color ORDER BY units DESC) AS rn
  FROM Sales)
WHERE rn <= 3{code}

> Support limit per group in Window
> -
>
> Key: CALCITE-3181
> URL: https://issues.apache.org/jira/browse/CALCITE-3181
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Haisheng Yuan
>Priority: Major
>
> We have a lot of queries like the following to retrieve top N tuples per 
> group:
> {code:java}
> SELECT x, y FROM
>  (SELECT x, y, ROW_NUMBER() OVER (PARTITION BY x ORDER BY y) 
>  AS rn FROM t1) t2 WHERE rn <= 3;
> {code}
> The performance is not good if each group has a lot more tuples than wanted, 
> because we will retrieve and sort all the tuples, instead of just doing a 
> top-N heap sort.
> In order to do optimization for this kind of query, we need to extend window 
> to support limit, if and only if there is only 1 window function, and it is 
> {{row_number()}}. We also need a substitute rule to push the limit into 
> window. Of course, we also need to modify executor to support this 
> optimization (can be later).
> {code:java}
> Filter (rn <= 3)
>   +- Window (window#0={Partition by x order by y ROW_NUMBER()})
> {code}
> to
> {code:java}
> Filter (rn <= 3)
>   +- Window (window#0={Partition by x order by y limit 3 ROW_NUMBER()})
> {code}
> Thoughts? Objections?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-3181) Support limit per group in Window

2021-07-08 Thread Haisheng Yuan (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377857#comment-17377857
 ] 

Haisheng Yuan commented on CALCITE-3181:


Yeah, that makes more sense. 
How does it work for the following case?
SELECT a, b, ROW_NUMBER() OVER (PARTITION BY x ORDER BY y)...

here the value of "a" and "b" may vary row by row.

> Support limit per group in Window
> -
>
> Key: CALCITE-3181
> URL: https://issues.apache.org/jira/browse/CALCITE-3181
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Haisheng Yuan
>Priority: Major
>
> We have a lot of queries like the following to retrieve top N tuples per 
> group:
> {code:java}
> SELECT x, y FROM
>  (SELECT x, y, ROW_NUMBER() OVER (PARTITION BY x ORDER BY y) 
>  AS rn FROM t1) t2 WHERE rn <= 3;
> {code}
> The performance is not good if each group has a lot more tuples than wanted, 
> because we will retrieve and sort all the tuples, instead of just doing a 
> top-N heap sort.
> In order to do optimization for this kind of query, we need to extend window 
> to support limit, if and only if there is only 1 window function, and it is 
> {{row_number()}}. We also need a substitute rule to push the limit into 
> window. Of course, we also need to modify executor to support this 
> optimization (can be later).
> {code:java}
> Filter (rn <= 3)
>   +- Window (window#0={Partition by x order by y ROW_NUMBER()})
> {code}
> to
> {code:java}
> Filter (rn <= 3)
>   +- Window (window#0={Partition by x order by y limit 3 ROW_NUMBER()})
> {code}
> Thoughts? Objections?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-3181) Support limit per group in Window

2021-07-08 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377711#comment-17377711
 ] 

Julian Hyde commented on CALCITE-3181:
--

Only one row per group, but that row contains an array of row objects which, 
when unnested, become multiple rows.

> Support limit per group in Window
> -
>
> Key: CALCITE-3181
> URL: https://issues.apache.org/jira/browse/CALCITE-3181
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Haisheng Yuan
>Priority: Major
>
> We have a lot of queries like the following to retrieve top N tuples per 
> group:
> {code:java}
> SELECT x, y FROM
>  (SELECT x, y, ROW_NUMBER() OVER (PARTITION BY x ORDER BY y) 
>  AS rn FROM t1) t2 WHERE rn <= 3;
> {code}
> The performance is not good if each group has a lot more tuples than wanted, 
> because we will retrieve and sort all the tuples, instead of just doing a 
> top-N heap sort.
> In order to do optimization for this kind of query, we need to extend window 
> to support limit, if and only if there is only 1 window function, and it is 
> {{row_number()}}. We also need a substitute rule to push the limit into 
> window. Of course, we also need to modify executor to support this 
> optimization (can be later).
> {code:java}
> Filter (rn <= 3)
>   +- Window (window#0={Partition by x order by y ROW_NUMBER()})
> {code}
> to
> {code:java}
> Filter (rn <= 3)
>   +- Window (window#0={Partition by x order by y limit 3 ROW_NUMBER()})
> {code}
> Thoughts? Objections?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-3181) Support limit per group in Window

2021-07-08 Thread Haisheng Yuan (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377683#comment-17377683
 ] 

Haisheng Yuan commented on CALCITE-3181:


Yes, they are quite similar with each other.
There are still some slight difference, the window keep the row count to the 
limit, but the aggregate in CALCITE-4687 only generate 1 row per group, no 
matter with or without the limit. So maybe not exchangable?

> Support limit per group in Window
> -
>
> Key: CALCITE-3181
> URL: https://issues.apache.org/jira/browse/CALCITE-3181
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Haisheng Yuan
>Priority: Major
>
> We have a lot of queries like the following to retrieve top N tuples per 
> group:
> {code:java}
> SELECT x, y FROM
>  (SELECT x, y, ROW_NUMBER() OVER (PARTITION BY x ORDER BY y) 
>  AS rn FROM t1) t2 WHERE rn <= 3;
> {code}
> The performance is not good if each group has a lot more tuples than wanted, 
> because we will retrieve and sort all the tuples, instead of just doing a 
> top-N heap sort.
> In order to do optimization for this kind of query, we need to extend window 
> to support limit, if and only if there is only 1 window function, and it is 
> {{row_number()}}. We also need a substitute rule to push the limit into 
> window. Of course, we also need to modify executor to support this 
> optimization (can be later).
> {code:java}
> Filter (rn <= 3)
>   +- Window (window#0={Partition by x order by y ROW_NUMBER()})
> {code}
> to
> {code:java}
> Filter (rn <= 3)
>   +- Window (window#0={Partition by x order by y limit 3 ROW_NUMBER()})
> {code}
> Thoughts? Objections?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-3181) Support limit per group in Window

2021-07-08 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377599#comment-17377599
 ] 

Julian Hyde commented on CALCITE-3181:
--

[~hyuan], I just logged CALCITE-4687, which is another way of posing queries 
that have "per-key limits". Please take a look, and let me know what you think.

The important insight, I think, is that computing the rank is expensive, but 
filtering on the rank (say {{rn <= 3}}) may be much cheaper, as long as you 
don't return the rank column in your result.

Since the two formulations are equivalent, it would be possible to define rules 
that transform one into the other.

> Support limit per group in Window
> -
>
> Key: CALCITE-3181
> URL: https://issues.apache.org/jira/browse/CALCITE-3181
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Haisheng Yuan
>Priority: Major
>
> We have a lot of queries like the following to retrieve top N tuples per 
> group:
> {code:java}
> SELECT x, y FROM
>  (SELECT x, y, ROW_NUMBER() OVER (PARTITION BY x ORDER BY y) 
>  AS rn FROM t1) t2 WHERE rn <= 3;
> {code}
> The performance is not good if each group has a lot more tuples than wanted, 
> because we will retrieve and sort all the tuples, instead of just doing a 
> top-N heap sort.
> In order to do optimization for this kind of query, we need to extend window 
> to support limit, if and only if there is only 1 window function, and it is 
> {{row_number()}}. We also need a substitute rule to push the limit into 
> window. Of course, we also need to modify executor to support this 
> optimization (can be later).
> {code:java}
> Filter (rn <= 3)
>   +- Window (window#0={Partition by x order by y ROW_NUMBER()})
> {code}
> to
> {code:java}
> Filter (rn <= 3)
>   +- Window (window#0={Partition by x order by y limit 3 ROW_NUMBER()})
> {code}
> Thoughts? Objections?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-3181) Support limit per group in Window

2019-07-15 Thread Julian Hyde (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16885545#comment-16885545
 ] 

Julian Hyde commented on CALCITE-3181:
--

The {{enum RelFieldCollation}} has values ASCENDING, STRICTLY_ASCENDING, 
DESCENDING, STRICTLY_DESCENDING and CLUSTERED. CLUSTERED exists for precisely 
this purpose.

We have no way of saying in a physical property (trait) "for a given deptno key 
there are no more than 10 rows". My instinct says we probably shouldn't add a 
way; it would make the trait more complicated for everyone else. So we'd do 
this by transformation (forward chaining) not by traits (backward chaining).

> Support limit per group in Window
> -
>
> Key: CALCITE-3181
> URL: https://issues.apache.org/jira/browse/CALCITE-3181
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Haisheng Yuan
>Priority: Major
>
> We have a lot of queries like the following to retrieve top N tuples per 
> group:
> {code:java}
> SELECT x, y FROM
>  (SELECT x, y, ROW_NUMBER() OVER (PARTITION BY x ORDER BY y) 
>  AS rn FROM t1) t2 WHERE rn <= 3;
> {code}
> The performance is not good if each group has a lot more tuples than wanted, 
> because we will retrieve and sort all the tuples, instead of just doing a 
> top-N heap sort.
> In order to do optimization for this kind of query, we need to extend window 
> to support limit, if and only if there is only 1 window function, and it is 
> {{row_number()}}. We also need a substitute rule to push the limit into 
> window. Of course, we also need to modify executor to support this 
> optimization (can be later).
> {code:java}
> Filter (rn <= 3)
>   +- Window (window#0={Partition by x order by y ROW_NUMBER()})
> {code}
> to
> {code:java}
> Filter (rn <= 3)
>   +- Window (window#0={Partition by x order by y limit 3 ROW_NUMBER()})
> {code}
> Thoughts? Objections?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (CALCITE-3181) Support limit per group in Window

2019-07-09 Thread Haisheng Yuan (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881722#comment-16881722
 ] 

Haisheng Yuan commented on CALCITE-3181:


{quote}
It would make sense to have an implementation of Window (let's call it 
StreamingWindow) that exploits the fact that the input is sorted. 
{quote}

This is exactly how Greenplum database and Orca optimizer implement Window 
operator. In Orca, the physical window operator requests physical properties of 
[partition keys] hash distribution and [partition keys, sort keys] sort order, 
which gives optimizer more optimization opportunities, like window reordering 
to avoid redundant sort or shuffle.

{quote}
if Sort supported partitioned top-n, then we could create a StreamingWindow on 
top of a top-n Sort, and the window would not even have to worry that the Sort 
is removing rows.
{quote}
Agree. So that we don't need to change Window semantics. What physical order 
property would the top-n Sort deliver?  [partition keys + sort keys]? But if 
the sort is hash table based implementation, the [sort keys] order trait might 
be wrong.



> Support limit per group in Window
> -
>
> Key: CALCITE-3181
> URL: https://issues.apache.org/jira/browse/CALCITE-3181
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Haisheng Yuan
>Priority: Major
>
> We have a lot of queries like the following to retrieve top N tuples per 
> group:
> {code:java}
> SELECT x, y FROM
>  (SELECT x, y, ROW_NUMBER() OVER (PARTITION BY x ORDER BY y) 
>  AS rn FROM t1) t2 WHERE rn <= 3;
> {code}
> The performance is not good if each group has a lot more tuples than wanted, 
> because we will retrieve and sort all the tuples, instead of just doing a 
> top-N heap sort.
> In order to do optimization for this kind of query, we need to extend window 
> to support limit, if and only if there is only 1 window function, and it is 
> {{row_number()}}. We also need a substitute rule to push the limit into 
> window. Of course, we also need to modify executor to support this 
> optimization (can be later).
> {code:java}
> Filter (rn <= 3)
>   +- Window (window#0={Partition by x order by y ROW_NUMBER()})
> {code}
> to
> {code:java}
> Filter (rn <= 3)
>   +- Window (window#0={Partition by x order by y limit 3 ROW_NUMBER()})
> {code}
> Thoughts? Objections?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-3181) Support limit per group in Window

2019-07-09 Thread Julian Hyde (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881530#comment-16881530
 ] 

Julian Hyde commented on CALCITE-3181:
--

Now switching to consider algebra rather than SQL syntax. It would make sense 
to have an implementation of {{Window}} (let's call it {{StreamingWindow}}) 
that exploits the fact that the input is sorted. If the Window only needed the 
top 3 rows in each partition, and if {{Sort}} supported partitioned top-n, then 
we could create a {{StreamingWindow}} on top of a top-n {{Sort}}, and the 
window would not even have to worry that the {{Sort}} is removing rows.

> Support limit per group in Window
> -
>
> Key: CALCITE-3181
> URL: https://issues.apache.org/jira/browse/CALCITE-3181
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Haisheng Yuan
>Priority: Major
>
> We have a lot of queries like the following to retrieve top N tuples per 
> group:
> {code:java}
> SELECT x, y FROM
>  (SELECT x, y, ROW_NUMBER() OVER (PARTITION BY x ORDER BY y) 
>  AS rn FROM t1) t2 WHERE rn <= 3;
> {code}
> The performance is not good if each group has a lot more tuples than wanted, 
> because we will retrieve and sort all the tuples, instead of just doing a 
> top-N heap sort.
> In order to do optimization for this kind of query, we need to extend window 
> to support limit, if and only if there is only 1 window function, and it is 
> {{row_number()}}. We also need a substitute rule to push the limit into 
> window. Of course, we also need to modify executor to support this 
> optimization (can be later).
> {code:java}
> Filter (rn <= 3)
>   +- Window (window#0={Partition by x order by y ROW_NUMBER()})
> {code}
> to
> {code:java}
> Filter (rn <= 3)
>   +- Window (window#0={Partition by x order by y limit 3 ROW_NUMBER()})
> {code}
> Thoughts? Objections?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-3181) Support limit per group in Window

2019-07-09 Thread Julian Hyde (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881528#comment-16881528
 ] 

Julian Hyde commented on CALCITE-3181:
--

I wonder if there is any way to express what we want using the {{rows}} 
sub-clause. Your query is *almost* the same as

{code}
select category, product, pv, 
 sum(pv) over(partition by category order by sale_count desc rows 10 
preceding) from prod;
{code}

except that "rows 10 preceding" is relative to the current row whereas we want 
the top 10 rows, i.e. the top row and its 10 preceding rows.

> Support limit per group in Window
> -
>
> Key: CALCITE-3181
> URL: https://issues.apache.org/jira/browse/CALCITE-3181
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Haisheng Yuan
>Priority: Major
>
> We have a lot of queries like the following to retrieve top N tuples per 
> group:
> {code:java}
> SELECT x, y FROM
>  (SELECT x, y, ROW_NUMBER() OVER (PARTITION BY x ORDER BY y) 
>  AS rn FROM t1) t2 WHERE rn <= 3;
> {code}
> The performance is not good if each group has a lot more tuples than wanted, 
> because we will retrieve and sort all the tuples, instead of just doing a 
> top-N heap sort.
> In order to do optimization for this kind of query, we need to extend window 
> to support limit, if and only if there is only 1 window function, and it is 
> {{row_number()}}. We also need a substitute rule to push the limit into 
> window. Of course, we also need to modify executor to support this 
> optimization (can be later).
> {code:java}
> Filter (rn <= 3)
>   +- Window (window#0={Partition by x order by y ROW_NUMBER()})
> {code}
> to
> {code:java}
> Filter (rn <= 3)
>   +- Window (window#0={Partition by x order by y limit 3 ROW_NUMBER()})
> {code}
> Thoughts? Objections?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-3181) Support limit per group in Window

2019-07-09 Thread Haisheng Yuan (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881121#comment-16881121
 ] 

Haisheng Yuan commented on CALCITE-3181:


e.g. compute the total page views for the top 10 hot-sale products in each 
category:

{code:java}
select category, product, pv, 
 sum(pv) over(partition by category order by sale_count desc limit 10) from 
prod;
{code}

Instead of sort-based window implementation, we can use hash table based 
implementation for window with limit. Use the partition key as the hash key, 
each distinct key has an entry that contains a heap if there is order by 
clause, with the heap size be the limit value. In distributed system, we can 
have 2-phase window (global and local), execute local window before shuffling 
tuples by the window partition key, which might reduce the shuffle cost 
dramatically.

But the side effect is also obvious: this will break the assumption that window 
operator keeps the cardinality unchanged.


> Support limit per group in Window
> -
>
> Key: CALCITE-3181
> URL: https://issues.apache.org/jira/browse/CALCITE-3181
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Haisheng Yuan
>Priority: Major
>
> We have a lot of queries like the following to retrieve top N tuples per 
> group:
> {code:java}
> SELECT x, y FROM
>  (SELECT x, y, ROW_NUMBER() OVER (PARTITION BY x ORDER BY y) 
>  AS rn FROM t1) t2 WHERE rn <= 3;
> {code}
> The performance is not good if each group has a lot more tuples than wanted, 
> because we will retrieve and sort all the tuples, instead of just doing a 
> top-N heap sort.
> In order to do optimization for this kind of query, we need to extend window 
> to support limit, if and only if there is only 1 window function, and it is 
> {{row_number()}}. We also need a substitute rule to push the limit into 
> window. Of course, we also need to modify executor to support this 
> optimization (can be later).
> {code:java}
> Filter (rn <= 3)
>   +- Window (window#0={Partition by x order by y ROW_NUMBER()})
> {code}
> to
> {code:java}
> Filter (rn <= 3)
>   +- Window (window#0={Partition by x order by y limit 3 ROW_NUMBER()})
> {code}
> Thoughts? Objections?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-3181) Support limit per group in Window

2019-07-08 Thread Julian Hyde (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16880945#comment-16880945
 ] 

Julian Hyde commented on CALCITE-3181:
--

It seems to make sense. Can you devise an example on a "real" data set, and 
involving not just ROW_NUMBER but an aggregate function such as AVG or SUM.

I know it is not standard SQL, but can you sketch out how the query would look 
if we allowed "OVER (PARTITION BY ... ORDER BY ... FETCH ...)"?

A while ago I was thinking about "partitioned sort-limit". We support 
"Sort(Scan(EMP), key=[salary DESC], fetch=10)" (the 10 top-earning employees) 
but we don't support "Sort(Scan(EMP), partition=[deptno], key=[salary DESC], 
fetch=10)" (the top 10 top-earning employees in each department).

This came up in CALCITE-1317, the "WinMagic" optimization. Regular Sort with 
limit can handle non-correlated sub-queries, but for correlated sub-queries we 
needed partitioned Sort with limit. Thus it seems a natural and useful 
extension to Sort. I haven't thought about whether the extension of Window that 
you propose is as natural.

> Support limit per group in Window
> -
>
> Key: CALCITE-3181
> URL: https://issues.apache.org/jira/browse/CALCITE-3181
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Haisheng Yuan
>Priority: Major
>
> We have a lot of queries like the following to retrieve top N tuples per 
> group:
> {code:java}
> SELECT x, y FROM
>  (SELECT x, y, ROW_NUMBER() OVER (PARTITION BY x ORDER BY y) 
>  AS rn FROM t1) t2 WHERE rn <= 3;
> {code}
> The performance is not good if each group has a lot more tuples than wanted, 
> because we will retrieve and sort all the tuples, instead of just doing a 
> top-N heap sort.
> In order to do optimization for this kind of query, we need to extend window 
> to support limit, if and only if there is only 1 window function, and it is 
> {{row_number()}}. We also need a substitute rule to push the limit into 
> window. Of course, we also need to modify executor to support this 
> optimization (can be later).
> {code:java}
> Filter (rn <= 3)
>   +- Window (window#0={Partition by x order by y ROW_NUMBER()})
> {code}
> to
> {code:java}
> Filter (rn <= 3)
>   +- Window (window#0={Partition by x order by y limit 3 ROW_NUMBER()})
> {code}
> Thoughts? Objections?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)