[ 
https://issues.apache.org/jira/browse/FLINK-11943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kurt Young updated FLINK-11943:
-------------------------------
    Component/s:     (was: API / Table SQL)
                 SQL / Planner
                 Runtime / Operators

> Support TopN feature for SQL
> ----------------------------
>
>                 Key: FLINK-11943
>                 URL: https://issues.apache.org/jira/browse/FLINK-11943
>             Project: Flink
>          Issue Type: New Feature
>          Components: Runtime / Operators, SQL / Planner
>            Reporter: Jark Wu
>            Priority: Major
>
> TopN is a frequently used feature in data analysis. We can use ORDER BY + 
> LIMIT to easily express a TopN query, e.g. {{SELECT * FROM T ORDER BY amount 
> DESC LIMIT 10}}.
> But this is a global TopN, there is a great requirement for per-group TopN. 
> For example, top 10 shops for each category. In order to avoid introducing 
> new syntax for this, we would like to use traditional syntax to express it by 
> using {{ROW_NUMBER}} over window + {{FILTER}} to limit the numbers.
> For example:
> SELECT *
> FROM (
>   SELECT category, shopId, sales,
>          [ROW_NUMBER()|RANK()|DENSE_RANK()] OVER 
>           (PARTITION BY category ORDER BY sales ASC) as rownum
>   FROM shop_sales
> )
> WHERE rownum <= 10
> This issue is aiming to optimize this query to an {{Rank}} node instead of 
> {{Over}} plus {{Calc}}. And translate the {{Rank}} node into physical 
> operators.
> There are some optimization for rank operator based on the different input of 
> the Rank. We would like to implement the basic and one-fit-all 
> implementation. And do the performance improvement later. 
> Here is a brief design doc: 
> https://docs.google.com/document/d/14JCV6X6hcpoA51loprgntZNxQ2NmnDLucxgGY8xVDuI/edit#



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to