subject:"\[jira\] \[Commented\] \(FLINK\-6075\) Support Limit\/Top\(Sort\) for Stream SQL"

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-07-18 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16092481#comment-16092481
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user wuchong commented on the issue:

https://github.com/apache/flink/pull/4263
  
Hi @fhueske ,  I left this comment under the JIRA. To make sure you know 
our situation, I post it here:

>I think the most useful feature is PR3 "ORDER BY  OFFSET FETCH" 
(i.e. TopN). It is an important requirement. And there is something more 
complex such as TopN for each group (e.g. 
https://stackoverflow.com/questions/176964/select-top-10-records-for-each-category).
 We have an idea (prototype) to support it, and planning to make a whole design 
for the TopN (any-attribute), and hope that to meet soon.


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 operations (limit, top and sort) are similar in behavior as they all 
> require a sorted collection of the data on which the logic will be applied 
> (i.e., select a subset of the items or the entire sorted set). These 
> functions would make sense in the streaming context only in the context of a 
> window. Without defining a window the functions could never emit as the sort 
> operation would never trigger. If an SQL query will be provided without 
> limits an error will be thrown (`SELECT a FROM stream1 TOP 10` -> ERROR). 
> Although not targeted by this JIRA, in the case of working based on event 
> time order, the retraction mechanisms of windows and the lateness mechanisms 
> can be used to deal with out of order events and retraction/updates of 
> results.
> **Functionality example**
> We exemplify with the query below for all the 3 types of operators (sorting, 
> limit and top). Rowtime indicates when the HOP window will trigger – which 
> can be observed in the fact that outputs are generated only at those moments. 
> The HOP windows will trigger at every hour (fixed hour) and each event will 
> contribute/ be duplicated for 2 consecutive hour intervals. Proctime 
> indicates the processing time when a new event arrives in the system. Events 
> are of the type (a,b) with the ordering being applied on the b field.
> `SELECT a FROM stream1 HOP(proctime, INTERVAL '1' HOUR, INTERVAL '2' HOUR) 
> ORDER BY b (LIMIT 2/ TOP 2 / [ASC/DESC] `)
> ||Rowtime||   Proctime||  Stream1||   Limit 2||   Top 2|| Sort 
> [ASC]||
> | |10:00:00  |(aaa, 11)   |   | | 
>|
> | |10:05:00|(aab, 7)  |   | ||
> |10-11  |11:00:00  |  |   aab,aaa |aab,aaa  | aab,aaa 
>|
> | |11:03:00  |(aac,21)  |   | ||  
> 
> |11-12|12:00:00  |  | aab,aaa |aab,aaa  | aab,aaa,aac|
> | |12:10:00  |(abb,12)  |   | ||  
> 
> | |12:15:00  |(abb,12)  |   | ||  
> 
> |12-13  |13:00:00  |  |   abb,abb | abb,abb | 
> abb,abb,aac|
> |...|
> **Implementation option**
> Considering that the SQL operators will be associated with window boundaries, 
> the functionality will be implemented within the logic of the window as 
> follows.
> * Window assigner – selected based on the type of window used in SQL 
> (TUMBLING, SLIDING…)
> * Evictor/ Trigg

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-07-18 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16091337#comment-16091337
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user fhueske commented on the issue:

https://github.com/apache/flink/pull/4263
  
Thanks @rtudoran. 

Regarding PR it is up to you whether you'd like to update the current or 
open a new one. I think the new version will have less changes (LOCs and files) 
because we only need to add a few counters and conditions to 
`ProcTimeSortProcessFunction` and `RowTimeSortProcessFunction`. 

Regarding the JIRA, I would change the title and description of FLINK-6075 
describe the `ORDER BY *time ASC` case (i.e., what is already implemented and 
merged) and open new JIRAs for 
- `ORDER BY *time ASC OFFSET FETCH`
- `ORDER BY *time DESC OFFSET FETCH`
- `ORDER BY * OFFSET FETCH`


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 operations (limit, top and sort) are similar in behavior as they all 
> require a sorted collection of the data on which the logic will be applied 
> (i.e., select a subset of the items or the entire sorted set). These 
> functions would make sense in the streaming context only in the context of a 
> window. Without defining a window the functions could never emit as the sort 
> operation would never trigger. If an SQL query will be provided without 
> limits an error will be thrown (`SELECT a FROM stream1 TOP 10` -> ERROR). 
> Although not targeted by this JIRA, in the case of working based on event 
> time order, the retraction mechanisms of windows and the lateness mechanisms 
> can be used to deal with out of order events and retraction/updates of 
> results.
> **Functionality example**
> We exemplify with the query below for all the 3 types of operators (sorting, 
> limit and top). Rowtime indicates when the HOP window will trigger – which 
> can be observed in the fact that outputs are generated only at those moments. 
> The HOP windows will trigger at every hour (fixed hour) and each event will 
> contribute/ be duplicated for 2 consecutive hour intervals. Proctime 
> indicates the processing time when a new event arrives in the system. Events 
> are of the type (a,b) with the ordering being applied on the b field.
> `SELECT a FROM stream1 HOP(proctime, INTERVAL '1' HOUR, INTERVAL '2' HOUR) 
> ORDER BY b (LIMIT 2/ TOP 2 / [ASC/DESC] `)
> ||Rowtime||   Proctime||  Stream1||   Limit 2||   Top 2|| Sort 
> [ASC]||
> | |10:00:00  |(aaa, 11)   |   | | 
>|
> | |10:05:00|(aab, 7)  |   | ||
> |10-11  |11:00:00  |  |   aab,aaa |aab,aaa  | aab,aaa 
>|
> | |11:03:00  |(aac,21)  |   | ||  
> 
> |11-12|12:00:00  |  | aab,aaa |aab,aaa  | aab,aaa,aac|
> | |12:10:00  |(abb,12)  |   | ||  
> 
> | |12:15:00  |(abb,12)  |   | ||  
> 
> |12-13  |13:00:00  |  |   abb,abb | abb,abb | 
> abb,abb,aac|
> |...|
> **Implementation option**
> Considering that the SQL operators will be associated with window boundaries, 
> the functionality will be implemented within the logic of the window as 
> follows.
> * Window assi

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-07-18 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16091228#comment-16091228
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user rtudoran commented on the issue:

https://github.com/apache/flink/pull/4263
  
@fhueske @wuchong
Thanks for the feedback. Fine for me to eliminate the retraction for the 
*time ASC cases. I will modify the PR (to support *time ASC cases) over the 
next days and re-submit it. I am not sure if you prefer it to be a new commit 
here or if i should open a new PR?

@fhueske
Regarding the other cases you mentioned - to support time DESC ... do you 
want to modify the JIRA as well? can you modify it or do you want me to add the 
sub-jiras for the cases you mentioned?


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 operations (limit, top and sort) are similar in behavior as they all 
> require a sorted collection of the data on which the logic will be applied 
> (i.e., select a subset of the items or the entire sorted set). These 
> functions would make sense in the streaming context only in the context of a 
> window. Without defining a window the functions could never emit as the sort 
> operation would never trigger. If an SQL query will be provided without 
> limits an error will be thrown (`SELECT a FROM stream1 TOP 10` -> ERROR). 
> Although not targeted by this JIRA, in the case of working based on event 
> time order, the retraction mechanisms of windows and the lateness mechanisms 
> can be used to deal with out of order events and retraction/updates of 
> results.
> **Functionality example**
> We exemplify with the query below for all the 3 types of operators (sorting, 
> limit and top). Rowtime indicates when the HOP window will trigger – which 
> can be observed in the fact that outputs are generated only at those moments. 
> The HOP windows will trigger at every hour (fixed hour) and each event will 
> contribute/ be duplicated for 2 consecutive hour intervals. Proctime 
> indicates the processing time when a new event arrives in the system. Events 
> are of the type (a,b) with the ordering being applied on the b field.
> `SELECT a FROM stream1 HOP(proctime, INTERVAL '1' HOUR, INTERVAL '2' HOUR) 
> ORDER BY b (LIMIT 2/ TOP 2 / [ASC/DESC] `)
> ||Rowtime||   Proctime||  Stream1||   Limit 2||   Top 2|| Sort 
> [ASC]||
> | |10:00:00  |(aaa, 11)   |   | | 
>|
> | |10:05:00|(aab, 7)  |   | ||
> |10-11  |11:00:00  |  |   aab,aaa |aab,aaa  | aab,aaa 
>|
> | |11:03:00  |(aac,21)  |   | ||  
> 
> |11-12|12:00:00  |  | aab,aaa |aab,aaa  | aab,aaa,aac|
> | |12:10:00  |(abb,12)  |   | ||  
> 
> | |12:15:00  |(abb,12)  |   | ||  
> 
> |12-13  |13:00:00  |  |   abb,abb | abb,abb | 
> abb,abb,aac|
> |...|
> **Implementation option**
> Considering that the SQL operators will be associated with window boundaries, 
> the functionality will be implemented within the logic of the window as 
> follows.
> * Window assigner – selected based on the type of window used in SQL 
> (TUMBLING, SLIDING…)
> * Evictor/ Trigger – time or cou

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-07-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16089763#comment-16089763
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user fhueske commented on the issue:

https://github.com/apache/flink/pull/4263
  
Hi @rtudoran, you are right, I said we will need retraction for OFFSET and 
FETCH. However, it is only required for the general case and the special case 
of `ORDER BY *time ASC` without updates should be implemented without. 

IMO, the semantics of a query are fixed and not really up for discussion. 
Just imagine a regular database would execute the same query on a table that 
holds the same data as the stream. This defines the result of the streaming 
query that our operators must produce.

Moving forward, I would modify this PR to support `ORDER BY` with `OFFSET` 
and `FETCH` for append-only input by extending the existing operators with 
counters and open JIRAs for the other two cases (`ORDER BY *time DESC` and 
`ORDER BY *`).

Best, Fabian


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 operations (limit, top and sort) are similar in behavior as they all 
> require a sorted collection of the data on which the logic will be applied 
> (i.e., select a subset of the items or the entire sorted set). These 
> functions would make sense in the streaming context only in the context of a 
> window. Without defining a window the functions could never emit as the sort 
> operation would never trigger. If an SQL query will be provided without 
> limits an error will be thrown (`SELECT a FROM stream1 TOP 10` -> ERROR). 
> Although not targeted by this JIRA, in the case of working based on event 
> time order, the retraction mechanisms of windows and the lateness mechanisms 
> can be used to deal with out of order events and retraction/updates of 
> results.
> **Functionality example**
> We exemplify with the query below for all the 3 types of operators (sorting, 
> limit and top). Rowtime indicates when the HOP window will trigger – which 
> can be observed in the fact that outputs are generated only at those moments. 
> The HOP windows will trigger at every hour (fixed hour) and each event will 
> contribute/ be duplicated for 2 consecutive hour intervals. Proctime 
> indicates the processing time when a new event arrives in the system. Events 
> are of the type (a,b) with the ordering being applied on the b field.
> `SELECT a FROM stream1 HOP(proctime, INTERVAL '1' HOUR, INTERVAL '2' HOUR) 
> ORDER BY b (LIMIT 2/ TOP 2 / [ASC/DESC] `)
> ||Rowtime||   Proctime||  Stream1||   Limit 2||   Top 2|| Sort 
> [ASC]||
> | |10:00:00  |(aaa, 11)   |   | | 
>|
> | |10:05:00|(aab, 7)  |   | ||
> |10-11  |11:00:00  |  |   aab,aaa |aab,aaa  | aab,aaa 
>|
> | |11:03:00  |(aac,21)  |   | ||  
> 
> |11-12|12:00:00  |  | aab,aaa |aab,aaa  | aab,aaa,aac|
> | |12:10:00  |(abb,12)  |   | ||  
> 
> | |12:15:00  |(abb,12)  |   | ||  
> 
> |12-13  |13:00:00  |  |   abb,abb | abb,abb | 
> abb,abb,aac|
> |...|
> **Implementation option**
> Considering that the

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-07-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16089482#comment-16089482
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user wuchong commented on the issue:

https://github.com/apache/flink/pull/4263
  
@rtudoran  "SELECT x FROM stream ORDER BY *time FETCH 2" do not need 
retraction. Because it is order by ascending time. The query will only emit the 
first 2 rows and after that drop all rows. We need retraction if the query is 
order by time desc. 


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 operations (limit, top and sort) are similar in behavior as they all 
> require a sorted collection of the data on which the logic will be applied 
> (i.e., select a subset of the items or the entire sorted set). These 
> functions would make sense in the streaming context only in the context of a 
> window. Without defining a window the functions could never emit as the sort 
> operation would never trigger. If an SQL query will be provided without 
> limits an error will be thrown (`SELECT a FROM stream1 TOP 10` -> ERROR). 
> Although not targeted by this JIRA, in the case of working based on event 
> time order, the retraction mechanisms of windows and the lateness mechanisms 
> can be used to deal with out of order events and retraction/updates of 
> results.
> **Functionality example**
> We exemplify with the query below for all the 3 types of operators (sorting, 
> limit and top). Rowtime indicates when the HOP window will trigger – which 
> can be observed in the fact that outputs are generated only at those moments. 
> The HOP windows will trigger at every hour (fixed hour) and each event will 
> contribute/ be duplicated for 2 consecutive hour intervals. Proctime 
> indicates the processing time when a new event arrives in the system. Events 
> are of the type (a,b) with the ordering being applied on the b field.
> `SELECT a FROM stream1 HOP(proctime, INTERVAL '1' HOUR, INTERVAL '2' HOUR) 
> ORDER BY b (LIMIT 2/ TOP 2 / [ASC/DESC] `)
> ||Rowtime||   Proctime||  Stream1||   Limit 2||   Top 2|| Sort 
> [ASC]||
> | |10:00:00  |(aaa, 11)   |   | | 
>|
> | |10:05:00|(aab, 7)  |   | ||
> |10-11  |11:00:00  |  |   aab,aaa |aab,aaa  | aab,aaa 
>|
> | |11:03:00  |(aac,21)  |   | ||  
> 
> |11-12|12:00:00  |  | aab,aaa |aab,aaa  | aab,aaa,aac|
> | |12:10:00  |(abb,12)  |   | ||  
> 
> | |12:15:00  |(abb,12)  |   | ||  
> 
> |12-13  |13:00:00  |  |   abb,abb | abb,abb | 
> abb,abb,aac|
> |...|
> **Implementation option**
> Considering that the SQL operators will be associated with window boundaries, 
> the functionality will be implemented within the logic of the window as 
> follows.
> * Window assigner – selected based on the type of window used in SQL 
> (TUMBLING, SLIDING…)
> * Evictor/ Trigger – time or count evictor based on the definition of the 
> window boundaries
> * Apply – window function that sorts data and selects the output to trigger 
> (based on LIMIT/TOP parameters). All data will be sorted at once and result 
> outputted when the window is tr

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-07-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16089436#comment-16089436
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user rtudoran commented on the issue:

https://github.com/apache/flink/pull/4263
  
@fhueske You were the one  that argue that retraction support is needed for 
offset and fetch. When we were discussing this i was not into having the 
retraction. I think finally it depends on what behavior we want to give for the 
function. If we have a SELECT x FROM stream ORDER BY *time FETCH 2 when we emit 
the new latest data - do we still want to invalidate what we have emitted 
previously (in this case we would emit the retraction), otherwise if we 
consider that that was the valid input at that point in time - we do not need 
the retraction. I am fine either way. Let me know. 


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 operations (limit, top and sort) are similar in behavior as they all 
> require a sorted collection of the data on which the logic will be applied 
> (i.e., select a subset of the items or the entire sorted set). These 
> functions would make sense in the streaming context only in the context of a 
> window. Without defining a window the functions could never emit as the sort 
> operation would never trigger. If an SQL query will be provided without 
> limits an error will be thrown (`SELECT a FROM stream1 TOP 10` -> ERROR). 
> Although not targeted by this JIRA, in the case of working based on event 
> time order, the retraction mechanisms of windows and the lateness mechanisms 
> can be used to deal with out of order events and retraction/updates of 
> results.
> **Functionality example**
> We exemplify with the query below for all the 3 types of operators (sorting, 
> limit and top). Rowtime indicates when the HOP window will trigger – which 
> can be observed in the fact that outputs are generated only at those moments. 
> The HOP windows will trigger at every hour (fixed hour) and each event will 
> contribute/ be duplicated for 2 consecutive hour intervals. Proctime 
> indicates the processing time when a new event arrives in the system. Events 
> are of the type (a,b) with the ordering being applied on the b field.
> `SELECT a FROM stream1 HOP(proctime, INTERVAL '1' HOUR, INTERVAL '2' HOUR) 
> ORDER BY b (LIMIT 2/ TOP 2 / [ASC/DESC] `)
> ||Rowtime||   Proctime||  Stream1||   Limit 2||   Top 2|| Sort 
> [ASC]||
> | |10:00:00  |(aaa, 11)   |   | | 
>|
> | |10:05:00|(aab, 7)  |   | ||
> |10-11  |11:00:00  |  |   aab,aaa |aab,aaa  | aab,aaa 
>|
> | |11:03:00  |(aac,21)  |   | ||  
> 
> |11-12|12:00:00  |  | aab,aaa |aab,aaa  | aab,aaa,aac|
> | |12:10:00  |(abb,12)  |   | ||  
> 
> | |12:15:00  |(abb,12)  |   | ||  
> 
> |12-13  |13:00:00  |  |   abb,abb | abb,abb | 
> abb,abb,aac|
> |...|
> **Implementation option**
> Considering that the SQL operators will be associated with window boundaries, 
> the functionality will be implemented within the logic of the window as 
> follows.
> * Window assigner – selected based on t

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-07-16 Thread Jark Wu (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16089243#comment-16089243
 ] 

Jark Wu commented on FLINK-6075:


Hi [~fhueske], I agree with your semantic and retraction mechanism. And your 
task division looks good to me. 

I think the most useful feature is PR3 "ORDER BY  OFFSET FETCH" (i.e. 
TopN). We have a great requirement for this and something more complex such as 
TopN for each group 
(https://stackoverflow.com/questions/176964/select-top-10-records-for-each-category).
  We are planning to draft a design for the TopN (any-attribute), and hope to 
meet soon.

> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 operations (limit, top and sort) are similar in behavior as they all 
> require a sorted collection of the data on which the logic will be applied 
> (i.e., select a subset of the items or the entire sorted set). These 
> functions would make sense in the streaming context only in the context of a 
> window. Without defining a window the functions could never emit as the sort 
> operation would never trigger. If an SQL query will be provided without 
> limits an error will be thrown (`SELECT a FROM stream1 TOP 10` -> ERROR). 
> Although not targeted by this JIRA, in the case of working based on event 
> time order, the retraction mechanisms of windows and the lateness mechanisms 
> can be used to deal with out of order events and retraction/updates of 
> results.
> **Functionality example**
> We exemplify with the query below for all the 3 types of operators (sorting, 
> limit and top). Rowtime indicates when the HOP window will trigger – which 
> can be observed in the fact that outputs are generated only at those moments. 
> The HOP windows will trigger at every hour (fixed hour) and each event will 
> contribute/ be duplicated for 2 consecutive hour intervals. Proctime 
> indicates the processing time when a new event arrives in the system. Events 
> are of the type (a,b) with the ordering being applied on the b field.
> `SELECT a FROM stream1 HOP(proctime, INTERVAL '1' HOUR, INTERVAL '2' HOUR) 
> ORDER BY b (LIMIT 2/ TOP 2 / [ASC/DESC] `)
> ||Rowtime||   Proctime||  Stream1||   Limit 2||   Top 2|| Sort 
> [ASC]||
> | |10:00:00  |(aaa, 11)   |   | | 
>|
> | |10:05:00|(aab, 7)  |   | ||
> |10-11  |11:00:00  |  |   aab,aaa |aab,aaa  | aab,aaa 
>|
> | |11:03:00  |(aac,21)  |   | ||  
> 
> |11-12|12:00:00  |  | aab,aaa |aab,aaa  | aab,aaa,aac|
> | |12:10:00  |(abb,12)  |   | ||  
> 
> | |12:15:00  |(abb,12)  |   | ||  
> 
> |12-13  |13:00:00  |  |   abb,abb | abb,abb | 
> abb,abb,aac|
> |...|
> **Implementation option**
> Considering that the SQL operators will be associated with window boundaries, 
> the functionality will be implemented within the logic of the window as 
> follows.
> * Window assigner – selected based on the type of window used in SQL 
> (TUMBLING, SLIDING…)
> * Evictor/ Trigger – time or count evictor based on the definition of the 
> window boundaries
> * Apply – window function that sorts data and selects the output to trigger 
> (based on LIMI

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-07-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16089102#comment-16089102
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user fhueske commented on the issue:

https://github.com/apache/flink/pull/4263
  
Hi @rtudoran, I had a look at the PR and left a comment on 
[FLINK-6075](https://issues.apache.org/jira/browse/FLINK-6075).

I think we do not need retraction for `OFFSET` and `FETCH` on `ORDER BY 
*time ASC` and can implement this by extending the existing `ProcessFunction` 
for sorting. We only need to buffer and retract rows if we support `OFFSET` and 
`FETCH` for arbitrary sort orders.

Let me know what you think,
Fabian



> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 operations (limit, top and sort) are similar in behavior as they all 
> require a sorted collection of the data on which the logic will be applied 
> (i.e., select a subset of the items or the entire sorted set). These 
> functions would make sense in the streaming context only in the context of a 
> window. Without defining a window the functions could never emit as the sort 
> operation would never trigger. If an SQL query will be provided without 
> limits an error will be thrown (`SELECT a FROM stream1 TOP 10` -> ERROR). 
> Although not targeted by this JIRA, in the case of working based on event 
> time order, the retraction mechanisms of windows and the lateness mechanisms 
> can be used to deal with out of order events and retraction/updates of 
> results.
> **Functionality example**
> We exemplify with the query below for all the 3 types of operators (sorting, 
> limit and top). Rowtime indicates when the HOP window will trigger – which 
> can be observed in the fact that outputs are generated only at those moments. 
> The HOP windows will trigger at every hour (fixed hour) and each event will 
> contribute/ be duplicated for 2 consecutive hour intervals. Proctime 
> indicates the processing time when a new event arrives in the system. Events 
> are of the type (a,b) with the ordering being applied on the b field.
> `SELECT a FROM stream1 HOP(proctime, INTERVAL '1' HOUR, INTERVAL '2' HOUR) 
> ORDER BY b (LIMIT 2/ TOP 2 / [ASC/DESC] `)
> ||Rowtime||   Proctime||  Stream1||   Limit 2||   Top 2|| Sort 
> [ASC]||
> | |10:00:00  |(aaa, 11)   |   | | 
>|
> | |10:05:00|(aab, 7)  |   | ||
> |10-11  |11:00:00  |  |   aab,aaa |aab,aaa  | aab,aaa 
>|
> | |11:03:00  |(aac,21)  |   | ||  
> 
> |11-12|12:00:00  |  | aab,aaa |aab,aaa  | aab,aaa,aac|
> | |12:10:00  |(abb,12)  |   | ||  
> 
> | |12:15:00  |(abb,12)  |   | ||  
> 
> |12-13  |13:00:00  |  |   abb,abb | abb,abb | 
> abb,abb,aac|
> |...|
> **Implementation option**
> Considering that the SQL operators will be associated with window boundaries, 
> the functionality will be implemented within the logic of the window as 
> follows.
> * Window assigner – selected based on the type of window used in SQL 
> (TUMBLING, SLIDING…)
> * Evictor/ Trigger – time or count evictor based on the definition of the 
> window bou

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-07-16 Thread Fabian Hueske (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16089101#comment-16089101
 ] 

Fabian Hueske commented on FLINK-6075:
--

Hi [~rtudoran],

I had a look at [PR #4263|https://github.com/apache/flink/pull/4263] and think 
we need to rework the semantics of the operators a bit.
The result of a query on a stream must be the same as if the query is executed 
on the same data in a regular database. 

For example, given a query {{SELECT * FROM t ORDER BY rowtime ASC OFFSET 10}}, 
we must emit all rows except for the rows with the 10 smallest timestamps 
(which should be the first). {{SELECT * FROM t ORDER BY rowtime ASC FETCH FIRST 
10 ROWS ONLY}} would return the rows with the 10 smallest timestamps. 

With these semantics in mind, {{ORDER BY}} with {{OFFSET}} and {{FETCH}} can be 
implemented as follows:

- {{ORDER BY time ASC}}:
  - ORDER BY: already implemented. No retraction needed.
  - ORDER BY OFFSET x: same operator as ORDER BY but drop the first x rows. 
Only counter state and no retraction needed.
  - ORDER BY FETCH x: same operator as ORDER BY but drop all rows after the 
first x. Only counter state and no retraction needed.
  - ORDER BY OFFSET x FETCH y: same operator as ORDER BY but drop the first x 
rows and stop after y rows have been emitted. Only counter state and no 
retraction needed.

We can also support {{ORDER BY}} with {{OFFSET}} and {{FETCH}} for arbitrary 
sort orders if we do not care about the emission order and only about the 
correct filtering.

- ORDER BY any attribute: Correct order cannot be provided, but we can return 
all valid rows.
  - ORDER BY: correct order cannot be provided, hence not possible.
  - ORDER BY OFFSET x: We can emit all valid rows if we keep x rows as state. 
No retraction needed.
  - ORDER BY FETCH y: We can emit all valid rows if we keep y rows as state. 
Retraction needed.
  - ORDER BY OFFSET x FETCH y: We can emit all valid rows if we keep x + y rows 
as state. Retraction needed.

Given these semantics, I propose to split the development of ORDER BY with 
OFFSET and FETCH into multiple subissues / PRs:

- Subissue / PR 1: ORDER BY OFFSET, ORDER BY FETCH, and ORDER BY OFFSET FETCH 
for ascending time order. This can be implemented by extending the current 
ORDER BY implementation with two counters.
- Subissue / PR 2: ORDER BY OFFSET, ORDER BY FETCH, and ORDER BY OFFSET FETCH 
for descending time order. These can be implemented in two operators, one for 
proc-time and one for row-time.
- Subissue / PR 3: ORDER BY OFFSET, ORDER BY FETCH, and ORDER BY OFFSET FETCH 
for arbitrary sort orders. These can be implemented in two operators, one for 
proc-time and one for row-time.

All implementations only support append-only input tables. We can later add 
support for inputs that are updated.

Best, Fabian

> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 operations (limit, top and sort) are similar in behavior as they all 
> require a sorted collection of the data on which the logic will be applied 
> (i.e., select a subset of the items or the entire sorted set). These 
> functions would make sense in the streaming context only in the context of a 
> window. Without defining a window the functions could never emit as the sort 
> operation would never trigger. If an SQL query will be provided without 
> limits an error will be thrown (`SELECT a FROM stream1 TO

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-07-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16076116#comment-16076116
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user rtudoran commented on the issue:

https://github.com/apache/flink/pull/4263
  
@wuchong @fhueske 
I closed that old PR (i forgot about it).
Thanks for looking into this.


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 operations (limit, top and sort) are similar in behavior as they all 
> require a sorted collection of the data on which the logic will be applied 
> (i.e., select a subset of the items or the entire sorted set). These 
> functions would make sense in the streaming context only in the context of a 
> window. Without defining a window the functions could never emit as the sort 
> operation would never trigger. If an SQL query will be provided without 
> limits an error will be thrown (`SELECT a FROM stream1 TOP 10` -> ERROR). 
> Although not targeted by this JIRA, in the case of working based on event 
> time order, the retraction mechanisms of windows and the lateness mechanisms 
> can be used to deal with out of order events and retraction/updates of 
> results.
> **Functionality example**
> We exemplify with the query below for all the 3 types of operators (sorting, 
> limit and top). Rowtime indicates when the HOP window will trigger – which 
> can be observed in the fact that outputs are generated only at those moments. 
> The HOP windows will trigger at every hour (fixed hour) and each event will 
> contribute/ be duplicated for 2 consecutive hour intervals. Proctime 
> indicates the processing time when a new event arrives in the system. Events 
> are of the type (a,b) with the ordering being applied on the b field.
> `SELECT a FROM stream1 HOP(proctime, INTERVAL '1' HOUR, INTERVAL '2' HOUR) 
> ORDER BY b (LIMIT 2/ TOP 2 / [ASC/DESC] `)
> ||Rowtime||   Proctime||  Stream1||   Limit 2||   Top 2|| Sort 
> [ASC]||
> | |10:00:00  |(aaa, 11)   |   | | 
>|
> | |10:05:00|(aab, 7)  |   | ||
> |10-11  |11:00:00  |  |   aab,aaa |aab,aaa  | aab,aaa 
>|
> | |11:03:00  |(aac,21)  |   | ||  
> 
> |11-12|12:00:00  |  | aab,aaa |aab,aaa  | aab,aaa,aac|
> | |12:10:00  |(abb,12)  |   | ||  
> 
> | |12:15:00  |(abb,12)  |   | ||  
> 
> |12-13  |13:00:00  |  |   abb,abb | abb,abb | 
> abb,abb,aac|
> |...|
> **Implementation option**
> Considering that the SQL operators will be associated with window boundaries, 
> the functionality will be implemented within the logic of the window as 
> follows.
> * Window assigner – selected based on the type of window used in SQL 
> (TUMBLING, SLIDING…)
> * Evictor/ Trigger – time or count evictor based on the definition of the 
> window boundaries
> * Apply – window function that sorts data and selects the output to trigger 
> (based on LIMIT/TOP parameters). All data will be sorted at once and result 
> outputted when the window is triggered
> An alternative implementation can be to use a fold window function to sort 
> the elements as they arrive, one at a time followed by a flatMa

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-07-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16076113#comment-16076113
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user rtudoran closed the pull request at:

https://github.com/apache/flink/pull/3714


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 operations (limit, top and sort) are similar in behavior as they all 
> require a sorted collection of the data on which the logic will be applied 
> (i.e., select a subset of the items or the entire sorted set). These 
> functions would make sense in the streaming context only in the context of a 
> window. Without defining a window the functions could never emit as the sort 
> operation would never trigger. If an SQL query will be provided without 
> limits an error will be thrown (`SELECT a FROM stream1 TOP 10` -> ERROR). 
> Although not targeted by this JIRA, in the case of working based on event 
> time order, the retraction mechanisms of windows and the lateness mechanisms 
> can be used to deal with out of order events and retraction/updates of 
> results.
> **Functionality example**
> We exemplify with the query below for all the 3 types of operators (sorting, 
> limit and top). Rowtime indicates when the HOP window will trigger – which 
> can be observed in the fact that outputs are generated only at those moments. 
> The HOP windows will trigger at every hour (fixed hour) and each event will 
> contribute/ be duplicated for 2 consecutive hour intervals. Proctime 
> indicates the processing time when a new event arrives in the system. Events 
> are of the type (a,b) with the ordering being applied on the b field.
> `SELECT a FROM stream1 HOP(proctime, INTERVAL '1' HOUR, INTERVAL '2' HOUR) 
> ORDER BY b (LIMIT 2/ TOP 2 / [ASC/DESC] `)
> ||Rowtime||   Proctime||  Stream1||   Limit 2||   Top 2|| Sort 
> [ASC]||
> | |10:00:00  |(aaa, 11)   |   | | 
>|
> | |10:05:00|(aab, 7)  |   | ||
> |10-11  |11:00:00  |  |   aab,aaa |aab,aaa  | aab,aaa 
>|
> | |11:03:00  |(aac,21)  |   | ||  
> 
> |11-12|12:00:00  |  | aab,aaa |aab,aaa  | aab,aaa,aac|
> | |12:10:00  |(abb,12)  |   | ||  
> 
> | |12:15:00  |(abb,12)  |   | ||  
> 
> |12-13  |13:00:00  |  |   abb,abb | abb,abb | 
> abb,abb,aac|
> |...|
> **Implementation option**
> Considering that the SQL operators will be associated with window boundaries, 
> the functionality will be implemented within the logic of the window as 
> follows.
> * Window assigner – selected based on the type of window used in SQL 
> (TUMBLING, SLIDING…)
> * Evictor/ Trigger – time or count evictor based on the definition of the 
> window boundaries
> * Apply – window function that sorts data and selects the output to trigger 
> (based on LIMIT/TOP parameters). All data will be sorted at once and result 
> outputted when the window is triggered
> An alternative implementation can be to use a fold window function to sort 
> the elements as they arrive, one at a time followed by a flatMap to filter 
> the number of outputs. 
> !sort.png!
> **General logic of Join**
> ```
> inputDataStrea

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-07-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16075773#comment-16075773
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user wuchong commented on the issue:

https://github.com/apache/flink/pull/4263
  
Thank your for the work , I will also look into this in the next days. BTW, 
can you please close #3714 @rtudoran ?


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 operations (limit, top and sort) are similar in behavior as they all 
> require a sorted collection of the data on which the logic will be applied 
> (i.e., select a subset of the items or the entire sorted set). These 
> functions would make sense in the streaming context only in the context of a 
> window. Without defining a window the functions could never emit as the sort 
> operation would never trigger. If an SQL query will be provided without 
> limits an error will be thrown (`SELECT a FROM stream1 TOP 10` -> ERROR). 
> Although not targeted by this JIRA, in the case of working based on event 
> time order, the retraction mechanisms of windows and the lateness mechanisms 
> can be used to deal with out of order events and retraction/updates of 
> results.
> **Functionality example**
> We exemplify with the query below for all the 3 types of operators (sorting, 
> limit and top). Rowtime indicates when the HOP window will trigger – which 
> can be observed in the fact that outputs are generated only at those moments. 
> The HOP windows will trigger at every hour (fixed hour) and each event will 
> contribute/ be duplicated for 2 consecutive hour intervals. Proctime 
> indicates the processing time when a new event arrives in the system. Events 
> are of the type (a,b) with the ordering being applied on the b field.
> `SELECT a FROM stream1 HOP(proctime, INTERVAL '1' HOUR, INTERVAL '2' HOUR) 
> ORDER BY b (LIMIT 2/ TOP 2 / [ASC/DESC] `)
> ||Rowtime||   Proctime||  Stream1||   Limit 2||   Top 2|| Sort 
> [ASC]||
> | |10:00:00  |(aaa, 11)   |   | | 
>|
> | |10:05:00|(aab, 7)  |   | ||
> |10-11  |11:00:00  |  |   aab,aaa |aab,aaa  | aab,aaa 
>|
> | |11:03:00  |(aac,21)  |   | ||  
> 
> |11-12|12:00:00  |  | aab,aaa |aab,aaa  | aab,aaa,aac|
> | |12:10:00  |(abb,12)  |   | ||  
> 
> | |12:15:00  |(abb,12)  |   | ||  
> 
> |12-13  |13:00:00  |  |   abb,abb | abb,abb | 
> abb,abb,aac|
> |...|
> **Implementation option**
> Considering that the SQL operators will be associated with window boundaries, 
> the functionality will be implemented within the logic of the window as 
> follows.
> * Window assigner – selected based on the type of window used in SQL 
> (TUMBLING, SLIDING…)
> * Evictor/ Trigger – time or count evictor based on the definition of the 
> window boundaries
> * Apply – window function that sorts data and selects the output to trigger 
> (based on LIMIT/TOP parameters). All data will be sorted at once and result 
> outputted when the window is triggered
> An alternative implementation can be to use a fold window function to sort 
> the elements as they arrive, one at a time foll

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-07-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16075003#comment-16075003
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user rtudoran commented on the issue:

https://github.com/apache/flink/pull/4263
  
@fhueske
I have implemented the support for offset and fetch for both rowtime and 
proctime. 
This PR will close the JIRA issue. Please have a look


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 operations (limit, top and sort) are similar in behavior as they all 
> require a sorted collection of the data on which the logic will be applied 
> (i.e., select a subset of the items or the entire sorted set). These 
> functions would make sense in the streaming context only in the context of a 
> window. Without defining a window the functions could never emit as the sort 
> operation would never trigger. If an SQL query will be provided without 
> limits an error will be thrown (`SELECT a FROM stream1 TOP 10` -> ERROR). 
> Although not targeted by this JIRA, in the case of working based on event 
> time order, the retraction mechanisms of windows and the lateness mechanisms 
> can be used to deal with out of order events and retraction/updates of 
> results.
> **Functionality example**
> We exemplify with the query below for all the 3 types of operators (sorting, 
> limit and top). Rowtime indicates when the HOP window will trigger – which 
> can be observed in the fact that outputs are generated only at those moments. 
> The HOP windows will trigger at every hour (fixed hour) and each event will 
> contribute/ be duplicated for 2 consecutive hour intervals. Proctime 
> indicates the processing time when a new event arrives in the system. Events 
> are of the type (a,b) with the ordering being applied on the b field.
> `SELECT a FROM stream1 HOP(proctime, INTERVAL '1' HOUR, INTERVAL '2' HOUR) 
> ORDER BY b (LIMIT 2/ TOP 2 / [ASC/DESC] `)
> ||Rowtime||   Proctime||  Stream1||   Limit 2||   Top 2|| Sort 
> [ASC]||
> | |10:00:00  |(aaa, 11)   |   | | 
>|
> | |10:05:00|(aab, 7)  |   | ||
> |10-11  |11:00:00  |  |   aab,aaa |aab,aaa  | aab,aaa 
>|
> | |11:03:00  |(aac,21)  |   | ||  
> 
> |11-12|12:00:00  |  | aab,aaa |aab,aaa  | aab,aaa,aac|
> | |12:10:00  |(abb,12)  |   | ||  
> 
> | |12:15:00  |(abb,12)  |   | ||  
> 
> |12-13  |13:00:00  |  |   abb,abb | abb,abb | 
> abb,abb,aac|
> |...|
> **Implementation option**
> Considering that the SQL operators will be associated with window boundaries, 
> the functionality will be implemented within the logic of the window as 
> follows.
> * Window assigner – selected based on the type of window used in SQL 
> (TUMBLING, SLIDING…)
> * Evictor/ Trigger – time or count evictor based on the definition of the 
> window boundaries
> * Apply – window function that sorts data and selects the output to trigger 
> (based on LIMIT/TOP parameters). All data will be sorted at once and result 
> outputted when the window is triggered
> An alternative implementation can be to use a fold window function to sort 
> the el

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-07-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16075000#comment-16075000
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user rtudoran commented on the issue:

https://github.com/apache/flink/pull/4003
  
I have updated these contributions against the master branch and based on 
the latest modifications after the merge of the basis for orderby. I opened a 
new PR for this #4263.
I will close this one as it is not relevant anymore


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 operations (limit, top and sort) are similar in behavior as they all 
> require a sorted collection of the data on which the logic will be applied 
> (i.e., select a subset of the items or the entire sorted set). These 
> functions would make sense in the streaming context only in the context of a 
> window. Without defining a window the functions could never emit as the sort 
> operation would never trigger. If an SQL query will be provided without 
> limits an error will be thrown (`SELECT a FROM stream1 TOP 10` -> ERROR). 
> Although not targeted by this JIRA, in the case of working based on event 
> time order, the retraction mechanisms of windows and the lateness mechanisms 
> can be used to deal with out of order events and retraction/updates of 
> results.
> **Functionality example**
> We exemplify with the query below for all the 3 types of operators (sorting, 
> limit and top). Rowtime indicates when the HOP window will trigger – which 
> can be observed in the fact that outputs are generated only at those moments. 
> The HOP windows will trigger at every hour (fixed hour) and each event will 
> contribute/ be duplicated for 2 consecutive hour intervals. Proctime 
> indicates the processing time when a new event arrives in the system. Events 
> are of the type (a,b) with the ordering being applied on the b field.
> `SELECT a FROM stream1 HOP(proctime, INTERVAL '1' HOUR, INTERVAL '2' HOUR) 
> ORDER BY b (LIMIT 2/ TOP 2 / [ASC/DESC] `)
> ||Rowtime||   Proctime||  Stream1||   Limit 2||   Top 2|| Sort 
> [ASC]||
> | |10:00:00  |(aaa, 11)   |   | | 
>|
> | |10:05:00|(aab, 7)  |   | ||
> |10-11  |11:00:00  |  |   aab,aaa |aab,aaa  | aab,aaa 
>|
> | |11:03:00  |(aac,21)  |   | ||  
> 
> |11-12|12:00:00  |  | aab,aaa |aab,aaa  | aab,aaa,aac|
> | |12:10:00  |(abb,12)  |   | ||  
> 
> | |12:15:00  |(abb,12)  |   | ||  
> 
> |12-13  |13:00:00  |  |   abb,abb | abb,abb | 
> abb,abb,aac|
> |...|
> **Implementation option**
> Considering that the SQL operators will be associated with window boundaries, 
> the functionality will be implemented within the logic of the window as 
> follows.
> * Window assigner – selected based on the type of window used in SQL 
> (TUMBLING, SLIDING…)
> * Evictor/ Trigger – time or count evictor based on the definition of the 
> window boundaries
> * Apply – window function that sorts data and selects the output to trigger 
> (based on LIMIT/TOP parameters). All data will be sorted at once and result 
> outputted when the window is triggered
> An alte

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-07-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16075001#comment-16075001
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user rtudoran closed the pull request at:

https://github.com/apache/flink/pull/4003


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 operations (limit, top and sort) are similar in behavior as they all 
> require a sorted collection of the data on which the logic will be applied 
> (i.e., select a subset of the items or the entire sorted set). These 
> functions would make sense in the streaming context only in the context of a 
> window. Without defining a window the functions could never emit as the sort 
> operation would never trigger. If an SQL query will be provided without 
> limits an error will be thrown (`SELECT a FROM stream1 TOP 10` -> ERROR). 
> Although not targeted by this JIRA, in the case of working based on event 
> time order, the retraction mechanisms of windows and the lateness mechanisms 
> can be used to deal with out of order events and retraction/updates of 
> results.
> **Functionality example**
> We exemplify with the query below for all the 3 types of operators (sorting, 
> limit and top). Rowtime indicates when the HOP window will trigger – which 
> can be observed in the fact that outputs are generated only at those moments. 
> The HOP windows will trigger at every hour (fixed hour) and each event will 
> contribute/ be duplicated for 2 consecutive hour intervals. Proctime 
> indicates the processing time when a new event arrives in the system. Events 
> are of the type (a,b) with the ordering being applied on the b field.
> `SELECT a FROM stream1 HOP(proctime, INTERVAL '1' HOUR, INTERVAL '2' HOUR) 
> ORDER BY b (LIMIT 2/ TOP 2 / [ASC/DESC] `)
> ||Rowtime||   Proctime||  Stream1||   Limit 2||   Top 2|| Sort 
> [ASC]||
> | |10:00:00  |(aaa, 11)   |   | | 
>|
> | |10:05:00|(aab, 7)  |   | ||
> |10-11  |11:00:00  |  |   aab,aaa |aab,aaa  | aab,aaa 
>|
> | |11:03:00  |(aac,21)  |   | ||  
> 
> |11-12|12:00:00  |  | aab,aaa |aab,aaa  | aab,aaa,aac|
> | |12:10:00  |(abb,12)  |   | ||  
> 
> | |12:15:00  |(abb,12)  |   | ||  
> 
> |12-13  |13:00:00  |  |   abb,abb | abb,abb | 
> abb,abb,aac|
> |...|
> **Implementation option**
> Considering that the SQL operators will be associated with window boundaries, 
> the functionality will be implemented within the logic of the window as 
> follows.
> * Window assigner – selected based on the type of window used in SQL 
> (TUMBLING, SLIDING…)
> * Evictor/ Trigger – time or count evictor based on the definition of the 
> window boundaries
> * Apply – window function that sorts data and selects the output to trigger 
> (based on LIMIT/TOP parameters). All data will be sorted at once and result 
> outputted when the window is triggered
> An alternative implementation can be to use a fold window function to sort 
> the elements as they arrive, one at a time followed by a flatMap to filter 
> the number of outputs. 
> !sort.png!
> **General logic of Join**
> ```
> inputDataStrea

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-07-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16074996#comment-16074996
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

GitHub user rtudoran opened a pull request:

https://github.com/apache/flink/pull/4263

[FLINK-6075] - Support Limit/Top(Sort) for Stream SQL

Thanks for contributing to Apache Flink. Before you open your pull request, 
please take the following check list into consideration.
If your changes take all of the items into account, feel free to open your 
pull request. For more information and/or questions please refer to the [How To 
Contribute guide](http://flink.apache.org/how-to-contribute.html).
In addition to going through the list, please provide a meaningful 
description of your changes.

- [x ] General
  - The pull request references the related JIRA issue ("[FLINK-XXX] Jira 
title text")
  - The pull request addresses only one issue
  - Each commit in the PR has a meaningful commit message (including the 
JIRA id)

- [x ] Documentation
  - Documentation has been added for new functionality
  - Old documentation affected by the pull request has been updated
  - JavaDoc for public methods has been added

- [x ] Tests & Build
  - Functionality added by the pull request is covered by tests
  - `mvn clean verify` has been executed successfully locally or a Travis 
build has passed


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/huawei-flink/flink FLINK-6075-OFRe4

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/4263.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4263






> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 operations (limit, top and sort) are similar in behavior as they all 
> require a sorted collection of the data on which the logic will be applied 
> (i.e., select a subset of the items or the entire sorted set). These 
> functions would make sense in the streaming context only in the context of a 
> window. Without defining a window the functions could never emit as the sort 
> operation would never trigger. If an SQL query will be provided without 
> limits an error will be thrown (`SELECT a FROM stream1 TOP 10` -> ERROR). 
> Although not targeted by this JIRA, in the case of working based on event 
> time order, the retraction mechanisms of windows and the lateness mechanisms 
> can be used to deal with out of order events and retraction/updates of 
> results.
> **Functionality example**
> We exemplify with the query below for all the 3 types of operators (sorting, 
> limit and top). Rowtime indicates when the HOP window will trigger – which 
> can be observed in the fact that outputs are generated only at those moments. 
> The HOP windows will trigger at every hour (fixed hour) and each event will 
> contribute/ be duplicated for 2 consecutive hour intervals. Proctime 
> indicates the processing time when a new event arrives in the system. Events 
> are of the type (a,b) with the ordering being applied on the b field.
> `SELECT a FROM stream1 HOP(proctime, INTERVAL '1' HOUR, INTERVAL '2' HOUR) 
> ORDER BY b (LIMIT 2/ TOP 2 / [ASC/DESC] `)
> ||Rowtime||

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-07-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16072013#comment-16072013
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user rtudoran commented on the issue:

https://github.com/apache/flink/pull/3889
  
@fhueske Thanks for the review and for handling the the fixes (sorry for 
still kipping some :( ). I will create this week the PR for the rest of the 
functions.


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 operations (limit, top and sort) are similar in behavior as they all 
> require a sorted collection of the data on which the logic will be applied 
> (i.e., select a subset of the items or the entire sorted set). These 
> functions would make sense in the streaming context only in the context of a 
> window. Without defining a window the functions could never emit as the sort 
> operation would never trigger. If an SQL query will be provided without 
> limits an error will be thrown (`SELECT a FROM stream1 TOP 10` -> ERROR). 
> Although not targeted by this JIRA, in the case of working based on event 
> time order, the retraction mechanisms of windows and the lateness mechanisms 
> can be used to deal with out of order events and retraction/updates of 
> results.
> **Functionality example**
> We exemplify with the query below for all the 3 types of operators (sorting, 
> limit and top). Rowtime indicates when the HOP window will trigger – which 
> can be observed in the fact that outputs are generated only at those moments. 
> The HOP windows will trigger at every hour (fixed hour) and each event will 
> contribute/ be duplicated for 2 consecutive hour intervals. Proctime 
> indicates the processing time when a new event arrives in the system. Events 
> are of the type (a,b) with the ordering being applied on the b field.
> `SELECT a FROM stream1 HOP(proctime, INTERVAL '1' HOUR, INTERVAL '2' HOUR) 
> ORDER BY b (LIMIT 2/ TOP 2 / [ASC/DESC] `)
> ||Rowtime||   Proctime||  Stream1||   Limit 2||   Top 2|| Sort 
> [ASC]||
> | |10:00:00  |(aaa, 11)   |   | | 
>|
> | |10:05:00|(aab, 7)  |   | ||
> |10-11  |11:00:00  |  |   aab,aaa |aab,aaa  | aab,aaa 
>|
> | |11:03:00  |(aac,21)  |   | ||  
> 
> |11-12|12:00:00  |  | aab,aaa |aab,aaa  | aab,aaa,aac|
> | |12:10:00  |(abb,12)  |   | ||  
> 
> | |12:15:00  |(abb,12)  |   | ||  
> 
> |12-13  |13:00:00  |  |   abb,abb | abb,abb | 
> abb,abb,aac|
> |...|
> **Implementation option**
> Considering that the SQL operators will be associated with window boundaries, 
> the functionality will be implemented within the logic of the window as 
> follows.
> * Window assigner – selected based on the type of window used in SQL 
> (TUMBLING, SLIDING…)
> * Evictor/ Trigger – time or count evictor based on the definition of the 
> window boundaries
> * Apply – window function that sorts data and selects the output to trigger 
> (based on LIMIT/TOP parameters). All data will be sorted at once and result 
> outputted when the window is triggered
> An alternative implementation can be to use a fold window function to sort 
>

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-07-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16072010#comment-16072010
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user rtudoran commented on the issue:

https://github.com/apache/flink/pull/4003
  
@fhueske I will re-factor these contributions (hopefully by the end of the 
week) based on the things that are already merged from order by. I will ping 
you soon 


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 operations (limit, top and sort) are similar in behavior as they all 
> require a sorted collection of the data on which the logic will be applied 
> (i.e., select a subset of the items or the entire sorted set). These 
> functions would make sense in the streaming context only in the context of a 
> window. Without defining a window the functions could never emit as the sort 
> operation would never trigger. If an SQL query will be provided without 
> limits an error will be thrown (`SELECT a FROM stream1 TOP 10` -> ERROR). 
> Although not targeted by this JIRA, in the case of working based on event 
> time order, the retraction mechanisms of windows and the lateness mechanisms 
> can be used to deal with out of order events and retraction/updates of 
> results.
> **Functionality example**
> We exemplify with the query below for all the 3 types of operators (sorting, 
> limit and top). Rowtime indicates when the HOP window will trigger – which 
> can be observed in the fact that outputs are generated only at those moments. 
> The HOP windows will trigger at every hour (fixed hour) and each event will 
> contribute/ be duplicated for 2 consecutive hour intervals. Proctime 
> indicates the processing time when a new event arrives in the system. Events 
> are of the type (a,b) with the ordering being applied on the b field.
> `SELECT a FROM stream1 HOP(proctime, INTERVAL '1' HOUR, INTERVAL '2' HOUR) 
> ORDER BY b (LIMIT 2/ TOP 2 / [ASC/DESC] `)
> ||Rowtime||   Proctime||  Stream1||   Limit 2||   Top 2|| Sort 
> [ASC]||
> | |10:00:00  |(aaa, 11)   |   | | 
>|
> | |10:05:00|(aab, 7)  |   | ||
> |10-11  |11:00:00  |  |   aab,aaa |aab,aaa  | aab,aaa 
>|
> | |11:03:00  |(aac,21)  |   | ||  
> 
> |11-12|12:00:00  |  | aab,aaa |aab,aaa  | aab,aaa,aac|
> | |12:10:00  |(abb,12)  |   | ||  
> 
> | |12:15:00  |(abb,12)  |   | ||  
> 
> |12-13  |13:00:00  |  |   abb,abb | abb,abb | 
> abb,abb,aac|
> |...|
> **Implementation option**
> Considering that the SQL operators will be associated with window boundaries, 
> the functionality will be implemented within the logic of the window as 
> follows.
> * Window assigner – selected based on the type of window used in SQL 
> (TUMBLING, SLIDING…)
> * Evictor/ Trigger – time or count evictor based on the definition of the 
> window boundaries
> * Apply – window function that sorts data and selects the output to trigger 
> (based on LIMIT/TOP parameters). All data will be sorted at once and result 
> outputted when the window is triggered
> An alternative implementation can be to use a fold window function to sort

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-30 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16070721#comment-16070721
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/3889


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 operations (limit, top and sort) are similar in behavior as they all 
> require a sorted collection of the data on which the logic will be applied 
> (i.e., select a subset of the items or the entire sorted set). These 
> functions would make sense in the streaming context only in the context of a 
> window. Without defining a window the functions could never emit as the sort 
> operation would never trigger. If an SQL query will be provided without 
> limits an error will be thrown (`SELECT a FROM stream1 TOP 10` -> ERROR). 
> Although not targeted by this JIRA, in the case of working based on event 
> time order, the retraction mechanisms of windows and the lateness mechanisms 
> can be used to deal with out of order events and retraction/updates of 
> results.
> **Functionality example**
> We exemplify with the query below for all the 3 types of operators (sorting, 
> limit and top). Rowtime indicates when the HOP window will trigger – which 
> can be observed in the fact that outputs are generated only at those moments. 
> The HOP windows will trigger at every hour (fixed hour) and each event will 
> contribute/ be duplicated for 2 consecutive hour intervals. Proctime 
> indicates the processing time when a new event arrives in the system. Events 
> are of the type (a,b) with the ordering being applied on the b field.
> `SELECT a FROM stream1 HOP(proctime, INTERVAL '1' HOUR, INTERVAL '2' HOUR) 
> ORDER BY b (LIMIT 2/ TOP 2 / [ASC/DESC] `)
> ||Rowtime||   Proctime||  Stream1||   Limit 2||   Top 2|| Sort 
> [ASC]||
> | |10:00:00  |(aaa, 11)   |   | | 
>|
> | |10:05:00|(aab, 7)  |   | ||
> |10-11  |11:00:00  |  |   aab,aaa |aab,aaa  | aab,aaa 
>|
> | |11:03:00  |(aac,21)  |   | ||  
> 
> |11-12|12:00:00  |  | aab,aaa |aab,aaa  | aab,aaa,aac|
> | |12:10:00  |(abb,12)  |   | ||  
> 
> | |12:15:00  |(abb,12)  |   | ||  
> 
> |12-13  |13:00:00  |  |   abb,abb | abb,abb | 
> abb,abb,aac|
> |...|
> **Implementation option**
> Considering that the SQL operators will be associated with window boundaries, 
> the functionality will be implemented within the logic of the window as 
> follows.
> * Window assigner – selected based on the type of window used in SQL 
> (TUMBLING, SLIDING…)
> * Evictor/ Trigger – time or count evictor based on the definition of the 
> window boundaries
> * Apply – window function that sorts data and selects the output to trigger 
> (based on LIMIT/TOP parameters). All data will be sorted at once and result 
> outputted when the window is triggered
> An alternative implementation can be to use a fold window function to sort 
> the elements as they arrive, one at a time followed by a flatMap to filter 
> the number of outputs. 
> !sort.png!
> **General logic of Join**
> ```
> inputDataStream.

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-30 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16070310#comment-16070310
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user fhueske commented on the issue:

https://github.com/apache/flink/pull/3714
  
@rtudoran can you please close this PR? Thank you


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 operations (limit, top and sort) are similar in behavior as they all 
> require a sorted collection of the data on which the logic will be applied 
> (i.e., select a subset of the items or the entire sorted set). These 
> functions would make sense in the streaming context only in the context of a 
> window. Without defining a window the functions could never emit as the sort 
> operation would never trigger. If an SQL query will be provided without 
> limits an error will be thrown (`SELECT a FROM stream1 TOP 10` -> ERROR). 
> Although not targeted by this JIRA, in the case of working based on event 
> time order, the retraction mechanisms of windows and the lateness mechanisms 
> can be used to deal with out of order events and retraction/updates of 
> results.
> **Functionality example**
> We exemplify with the query below for all the 3 types of operators (sorting, 
> limit and top). Rowtime indicates when the HOP window will trigger – which 
> can be observed in the fact that outputs are generated only at those moments. 
> The HOP windows will trigger at every hour (fixed hour) and each event will 
> contribute/ be duplicated for 2 consecutive hour intervals. Proctime 
> indicates the processing time when a new event arrives in the system. Events 
> are of the type (a,b) with the ordering being applied on the b field.
> `SELECT a FROM stream1 HOP(proctime, INTERVAL '1' HOUR, INTERVAL '2' HOUR) 
> ORDER BY b (LIMIT 2/ TOP 2 / [ASC/DESC] `)
> ||Rowtime||   Proctime||  Stream1||   Limit 2||   Top 2|| Sort 
> [ASC]||
> | |10:00:00  |(aaa, 11)   |   | | 
>|
> | |10:05:00|(aab, 7)  |   | ||
> |10-11  |11:00:00  |  |   aab,aaa |aab,aaa  | aab,aaa 
>|
> | |11:03:00  |(aac,21)  |   | ||  
> 
> |11-12|12:00:00  |  | aab,aaa |aab,aaa  | aab,aaa,aac|
> | |12:10:00  |(abb,12)  |   | ||  
> 
> | |12:15:00  |(abb,12)  |   | ||  
> 
> |12-13  |13:00:00  |  |   abb,abb | abb,abb | 
> abb,abb,aac|
> |...|
> **Implementation option**
> Considering that the SQL operators will be associated with window boundaries, 
> the functionality will be implemented within the logic of the window as 
> follows.
> * Window assigner – selected based on the type of window used in SQL 
> (TUMBLING, SLIDING…)
> * Evictor/ Trigger – time or count evictor based on the definition of the 
> window boundaries
> * Apply – window function that sorts data and selects the output to trigger 
> (based on LIMIT/TOP parameters). All data will be sorted at once and result 
> outputted when the window is triggered
> An alternative implementation can be to use a fold window function to sort 
> the elements as they arrive, one at a time followed by a flatMap to filter 
> the number of outputs. 
> !sort.png

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-30 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16070252#comment-16070252
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r125006376
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/calcite/RelTimeIndicatorConverter.scala
 ---
@@ -92,8 +92,19 @@ class RelTimeIndicatorConverter(rexBuilder: RexBuilder) 
extends RelShuttle {
   override def visit(minus: LogicalMinus): RelNode =
 throw new TableException("Logical minus in a stream environment is not 
supported yet.")
 
-  override def visit(sort: LogicalSort): RelNode =
-throw new TableException("Logical sort in a stream environment is not 
supported yet.")
+  override def visit(sort: LogicalSort): RelNode = {
+
+val input = sort.getInput.accept(this)
+
+val materializer = new RexTimeIndicatorMaterializer(
+  rexBuilder,
+  input.getRowType.getFieldList.map(_.getType))
+   
+//val offset = if(sort.offset != null) 
sort.offset.accept(materializer) else null
--- End diff --

Should be removed


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 operations (limit, top and sort) are similar in behavior as they all 
> require a sorted collection of the data on which the logic will be applied 
> (i.e., select a subset of the items or the entire sorted set). These 
> functions would make sense in the streaming context only in the context of a 
> window. Without defining a window the functions could never emit as the sort 
> operation would never trigger. If an SQL query will be provided without 
> limits an error will be thrown (`SELECT a FROM stream1 TOP 10` -> ERROR). 
> Although not targeted by this JIRA, in the case of working based on event 
> time order, the retraction mechanisms of windows and the lateness mechanisms 
> can be used to deal with out of order events and retraction/updates of 
> results.
> **Functionality example**
> We exemplify with the query below for all the 3 types of operators (sorting, 
> limit and top). Rowtime indicates when the HOP window will trigger – which 
> can be observed in the fact that outputs are generated only at those moments. 
> The HOP windows will trigger at every hour (fixed hour) and each event will 
> contribute/ be duplicated for 2 consecutive hour intervals. Proctime 
> indicates the processing time when a new event arrives in the system. Events 
> are of the type (a,b) with the ordering being applied on the b field.
> `SELECT a FROM stream1 HOP(proctime, INTERVAL '1' HOUR, INTERVAL '2' HOUR) 
> ORDER BY b (LIMIT 2/ TOP 2 / [ASC/DESC] `)
> ||Rowtime||   Proctime||  Stream1||   Limit 2||   Top 2|| Sort 
> [ASC]||
> | |10:00:00  |(aaa, 11)   |   | | 
>|
> | |10:05:00|(aab, 7)  |   | ||
> |10-11  |11:00:00  |  |   aab,aaa |aab,aaa  | aab,aaa 
>|
> | |11:03:00  |(aac,21)  |   | ||  
> 
> |11-12|12:00:00  |  | aab,aaa |aab,aaa  | aab,aaa,aac|
> | |12:10:00  |(abb,12)  |   | ||  
> 
> | |12:15:00  |(abb

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-30 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16070250#comment-16070250
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r125063072
  
--- Diff: 
flink-libraries/flink-table/src/test/scala/org/apache/flink/table/api/scala/stream/sql/SortITCase.scala
 ---
@@ -0,0 +1,127 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.api.scala.stream.sql
+
+import org.apache.flink.api.scala._
+import 
org.apache.flink.table.api.scala.stream.sql.SortITCase.StringRowSelectorSink
+import 
org.apache.flink.table.api.scala.stream.sql.TimeTestUtil.EventTimeSourceFunction
+import org.apache.flink.streaming.api.functions.source.SourceFunction
+import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment
+import org.apache.flink.table.api.TableEnvironment
+import org.apache.flink.table.api.scala._
+import org.apache.flink.table.api.scala.stream.utils.{StreamITCase, 
StreamTestData, StreamingWithStateTestBase}
+import org.apache.flink.api.common.typeinfo.BasicTypeInfo
+import org.apache.flink.api.java.typeutils.RowTypeInfo
+import org.apache.flink.api.common.typeinfo.TypeInformation
+import org.apache.flink.types.Row
+import org.junit.Assert._
+import org.junit._
+import org.apache.flink.streaming.api.TimeCharacteristic
+import 
org.apache.flink.streaming.api.functions.source.SourceFunction.SourceContext
+import org.apache.flink.streaming.api.watermark.Watermark
+import scala.collection.mutable
+import org.apache.flink.streaming.api.functions.sink.RichSinkFunction
+
+class SortITCase extends StreamingWithStateTestBase {
+
+  @Test
+  def testEventTimeOrderBy(): Unit = {
+val data = Seq(
+  Left((1500L, (1L, 15, "Hello"))),
+  Left((1600L, (1L, 16, "Hello"))),
+  Left((1000L, (1L, 1, "Hello"))),
+  Left((2000L, (2L, 2, "Hello"))),
+  Right(1000L),
+  Left((2000L, (2L, 2, "Hello"))),
+  Left((2000L, (2L, 3, "Hello"))),
+  Left((3000L, (3L, 3, "Hello"))),
+  Left((2000L, (3L, 1, "Hello"))),
+  Right(2000L),
+  Left((4000L, (4L, 4, "Hello"))),
+  Right(3000L),
+  Left((5000L, (5L, 5, "Hello"))),
+  Right(5000L),
+  Left((6000L, (6L, 65, "Hello"))),
+  Left((6000L, (6L, 6, "Hello"))),
+  Left((6000L, (6L, 67, "Hello"))),
+  Left((6000L, (6L, -1, "Hello"))),
+  Left((6000L, (6L, 6, "Hello"))),
+  Right(7000L),
+  Left((9000L, (6L, 9, "Hello"))),
+  Left((8500L, (6L, 18, "Hello"))),
+  Left((9000L, (6L, 7, "Hello"))),
+  Right(1L),
+  Left((1L, (7L, 7, "Hello World"))),
+  Left((11000L, (7L, 77, "Hello World"))),
+  Left((11000L, (7L, 17, "Hello World"))),
+  Right(12000L),
+  Left((14000L, (7L, 18, "Hello World"))),
+  Right(14000L),
+  Left((15000L, (8L, 8, "Hello World"))),
+  Right(17000L),
+  Left((2L, (20L, 20, "Hello World"))), 
+  Right(19000L))
+
+val env = StreamExecutionEnvironment.getExecutionEnvironment
+env.setStateBackend(getStateBackend)
+env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
+val tEnv = TableEnvironment.getTableEnvironment(env)
+StreamITCase.clear
+
+val t1 = env.addSource(new EventTimeSourceFunction[(Long, Int, 
String)](data))
+  .toTable(tEnv, 'a, 'b, 'c, 'rowtime.rowtime)
+  
+tEnv.registerTable("T1", t1)
+
+val  sqlQuery = "SELECT b FROM T1 " +
+  "ORDER BY rowtime, b ASC ";
+  
+  
+val result = tEnv.sql(sqlQuery).toDataStream[Row]
--- End diff --

OK, will do that before merging


> Support Limit/Top(Sort) for Stream SQL
> ---

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-30 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16070248#comment-16070248
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r125058559
  
--- Diff: 
flink-libraries/flink-table/src/test/scala/org/apache/flink/table/runtime/aggregate/TimeSortProcessFunctionTest.scala
 ---
@@ -122,6 +130,11 @@ class TimeSortProcessFunctionTest{
   Row.of(1: JInt, 12L: JLong, 1: JInt, "aaa", 11L: JLong),true), 4))
 expectedOutput.add(new StreamRecord(new CRow(
   Row.of(1: JInt, 12L: JLong, 0: JInt, "aaa", 11L: JLong),true), 4))
+
+expectedOutput.add(new StreamRecord(new CRow(
+  Row.of(1: JInt, 1L: JLong, 0: JInt, "aaa", 11L: JLong),true), 1006))
+expectedOutput.add(new StreamRecord(new CRow(
+  Row.of(1: JInt, 2L: JLong, 0: JInt, "aaa", 11L: JLong),true), 1006))
   
 TestHarnessUtil.assertOutputEqualsSorted("Output was not correct.",
--- End diff --

The test should check that the ProcessFunction emit the rows in the correct 
order. `assertOutputEqualsSorted` sorts the result and expected data before 
comparing them. We have to use `assertOutputEquals` instead.


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 operations (limit, top and sort) are similar in behavior as they all 
> require a sorted collection of the data on which the logic will be applied 
> (i.e., select a subset of the items or the entire sorted set). These 
> functions would make sense in the streaming context only in the context of a 
> window. Without defining a window the functions could never emit as the sort 
> operation would never trigger. If an SQL query will be provided without 
> limits an error will be thrown (`SELECT a FROM stream1 TOP 10` -> ERROR). 
> Although not targeted by this JIRA, in the case of working based on event 
> time order, the retraction mechanisms of windows and the lateness mechanisms 
> can be used to deal with out of order events and retraction/updates of 
> results.
> **Functionality example**
> We exemplify with the query below for all the 3 types of operators (sorting, 
> limit and top). Rowtime indicates when the HOP window will trigger – which 
> can be observed in the fact that outputs are generated only at those moments. 
> The HOP windows will trigger at every hour (fixed hour) and each event will 
> contribute/ be duplicated for 2 consecutive hour intervals. Proctime 
> indicates the processing time when a new event arrives in the system. Events 
> are of the type (a,b) with the ordering being applied on the b field.
> `SELECT a FROM stream1 HOP(proctime, INTERVAL '1' HOUR, INTERVAL '2' HOUR) 
> ORDER BY b (LIMIT 2/ TOP 2 / [ASC/DESC] `)
> ||Rowtime||   Proctime||  Stream1||   Limit 2||   Top 2|| Sort 
> [ASC]||
> | |10:00:00  |(aaa, 11)   |   | | 
>|
> | |10:05:00|(aab, 7)  |   | ||
> |10-11  |11:00:00  |  |   aab,aaa |aab,aaa  | aab,aaa 
>|
> | |11:03:00  |(aac,21)  |   | ||  
> 
> |11-12|12:00:00  |  | aab,aaa |aab,aaa  | aab,aaa,aac|
> | |12:10:00  |(abb,12)  |   |

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-30 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16070253#comment-16070253
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r125006340
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/calcite/RelTimeIndicatorConverter.scala
 ---
@@ -92,8 +92,19 @@ class RelTimeIndicatorConverter(rexBuilder: RexBuilder) 
extends RelShuttle {
   override def visit(minus: LogicalMinus): RelNode =
 throw new TableException("Logical minus in a stream environment is not 
supported yet.")
 
-  override def visit(sort: LogicalSort): RelNode =
-throw new TableException("Logical sort in a stream environment is not 
supported yet.")
+  override def visit(sort: LogicalSort): RelNode = {
+
+val input = sort.getInput.accept(this)
+
+val materializer = new RexTimeIndicatorMaterializer(
--- End diff --

unused. Should be removed


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 operations (limit, top and sort) are similar in behavior as they all 
> require a sorted collection of the data on which the logic will be applied 
> (i.e., select a subset of the items or the entire sorted set). These 
> functions would make sense in the streaming context only in the context of a 
> window. Without defining a window the functions could never emit as the sort 
> operation would never trigger. If an SQL query will be provided without 
> limits an error will be thrown (`SELECT a FROM stream1 TOP 10` -> ERROR). 
> Although not targeted by this JIRA, in the case of working based on event 
> time order, the retraction mechanisms of windows and the lateness mechanisms 
> can be used to deal with out of order events and retraction/updates of 
> results.
> **Functionality example**
> We exemplify with the query below for all the 3 types of operators (sorting, 
> limit and top). Rowtime indicates when the HOP window will trigger – which 
> can be observed in the fact that outputs are generated only at those moments. 
> The HOP windows will trigger at every hour (fixed hour) and each event will 
> contribute/ be duplicated for 2 consecutive hour intervals. Proctime 
> indicates the processing time when a new event arrives in the system. Events 
> are of the type (a,b) with the ordering being applied on the b field.
> `SELECT a FROM stream1 HOP(proctime, INTERVAL '1' HOUR, INTERVAL '2' HOUR) 
> ORDER BY b (LIMIT 2/ TOP 2 / [ASC/DESC] `)
> ||Rowtime||   Proctime||  Stream1||   Limit 2||   Top 2|| Sort 
> [ASC]||
> | |10:00:00  |(aaa, 11)   |   | | 
>|
> | |10:05:00|(aab, 7)  |   | ||
> |10-11  |11:00:00  |  |   aab,aaa |aab,aaa  | aab,aaa 
>|
> | |11:03:00  |(aac,21)  |   | ||  
> 
> |11-12|12:00:00  |  | aab,aaa |aab,aaa  | aab,aaa,aac|
> | |12:10:00  |(abb,12)  |   | ||  
> 
> | |12:15:00  |(abb,12)  |   | ||  
> 
> |12-13  |13:00:00  |  |   abb,abb | abb,abb | 
> abb,abb,aac|
> |...|
> **Implement

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-30 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16070249#comment-16070249
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r125058944
  
--- Diff: 
flink-libraries/flink-table/src/test/scala/org/apache/flink/table/runtime/aggregate/TimeSortProcessFunctionTest.scala
 ---
@@ -0,0 +1,256 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.table.runtime.aggregate
+
+import java.util.Comparator
+import java.util.concurrent.ConcurrentLinkedQueue
+import java.lang.{Integer => JInt, Long => JLong}
+
+import org.apache.flink.api.common.typeinfo.BasicTypeInfo._
+import org.apache.flink.api.common.typeinfo.{BasicTypeInfo, 
TypeInformation}
+import org.apache.flink.api.java.functions.KeySelector
+import org.apache.flink.api.java.typeutils.RowTypeInfo
+import org.apache.flink.streaming.api.operators.KeyedProcessOperator
+import org.apache.flink.streaming.api.watermark.Watermark
+import org.apache.flink.streaming.runtime.streamrecord.StreamRecord
+import 
org.apache.flink.streaming.util.{KeyedOneInputStreamOperatorTestHarness, 
TestHarnessUtil}
+import org.apache.flink.types.Row
+import org.junit.Test
+import org.apache.flink.table.runtime.aggregate.ProcTimeSortProcessFunction
+import org.apache.flink.table.runtime.aggregate.RowTimeSortProcessFunction
+import 
org.apache.flink.table.runtime.aggregate.TimeSortProcessFunctionTest._
+import org.apache.flink.api.java.typeutils.runtime.RowComparator
+import org.apache.flink.api.common.typeutils.TypeSerializer
+import org.apache.flink.api.common.typeutils.TypeComparator
+import org.apache.flink.table.runtime.types.{CRow, CRowTypeInfo}
+import org.apache.flink.streaming.api.TimeCharacteristic
+
+class TimeSortProcessFunctionTest{
+
+  
+  @Test
+  def testSortProcTimeHarnessPartitioned(): Unit = {
+
+val rT =  new RowTypeInfo(Array[TypeInformation[_]](
+  INT_TYPE_INFO,
+  LONG_TYPE_INFO,
+  INT_TYPE_INFO,
+  STRING_TYPE_INFO,
+  LONG_TYPE_INFO),
+  Array("a","b","c","d","e"))
+
+val rTA =  new RowTypeInfo(Array[TypeInformation[_]](
+ LONG_TYPE_INFO), Array("count"))
+val indexes = Array(1,2)
+  
+val fieldComps = Array[TypeComparator[AnyRef]](
+  LONG_TYPE_INFO.createComparator(true, 
null).asInstanceOf[TypeComparator[AnyRef]],
+  INT_TYPE_INFO.createComparator(false, 
null).asInstanceOf[TypeComparator[AnyRef]] )
+val booleanOrders = Array(true, false)
+
+
+val rowComp = new RowComparator(
+  rT.getTotalFields,
+  indexes,
+  fieldComps,
+  new Array[TypeSerializer[AnyRef]](0), //used only for serialized 
comparisons
+  booleanOrders)
+
+val collectionRowComparator = new CollectionRowComparator(rowComp)
+
+val inputCRowType = CRowTypeInfo(rT)
+
+val processFunction = new KeyedProcessOperator[Integer,CRow,CRow](
+  new ProcTimeSortProcessFunction(
+inputCRowType,
+collectionRowComparator))
+  
+   val testHarness = new 
KeyedOneInputStreamOperatorTestHarness[Integer,CRow,CRow](
+  processFunction, 
+  new TupleRowSelector(0), 
+  BasicTypeInfo.INT_TYPE_INFO)
+
+   testHarness.open();
+
+   testHarness.setProcessingTime(3)
+
+  // timestamp is ignored in processing time
+testHarness.processElement(new StreamRecord(new CRow(
+  Row.of(1: JInt, 11L: JLong, 1: JInt, "aaa", 11L: JLong), true), 
1001))
+testHarness.processElement(new StreamRecord(new CRow(
+Row.of(1: JInt, 12L: JLong, 1: JInt, "aaa", 11L: JLong), true), 
2002))
+testHarness.processElement(new StreamRecord(new CR

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-30 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16070251#comment-16070251
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r125040180
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/nodes/datastream/DataStreamSort.scala
 ---
@@ -0,0 +1,212 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.plan.nodes.datastream
+
+import org.apache.calcite.plan.{ RelOptCluster, RelTraitSet }
--- End diff --

About half of the imports are unused. Other classes have unused imports as 
well.


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 operations (limit, top and sort) are similar in behavior as they all 
> require a sorted collection of the data on which the logic will be applied 
> (i.e., select a subset of the items or the entire sorted set). These 
> functions would make sense in the streaming context only in the context of a 
> window. Without defining a window the functions could never emit as the sort 
> operation would never trigger. If an SQL query will be provided without 
> limits an error will be thrown (`SELECT a FROM stream1 TOP 10` -> ERROR). 
> Although not targeted by this JIRA, in the case of working based on event 
> time order, the retraction mechanisms of windows and the lateness mechanisms 
> can be used to deal with out of order events and retraction/updates of 
> results.
> **Functionality example**
> We exemplify with the query below for all the 3 types of operators (sorting, 
> limit and top). Rowtime indicates when the HOP window will trigger – which 
> can be observed in the fact that outputs are generated only at those moments. 
> The HOP windows will trigger at every hour (fixed hour) and each event will 
> contribute/ be duplicated for 2 consecutive hour intervals. Proctime 
> indicates the processing time when a new event arrives in the system. Events 
> are of the type (a,b) with the ordering being applied on the b field.
> `SELECT a FROM stream1 HOP(proctime, INTERVAL '1' HOUR, INTERVAL '2' HOUR) 
> ORDER BY b (LIMIT 2/ TOP 2 / [ASC/DESC] `)
> ||Rowtime||   Proctime||  Stream1||   Limit 2||   Top 2|| Sort 
> [ASC]||
> | |10:00:00  |(aaa, 11)   |   | | 
>|
> | |10:05:00|(aab, 7)  |   | ||
> |10-11  |11:00:00  |  |

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-28 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16066767#comment-16066767
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user rtudoran commented on the issue:

https://github.com/apache/flink/pull/3889
  
@fhueske  I have addressed the remarks you have done (with 1-2 minor 
exceptions for which you can see my comments). Please have a look. I think it 
could be ready to be merged and than I can start the preparation for the offset 
and fetch


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 operations (limit, top and sort) are similar in behavior as they all 
> require a sorted collection of the data on which the logic will be applied 
> (i.e., select a subset of the items or the entire sorted set). These 
> functions would make sense in the streaming context only in the context of a 
> window. Without defining a window the functions could never emit as the sort 
> operation would never trigger. If an SQL query will be provided without 
> limits an error will be thrown (`SELECT a FROM stream1 TOP 10` -> ERROR). 
> Although not targeted by this JIRA, in the case of working based on event 
> time order, the retraction mechanisms of windows and the lateness mechanisms 
> can be used to deal with out of order events and retraction/updates of 
> results.
> **Functionality example**
> We exemplify with the query below for all the 3 types of operators (sorting, 
> limit and top). Rowtime indicates when the HOP window will trigger – which 
> can be observed in the fact that outputs are generated only at those moments. 
> The HOP windows will trigger at every hour (fixed hour) and each event will 
> contribute/ be duplicated for 2 consecutive hour intervals. Proctime 
> indicates the processing time when a new event arrives in the system. Events 
> are of the type (a,b) with the ordering being applied on the b field.
> `SELECT a FROM stream1 HOP(proctime, INTERVAL '1' HOUR, INTERVAL '2' HOUR) 
> ORDER BY b (LIMIT 2/ TOP 2 / [ASC/DESC] `)
> ||Rowtime||   Proctime||  Stream1||   Limit 2||   Top 2|| Sort 
> [ASC]||
> | |10:00:00  |(aaa, 11)   |   | | 
>|
> | |10:05:00|(aab, 7)  |   | ||
> |10-11  |11:00:00  |  |   aab,aaa |aab,aaa  | aab,aaa 
>|
> | |11:03:00  |(aac,21)  |   | ||  
> 
> |11-12|12:00:00  |  | aab,aaa |aab,aaa  | aab,aaa,aac|
> | |12:10:00  |(abb,12)  |   | ||  
> 
> | |12:15:00  |(abb,12)  |   | ||  
> 
> |12-13  |13:00:00  |  |   abb,abb | abb,abb | 
> abb,abb,aac|
> |...|
> **Implementation option**
> Considering that the SQL operators will be associated with window boundaries, 
> the functionality will be implemented within the logic of the window as 
> follows.
> * Window assigner – selected based on the type of window used in SQL 
> (TUMBLING, SLIDING…)
> * Evictor/ Trigger – time or count evictor based on the definition of the 
> window boundaries
> * Apply – window function that sorts data and selects the output to trigger 
> (based on LIMIT/TOP parameters). All data will be sorted at once and result 
> outputted when the window is triggered
>

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-28 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16066716#comment-16066716
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user rtudoran commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r124576040
  
--- Diff: 
flink-libraries/flink-table/src/test/scala/org/apache/flink/table/runtime/aggregate/TimeSortProcessFunctionTest.scala
 ---
@@ -0,0 +1,256 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.table.runtime.aggregate
+
+import java.util.Comparator
+import java.util.concurrent.ConcurrentLinkedQueue
+import java.lang.{Integer => JInt, Long => JLong}
+
+import org.apache.flink.api.common.typeinfo.BasicTypeInfo._
+import org.apache.flink.api.common.typeinfo.{BasicTypeInfo, 
TypeInformation}
+import org.apache.flink.api.java.functions.KeySelector
+import org.apache.flink.api.java.typeutils.RowTypeInfo
+import org.apache.flink.streaming.api.operators.KeyedProcessOperator
+import org.apache.flink.streaming.api.watermark.Watermark
+import org.apache.flink.streaming.runtime.streamrecord.StreamRecord
+import 
org.apache.flink.streaming.util.{KeyedOneInputStreamOperatorTestHarness, 
TestHarnessUtil}
+import org.apache.flink.types.Row
+import org.junit.Test
+import org.apache.flink.table.runtime.aggregate.ProcTimeSortProcessFunction
+import org.apache.flink.table.runtime.aggregate.RowTimeSortProcessFunction
+import 
org.apache.flink.table.runtime.aggregate.TimeSortProcessFunctionTest._
+import org.apache.flink.api.java.typeutils.runtime.RowComparator
+import org.apache.flink.api.common.typeutils.TypeSerializer
+import org.apache.flink.api.common.typeutils.TypeComparator
+import org.apache.flink.table.runtime.types.{CRow, CRowTypeInfo}
+import org.apache.flink.streaming.api.TimeCharacteristic
+
+class TimeSortProcessFunctionTest{
+
+  
+  @Test
+  def testSortProcTimeHarnessPartitioned(): Unit = {
+
+val rT =  new RowTypeInfo(Array[TypeInformation[_]](
+  INT_TYPE_INFO,
+  LONG_TYPE_INFO,
+  INT_TYPE_INFO,
+  STRING_TYPE_INFO,
+  LONG_TYPE_INFO),
+  Array("a","b","c","d","e"))
+
+val rTA =  new RowTypeInfo(Array[TypeInformation[_]](
+ LONG_TYPE_INFO), Array("count"))
+val indexes = Array(1,2)
+  
+val fieldComps = Array[TypeComparator[AnyRef]](
+  LONG_TYPE_INFO.createComparator(true, 
null).asInstanceOf[TypeComparator[AnyRef]],
+  INT_TYPE_INFO.createComparator(false, 
null).asInstanceOf[TypeComparator[AnyRef]] )
+val booleanOrders = Array(true, false)
+
+
+val rowComp = new RowComparator(
+  rT.getTotalFields,
+  indexes,
+  fieldComps,
+  new Array[TypeSerializer[AnyRef]](0), //used only for serialized 
comparisons
+  booleanOrders)
+
+val collectionRowComparator = new CollectionRowComparator(rowComp)
+
+val inputCRowType = CRowTypeInfo(rT)
+
+val processFunction = new KeyedProcessOperator[Integer,CRow,CRow](
+  new ProcTimeSortProcessFunction(
+inputCRowType,
+collectionRowComparator))
+  
+   val testHarness = new 
KeyedOneInputStreamOperatorTestHarness[Integer,CRow,CRow](
+  processFunction, 
+  new TupleRowSelector(0), 
+  BasicTypeInfo.INT_TYPE_INFO)
+
+   testHarness.open();
+
+   testHarness.setProcessingTime(3)
+
+  // timestamp is ignored in processing time
+testHarness.processElement(new StreamRecord(new CRow(
+  Row.of(1: JInt, 11L: JLong, 1: JInt, "aaa", 11L: JLong), true), 
1001))
+testHarness.processElement(new StreamRecord(new CRow(
+Row.of(1: JInt, 12L: JLong, 1: JInt, "aaa", 11L: JLong), true), 
2002))
+testHarness.processElement(new StreamRecord(new C

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-28 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16066711#comment-16066711
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user rtudoran commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r124575119
  
--- Diff: 
flink-libraries/flink-table/src/test/scala/org/apache/flink/table/runtime/aggregate/TimeSortProcessFunctionTest.scala
 ---
@@ -0,0 +1,256 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.table.runtime.aggregate
+
+import java.util.Comparator
+import java.util.concurrent.ConcurrentLinkedQueue
+import java.lang.{Integer => JInt, Long => JLong}
+
+import org.apache.flink.api.common.typeinfo.BasicTypeInfo._
+import org.apache.flink.api.common.typeinfo.{BasicTypeInfo, 
TypeInformation}
+import org.apache.flink.api.java.functions.KeySelector
+import org.apache.flink.api.java.typeutils.RowTypeInfo
+import org.apache.flink.streaming.api.operators.KeyedProcessOperator
+import org.apache.flink.streaming.api.watermark.Watermark
+import org.apache.flink.streaming.runtime.streamrecord.StreamRecord
+import 
org.apache.flink.streaming.util.{KeyedOneInputStreamOperatorTestHarness, 
TestHarnessUtil}
+import org.apache.flink.types.Row
+import org.junit.Test
+import org.apache.flink.table.runtime.aggregate.ProcTimeSortProcessFunction
+import org.apache.flink.table.runtime.aggregate.RowTimeSortProcessFunction
+import 
org.apache.flink.table.runtime.aggregate.TimeSortProcessFunctionTest._
+import org.apache.flink.api.java.typeutils.runtime.RowComparator
+import org.apache.flink.api.common.typeutils.TypeSerializer
+import org.apache.flink.api.common.typeutils.TypeComparator
+import org.apache.flink.table.runtime.types.{CRow, CRowTypeInfo}
+import org.apache.flink.streaming.api.TimeCharacteristic
+
+class TimeSortProcessFunctionTest{
+
+  
+  @Test
+  def testSortProcTimeHarnessPartitioned(): Unit = {
+
+val rT =  new RowTypeInfo(Array[TypeInformation[_]](
--- End diff --

I think it is ok - we sort on second and third field - on one ascending and 
on the other descending 


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 operations (limit, top and sort) are similar in behavior as they all 
> require a sorted collection of the data on which the logic will be applied 
> (i.e., select a subset of the items or the entire sorted set). These 
> functions

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-28 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16066708#comment-16066708
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user rtudoran commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r124574652
  
--- Diff: 
flink-libraries/flink-table/src/test/scala/org/apache/flink/table/runtime/aggregate/TimeSortProcessFunctionTest.scala
 ---
@@ -0,0 +1,256 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.table.runtime.aggregate
+
+import java.util.Comparator
+import java.util.concurrent.ConcurrentLinkedQueue
+import java.lang.{Integer => JInt, Long => JLong}
+
+import org.apache.flink.api.common.typeinfo.BasicTypeInfo._
+import org.apache.flink.api.common.typeinfo.{BasicTypeInfo, 
TypeInformation}
+import org.apache.flink.api.java.functions.KeySelector
+import org.apache.flink.api.java.typeutils.RowTypeInfo
+import org.apache.flink.streaming.api.operators.KeyedProcessOperator
+import org.apache.flink.streaming.api.watermark.Watermark
+import org.apache.flink.streaming.runtime.streamrecord.StreamRecord
+import 
org.apache.flink.streaming.util.{KeyedOneInputStreamOperatorTestHarness, 
TestHarnessUtil}
+import org.apache.flink.types.Row
+import org.junit.Test
+import org.apache.flink.table.runtime.aggregate.ProcTimeSortProcessFunction
+import org.apache.flink.table.runtime.aggregate.RowTimeSortProcessFunction
+import 
org.apache.flink.table.runtime.aggregate.TimeSortProcessFunctionTest._
+import org.apache.flink.api.java.typeutils.runtime.RowComparator
+import org.apache.flink.api.common.typeutils.TypeSerializer
+import org.apache.flink.api.common.typeutils.TypeComparator
+import org.apache.flink.table.runtime.types.{CRow, CRowTypeInfo}
+import org.apache.flink.streaming.api.TimeCharacteristic
+
+class TimeSortProcessFunctionTest{
--- End diff --

I do not understand - this test was added (see below 
testSortRowTimeHarnessPartitioned) ...perhaps this is an old comment ...


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 operations (limit, top and sort) are similar in behavior as they all 
> require a sorted collection of the data on which the logic will be applied 
> (i.e., select a subset of the items or the entire sorted set). These 
> functions would make sense in the streaming context only in the context of a 
> window. Without defining a window the functions could never e

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-28 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16066703#comment-16066703
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user rtudoran commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r124573957
  
--- Diff: 
flink-libraries/flink-table/src/test/scala/org/apache/flink/table/api/scala/stream/sql/SortTest.scala
 ---
@@ -0,0 +1,87 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.table.api.scala.stream.sql
+
+import org.apache.flink.api.scala._
+import org.apache.flink.table.api.TableException
+import org.apache.flink.table.api.scala._
+import org.apache.flink.table.utils.TableTestUtil._
+import org.apache.flink.table.utils.{StreamTableTestUtil, TableTestBase}
+import org.junit.Test
+
+class SortTest extends TableTestBase {
+  private val streamUtil: StreamTableTestUtil = streamTestUtil()
+  streamUtil.addTable[(Int, String, Long)]("MyTable", 'a, 'b, 'c,
+  'proctime.proctime, 'rowtime.rowtime)
+  
+  @Test
+  def testSortProcessingTime() = {
+
+val sqlQuery = "SELECT a FROM MyTable ORDER BY proctime, c"
+  
+val expected =
+  unaryNode(
+"DataStreamCalc",
+unaryNode("DataStreamSort",
+  streamTableNode(0),
+  term("orderBy", "proctime ASC", "c ASC"),
+  term("offset", "null"), 
+  term("fetch","unlimited")),
+term("select", "a", "TIME_MATERIALIZATION(proctime) AS proctime", 
"c"))
--- End diff --

yes and no :)
Perhaps you remember that in the previous PR I pointed this issue out. 
Calcite as it creates the execution plan now will have as the output type of 
the sort a tuple for each of the 3 fields (a, proctime, and c)...because one 
comes from the output and 2 are used internally. The sort does not do any 
projection so all the 3 fields needed in the input are carried in the output - 
hence the type of the output. I have asked you at that point if you believe we 
should drop the fields by ourselves - which you said no and I agree with that. 
This test is only the reflection of that case - basically the whole plan of 
calcite should be modified


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 operations (limit, top and sort) are similar in behavior as they all 
> require a sorted collection of the data on which the logic will be applied 
> (i.e., select a subset of the item

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-28 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16066622#comment-16066622
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user rtudoran commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r124563186
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/runtime/aggregate/SortUtil.scala
 ---
@@ -0,0 +1,247 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.table.runtime.aggregate
+
+import org.apache.flink.table.calcite.FlinkTypeFactory
+import org.apache.flink.types.Row
+import org.apache.calcite.rel.`type`._
+import org.apache.calcite.rel.RelCollation
+import org.apache.flink.streaming.api.functions.ProcessFunction
+import org.apache.flink.table.functions.AggregateFunction
+import org.apache.calcite.sql.`type`.SqlTypeName
+import org.apache.flink.table.api.TableException
+import org.apache.calcite.sql.`type`.SqlTypeName._
+import java.util.{ List => JList, ArrayList }
+import org.apache.flink.api.common.typeinfo.{SqlTimeTypeInfo, 
TypeInformation}
+import org.apache.flink.api.java.typeutils.RowTypeInfo
+import java.sql.Timestamp
+import org.apache.calcite.rel.RelFieldCollation
+import org.apache.calcite.rel.RelFieldCollation.Direction
+import java.util.Comparator
+import org.apache.flink.api.common.typeutils.TypeComparator
+import org.apache.flink.api.common.typeinfo.BasicTypeInfo._
+import 
java.lang.{Byte=>JByte,Integer=>JInt,Long=>JLong,Double=>JDouble,Short=>JShort,String=>JString,Float=>JFloat}
+import java.math.{BigDecimal=>JBigDecimal}
+import org.apache.flink.api.common.functions.MapFunction
+import org.apache.flink.api.common.operators.Order
+import org.apache.calcite.rex.{RexLiteral, RexNode}
+import org.apache.flink.api.common.ExecutionConfig
+import org.apache.flink.api.common.typeinfo.AtomicType
+import org.apache.flink.api.java.typeutils.runtime.RowComparator
+import org.apache.flink.api.common.typeutils.TypeSerializer
+import org.apache.flink.table.runtime.types.{CRow, CRowTypeInfo}
+
+/**
+ * Class represents a collection of helper methods to build the sort logic.
+ * It encapsulates as well the implementation for ordering and generic 
interfaces
+ */
+object SortUtil {
+
+  /**
+   * Function creates 
[org.apache.flink.streaming.api.functions.ProcessFunction] for sorting 
+   * elements based on rowtime and potentially other fields
+   * @param collationSort The Sort collation list
+   * @param inputType input row type
+   * @param execCfg table environment execution configuration
+   * @return org.apache.flink.streaming.api.functions.ProcessFunction
+   */
+  private[flink] def createRowTimeSortFunction(
+collationSort: RelCollation,
+inputType: RelDataType,
+inputTypeInfo: TypeInformation[Row],
+execCfg: ExecutionConfig): ProcessFunction[CRow, CRow] = {
+
+val keySortFields = getSortFieldIndexList(collationSort)
+//drop time from comparison as we sort on time in the states and 
result emission
+val keyIndexesNoTime = keySortFields.slice(1, keySortFields.size)
+val booleanOrderings = getSortFieldDirectionBooleanList(collationSort)
+val booleanDirectionsNoTime = booleanOrderings.slice(1, 
booleanOrderings.size)
+
+val fieldComps = createFieldComparators(
+  inputType, 
+  keyIndexesNoTime, 
+  booleanDirectionsNoTime,
+  execCfg)
+val fieldCompsRefs = 
fieldComps.asInstanceOf[Array[TypeComparator[AnyRef]]]
+
+val rowComp = new RowComparator(
+   inputType.getFieldCount,
+   keyIndexesNoTime,
+   fieldCompsRefs,
+   new Array[TypeSerializer[AnyRef]](0), //used only for object 
comparisons
+   booleanDirectionsNoTime)
+  
+val collectionRowC

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-28 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16066583#comment-16066583
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user rtudoran commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r124556692
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/runtime/aggregate/SortUtil.scala
 ---
@@ -0,0 +1,247 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.table.runtime.aggregate
+
+import org.apache.flink.table.calcite.FlinkTypeFactory
+import org.apache.flink.types.Row
+import org.apache.calcite.rel.`type`._
+import org.apache.calcite.rel.RelCollation
+import org.apache.flink.streaming.api.functions.ProcessFunction
+import org.apache.flink.table.functions.AggregateFunction
+import org.apache.calcite.sql.`type`.SqlTypeName
+import org.apache.flink.table.api.TableException
+import org.apache.calcite.sql.`type`.SqlTypeName._
+import java.util.{ List => JList, ArrayList }
+import org.apache.flink.api.common.typeinfo.{SqlTimeTypeInfo, 
TypeInformation}
+import org.apache.flink.api.java.typeutils.RowTypeInfo
+import java.sql.Timestamp
+import org.apache.calcite.rel.RelFieldCollation
+import org.apache.calcite.rel.RelFieldCollation.Direction
+import java.util.Comparator
+import org.apache.flink.api.common.typeutils.TypeComparator
+import org.apache.flink.api.common.typeinfo.BasicTypeInfo._
+import 
java.lang.{Byte=>JByte,Integer=>JInt,Long=>JLong,Double=>JDouble,Short=>JShort,String=>JString,Float=>JFloat}
--- End diff --

sorry for the unused imports ... there are some issues with the IDE that I 
am using when working with scala 


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 operations (limit, top and sort) are similar in behavior as they all 
> require a sorted collection of the data on which the logic will be applied 
> (i.e., select a subset of the items or the entire sorted set). These 
> functions would make sense in the streaming context only in the context of a 
> window. Without defining a window the functions could never emit as the sort 
> operation would never trigger. If an SQL query will be provided without 
> limits an error will be thrown (`SELECT a FROM stream1 TOP 10` -> ERROR). 
> Although not targeted by this JIRA, in the case of working based on event 
> time order, the retraction mechanisms of windows and the lateness mechanisms 
> can be used

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-28 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16066417#comment-16066417
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user rtudoran commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r124527971
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/nodes/datastream/DataStreamSort.scala
 ---
@@ -0,0 +1,206 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.plan.nodes.datastream
+
+import org.apache.calcite.plan.{ RelOptCluster, RelTraitSet }
+import org.apache.calcite.rel.`type`.RelDataType
+import org.apache.calcite.rel.{ RelNode, RelWriter, SingleRel }
+import org.apache.flink.api.java.tuple.Tuple
+import org.apache.flink.streaming.api.datastream.{ AllWindowedStream, 
DataStream, KeyedStream, WindowedStream }
+import org.apache.flink.streaming.api.windowing.assigners._
+import org.apache.flink.streaming.api.windowing.time.Time
+import org.apache.flink.streaming.api.windowing.windows.{ Window => 
DataStreamWindow }
+import org.apache.flink.table.calcite.FlinkRelBuilder.NamedWindowProperty
+import org.apache.flink.table.calcite.FlinkTypeFactory
+import org.apache.flink.table.expressions._
+import org.apache.flink.table.plan.logical._
+import org.apache.flink.table.plan.nodes.CommonAggregate
+import org.apache.flink.table.runtime.aggregate.AggregateUtil._
+import org.apache.flink.table.runtime.aggregate._
+import org.apache.flink.table.typeutils.TypeCheckUtils.isTimeInterval
+import org.apache.flink.table.typeutils.{ RowIntervalTypeInfo, 
TimeIntervalTypeInfo }
+import org.apache.flink.types.Row
+import org.apache.calcite.sql.SqlAggFunction
+import org.apache.flink.table.plan.nodes.datastream.DataStreamRel
+import org.apache.calcite.sql.fun.SqlSingleValueAggFunction
+import org.apache.flink.api.common.functions.RichMapFunction
+import org.apache.flink.api.common.functions.RichFlatMapFunction
+import org.apache.flink.configuration.Configuration
+import org.apache.flink.util.Collector
+import org.apache.flink.api.common.state.ValueState
+import org.apache.flink.api.common.state.ValueStateDescriptor
+import org.apache.flink.api.java.typeutils.RowTypeInfo
+import org.apache.calcite.rel.core.Sort
+import org.apache.flink.api.java.functions.NullByteKeySelector
+import org.apache.calcite.rel.RelFieldCollation.Direction
+import org.apache.flink.table.runtime.aggregate.SortUtil._
+import org.apache.calcite.rel.RelCollation
+import org.apache.calcite.rex.RexNode
+import org.apache.flink.api.common.ExecutionConfig
+import org.apache.flink.table.runtime.types.{CRow, CRowTypeInfo}
+import org.apache.flink.table.api.{StreamQueryConfig, 
StreamTableEnvironment, TableException}
+import org.apache.flink.table.plan.schema.RowSchema
+import org.apache.flink.table.plan.nodes.CommonSort
+
+/**
+  * Flink RelNode which matches along with Sort Rule.
+  *
+ */
+class DataStreamSort(
+cluster: RelOptCluster,
+traitSet: RelTraitSet,
+inputNode: RelNode,
+inputSchema: RowSchema,
+schema: RowSchema,
+sortCollation: RelCollation,
+sortOffset: RexNode,
+sortFetch: RexNode,
+description: String)
+  extends SingleRel(cluster, traitSet, inputNode)
+  with CommonSort
+  with DataStreamRel {
+
+  override def deriveRowType(): RelDataType = schema.logicalType
+
+  override def copy(traitSet: RelTraitSet, inputs: 
java.util.List[RelNode]): RelNode = {
+new DataStreamSort(
+  cluster,
+  traitSet,
+  inputs.get(0),
+  inputSchema,
+  schema,
+  sortCollation,
+  sortOffset,
+  sortFetch,
+  description)
+  }
+
+  override def toString: String = {
+s"Sort(by: ($$sortFieldsToS

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-28 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16066196#comment-16066196
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user rtudoran commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r124492156
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/nodes/CommonSort.scala
 ---
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.plan.nodes
+
+import org.apache.calcite.rex.{RexLiteral, RexNode}
+import org.apache.calcite.rel.RelFieldCollation
+import org.apache.calcite.rel.RelCollation
+import org.apache.calcite.rel.RelFieldCollation.Direction
+import org.apache.calcite.rel.`type`._
+import scala.collection.JavaConverters._
+import org.apache.flink.api.common.operators.Order
+
+
+/**
+ * Trait represents a collection of sort methods to manipulate the 
parameters
+ */
+
+trait CommonSort {
+  
+  private[flink] def offsetToString(offset: RexNode): String = {
+val offsetToString = s"$offset"
+offsetToString
+  }
+  
+  
+  private[flink] def sortFieldsToString(
--- End diff --

I do not see this behavior in the other classes that have a Common trait 
(e.g., DataStreamCalc has the explainsTerms in the class not in the CommonCalc)


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 operations (limit, top and sort) are similar in behavior as they all 
> require a sorted collection of the data on which the logic will be applied 
> (i.e., select a subset of the items or the entire sorted set). These 
> functions would make sense in the streaming context only in the context of a 
> window. Without defining a window the functions could never emit as the sort 
> operation would never trigger. If an SQL query will be provided without 
> limits an error will be thrown (`SELECT a FROM stream1 TOP 10` -> ERROR). 
> Although not targeted by this JIRA, in the case of working based on event 
> time order, the retraction mechanisms of windows and the lateness mechanisms 
> can be used to deal with out of order events and retraction/updates of 
> results.
> **Functionality example**
> We exemplify with the query below for all the 3 types of operators (sorting, 
> limit and top). Rowtime indicates when the HOP window will trigger – which 
> can be observed in the fact that outputs are generated only at those moments. 
> The HOP window

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-28 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16066193#comment-16066193
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user rtudoran commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r124491949
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/nodes/CommonSort.scala
 ---
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.plan.nodes
+
+import org.apache.calcite.rex.{RexLiteral, RexNode}
+import org.apache.calcite.rel.RelFieldCollation
+import org.apache.calcite.rel.RelCollation
+import org.apache.calcite.rel.RelFieldCollation.Direction
+import org.apache.calcite.rel.`type`._
+import scala.collection.JavaConverters._
+import org.apache.flink.api.common.operators.Order
+
+
+/**
+ * Trait represents a collection of sort methods to manipulate the 
parameters
+ */
+
+trait CommonSort {
+  
+  private[flink] def offsetToString(offset: RexNode): String = {
+val offsetToString = s"$offset"
+offsetToString
+  }
+  
+  
+  private[flink] def sortFieldsToString(
--- End diff --

it cannot be moved because explainTerms is a method from the super class 
SingleRel and only the DataStream/SetSort extends it not the CommonSort. Or is 
it possible in scala to have a trait that overrides a method that is inherited 
by another class?


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 operations (limit, top and sort) are similar in behavior as they all 
> require a sorted collection of the data on which the logic will be applied 
> (i.e., select a subset of the items or the entire sorted set). These 
> functions would make sense in the streaming context only in the context of a 
> window. Without defining a window the functions could never emit as the sort 
> operation would never trigger. If an SQL query will be provided without 
> limits an error will be thrown (`SELECT a FROM stream1 TOP 10` -> ERROR). 
> Although not targeted by this JIRA, in the case of working based on event 
> time order, the retraction mechanisms of windows and the lateness mechanisms 
> can be used to deal with out of order events and retraction/updates of 
> results.
> **Functionality example**
> We exemplify with the query below for all the 3 types of operators (sorting, 
> limit and top). Rowtime indicates when the HOP window will trigger – which

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-28 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16066189#comment-16066189
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user rtudoran commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r124491150
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/nodes/CommonSort.scala
 ---
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.plan.nodes
+
+import org.apache.calcite.rex.{RexLiteral, RexNode}
+import org.apache.calcite.rel.RelFieldCollation
+import org.apache.calcite.rel.RelCollation
+import org.apache.calcite.rel.RelFieldCollation.Direction
+import org.apache.calcite.rel.`type`._
+import scala.collection.JavaConverters._
+import org.apache.flink.api.common.operators.Order
+
+
+/**
+ * Trait represents a collection of sort methods to manipulate the 
parameters
+ */
+
+trait CommonSort {
+  
+  private[flink] def offsetToString(offset: RexNode): String = {
--- End diff --

> It is more for consistency - to have a method for "ToString" for each 
feature of the sort (direction, fetch, fields...). As there is not cost for it 
I suggest to keep it


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 operations (limit, top and sort) are similar in behavior as they all 
> require a sorted collection of the data on which the logic will be applied 
> (i.e., select a subset of the items or the entire sorted set). These 
> functions would make sense in the streaming context only in the context of a 
> window. Without defining a window the functions could never emit as the sort 
> operation would never trigger. If an SQL query will be provided without 
> limits an error will be thrown (`SELECT a FROM stream1 TOP 10` -> ERROR). 
> Although not targeted by this JIRA, in the case of working based on event 
> time order, the retraction mechanisms of windows and the lateness mechanisms 
> can be used to deal with out of order events and retraction/updates of 
> results.
> **Functionality example**
> We exemplify with the query below for all the 3 types of operators (sorting, 
> limit and top). Rowtime indicates when the HOP window will trigger – which 
> can be observed in the fact that outputs are generated only at those moments. 
> The HOP windows will trigger at every hour (fixed hour) and each event will 
> contribute/ be duplicated for 2 consecutive hour in

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-28 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16066176#comment-16066176
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user rtudoran commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r124489316
  
--- Diff: 
flink-libraries/flink-table/src/test/scala/org/apache/flink/table/api/scala/stream/sql/SortITCase.scala
 ---
@@ -0,0 +1,127 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.api.scala.stream.sql
+
+import org.apache.flink.api.scala._
+import 
org.apache.flink.table.api.scala.stream.sql.SortITCase.StringRowSelectorSink
+import 
org.apache.flink.table.api.scala.stream.sql.TimeTestUtil.EventTimeSourceFunction
+import org.apache.flink.streaming.api.functions.source.SourceFunction
+import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment
+import org.apache.flink.table.api.TableEnvironment
+import org.apache.flink.table.api.scala._
+import org.apache.flink.table.api.scala.stream.utils.{StreamITCase, 
StreamTestData, StreamingWithStateTestBase}
+import org.apache.flink.api.common.typeinfo.BasicTypeInfo
+import org.apache.flink.api.java.typeutils.RowTypeInfo
+import org.apache.flink.api.common.typeinfo.TypeInformation
+import org.apache.flink.types.Row
+import org.junit.Assert._
+import org.junit._
+import org.apache.flink.streaming.api.TimeCharacteristic
+import 
org.apache.flink.streaming.api.functions.source.SourceFunction.SourceContext
+import org.apache.flink.streaming.api.watermark.Watermark
+import scala.collection.mutable
+import org.apache.flink.streaming.api.functions.sink.RichSinkFunction
+
+class SortITCase extends StreamingWithStateTestBase {
+
+  @Test
+  def testEventTimeOrderBy(): Unit = {
+val data = Seq(
+  Left((1500L, (1L, 15, "Hello"))),
+  Left((1600L, (1L, 16, "Hello"))),
+  Left((1000L, (1L, 1, "Hello"))),
+  Left((2000L, (2L, 2, "Hello"))),
+  Right(1000L),
+  Left((2000L, (2L, 2, "Hello"))),
+  Left((2000L, (2L, 3, "Hello"))),
+  Left((3000L, (3L, 3, "Hello"))),
+  Left((2000L, (3L, 1, "Hello"))),
+  Right(2000L),
+  Left((4000L, (4L, 4, "Hello"))),
+  Right(3000L),
+  Left((5000L, (5L, 5, "Hello"))),
+  Right(5000L),
+  Left((6000L, (6L, 65, "Hello"))),
+  Left((6000L, (6L, 6, "Hello"))),
+  Left((6000L, (6L, 67, "Hello"))),
+  Left((6000L, (6L, -1, "Hello"))),
+  Left((6000L, (6L, 6, "Hello"))),
+  Right(7000L),
+  Left((9000L, (6L, 9, "Hello"))),
+  Left((8500L, (6L, 18, "Hello"))),
+  Left((9000L, (6L, 7, "Hello"))),
+  Right(1L),
+  Left((1L, (7L, 7, "Hello World"))),
+  Left((11000L, (7L, 77, "Hello World"))),
+  Left((11000L, (7L, 17, "Hello World"))),
+  Right(12000L),
+  Left((14000L, (7L, 18, "Hello World"))),
+  Right(14000L),
+  Left((15000L, (8L, 8, "Hello World"))),
+  Right(17000L),
+  Left((2L, (20L, 20, "Hello World"))), 
+  Right(19000L))
+
+val env = StreamExecutionEnvironment.getExecutionEnvironment
+env.setStateBackend(getStateBackend)
+env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
+val tEnv = TableEnvironment.getTableEnvironment(env)
+StreamITCase.clear
+
+val t1 = env.addSource(new EventTimeSourceFunction[(Long, Int, 
String)](data))
+  .toTable(tEnv, 'a, 'b, 'c, 'rowtime.rowtime)
+  
+tEnv.registerTable("T1", t1)
+
+val  sqlQuery = "SELECT b FROM T1 " +
+  "ORDER BY rowtime, b ASC ";
+  
+  
+val result = tEnv.sql(sqlQuery).toDataStream[Row]
--- End diff --

In the flink version from where i started it it is not available. I think 
if i rebase now

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16061028#comment-16061028
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r123755789
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/runtime/aggregate/RowTimeSortProcessFunction.scala
 ---
@@ -0,0 +1,146 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.table.runtime.aggregate
+
+import org.apache.flink.api.common.typeinfo.{BasicTypeInfo, 
TypeInformation}
+import org.apache.flink.api.java.typeutils.{RowTypeInfo, ListTypeInfo}
+import org.apache.flink.runtime.state.{ FunctionInitializationContext, 
FunctionSnapshotContext }
+import org.apache.flink.streaming.api.functions.ProcessFunction
+import org.apache.flink.types.Row
+import org.apache.flink.util.{ Collector, Preconditions }
+import org.apache.flink.api.common.state.ValueState
+import org.apache.flink.api.common.state.ValueStateDescriptor
+import scala.util.control.Breaks._
+import org.apache.flink.api.java.tuple.{ Tuple2 => JTuple2 }
+import org.apache.flink.api.common.state.MapState
+import org.apache.flink.api.common.state.MapStateDescriptor
+import org.apache.flink.configuration.Configuration
+import java.util.Comparator
+import java.util.ArrayList
+import java.util.Collections
+import org.apache.flink.api.common.typeutils.TypeComparator
+import java.util.{List => JList, ArrayList => JArrayList}
+import org.apache.flink.table.runtime.types.{CRow, CRowTypeInfo}
+
+/**
+ * Process Function used for the aggregate in bounded rowtime sort without 
offset/fetch
+ * [[org.apache.flink.streaming.api.datastream.DataStream]]
+ *
+ * @param inputRowType It is used to mark the type of the incoming data
+ * @param rowComparator the [[java.util.Comparator]] is used for this sort 
aggregation
+ */
+class RowTimeSortProcessFunction(
+  private val inputRowType: CRowTypeInfo,
+  private val rowComparator: CollectionRowComparator)
+extends ProcessFunction[CRow, CRow] {
+
+  Preconditions.checkNotNull(rowComparator)
+
+  // the state which keeps all the events that are not expired.
+  // Each timestamp will contain an associated list with the events 
+  // received at that timestamp
+  private var dataState: MapState[Long, JList[Row]] = _
+
+// the state which keeps the last triggering timestamp to filter late 
events
+  private var lastTriggeringTsState: ValueState[Long] = _
+  
+  private var outputC: CRow = _
+  
+  override def open(config: Configuration) {
+ 
+val keyTypeInformation: TypeInformation[Long] =
+  BasicTypeInfo.LONG_TYPE_INFO.asInstanceOf[TypeInformation[Long]]
+val valueTypeInformation: TypeInformation[JList[Row]] = new 
ListTypeInfo[Row](
+inputRowType.asInstanceOf[CRowTypeInfo].rowType)
+
+val mapStateDescriptor: MapStateDescriptor[Long, JList[Row]] =
+  new MapStateDescriptor[Long, JList[Row]](
+"dataState",
+keyTypeInformation,
+valueTypeInformation)
+
+dataState = getRuntimeContext.getMapState(mapStateDescriptor)
+
+val lastTriggeringTsDescriptor: ValueStateDescriptor[Long] =
+  new ValueStateDescriptor[Long]("lastTriggeringTsState", 
classOf[Long])
+lastTriggeringTsState = 
getRuntimeContext.getState(lastTriggeringTsDescriptor)
+
+if (outputC == null) {
+  val arity:Integer = inputRowType.getArity
+  outputC = new CRow(Row.of(arity), true)
+}
+  }
+
+  
+  override def processElement(
+inputC: CRow,
+ctx: ProcessFunction[CRow, CRow]#Context,
+out: Collector[CRow]): Unit = {
+
+val input = inputC.row
+
+// triggerin

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16061029#comment-16061029
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r123717544
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/rules/datastream/DataStreamSortRule.scala
 ---
@@ -0,0 +1,111 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.table.plan.rules.datastream
+
+import org.apache.calcite.plan.volcano.RelSubset
+import org.apache.calcite.plan.{ Convention, RelOptRule, RelOptRuleCall, 
RelTraitSet }
+import org.apache.calcite.rel.RelNode
+import org.apache.calcite.rel.convert.ConverterRule
+import org.apache.calcite.rel.logical.{ LogicalFilter, LogicalCorrelate, 
LogicalTableFunctionScan }
+import org.apache.calcite.rex.RexNode
+import org.apache.flink.table.plan.nodes.datastream.DataStreamCorrelate
+import org.apache.calcite.rel.logical.LogicalSort
+import org.apache.flink.table.plan.nodes.datastream.DataStreamSort
+import org.apache.calcite.rel.RelCollation
+import org.apache.flink.table.plan.nodes.FlinkConventions
+import org.apache.flink.table.plan.schema.RowSchema
+import org.apache.flink.table.plan.nodes.logical.FlinkLogicalSort
+import org.apache.flink.table.calcite.FlinkTypeFactory
+import org.apache.flink.table.api.TableException
+import org.apache.calcite.rel.RelFieldCollation.Direction
+import org.apache.flink.table.runtime.aggregate.SortUtil
+
+/**
+ * Rule to convert a LogicalSort into a DataStreamSort.
+ */
+class DataStreamSortRule
+extends ConverterRule(
+  classOf[FlinkLogicalSort],
+  FlinkConventions.LOGICAL,
+  FlinkConventions.DATASTREAM,
+  "DataStreamSortRule") {
+
+  override def matches(call: RelOptRuleCall): Boolean = {
+
+val result = super.matches(call)
+
+//need to identify time between others order fields. Time needs to be 
first sort element
+// we can safely convert the object if the match rule succeeded 
+if(result) {
+  val calcSort: FlinkLogicalSort = 
call.rel(0).asInstanceOf[FlinkLogicalSort]
+  checkTimeOrder(calcSort)
+}
+
+result
+  }
+
+  override def convert(rel: RelNode): RelNode = {
+val calcSort: FlinkLogicalSort = rel.asInstanceOf[FlinkLogicalSort]
+val traitSet: RelTraitSet = 
rel.getTraitSet.replace(FlinkConventions.DATASTREAM)
+val convInput: RelNode = RelOptRule.convert(calcSort.getInput(0), 
FlinkConventions.DATASTREAM)
+
+val inputRowType = 
convInput.asInstanceOf[RelSubset].getOriginal.getRowType
+
+new DataStreamSort(
+  rel.getCluster,
+  traitSet,
+  convInput,
+  new RowSchema(inputRowType),
+  new RowSchema(rel.getRowType),
+  calcSort.collation,
+  calcSort.offset,
+  calcSort.fetch,
+  description)
+
+  }
+  
+   
+  /**
+   * Function is used to check at verification time if the SQL syntax is 
supported
+   */
+  
+  def checkTimeOrder(calcSort: FlinkLogicalSort) = {
+
+val rowType = calcSort.getRowType
+val sortCollation = calcSort.collation 
+ //need to identify time between others order fields. Time needs to be 
first sort element
+val timeType = SortUtil.getTimeType(sortCollation, rowType)
+//time ordering needs to be ascending
+if (SortUtil.getTimeDirection(sortCollation) != Direction.ASCENDING) {
+  throw new TableException("SQL/Table supports only ascending time 
ordering")
--- End diff --

We do not exit the optimizer with exceptions because it prevents the 
optimizer to find an alternative plan.
Rather return `false` in `matches()`


> Support Limit/Top(Sort)

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16061033#comment-16061033
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r123759708
  
--- Diff: 
flink-libraries/flink-table/src/test/scala/org/apache/flink/table/runtime/aggregate/TimeSortProcessFunctionTest.scala
 ---
@@ -0,0 +1,256 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.table.runtime.aggregate
+
+import java.util.Comparator
+import java.util.concurrent.ConcurrentLinkedQueue
+import java.lang.{Integer => JInt, Long => JLong}
+
+import org.apache.flink.api.common.typeinfo.BasicTypeInfo._
+import org.apache.flink.api.common.typeinfo.{BasicTypeInfo, 
TypeInformation}
+import org.apache.flink.api.java.functions.KeySelector
+import org.apache.flink.api.java.typeutils.RowTypeInfo
+import org.apache.flink.streaming.api.operators.KeyedProcessOperator
+import org.apache.flink.streaming.api.watermark.Watermark
+import org.apache.flink.streaming.runtime.streamrecord.StreamRecord
+import 
org.apache.flink.streaming.util.{KeyedOneInputStreamOperatorTestHarness, 
TestHarnessUtil}
+import org.apache.flink.types.Row
+import org.junit.Test
+import org.apache.flink.table.runtime.aggregate.ProcTimeSortProcessFunction
+import org.apache.flink.table.runtime.aggregate.RowTimeSortProcessFunction
+import 
org.apache.flink.table.runtime.aggregate.TimeSortProcessFunctionTest._
+import org.apache.flink.api.java.typeutils.runtime.RowComparator
+import org.apache.flink.api.common.typeutils.TypeSerializer
+import org.apache.flink.api.common.typeutils.TypeComparator
+import org.apache.flink.table.runtime.types.{CRow, CRowTypeInfo}
+import org.apache.flink.streaming.api.TimeCharacteristic
+
+class TimeSortProcessFunctionTest{
+
+  
+  @Test
+  def testSortProcTimeHarnessPartitioned(): Unit = {
+
+val rT =  new RowTypeInfo(Array[TypeInformation[_]](
+  INT_TYPE_INFO,
+  LONG_TYPE_INFO,
+  INT_TYPE_INFO,
+  STRING_TYPE_INFO,
+  LONG_TYPE_INFO),
+  Array("a","b","c","d","e"))
+
+val rTA =  new RowTypeInfo(Array[TypeInformation[_]](
+ LONG_TYPE_INFO), Array("count"))
+val indexes = Array(1,2)
--- End diff --

space


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 ope

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16061015#comment-16061015
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r123757465
  
--- Diff: 
flink-libraries/flink-table/src/test/scala/org/apache/flink/table/api/scala/stream/sql/SortTest.scala
 ---
@@ -0,0 +1,87 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.table.api.scala.stream.sql
+
+import org.apache.flink.api.scala._
+import org.apache.flink.table.api.TableException
+import org.apache.flink.table.api.scala._
+import org.apache.flink.table.utils.TableTestUtil._
+import org.apache.flink.table.utils.{StreamTableTestUtil, TableTestBase}
+import org.junit.Test
+
+class SortTest extends TableTestBase {
+  private val streamUtil: StreamTableTestUtil = streamTestUtil()
+  streamUtil.addTable[(Int, String, Long)]("MyTable", 'a, 'b, 'c,
+  'proctime.proctime, 'rowtime.rowtime)
+  
+  @Test
+  def testSortProcessingTime() = {
+
+val sqlQuery = "SELECT a FROM MyTable ORDER BY proctime, c"
+  
+val expected =
+  unaryNode(
+"DataStreamCalc",
+unaryNode("DataStreamSort",
+  streamTableNode(0),
+  term("orderBy", "proctime ASC", "c ASC"),
+  term("offset", "null"), 
+  term("fetch","unlimited")),
+term("select", "a", "TIME_MATERIALIZATION(proctime) AS proctime", 
"c"))
--- End diff --

shouldn't this be just `a` since we do `SELECT a FROM ...` and not `SELECT 
a, proctime, c FROM ...`?


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 operations (limit, top and sort) are similar in behavior as they all 
> require a sorted collection of the data on which the logic will be applied 
> (i.e., select a subset of the items or the entire sorted set). These 
> functions would make sense in the streaming context only in the context of a 
> window. Without defining a window the functions could never emit as the sort 
> operation would never trigger. If an SQL query will be provided without 
> limits an error will be thrown (`SELECT a FROM stream1 TOP 10` -> ERROR). 
> Although not targeted by this JIRA, in the case of working based on event 
> time order, the retraction mechanisms of windows and the lateness mechanisms 
> can be used to deal with out of order events and retraction/u

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16061012#comment-16061012
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r123745157
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/runtime/aggregate/SortUtil.scala
 ---
@@ -0,0 +1,247 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.table.runtime.aggregate
+
+import org.apache.flink.table.calcite.FlinkTypeFactory
+import org.apache.flink.types.Row
+import org.apache.calcite.rel.`type`._
+import org.apache.calcite.rel.RelCollation
+import org.apache.flink.streaming.api.functions.ProcessFunction
+import org.apache.flink.table.functions.AggregateFunction
+import org.apache.calcite.sql.`type`.SqlTypeName
+import org.apache.flink.table.api.TableException
+import org.apache.calcite.sql.`type`.SqlTypeName._
+import java.util.{ List => JList, ArrayList }
+import org.apache.flink.api.common.typeinfo.{SqlTimeTypeInfo, 
TypeInformation}
+import org.apache.flink.api.java.typeutils.RowTypeInfo
+import java.sql.Timestamp
+import org.apache.calcite.rel.RelFieldCollation
+import org.apache.calcite.rel.RelFieldCollation.Direction
+import java.util.Comparator
+import org.apache.flink.api.common.typeutils.TypeComparator
+import org.apache.flink.api.common.typeinfo.BasicTypeInfo._
+import 
java.lang.{Byte=>JByte,Integer=>JInt,Long=>JLong,Double=>JDouble,Short=>JShort,String=>JString,Float=>JFloat}
+import java.math.{BigDecimal=>JBigDecimal}
+import org.apache.flink.api.common.functions.MapFunction
+import org.apache.flink.api.common.operators.Order
+import org.apache.calcite.rex.{RexLiteral, RexNode}
+import org.apache.flink.api.common.ExecutionConfig
+import org.apache.flink.api.common.typeinfo.AtomicType
+import org.apache.flink.api.java.typeutils.runtime.RowComparator
+import org.apache.flink.api.common.typeutils.TypeSerializer
+import org.apache.flink.table.runtime.types.{CRow, CRowTypeInfo}
+
+/**
+ * Class represents a collection of helper methods to build the sort logic.
+ * It encapsulates as well the implementation for ordering and generic 
interfaces
+ */
+object SortUtil {
+
+  /**
+   * Function creates 
[org.apache.flink.streaming.api.functions.ProcessFunction] for sorting 
+   * elements based on rowtime and potentially other fields
+   * @param collationSort The Sort collation list
+   * @param inputType input row type
+   * @param execCfg table environment execution configuration
+   * @return org.apache.flink.streaming.api.functions.ProcessFunction
+   */
+  private[flink] def createRowTimeSortFunction(
+collationSort: RelCollation,
+inputType: RelDataType,
+inputTypeInfo: TypeInformation[Row],
+execCfg: ExecutionConfig): ProcessFunction[CRow, CRow] = {
+
+val keySortFields = getSortFieldIndexList(collationSort)
+//drop time from comparison as we sort on time in the states and 
result emission
+val keyIndexesNoTime = keySortFields.slice(1, keySortFields.size)
+val booleanOrderings = getSortFieldDirectionBooleanList(collationSort)
+val booleanDirectionsNoTime = booleanOrderings.slice(1, 
booleanOrderings.size)
+
+val fieldComps = createFieldComparators(
+  inputType, 
+  keyIndexesNoTime, 
+  booleanDirectionsNoTime,
+  execCfg)
+val fieldCompsRefs = 
fieldComps.asInstanceOf[Array[TypeComparator[AnyRef]]]
+
+val rowComp = new RowComparator(
+   inputType.getFieldCount,
--- End diff --

Indention is off 3 instead of 2


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.ap

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16061024#comment-16061024
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r123753005
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/runtime/aggregate/ProcTimeSortProcessFunction.scala
 ---
@@ -0,0 +1,120 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.table.runtime.aggregate
+
+import org.apache.flink.api.common.state.{ ListState, ListStateDescriptor }
+import org.apache.flink.api.java.typeutils.RowTypeInfo
+import org.apache.flink.configuration.Configuration
+import org.apache.flink.runtime.state.{ FunctionInitializationContext, 
FunctionSnapshotContext }
+import org.apache.flink.streaming.api.functions.ProcessFunction
+import org.apache.flink.types.Row
+import org.apache.flink.util.{ Collector, Preconditions }
+import org.apache.flink.api.common.state.ValueState
+import org.apache.flink.api.common.state.ValueStateDescriptor
+import scala.util.control.Breaks._
+import org.apache.flink.api.java.tuple.{ Tuple2 => JTuple2 }
+import org.apache.flink.api.common.state.MapState
+import org.apache.flink.api.common.state.MapStateDescriptor
+import org.apache.flink.api.common.typeinfo.TypeInformation
+import org.apache.flink.api.java.typeutils.ListTypeInfo
+import java.util.Comparator
+import java.util.ArrayList
+import java.util.Collections
+import org.apache.flink.api.common.typeutils.TypeComparator
+import org.apache.flink.table.runtime.types.{CRow, CRowTypeInfo}
+import java.lang.{Integer=>JInt}
+
+/**
+ * Process Function used for the aggregate in bounded proctime sort 
without offset/fetch
+ * [[org.apache.flink.streaming.api.datastream.DataStream]]
+ *
+ * @param inputRowType It is used to mark the type of the incoming data
+ * @param rowComparator the [[java.util.Comparator]] is used for this sort 
aggregation
+ */
+class ProcTimeSortProcessFunction(
+  private val inputRowType: CRowTypeInfo,
+  private val rowComparator: CollectionRowComparator)
+extends ProcessFunction[CRow, CRow] {
+
+  Preconditions.checkNotNull(rowComparator)
+
+  private var stateEventsBuffer: ListState[Row] = _
+  private val sortArray: ArrayList[Row] = new ArrayList[Row]
+  
+  private var outputC: CRow = _
+  
+  override def open(config: Configuration) {
+val sortDescriptor = new ListStateDescriptor[Row]("sortState",
+inputRowType.asInstanceOf[CRowTypeInfo].rowType)
+stateEventsBuffer = getRuntimeContext.getListState(sortDescriptor)
+
+if (outputC == null) {
+  val arity:Integer = inputRowType.getArity
+  outputC = new CRow(Row.of(arity), true)
+}
+
+  }
+
+  override def processElement(
+inputC: CRow,
+ctx: ProcessFunction[CRow, CRow]#Context,
+out: Collector[CRow]): Unit = {
+
+val input = inputC.row
+
+val currentTime = ctx.timerService.currentProcessingTime
+//buffer the event incoming event
+  
+stateEventsBuffer.add(input)
+
+//deduplication of multiple registered timers is done automatically
+ctx.timerService.registerProcessingTimeTimer(currentTime + 1)  
+
+  }
+  
+  override def onTimer(
+timestamp: Long,
+ctx: ProcessFunction[CRow, CRow]#OnTimerContext,
+out: Collector[CRow]): Unit = {
+
+val iter =  stateEventsBuffer.get.iterator()
+
+sortArray.clear()
+while(iter.hasNext()) {
+  sortArray.add(iter.next())
+}
+
+//if we do not rely on java collections to do the sort we could 
implement 
--- End diff --

I think the comment can be removed. Collect

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16061017#comment-16061017
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r123712745
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/nodes/CommonSort.scala
 ---
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.plan.nodes
+
+import org.apache.calcite.rex.{RexLiteral, RexNode}
+import org.apache.calcite.rel.RelFieldCollation
+import org.apache.calcite.rel.RelCollation
+import org.apache.calcite.rel.RelFieldCollation.Direction
+import org.apache.calcite.rel.`type`._
+import scala.collection.JavaConverters._
+import org.apache.flink.api.common.operators.Order
+
+
+/**
+ * Trait represents a collection of sort methods to manipulate the 
parameters
+ */
+
+trait CommonSort {
+  
+  private[flink] def offsetToString(offset: RexNode): String = {
+val offsetToString = s"$offset"
+offsetToString
+  }
+  
+  
+  private[flink] def sortFieldsToString(
--- End diff --

can be completely private if we move `explainTerms` to `CommonSort`


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 operations (limit, top and sort) are similar in behavior as they all 
> require a sorted collection of the data on which the logic will be applied 
> (i.e., select a subset of the items or the entire sorted set). These 
> functions would make sense in the streaming context only in the context of a 
> window. Without defining a window the functions could never emit as the sort 
> operation would never trigger. If an SQL query will be provided without 
> limits an error will be thrown (`SELECT a FROM stream1 TOP 10` -> ERROR). 
> Although not targeted by this JIRA, in the case of working based on event 
> time order, the retraction mechanisms of windows and the lateness mechanisms 
> can be used to deal with out of order events and retraction/updates of 
> results.
> **Functionality example**
> We exemplify with the query below for all the 3 types of operators (sorting, 
> limit and top). Rowtime indicates when the HOP window will trigger – which 
> can be observed in the fact that outputs are generated only at those moments. 
> The HOP windows will trigger at every hour (fixed hour) and each event will 
> contribute/ be duplica

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16061026#comment-16061026
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r123754095
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/runtime/aggregate/ProcTimeSortProcessFunction.scala
 ---
@@ -0,0 +1,120 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.table.runtime.aggregate
+
+import org.apache.flink.api.common.state.{ ListState, ListStateDescriptor }
+import org.apache.flink.api.java.typeutils.RowTypeInfo
+import org.apache.flink.configuration.Configuration
+import org.apache.flink.runtime.state.{ FunctionInitializationContext, 
FunctionSnapshotContext }
+import org.apache.flink.streaming.api.functions.ProcessFunction
+import org.apache.flink.types.Row
+import org.apache.flink.util.{ Collector, Preconditions }
+import org.apache.flink.api.common.state.ValueState
+import org.apache.flink.api.common.state.ValueStateDescriptor
+import scala.util.control.Breaks._
+import org.apache.flink.api.java.tuple.{ Tuple2 => JTuple2 }
+import org.apache.flink.api.common.state.MapState
+import org.apache.flink.api.common.state.MapStateDescriptor
+import org.apache.flink.api.common.typeinfo.TypeInformation
+import org.apache.flink.api.java.typeutils.ListTypeInfo
+import java.util.Comparator
+import java.util.ArrayList
+import java.util.Collections
+import org.apache.flink.api.common.typeutils.TypeComparator
+import org.apache.flink.table.runtime.types.{CRow, CRowTypeInfo}
+import java.lang.{Integer=>JInt}
+
+/**
+ * Process Function used for the aggregate in bounded proctime sort 
without offset/fetch
+ * [[org.apache.flink.streaming.api.datastream.DataStream]]
+ *
+ * @param inputRowType It is used to mark the type of the incoming data
+ * @param rowComparator the [[java.util.Comparator]] is used for this sort 
aggregation
+ */
+class ProcTimeSortProcessFunction(
+  private val inputRowType: CRowTypeInfo,
+  private val rowComparator: CollectionRowComparator)
+extends ProcessFunction[CRow, CRow] {
+
+  Preconditions.checkNotNull(rowComparator)
+
+  private var stateEventsBuffer: ListState[Row] = _
+  private val sortArray: ArrayList[Row] = new ArrayList[Row]
+  
+  private var outputC: CRow = _
+  
+  override def open(config: Configuration) {
+val sortDescriptor = new ListStateDescriptor[Row]("sortState",
+inputRowType.asInstanceOf[CRowTypeInfo].rowType)
+stateEventsBuffer = getRuntimeContext.getListState(sortDescriptor)
+
+if (outputC == null) {
+  val arity:Integer = inputRowType.getArity
+  outputC = new CRow(Row.of(arity), true)
+}
+
+  }
+
+  override def processElement(
+inputC: CRow,
+ctx: ProcessFunction[CRow, CRow]#Context,
+out: Collector[CRow]): Unit = {
+
+val input = inputC.row
+
+val currentTime = ctx.timerService.currentProcessingTime
+//buffer the event incoming event
+  
+stateEventsBuffer.add(input)
+
+//deduplication of multiple registered timers is done automatically
+ctx.timerService.registerProcessingTimeTimer(currentTime + 1)  
+
+  }
+  
+  override def onTimer(
+timestamp: Long,
+ctx: ProcessFunction[CRow, CRow]#OnTimerContext,
+out: Collector[CRow]): Unit = {
+
+val iter =  stateEventsBuffer.get.iterator()
+
+sortArray.clear()
+while(iter.hasNext()) {
+  sortArray.add(iter.next())
+}
+
+//if we do not rely on java collections to do the sort we could 
implement 
+//an insertion sort as we get the elements  from the state

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16061020#comment-16061020
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r123743300
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/runtime/aggregate/SortUtil.scala
 ---
@@ -0,0 +1,345 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.table.runtime.aggregate
+
+import org.apache.flink.table.calcite.FlinkTypeFactory
--- End diff --

Most IDEs support highlighting of unused imports and other common code 
style issues (invalid parameter documentation in Java/ScalaDocs, etc).


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 operations (limit, top and sort) are similar in behavior as they all 
> require a sorted collection of the data on which the logic will be applied 
> (i.e., select a subset of the items or the entire sorted set). These 
> functions would make sense in the streaming context only in the context of a 
> window. Without defining a window the functions could never emit as the sort 
> operation would never trigger. If an SQL query will be provided without 
> limits an error will be thrown (`SELECT a FROM stream1 TOP 10` -> ERROR). 
> Although not targeted by this JIRA, in the case of working based on event 
> time order, the retraction mechanisms of windows and the lateness mechanisms 
> can be used to deal with out of order events and retraction/updates of 
> results.
> **Functionality example**
> We exemplify with the query below for all the 3 types of operators (sorting, 
> limit and top). Rowtime indicates when the HOP window will trigger – which 
> can be observed in the fact that outputs are generated only at those moments. 
> The HOP windows will trigger at every hour (fixed hour) and each event will 
> contribute/ be duplicated for 2 consecutive hour intervals. Proctime 
> indicates the processing time when a new event arrives in the system. Events 
> are of the type (a,b) with the ordering being applied on the b field.
> `SELECT a FROM stream1 HOP(proctime, INTERVAL '1' HOUR, INTERVAL '2' HOUR) 
> ORDER BY b (LIMIT 2/ TOP 2 / [ASC/DESC] `)
> ||Rowtime||   Proctime||  Stream1||   Limit 2||   Top 2|| Sort 
> [ASC]||
> | |10:00:00  |(aaa, 11)   |   | | 
>|
> | |10:05:00|(aab, 7)  |   | ||
> |10-1

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16061031#comment-16061031
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r123760084
  
--- Diff: 
flink-libraries/flink-table/src/test/scala/org/apache/flink/table/runtime/aggregate/TimeSortProcessFunctionTest.scala
 ---
@@ -0,0 +1,256 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.table.runtime.aggregate
+
+import java.util.Comparator
+import java.util.concurrent.ConcurrentLinkedQueue
+import java.lang.{Integer => JInt, Long => JLong}
+
+import org.apache.flink.api.common.typeinfo.BasicTypeInfo._
+import org.apache.flink.api.common.typeinfo.{BasicTypeInfo, 
TypeInformation}
+import org.apache.flink.api.java.functions.KeySelector
+import org.apache.flink.api.java.typeutils.RowTypeInfo
+import org.apache.flink.streaming.api.operators.KeyedProcessOperator
+import org.apache.flink.streaming.api.watermark.Watermark
+import org.apache.flink.streaming.runtime.streamrecord.StreamRecord
+import 
org.apache.flink.streaming.util.{KeyedOneInputStreamOperatorTestHarness, 
TestHarnessUtil}
+import org.apache.flink.types.Row
+import org.junit.Test
+import org.apache.flink.table.runtime.aggregate.ProcTimeSortProcessFunction
+import org.apache.flink.table.runtime.aggregate.RowTimeSortProcessFunction
+import 
org.apache.flink.table.runtime.aggregate.TimeSortProcessFunctionTest._
+import org.apache.flink.api.java.typeutils.runtime.RowComparator
+import org.apache.flink.api.common.typeutils.TypeSerializer
+import org.apache.flink.api.common.typeutils.TypeComparator
+import org.apache.flink.table.runtime.types.{CRow, CRowTypeInfo}
+import org.apache.flink.streaming.api.TimeCharacteristic
+
+class TimeSortProcessFunctionTest{
+
+  
+  @Test
+  def testSortProcTimeHarnessPartitioned(): Unit = {
+
+val rT =  new RowTypeInfo(Array[TypeInformation[_]](
+  INT_TYPE_INFO,
+  LONG_TYPE_INFO,
+  INT_TYPE_INFO,
+  STRING_TYPE_INFO,
+  LONG_TYPE_INFO),
+  Array("a","b","c","d","e"))
+
+val rTA =  new RowTypeInfo(Array[TypeInformation[_]](
+ LONG_TYPE_INFO), Array("count"))
+val indexes = Array(1,2)
+  
+val fieldComps = Array[TypeComparator[AnyRef]](
+  LONG_TYPE_INFO.createComparator(true, 
null).asInstanceOf[TypeComparator[AnyRef]],
+  INT_TYPE_INFO.createComparator(false, 
null).asInstanceOf[TypeComparator[AnyRef]] )
+val booleanOrders = Array(true, false)
+
+
+val rowComp = new RowComparator(
+  rT.getTotalFields,
+  indexes,
+  fieldComps,
+  new Array[TypeSerializer[AnyRef]](0), //used only for serialized 
comparisons
+  booleanOrders)
+
+val collectionRowComparator = new CollectionRowComparator(rowComp)
+
+val inputCRowType = CRowTypeInfo(rT)
+
+val processFunction = new KeyedProcessOperator[Integer,CRow,CRow](
+  new ProcTimeSortProcessFunction(
+inputCRowType,
+collectionRowComparator))
+  
+   val testHarness = new 
KeyedOneInputStreamOperatorTestHarness[Integer,CRow,CRow](
+  processFunction, 
+  new TupleRowSelector(0), 
+  BasicTypeInfo.INT_TYPE_INFO)
+
+   testHarness.open();
+
+   testHarness.setProcessingTime(3)
+
+  // timestamp is ignored in processing time
+testHarness.processElement(new StreamRecord(new CRow(
+  Row.of(1: JInt, 11L: JLong, 1: JInt, "aaa", 11L: JLong), true), 
1001))
+testHarness.processElement(new StreamRecord(new CRow(
+Row.of(1: JInt, 12L: JLong, 1: JInt, "aaa", 11L: JLong), true), 
2002))
+testHarness.processElement(new StreamRecord(new CR

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16061034#comment-16061034
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r123745861
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/runtime/aggregate/SortUtil.scala
 ---
@@ -0,0 +1,247 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.table.runtime.aggregate
+
+import org.apache.flink.table.calcite.FlinkTypeFactory
+import org.apache.flink.types.Row
+import org.apache.calcite.rel.`type`._
+import org.apache.calcite.rel.RelCollation
+import org.apache.flink.streaming.api.functions.ProcessFunction
+import org.apache.flink.table.functions.AggregateFunction
+import org.apache.calcite.sql.`type`.SqlTypeName
+import org.apache.flink.table.api.TableException
+import org.apache.calcite.sql.`type`.SqlTypeName._
+import java.util.{ List => JList, ArrayList }
+import org.apache.flink.api.common.typeinfo.{SqlTimeTypeInfo, 
TypeInformation}
+import org.apache.flink.api.java.typeutils.RowTypeInfo
+import java.sql.Timestamp
+import org.apache.calcite.rel.RelFieldCollation
+import org.apache.calcite.rel.RelFieldCollation.Direction
+import java.util.Comparator
+import org.apache.flink.api.common.typeutils.TypeComparator
+import org.apache.flink.api.common.typeinfo.BasicTypeInfo._
+import 
java.lang.{Byte=>JByte,Integer=>JInt,Long=>JLong,Double=>JDouble,Short=>JShort,String=>JString,Float=>JFloat}
+import java.math.{BigDecimal=>JBigDecimal}
+import org.apache.flink.api.common.functions.MapFunction
+import org.apache.flink.api.common.operators.Order
+import org.apache.calcite.rex.{RexLiteral, RexNode}
+import org.apache.flink.api.common.ExecutionConfig
+import org.apache.flink.api.common.typeinfo.AtomicType
+import org.apache.flink.api.java.typeutils.runtime.RowComparator
+import org.apache.flink.api.common.typeutils.TypeSerializer
+import org.apache.flink.table.runtime.types.{CRow, CRowTypeInfo}
+
+/**
+ * Class represents a collection of helper methods to build the sort logic.
+ * It encapsulates as well the implementation for ordering and generic 
interfaces
+ */
+object SortUtil {
+
+  /**
+   * Function creates 
[org.apache.flink.streaming.api.functions.ProcessFunction] for sorting 
+   * elements based on rowtime and potentially other fields
+   * @param collationSort The Sort collation list
+   * @param inputType input row type
+   * @param execCfg table environment execution configuration
+   * @return org.apache.flink.streaming.api.functions.ProcessFunction
+   */
+  private[flink] def createRowTimeSortFunction(
+collationSort: RelCollation,
+inputType: RelDataType,
+inputTypeInfo: TypeInformation[Row],
+execCfg: ExecutionConfig): ProcessFunction[CRow, CRow] = {
+
+val keySortFields = getSortFieldIndexList(collationSort)
+//drop time from comparison as we sort on time in the states and 
result emission
+val keyIndexesNoTime = keySortFields.slice(1, keySortFields.size)
+val booleanOrderings = getSortFieldDirectionBooleanList(collationSort)
+val booleanDirectionsNoTime = booleanOrderings.slice(1, 
booleanOrderings.size)
+
+val fieldComps = createFieldComparators(
+  inputType, 
+  keyIndexesNoTime, 
+  booleanDirectionsNoTime,
+  execCfg)
+val fieldCompsRefs = 
fieldComps.asInstanceOf[Array[TypeComparator[AnyRef]]]
+
+val rowComp = new RowComparator(
+   inputType.getFieldCount,
+   keyIndexesNoTime,
+   fieldCompsRefs,
+   new Array[TypeSerializer[AnyRef]](0), //used only for object 
comparisons
+   booleanDirectionsNoTime)
+  
+val collectionRowCo

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16060992#comment-16060992
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r123711129
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/nodes/CommonSort.scala
 ---
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.plan.nodes
+
+import org.apache.calcite.rex.{RexLiteral, RexNode}
+import org.apache.calcite.rel.RelFieldCollation
+import org.apache.calcite.rel.RelCollation
+import org.apache.calcite.rel.RelFieldCollation.Direction
+import org.apache.calcite.rel.`type`._
+import scala.collection.JavaConverters._
+import org.apache.flink.api.common.operators.Order
+
+
+/**
+ * Trait represents a collection of sort methods to manipulate the 
parameters
+ */
+
+trait CommonSort {
+  
+  private[flink] def offsetToString(offset: RexNode): String = {
--- End diff --

IMO, this method can be removed. It simply calls `offset.toString()`


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 operations (limit, top and sort) are similar in behavior as they all 
> require a sorted collection of the data on which the logic will be applied 
> (i.e., select a subset of the items or the entire sorted set). These 
> functions would make sense in the streaming context only in the context of a 
> window. Without defining a window the functions could never emit as the sort 
> operation would never trigger. If an SQL query will be provided without 
> limits an error will be thrown (`SELECT a FROM stream1 TOP 10` -> ERROR). 
> Although not targeted by this JIRA, in the case of working based on event 
> time order, the retraction mechanisms of windows and the lateness mechanisms 
> can be used to deal with out of order events and retraction/updates of 
> results.
> **Functionality example**
> We exemplify with the query below for all the 3 types of operators (sorting, 
> limit and top). Rowtime indicates when the HOP window will trigger – which 
> can be observed in the fact that outputs are generated only at those moments. 
> The HOP windows will trigger at every hour (fixed hour) and each event will 
> contribute/ be duplicated for 2 consecutive hour intervals. Proctime 
> indicates the processing time when a new event arrives in the system. Events 
> are o

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16061003#comment-16061003
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r123718029
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/rules/datastream/DataStreamSortRule.scala
 ---
@@ -0,0 +1,111 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.table.plan.rules.datastream
+
+import org.apache.calcite.plan.volcano.RelSubset
+import org.apache.calcite.plan.{ Convention, RelOptRule, RelOptRuleCall, 
RelTraitSet }
+import org.apache.calcite.rel.RelNode
+import org.apache.calcite.rel.convert.ConverterRule
+import org.apache.calcite.rel.logical.{ LogicalFilter, LogicalCorrelate, 
LogicalTableFunctionScan }
+import org.apache.calcite.rex.RexNode
+import org.apache.flink.table.plan.nodes.datastream.DataStreamCorrelate
+import org.apache.calcite.rel.logical.LogicalSort
+import org.apache.flink.table.plan.nodes.datastream.DataStreamSort
+import org.apache.calcite.rel.RelCollation
+import org.apache.flink.table.plan.nodes.FlinkConventions
+import org.apache.flink.table.plan.schema.RowSchema
+import org.apache.flink.table.plan.nodes.logical.FlinkLogicalSort
+import org.apache.flink.table.calcite.FlinkTypeFactory
+import org.apache.flink.table.api.TableException
+import org.apache.calcite.rel.RelFieldCollation.Direction
+import org.apache.flink.table.runtime.aggregate.SortUtil
+
+/**
+ * Rule to convert a LogicalSort into a DataStreamSort.
+ */
+class DataStreamSortRule
+extends ConverterRule(
+  classOf[FlinkLogicalSort],
+  FlinkConventions.LOGICAL,
+  FlinkConventions.DATASTREAM,
+  "DataStreamSortRule") {
+
+  override def matches(call: RelOptRuleCall): Boolean = {
+
+val result = super.matches(call)
+
+//need to identify time between others order fields. Time needs to be 
first sort element
+// we can safely convert the object if the match rule succeeded 
+if(result) {
+  val calcSort: FlinkLogicalSort = 
call.rel(0).asInstanceOf[FlinkLogicalSort]
+  checkTimeOrder(calcSort)
+}
+
+result
+  }
+
+  override def convert(rel: RelNode): RelNode = {
+val calcSort: FlinkLogicalSort = rel.asInstanceOf[FlinkLogicalSort]
+val traitSet: RelTraitSet = 
rel.getTraitSet.replace(FlinkConventions.DATASTREAM)
+val convInput: RelNode = RelOptRule.convert(calcSort.getInput(0), 
FlinkConventions.DATASTREAM)
+
+val inputRowType = 
convInput.asInstanceOf[RelSubset].getOriginal.getRowType
+
+new DataStreamSort(
+  rel.getCluster,
+  traitSet,
+  convInput,
+  new RowSchema(inputRowType),
+  new RowSchema(rel.getRowType),
+  calcSort.collation,
+  calcSort.offset,
+  calcSort.fetch,
+  description)
+
+  }
+  
+   
+  /**
+   * Function is used to check at verification time if the SQL syntax is 
supported
+   */
+  
+  def checkTimeOrder(calcSort: FlinkLogicalSort) = {
+
+val rowType = calcSort.getRowType
+val sortCollation = calcSort.collation 
+ //need to identify time between others order fields. Time needs to be 
first sort element
--- End diff --

indention


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, th

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16061001#comment-16061001
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r123742995
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/runtime/aggregate/SortUtil.scala
 ---
@@ -0,0 +1,247 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.table.runtime.aggregate
+
+import org.apache.flink.table.calcite.FlinkTypeFactory
+import org.apache.flink.types.Row
+import org.apache.calcite.rel.`type`._
+import org.apache.calcite.rel.RelCollation
+import org.apache.flink.streaming.api.functions.ProcessFunction
+import org.apache.flink.table.functions.AggregateFunction
+import org.apache.calcite.sql.`type`.SqlTypeName
+import org.apache.flink.table.api.TableException
+import org.apache.calcite.sql.`type`.SqlTypeName._
+import java.util.{ List => JList, ArrayList }
+import org.apache.flink.api.common.typeinfo.{SqlTimeTypeInfo, 
TypeInformation}
+import org.apache.flink.api.java.typeutils.RowTypeInfo
+import java.sql.Timestamp
+import org.apache.calcite.rel.RelFieldCollation
+import org.apache.calcite.rel.RelFieldCollation.Direction
+import java.util.Comparator
+import org.apache.flink.api.common.typeutils.TypeComparator
+import org.apache.flink.api.common.typeinfo.BasicTypeInfo._
+import 
java.lang.{Byte=>JByte,Integer=>JInt,Long=>JLong,Double=>JDouble,Short=>JShort,String=>JString,Float=>JFloat}
+import java.math.{BigDecimal=>JBigDecimal}
+import org.apache.flink.api.common.functions.MapFunction
+import org.apache.flink.api.common.operators.Order
+import org.apache.calcite.rex.{RexLiteral, RexNode}
+import org.apache.flink.api.common.ExecutionConfig
+import org.apache.flink.api.common.typeinfo.AtomicType
+import org.apache.flink.api.java.typeutils.runtime.RowComparator
+import org.apache.flink.api.common.typeutils.TypeSerializer
+import org.apache.flink.table.runtime.types.{CRow, CRowTypeInfo}
+
+/**
+ * Class represents a collection of helper methods to build the sort logic.
+ * It encapsulates as well the implementation for ordering and generic 
interfaces
+ */
+object SortUtil {
+
+  /**
+   * Function creates 
[org.apache.flink.streaming.api.functions.ProcessFunction] for sorting 
+   * elements based on rowtime and potentially other fields
+   * @param collationSort The Sort collation list
+   * @param inputType input row type
+   * @param execCfg table environment execution configuration
+   * @return org.apache.flink.streaming.api.functions.ProcessFunction
+   */
+  private[flink] def createRowTimeSortFunction(
+collationSort: RelCollation,
+inputType: RelDataType,
+inputTypeInfo: TypeInformation[Row],
+execCfg: ExecutionConfig): ProcessFunction[CRow, CRow] = {
+
+val keySortFields = getSortFieldIndexList(collationSort)
+//drop time from comparison as we sort on time in the states and 
result emission
+val keyIndexesNoTime = keySortFields.slice(1, keySortFields.size)
--- End diff --

for Strings and Arrays, `length` should be used instead of `size`.


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query exa

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16061007#comment-16061007
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r123742277
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/runtime/aggregate/SortUtil.scala
 ---
@@ -0,0 +1,345 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.table.runtime.aggregate
+
+import org.apache.flink.table.calcite.FlinkTypeFactory
--- End diff --

There quite a few: `AggregateFunction`, `SqlTypeName`, `ArrayList`, 
`JList`, and 10 more or so


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 operations (limit, top and sort) are similar in behavior as they all 
> require a sorted collection of the data on which the logic will be applied 
> (i.e., select a subset of the items or the entire sorted set). These 
> functions would make sense in the streaming context only in the context of a 
> window. Without defining a window the functions could never emit as the sort 
> operation would never trigger. If an SQL query will be provided without 
> limits an error will be thrown (`SELECT a FROM stream1 TOP 10` -> ERROR). 
> Although not targeted by this JIRA, in the case of working based on event 
> time order, the retraction mechanisms of windows and the lateness mechanisms 
> can be used to deal with out of order events and retraction/updates of 
> results.
> **Functionality example**
> We exemplify with the query below for all the 3 types of operators (sorting, 
> limit and top). Rowtime indicates when the HOP window will trigger – which 
> can be observed in the fact that outputs are generated only at those moments. 
> The HOP windows will trigger at every hour (fixed hour) and each event will 
> contribute/ be duplicated for 2 consecutive hour intervals. Proctime 
> indicates the processing time when a new event arrives in the system. Events 
> are of the type (a,b) with the ordering being applied on the b field.
> `SELECT a FROM stream1 HOP(proctime, INTERVAL '1' HOUR, INTERVAL '2' HOUR) 
> ORDER BY b (LIMIT 2/ TOP 2 / [ASC/DESC] `)
> ||Rowtime||   Proctime||  Stream1||   Limit 2||   Top 2|| Sort 
> [ASC]||
> | |10:00:00  |(aaa, 11)   |   | | 
>|
> | |10:05:00|(aab, 7)  |   | ||
> |10-11  |11:00:00  |  |   aab,aa

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16061032#comment-16061032
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r123736450
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/nodes/datastream/DataStreamSort.scala
 ---
@@ -0,0 +1,206 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.plan.nodes.datastream
+
+import org.apache.calcite.plan.{ RelOptCluster, RelTraitSet }
+import org.apache.calcite.rel.`type`.RelDataType
+import org.apache.calcite.rel.{ RelNode, RelWriter, SingleRel }
+import org.apache.flink.api.java.tuple.Tuple
+import org.apache.flink.streaming.api.datastream.{ AllWindowedStream, 
DataStream, KeyedStream, WindowedStream }
+import org.apache.flink.streaming.api.windowing.assigners._
+import org.apache.flink.streaming.api.windowing.time.Time
+import org.apache.flink.streaming.api.windowing.windows.{ Window => 
DataStreamWindow }
+import org.apache.flink.table.calcite.FlinkRelBuilder.NamedWindowProperty
+import org.apache.flink.table.calcite.FlinkTypeFactory
+import org.apache.flink.table.expressions._
+import org.apache.flink.table.plan.logical._
+import org.apache.flink.table.plan.nodes.CommonAggregate
+import org.apache.flink.table.runtime.aggregate.AggregateUtil._
+import org.apache.flink.table.runtime.aggregate._
+import org.apache.flink.table.typeutils.TypeCheckUtils.isTimeInterval
+import org.apache.flink.table.typeutils.{ RowIntervalTypeInfo, 
TimeIntervalTypeInfo }
+import org.apache.flink.types.Row
+import org.apache.calcite.sql.SqlAggFunction
+import org.apache.flink.table.plan.nodes.datastream.DataStreamRel
+import org.apache.calcite.sql.fun.SqlSingleValueAggFunction
+import org.apache.flink.api.common.functions.RichMapFunction
+import org.apache.flink.api.common.functions.RichFlatMapFunction
+import org.apache.flink.configuration.Configuration
+import org.apache.flink.util.Collector
+import org.apache.flink.api.common.state.ValueState
+import org.apache.flink.api.common.state.ValueStateDescriptor
+import org.apache.flink.api.java.typeutils.RowTypeInfo
+import org.apache.calcite.rel.core.Sort
+import org.apache.flink.api.java.functions.NullByteKeySelector
+import org.apache.calcite.rel.RelFieldCollation.Direction
+import org.apache.flink.table.runtime.aggregate.SortUtil._
+import org.apache.calcite.rel.RelCollation
+import org.apache.calcite.rex.RexNode
+import org.apache.flink.api.common.ExecutionConfig
+import org.apache.flink.table.runtime.types.{CRow, CRowTypeInfo}
+import org.apache.flink.table.api.{StreamQueryConfig, 
StreamTableEnvironment, TableException}
+import org.apache.flink.table.plan.schema.RowSchema
+import org.apache.flink.table.plan.nodes.CommonSort
+
+/**
+  * Flink RelNode which matches along with Sort Rule.
+  *
+ */
+class DataStreamSort(
+cluster: RelOptCluster,
+traitSet: RelTraitSet,
+inputNode: RelNode,
+inputSchema: RowSchema,
+schema: RowSchema,
+sortCollation: RelCollation,
+sortOffset: RexNode,
+sortFetch: RexNode,
+description: String)
+  extends SingleRel(cluster, traitSet, inputNode)
+  with CommonSort
+  with DataStreamRel {
+
+  override def deriveRowType(): RelDataType = schema.logicalType
+
+  override def copy(traitSet: RelTraitSet, inputs: 
java.util.List[RelNode]): RelNode = {
+new DataStreamSort(
+  cluster,
+  traitSet,
+  inputs.get(0),
+  inputSchema,
+  schema,
+  sortCollation,
+  sortOffset,
+  sortFetch,
+  description)
+  }
+
+  override def toString: String = {
+s"Sort(by: ($$sortFieldsToSt

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16061030#comment-16061030
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r123760629
  
--- Diff: 
flink-libraries/flink-table/src/test/scala/org/apache/flink/table/runtime/aggregate/TimeSortProcessFunctionTest.scala
 ---
@@ -0,0 +1,256 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.table.runtime.aggregate
+
+import java.util.Comparator
+import java.util.concurrent.ConcurrentLinkedQueue
+import java.lang.{Integer => JInt, Long => JLong}
+
+import org.apache.flink.api.common.typeinfo.BasicTypeInfo._
+import org.apache.flink.api.common.typeinfo.{BasicTypeInfo, 
TypeInformation}
+import org.apache.flink.api.java.functions.KeySelector
+import org.apache.flink.api.java.typeutils.RowTypeInfo
+import org.apache.flink.streaming.api.operators.KeyedProcessOperator
+import org.apache.flink.streaming.api.watermark.Watermark
+import org.apache.flink.streaming.runtime.streamrecord.StreamRecord
+import 
org.apache.flink.streaming.util.{KeyedOneInputStreamOperatorTestHarness, 
TestHarnessUtil}
+import org.apache.flink.types.Row
+import org.junit.Test
+import org.apache.flink.table.runtime.aggregate.ProcTimeSortProcessFunction
+import org.apache.flink.table.runtime.aggregate.RowTimeSortProcessFunction
+import 
org.apache.flink.table.runtime.aggregate.TimeSortProcessFunctionTest._
+import org.apache.flink.api.java.typeutils.runtime.RowComparator
+import org.apache.flink.api.common.typeutils.TypeSerializer
+import org.apache.flink.api.common.typeutils.TypeComparator
+import org.apache.flink.table.runtime.types.{CRow, CRowTypeInfo}
+import org.apache.flink.streaming.api.TimeCharacteristic
+
+class TimeSortProcessFunctionTest{
--- End diff --

Please add a similar test for the event-time sorter which sorts for 
multiple timestamps and validates that all rows preserve their timestamp.


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 operations (limit, top and sort) are similar in behavior as they all 
> require a sorted collection of the data on which the logic will be applied 
> (i.e., select a subset of the items or the entire sorted set). These 
> functions would make sense in the streaming context only in the context of a 
> window. Without defining a window the function

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16060997#comment-16060997
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r123716582
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/rules/datastream/DataStreamSortRule.scala
 ---
@@ -0,0 +1,111 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.table.plan.rules.datastream
+
+import org.apache.calcite.plan.volcano.RelSubset
+import org.apache.calcite.plan.{ Convention, RelOptRule, RelOptRuleCall, 
RelTraitSet }
+import org.apache.calcite.rel.RelNode
+import org.apache.calcite.rel.convert.ConverterRule
+import org.apache.calcite.rel.logical.{ LogicalFilter, LogicalCorrelate, 
LogicalTableFunctionScan }
+import org.apache.calcite.rex.RexNode
+import org.apache.flink.table.plan.nodes.datastream.DataStreamCorrelate
+import org.apache.calcite.rel.logical.LogicalSort
+import org.apache.flink.table.plan.nodes.datastream.DataStreamSort
+import org.apache.calcite.rel.RelCollation
+import org.apache.flink.table.plan.nodes.FlinkConventions
+import org.apache.flink.table.plan.schema.RowSchema
+import org.apache.flink.table.plan.nodes.logical.FlinkLogicalSort
+import org.apache.flink.table.calcite.FlinkTypeFactory
+import org.apache.flink.table.api.TableException
+import org.apache.calcite.rel.RelFieldCollation.Direction
+import org.apache.flink.table.runtime.aggregate.SortUtil
+
+/**
+ * Rule to convert a LogicalSort into a DataStreamSort.
+ */
+class DataStreamSortRule
+extends ConverterRule(
+  classOf[FlinkLogicalSort],
+  FlinkConventions.LOGICAL,
+  FlinkConventions.DATASTREAM,
+  "DataStreamSortRule") {
+
+  override def matches(call: RelOptRuleCall): Boolean = {
+
+val result = super.matches(call)
--- End diff --

This call always returns `true`. So we can skip it.


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 operations (limit, top and sort) are similar in behavior as they all 
> require a sorted collection of the data on which the logic will be applied 
> (i.e., select a subset of the items or the entire sorted set). These 
> functions would make sense in the streaming context only in the context of a 
> window. Without defining a window the functions could never emit a

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16060998#comment-16060998
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r123742438
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/runtime/aggregate/SortUtil.scala
 ---
@@ -0,0 +1,247 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.table.runtime.aggregate
+
+import org.apache.flink.table.calcite.FlinkTypeFactory
+import org.apache.flink.types.Row
+import org.apache.calcite.rel.`type`._
+import org.apache.calcite.rel.RelCollation
+import org.apache.flink.streaming.api.functions.ProcessFunction
+import org.apache.flink.table.functions.AggregateFunction
+import org.apache.calcite.sql.`type`.SqlTypeName
+import org.apache.flink.table.api.TableException
+import org.apache.calcite.sql.`type`.SqlTypeName._
+import java.util.{ List => JList, ArrayList }
+import org.apache.flink.api.common.typeinfo.{SqlTimeTypeInfo, 
TypeInformation}
+import org.apache.flink.api.java.typeutils.RowTypeInfo
+import java.sql.Timestamp
+import org.apache.calcite.rel.RelFieldCollation
+import org.apache.calcite.rel.RelFieldCollation.Direction
+import java.util.Comparator
+import org.apache.flink.api.common.typeutils.TypeComparator
+import org.apache.flink.api.common.typeinfo.BasicTypeInfo._
+import 
java.lang.{Byte=>JByte,Integer=>JInt,Long=>JLong,Double=>JDouble,Short=>JShort,String=>JString,Float=>JFloat}
+import java.math.{BigDecimal=>JBigDecimal}
+import org.apache.flink.api.common.functions.MapFunction
+import org.apache.flink.api.common.operators.Order
+import org.apache.calcite.rex.{RexLiteral, RexNode}
+import org.apache.flink.api.common.ExecutionConfig
+import org.apache.flink.api.common.typeinfo.AtomicType
+import org.apache.flink.api.java.typeutils.runtime.RowComparator
+import org.apache.flink.api.common.typeutils.TypeSerializer
+import org.apache.flink.table.runtime.types.{CRow, CRowTypeInfo}
+
+/**
+ * Class represents a collection of helper methods to build the sort logic.
+ * It encapsulates as well the implementation for ordering and generic 
interfaces
+ */
+object SortUtil {
+
+  /**
+   * Function creates 
[org.apache.flink.streaming.api.functions.ProcessFunction] for sorting 
+   * elements based on rowtime and potentially other fields
+   * @param collationSort The Sort collation list
+   * @param inputType input row type
+   * @param execCfg table environment execution configuration
+   * @return org.apache.flink.streaming.api.functions.ProcessFunction
+   */
+  private[flink] def createRowTimeSortFunction(
+collationSort: RelCollation,
+inputType: RelDataType,
+inputTypeInfo: TypeInformation[Row],
+execCfg: ExecutionConfig): ProcessFunction[CRow, CRow] = {
+
+val keySortFields = getSortFieldIndexList(collationSort)
+//drop time from comparison as we sort on time in the states and 
result emission
+val keyIndexesNoTime = keySortFields.slice(1, keySortFields.size)
+val booleanOrderings = getSortFieldDirectionBooleanList(collationSort)
+val booleanDirectionsNoTime = booleanOrderings.slice(1, 
booleanOrderings.size)
+
+val fieldComps = createFieldComparators(
+  inputType, 
+  keyIndexesNoTime, 
+  booleanDirectionsNoTime,
+  execCfg)
+val fieldCompsRefs = 
fieldComps.asInstanceOf[Array[TypeComparator[AnyRef]]]
+
+val rowComp = new RowComparator(
+   inputType.getFieldCount,
+   keyIndexesNoTime,
+   fieldCompsRefs,
+   new Array[TypeSerializer[AnyRef]](0), //used only for object 
comparisons
+   booleanDirectionsNoTime)
+  
+val collectionRowCo

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16060989#comment-16060989
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r123709906
  
--- Diff: 
flink-libraries/flink-table/src/test/scala/org/apache/flink/table/api/scala/stream/sql/SortITCase.scala
 ---
@@ -0,0 +1,127 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.api.scala.stream.sql
+
+import org.apache.flink.api.scala._
+import 
org.apache.flink.table.api.scala.stream.sql.SortITCase.StringRowSelectorSink
+import 
org.apache.flink.table.api.scala.stream.sql.TimeTestUtil.EventTimeSourceFunction
+import org.apache.flink.streaming.api.functions.source.SourceFunction
+import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment
+import org.apache.flink.table.api.TableEnvironment
+import org.apache.flink.table.api.scala._
+import org.apache.flink.table.api.scala.stream.utils.{StreamITCase, 
StreamTestData, StreamingWithStateTestBase}
+import org.apache.flink.api.common.typeinfo.BasicTypeInfo
+import org.apache.flink.api.java.typeutils.RowTypeInfo
+import org.apache.flink.api.common.typeinfo.TypeInformation
+import org.apache.flink.types.Row
+import org.junit.Assert._
+import org.junit._
+import org.apache.flink.streaming.api.TimeCharacteristic
+import 
org.apache.flink.streaming.api.functions.source.SourceFunction.SourceContext
+import org.apache.flink.streaming.api.watermark.Watermark
+import scala.collection.mutable
+import org.apache.flink.streaming.api.functions.sink.RichSinkFunction
+
+class SortITCase extends StreamingWithStateTestBase {
+
+  @Test
+  def testEventTimeOrderBy(): Unit = {
+val data = Seq(
+  Left((1500L, (1L, 15, "Hello"))),
+  Left((1600L, (1L, 16, "Hello"))),
+  Left((1000L, (1L, 1, "Hello"))),
+  Left((2000L, (2L, 2, "Hello"))),
+  Right(1000L),
+  Left((2000L, (2L, 2, "Hello"))),
+  Left((2000L, (2L, 3, "Hello"))),
+  Left((3000L, (3L, 3, "Hello"))),
+  Left((2000L, (3L, 1, "Hello"))),
+  Right(2000L),
+  Left((4000L, (4L, 4, "Hello"))),
+  Right(3000L),
+  Left((5000L, (5L, 5, "Hello"))),
+  Right(5000L),
+  Left((6000L, (6L, 65, "Hello"))),
+  Left((6000L, (6L, 6, "Hello"))),
+  Left((6000L, (6L, 67, "Hello"))),
+  Left((6000L, (6L, -1, "Hello"))),
+  Left((6000L, (6L, 6, "Hello"))),
+  Right(7000L),
+  Left((9000L, (6L, 9, "Hello"))),
+  Left((8500L, (6L, 18, "Hello"))),
+  Left((9000L, (6L, 7, "Hello"))),
+  Right(1L),
+  Left((1L, (7L, 7, "Hello World"))),
+  Left((11000L, (7L, 77, "Hello World"))),
+  Left((11000L, (7L, 17, "Hello World"))),
+  Right(12000L),
+  Left((14000L, (7L, 18, "Hello World"))),
+  Right(14000L),
+  Left((15000L, (8L, 8, "Hello World"))),
+  Right(17000L),
+  Left((2L, (20L, 20, "Hello World"))), 
+  Right(19000L))
+
+val env = StreamExecutionEnvironment.getExecutionEnvironment
+env.setStateBackend(getStateBackend)
+env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
+val tEnv = TableEnvironment.getTableEnvironment(env)
+StreamITCase.clear
+
+val t1 = env.addSource(new EventTimeSourceFunction[(Long, Int, 
String)](data))
+  .toTable(tEnv, 'a, 'b, 'c, 'rowtime.rowtime)
+  
+tEnv.registerTable("T1", t1)
+
+val  sqlQuery = "SELECT b FROM T1 " +
+  "ORDER BY rowtime, b ASC ";
+  
+  
+val result = tEnv.sql(sqlQuery).toDataStream[Row]
--- End diff --

`toDataStream` is deprecated. Please use `toAppendStream` instead.


> Support Limit/Top(Sor

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16061002#comment-16061002
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r123744145
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/runtime/aggregate/SortUtil.scala
 ---
@@ -0,0 +1,247 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.table.runtime.aggregate
+
+import org.apache.flink.table.calcite.FlinkTypeFactory
+import org.apache.flink.types.Row
+import org.apache.calcite.rel.`type`._
+import org.apache.calcite.rel.RelCollation
+import org.apache.flink.streaming.api.functions.ProcessFunction
+import org.apache.flink.table.functions.AggregateFunction
+import org.apache.calcite.sql.`type`.SqlTypeName
+import org.apache.flink.table.api.TableException
+import org.apache.calcite.sql.`type`.SqlTypeName._
+import java.util.{ List => JList, ArrayList }
+import org.apache.flink.api.common.typeinfo.{SqlTimeTypeInfo, 
TypeInformation}
+import org.apache.flink.api.java.typeutils.RowTypeInfo
+import java.sql.Timestamp
+import org.apache.calcite.rel.RelFieldCollation
+import org.apache.calcite.rel.RelFieldCollation.Direction
+import java.util.Comparator
+import org.apache.flink.api.common.typeutils.TypeComparator
+import org.apache.flink.api.common.typeinfo.BasicTypeInfo._
+import 
java.lang.{Byte=>JByte,Integer=>JInt,Long=>JLong,Double=>JDouble,Short=>JShort,String=>JString,Float=>JFloat}
+import java.math.{BigDecimal=>JBigDecimal}
+import org.apache.flink.api.common.functions.MapFunction
+import org.apache.flink.api.common.operators.Order
+import org.apache.calcite.rex.{RexLiteral, RexNode}
+import org.apache.flink.api.common.ExecutionConfig
+import org.apache.flink.api.common.typeinfo.AtomicType
+import org.apache.flink.api.java.typeutils.runtime.RowComparator
+import org.apache.flink.api.common.typeutils.TypeSerializer
+import org.apache.flink.table.runtime.types.{CRow, CRowTypeInfo}
+
+/**
+ * Class represents a collection of helper methods to build the sort logic.
+ * It encapsulates as well the implementation for ordering and generic 
interfaces
+ */
+object SortUtil {
+
+  /**
+   * Function creates 
[org.apache.flink.streaming.api.functions.ProcessFunction] for sorting 
+   * elements based on rowtime and potentially other fields
+   * @param collationSort The Sort collation list
+   * @param inputType input row type
+   * @param execCfg table environment execution configuration
+   * @return org.apache.flink.streaming.api.functions.ProcessFunction
+   */
+  private[flink] def createRowTimeSortFunction(
+collationSort: RelCollation,
+inputType: RelDataType,
+inputTypeInfo: TypeInformation[Row],
+execCfg: ExecutionConfig): ProcessFunction[CRow, CRow] = {
+
+val keySortFields = getSortFieldIndexList(collationSort)
+//drop time from comparison as we sort on time in the states and 
result emission
+val keyIndexesNoTime = keySortFields.slice(1, keySortFields.size)
+val booleanOrderings = getSortFieldDirectionBooleanList(collationSort)
+val booleanDirectionsNoTime = booleanOrderings.slice(1, 
booleanOrderings.size)
+
+val fieldComps = createFieldComparators(
+  inputType, 
+  keyIndexesNoTime, 
+  booleanDirectionsNoTime,
+  execCfg)
+val fieldCompsRefs = 
fieldComps.asInstanceOf[Array[TypeComparator[AnyRef]]]
--- End diff --

You can change `createFieldComparators` to return an 
`Array[TypeComparator[AnyRef]]` and avoid the additional cast.


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16061027#comment-16061027
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r123752704
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/runtime/aggregate/ProcTimeSortProcessFunction.scala
 ---
@@ -0,0 +1,120 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.table.runtime.aggregate
+
+import org.apache.flink.api.common.state.{ ListState, ListStateDescriptor }
+import org.apache.flink.api.java.typeutils.RowTypeInfo
+import org.apache.flink.configuration.Configuration
+import org.apache.flink.runtime.state.{ FunctionInitializationContext, 
FunctionSnapshotContext }
+import org.apache.flink.streaming.api.functions.ProcessFunction
+import org.apache.flink.types.Row
+import org.apache.flink.util.{ Collector, Preconditions }
+import org.apache.flink.api.common.state.ValueState
+import org.apache.flink.api.common.state.ValueStateDescriptor
+import scala.util.control.Breaks._
+import org.apache.flink.api.java.tuple.{ Tuple2 => JTuple2 }
+import org.apache.flink.api.common.state.MapState
+import org.apache.flink.api.common.state.MapStateDescriptor
+import org.apache.flink.api.common.typeinfo.TypeInformation
+import org.apache.flink.api.java.typeutils.ListTypeInfo
+import java.util.Comparator
+import java.util.ArrayList
+import java.util.Collections
+import org.apache.flink.api.common.typeutils.TypeComparator
+import org.apache.flink.table.runtime.types.{CRow, CRowTypeInfo}
+import java.lang.{Integer=>JInt}
+
+/**
+ * Process Function used for the aggregate in bounded proctime sort 
without offset/fetch
+ * [[org.apache.flink.streaming.api.datastream.DataStream]]
+ *
+ * @param inputRowType It is used to mark the type of the incoming data
+ * @param rowComparator the [[java.util.Comparator]] is used for this sort 
aggregation
+ */
+class ProcTimeSortProcessFunction(
+  private val inputRowType: CRowTypeInfo,
+  private val rowComparator: CollectionRowComparator)
+extends ProcessFunction[CRow, CRow] {
+
+  Preconditions.checkNotNull(rowComparator)
+
+  private var stateEventsBuffer: ListState[Row] = _
--- End diff --

rename to `bufferedEvents`?


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description*

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16060999#comment-16060999
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r123717968
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/rules/datastream/DataStreamSortRule.scala
 ---
@@ -0,0 +1,111 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.table.plan.rules.datastream
+
+import org.apache.calcite.plan.volcano.RelSubset
+import org.apache.calcite.plan.{ Convention, RelOptRule, RelOptRuleCall, 
RelTraitSet }
+import org.apache.calcite.rel.RelNode
+import org.apache.calcite.rel.convert.ConverterRule
+import org.apache.calcite.rel.logical.{ LogicalFilter, LogicalCorrelate, 
LogicalTableFunctionScan }
+import org.apache.calcite.rex.RexNode
+import org.apache.flink.table.plan.nodes.datastream.DataStreamCorrelate
+import org.apache.calcite.rel.logical.LogicalSort
+import org.apache.flink.table.plan.nodes.datastream.DataStreamSort
+import org.apache.calcite.rel.RelCollation
+import org.apache.flink.table.plan.nodes.FlinkConventions
+import org.apache.flink.table.plan.schema.RowSchema
+import org.apache.flink.table.plan.nodes.logical.FlinkLogicalSort
+import org.apache.flink.table.calcite.FlinkTypeFactory
+import org.apache.flink.table.api.TableException
+import org.apache.calcite.rel.RelFieldCollation.Direction
+import org.apache.flink.table.runtime.aggregate.SortUtil
+
+/**
+ * Rule to convert a LogicalSort into a DataStreamSort.
+ */
+class DataStreamSortRule
+extends ConverterRule(
+  classOf[FlinkLogicalSort],
+  FlinkConventions.LOGICAL,
+  FlinkConventions.DATASTREAM,
+  "DataStreamSortRule") {
+
+  override def matches(call: RelOptRuleCall): Boolean = {
+
+val result = super.matches(call)
+
+//need to identify time between others order fields. Time needs to be 
first sort element
+// we can safely convert the object if the match rule succeeded 
+if(result) {
+  val calcSort: FlinkLogicalSort = 
call.rel(0).asInstanceOf[FlinkLogicalSort]
+  checkTimeOrder(calcSort)
+}
+
+result
+  }
+
+  override def convert(rel: RelNode): RelNode = {
+val calcSort: FlinkLogicalSort = rel.asInstanceOf[FlinkLogicalSort]
+val traitSet: RelTraitSet = 
rel.getTraitSet.replace(FlinkConventions.DATASTREAM)
+val convInput: RelNode = RelOptRule.convert(calcSort.getInput(0), 
FlinkConventions.DATASTREAM)
+
+val inputRowType = 
convInput.asInstanceOf[RelSubset].getOriginal.getRowType
+
+new DataStreamSort(
+  rel.getCluster,
+  traitSet,
+  convInput,
+  new RowSchema(inputRowType),
+  new RowSchema(rel.getRowType),
+  calcSort.collation,
+  calcSort.offset,
+  calcSort.fetch,
+  description)
+
+  }
+  
+   
+  /**
+   * Function is used to check at verification time if the SQL syntax is 
supported
+   */
+  
+  def checkTimeOrder(calcSort: FlinkLogicalSort) = {
--- End diff --

`calcSort` -> `sort`


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16061018#comment-16061018
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r123710581
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/nodes/CommonSort.scala
 ---
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.plan.nodes
+
+import org.apache.calcite.rex.{RexLiteral, RexNode}
+import org.apache.calcite.rel.RelFieldCollation
+import org.apache.calcite.rel.RelCollation
+import org.apache.calcite.rel.RelFieldCollation.Direction
+import org.apache.calcite.rel.`type`._
+import scala.collection.JavaConverters._
+import org.apache.flink.api.common.operators.Order
+
+
+/**
+ * Trait represents a collection of sort methods to manipulate the 
parameters
+ */
+
+trait CommonSort {
+  
+  private[flink] def offsetToString(offset: RexNode): String = {
+val offsetToString = s"$offset"
+offsetToString
+  }
+  
--- End diff --

rm new line


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 operations (limit, top and sort) are similar in behavior as they all 
> require a sorted collection of the data on which the logic will be applied 
> (i.e., select a subset of the items or the entire sorted set). These 
> functions would make sense in the streaming context only in the context of a 
> window. Without defining a window the functions could never emit as the sort 
> operation would never trigger. If an SQL query will be provided without 
> limits an error will be thrown (`SELECT a FROM stream1 TOP 10` -> ERROR). 
> Although not targeted by this JIRA, in the case of working based on event 
> time order, the retraction mechanisms of windows and the lateness mechanisms 
> can be used to deal with out of order events and retraction/updates of 
> results.
> **Functionality example**
> We exemplify with the query below for all the 3 types of operators (sorting, 
> limit and top). Rowtime indicates when the HOP window will trigger – which 
> can be observed in the fact that outputs are generated only at those moments. 
> The HOP windows will trigger at every hour (fixed hour) and each event will 
> contribute/ be duplicated for 2 consecutive hour intervals. Proctime 
> indicates the processing time when a new event arrives in th

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16060994#comment-16060994
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r123712493
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/nodes/datastream/DataStreamSort.scala
 ---
@@ -0,0 +1,206 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.plan.nodes.datastream
+
+import org.apache.calcite.plan.{ RelOptCluster, RelTraitSet }
+import org.apache.calcite.rel.`type`.RelDataType
+import org.apache.calcite.rel.{ RelNode, RelWriter, SingleRel }
+import org.apache.flink.api.java.tuple.Tuple
+import org.apache.flink.streaming.api.datastream.{ AllWindowedStream, 
DataStream, KeyedStream, WindowedStream }
+import org.apache.flink.streaming.api.windowing.assigners._
+import org.apache.flink.streaming.api.windowing.time.Time
+import org.apache.flink.streaming.api.windowing.windows.{ Window => 
DataStreamWindow }
+import org.apache.flink.table.calcite.FlinkRelBuilder.NamedWindowProperty
+import org.apache.flink.table.calcite.FlinkTypeFactory
+import org.apache.flink.table.expressions._
+import org.apache.flink.table.plan.logical._
+import org.apache.flink.table.plan.nodes.CommonAggregate
+import org.apache.flink.table.runtime.aggregate.AggregateUtil._
+import org.apache.flink.table.runtime.aggregate._
+import org.apache.flink.table.typeutils.TypeCheckUtils.isTimeInterval
+import org.apache.flink.table.typeutils.{ RowIntervalTypeInfo, 
TimeIntervalTypeInfo }
+import org.apache.flink.types.Row
+import org.apache.calcite.sql.SqlAggFunction
+import org.apache.flink.table.plan.nodes.datastream.DataStreamRel
+import org.apache.calcite.sql.fun.SqlSingleValueAggFunction
+import org.apache.flink.api.common.functions.RichMapFunction
+import org.apache.flink.api.common.functions.RichFlatMapFunction
+import org.apache.flink.configuration.Configuration
+import org.apache.flink.util.Collector
+import org.apache.flink.api.common.state.ValueState
+import org.apache.flink.api.common.state.ValueStateDescriptor
+import org.apache.flink.api.java.typeutils.RowTypeInfo
+import org.apache.calcite.rel.core.Sort
+import org.apache.flink.api.java.functions.NullByteKeySelector
+import org.apache.calcite.rel.RelFieldCollation.Direction
+import org.apache.flink.table.runtime.aggregate.SortUtil._
+import org.apache.calcite.rel.RelCollation
+import org.apache.calcite.rex.RexNode
+import org.apache.flink.api.common.ExecutionConfig
+import org.apache.flink.table.runtime.types.{CRow, CRowTypeInfo}
+import org.apache.flink.table.api.{StreamQueryConfig, 
StreamTableEnvironment, TableException}
+import org.apache.flink.table.plan.schema.RowSchema
+import org.apache.flink.table.plan.nodes.CommonSort
+
+/**
+  * Flink RelNode which matches along with Sort Rule.
+  *
+ */
+class DataStreamSort(
+cluster: RelOptCluster,
+traitSet: RelTraitSet,
+inputNode: RelNode,
+inputSchema: RowSchema,
+schema: RowSchema,
+sortCollation: RelCollation,
+sortOffset: RexNode,
+sortFetch: RexNode,
+description: String)
+  extends SingleRel(cluster, traitSet, inputNode)
+  with CommonSort
+  with DataStreamRel {
+
+  override def deriveRowType(): RelDataType = schema.logicalType
+
+  override def copy(traitSet: RelTraitSet, inputs: 
java.util.List[RelNode]): RelNode = {
+new DataStreamSort(
+  cluster,
+  traitSet,
+  inputs.get(0),
+  inputSchema,
+  schema,
+  sortCollation,
+  sortOffset,
+  sortFetch,
+  description)
+  }
+
+  override def toString: String = {
+s"Sort(by: ($$sortFieldsToSt

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16061013#comment-16061013
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r123739308
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/runtime/aggregate/SortUtil.scala
 ---
@@ -0,0 +1,247 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.table.runtime.aggregate
+
+import org.apache.flink.table.calcite.FlinkTypeFactory
+import org.apache.flink.types.Row
+import org.apache.calcite.rel.`type`._
+import org.apache.calcite.rel.RelCollation
+import org.apache.flink.streaming.api.functions.ProcessFunction
+import org.apache.flink.table.functions.AggregateFunction
+import org.apache.calcite.sql.`type`.SqlTypeName
+import org.apache.flink.table.api.TableException
+import org.apache.calcite.sql.`type`.SqlTypeName._
+import java.util.{ List => JList, ArrayList }
+import org.apache.flink.api.common.typeinfo.{SqlTimeTypeInfo, 
TypeInformation}
+import org.apache.flink.api.java.typeutils.RowTypeInfo
+import java.sql.Timestamp
+import org.apache.calcite.rel.RelFieldCollation
+import org.apache.calcite.rel.RelFieldCollation.Direction
+import java.util.Comparator
+import org.apache.flink.api.common.typeutils.TypeComparator
+import org.apache.flink.api.common.typeinfo.BasicTypeInfo._
+import 
java.lang.{Byte=>JByte,Integer=>JInt,Long=>JLong,Double=>JDouble,Short=>JShort,String=>JString,Float=>JFloat}
--- End diff --

please add a space after a comma


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 operations (limit, top and sort) are similar in behavior as they all 
> require a sorted collection of the data on which the logic will be applied 
> (i.e., select a subset of the items or the entire sorted set). These 
> functions would make sense in the streaming context only in the context of a 
> window. Without defining a window the functions could never emit as the sort 
> operation would never trigger. If an SQL query will be provided without 
> limits an error will be thrown (`SELECT a FROM stream1 TOP 10` -> ERROR). 
> Although not targeted by this JIRA, in the case of working based on event 
> time order, the retraction mechanisms of windows and the lateness mechanisms 
> can be used to deal with out of order events and retraction/updates of 
> results.
> **Fun

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16061014#comment-16061014
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r123749098
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/runtime/aggregate/SortUtil.scala
 ---
@@ -0,0 +1,247 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.table.runtime.aggregate
+
+import org.apache.flink.table.calcite.FlinkTypeFactory
+import org.apache.flink.types.Row
+import org.apache.calcite.rel.`type`._
+import org.apache.calcite.rel.RelCollation
+import org.apache.flink.streaming.api.functions.ProcessFunction
+import org.apache.flink.table.functions.AggregateFunction
+import org.apache.calcite.sql.`type`.SqlTypeName
+import org.apache.flink.table.api.TableException
+import org.apache.calcite.sql.`type`.SqlTypeName._
+import java.util.{ List => JList, ArrayList }
+import org.apache.flink.api.common.typeinfo.{SqlTimeTypeInfo, 
TypeInformation}
+import org.apache.flink.api.java.typeutils.RowTypeInfo
+import java.sql.Timestamp
+import org.apache.calcite.rel.RelFieldCollation
+import org.apache.calcite.rel.RelFieldCollation.Direction
+import java.util.Comparator
+import org.apache.flink.api.common.typeutils.TypeComparator
+import org.apache.flink.api.common.typeinfo.BasicTypeInfo._
+import 
java.lang.{Byte=>JByte,Integer=>JInt,Long=>JLong,Double=>JDouble,Short=>JShort,String=>JString,Float=>JFloat}
+import java.math.{BigDecimal=>JBigDecimal}
+import org.apache.flink.api.common.functions.MapFunction
+import org.apache.flink.api.common.operators.Order
+import org.apache.calcite.rex.{RexLiteral, RexNode}
+import org.apache.flink.api.common.ExecutionConfig
+import org.apache.flink.api.common.typeinfo.AtomicType
+import org.apache.flink.api.java.typeutils.runtime.RowComparator
+import org.apache.flink.api.common.typeutils.TypeSerializer
+import org.apache.flink.table.runtime.types.{CRow, CRowTypeInfo}
+
+/**
+ * Class represents a collection of helper methods to build the sort logic.
+ * It encapsulates as well the implementation for ordering and generic 
interfaces
+ */
+object SortUtil {
+
+  /**
+   * Function creates 
[org.apache.flink.streaming.api.functions.ProcessFunction] for sorting 
+   * elements based on rowtime and potentially other fields
+   * @param collationSort The Sort collation list
+   * @param inputType input row type
+   * @param execCfg table environment execution configuration
+   * @return org.apache.flink.streaming.api.functions.ProcessFunction
+   */
+  private[flink] def createRowTimeSortFunction(
+collationSort: RelCollation,
+inputType: RelDataType,
+inputTypeInfo: TypeInformation[Row],
+execCfg: ExecutionConfig): ProcessFunction[CRow, CRow] = {
+
+val keySortFields = getSortFieldIndexList(collationSort)
+//drop time from comparison as we sort on time in the states and 
result emission
+val keyIndexesNoTime = keySortFields.slice(1, keySortFields.size)
+val booleanOrderings = getSortFieldDirectionBooleanList(collationSort)
+val booleanDirectionsNoTime = booleanOrderings.slice(1, 
booleanOrderings.size)
+
+val fieldComps = createFieldComparators(
+  inputType, 
+  keyIndexesNoTime, 
+  booleanDirectionsNoTime,
+  execCfg)
+val fieldCompsRefs = 
fieldComps.asInstanceOf[Array[TypeComparator[AnyRef]]]
+
+val rowComp = new RowComparator(
+   inputType.getFieldCount,
+   keyIndexesNoTime,
+   fieldCompsRefs,
+   new Array[TypeSerializer[AnyRef]](0), //used only for object 
comparisons
+   booleanDirectionsNoTime)
+  
+val collectionRowCo

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16060995#comment-16060995
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r123713964
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/nodes/datastream/DataStreamSort.scala
 ---
@@ -0,0 +1,206 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.plan.nodes.datastream
+
+import org.apache.calcite.plan.{ RelOptCluster, RelTraitSet }
+import org.apache.calcite.rel.`type`.RelDataType
+import org.apache.calcite.rel.{ RelNode, RelWriter, SingleRel }
+import org.apache.flink.api.java.tuple.Tuple
+import org.apache.flink.streaming.api.datastream.{ AllWindowedStream, 
DataStream, KeyedStream, WindowedStream }
+import org.apache.flink.streaming.api.windowing.assigners._
+import org.apache.flink.streaming.api.windowing.time.Time
+import org.apache.flink.streaming.api.windowing.windows.{ Window => 
DataStreamWindow }
+import org.apache.flink.table.calcite.FlinkRelBuilder.NamedWindowProperty
+import org.apache.flink.table.calcite.FlinkTypeFactory
+import org.apache.flink.table.expressions._
+import org.apache.flink.table.plan.logical._
+import org.apache.flink.table.plan.nodes.CommonAggregate
+import org.apache.flink.table.runtime.aggregate.AggregateUtil._
+import org.apache.flink.table.runtime.aggregate._
+import org.apache.flink.table.typeutils.TypeCheckUtils.isTimeInterval
+import org.apache.flink.table.typeutils.{ RowIntervalTypeInfo, 
TimeIntervalTypeInfo }
+import org.apache.flink.types.Row
+import org.apache.calcite.sql.SqlAggFunction
+import org.apache.flink.table.plan.nodes.datastream.DataStreamRel
+import org.apache.calcite.sql.fun.SqlSingleValueAggFunction
+import org.apache.flink.api.common.functions.RichMapFunction
+import org.apache.flink.api.common.functions.RichFlatMapFunction
+import org.apache.flink.configuration.Configuration
+import org.apache.flink.util.Collector
+import org.apache.flink.api.common.state.ValueState
+import org.apache.flink.api.common.state.ValueStateDescriptor
+import org.apache.flink.api.java.typeutils.RowTypeInfo
+import org.apache.calcite.rel.core.Sort
+import org.apache.flink.api.java.functions.NullByteKeySelector
+import org.apache.calcite.rel.RelFieldCollation.Direction
+import org.apache.flink.table.runtime.aggregate.SortUtil._
+import org.apache.calcite.rel.RelCollation
+import org.apache.calcite.rex.RexNode
+import org.apache.flink.api.common.ExecutionConfig
+import org.apache.flink.table.runtime.types.{CRow, CRowTypeInfo}
+import org.apache.flink.table.api.{StreamQueryConfig, 
StreamTableEnvironment, TableException}
+import org.apache.flink.table.plan.schema.RowSchema
+import org.apache.flink.table.plan.nodes.CommonSort
+
+/**
+  * Flink RelNode which matches along with Sort Rule.
+  *
+ */
+class DataStreamSort(
+cluster: RelOptCluster,
+traitSet: RelTraitSet,
+inputNode: RelNode,
+inputSchema: RowSchema,
+schema: RowSchema,
+sortCollation: RelCollation,
+sortOffset: RexNode,
+sortFetch: RexNode,
+description: String)
+  extends SingleRel(cluster, traitSet, inputNode)
+  with CommonSort
+  with DataStreamRel {
+
+  override def deriveRowType(): RelDataType = schema.logicalType
+
+  override def copy(traitSet: RelTraitSet, inputs: 
java.util.List[RelNode]): RelNode = {
+new DataStreamSort(
+  cluster,
+  traitSet,
+  inputs.get(0),
+  inputSchema,
+  schema,
+  sortCollation,
+  sortOffset,
+  sortFetch,
+  description)
+  }
+
+  override def toString: String = {
+s"Sort(by: ($$sortFieldsToSt

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16061023#comment-16061023
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r123758098
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/nodes/datastream/DataStreamSort.scala
 ---
@@ -0,0 +1,206 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.plan.nodes.datastream
+
+import org.apache.calcite.plan.{ RelOptCluster, RelTraitSet }
+import org.apache.calcite.rel.`type`.RelDataType
+import org.apache.calcite.rel.{ RelNode, RelWriter, SingleRel }
+import org.apache.flink.api.java.tuple.Tuple
+import org.apache.flink.streaming.api.datastream.{ AllWindowedStream, 
DataStream, KeyedStream, WindowedStream }
+import org.apache.flink.streaming.api.windowing.assigners._
+import org.apache.flink.streaming.api.windowing.time.Time
+import org.apache.flink.streaming.api.windowing.windows.{ Window => 
DataStreamWindow }
+import org.apache.flink.table.calcite.FlinkRelBuilder.NamedWindowProperty
+import org.apache.flink.table.calcite.FlinkTypeFactory
+import org.apache.flink.table.expressions._
+import org.apache.flink.table.plan.logical._
+import org.apache.flink.table.plan.nodes.CommonAggregate
+import org.apache.flink.table.runtime.aggregate.AggregateUtil._
+import org.apache.flink.table.runtime.aggregate._
+import org.apache.flink.table.typeutils.TypeCheckUtils.isTimeInterval
+import org.apache.flink.table.typeutils.{ RowIntervalTypeInfo, 
TimeIntervalTypeInfo }
+import org.apache.flink.types.Row
+import org.apache.calcite.sql.SqlAggFunction
+import org.apache.flink.table.plan.nodes.datastream.DataStreamRel
+import org.apache.calcite.sql.fun.SqlSingleValueAggFunction
+import org.apache.flink.api.common.functions.RichMapFunction
+import org.apache.flink.api.common.functions.RichFlatMapFunction
+import org.apache.flink.configuration.Configuration
+import org.apache.flink.util.Collector
+import org.apache.flink.api.common.state.ValueState
+import org.apache.flink.api.common.state.ValueStateDescriptor
+import org.apache.flink.api.java.typeutils.RowTypeInfo
+import org.apache.calcite.rel.core.Sort
+import org.apache.flink.api.java.functions.NullByteKeySelector
+import org.apache.calcite.rel.RelFieldCollation.Direction
+import org.apache.flink.table.runtime.aggregate.SortUtil._
+import org.apache.calcite.rel.RelCollation
+import org.apache.calcite.rex.RexNode
+import org.apache.flink.api.common.ExecutionConfig
+import org.apache.flink.table.runtime.types.{CRow, CRowTypeInfo}
+import org.apache.flink.table.api.{StreamQueryConfig, 
StreamTableEnvironment, TableException}
+import org.apache.flink.table.plan.schema.RowSchema
+import org.apache.flink.table.plan.nodes.CommonSort
+
+/**
+  * Flink RelNode which matches along with Sort Rule.
+  *
+ */
+class DataStreamSort(
+cluster: RelOptCluster,
+traitSet: RelTraitSet,
+inputNode: RelNode,
+inputSchema: RowSchema,
+schema: RowSchema,
+sortCollation: RelCollation,
+sortOffset: RexNode,
+sortFetch: RexNode,
+description: String)
+  extends SingleRel(cluster, traitSet, inputNode)
+  with CommonSort
+  with DataStreamRel {
+
+  override def deriveRowType(): RelDataType = schema.logicalType
+
+  override def copy(traitSet: RelTraitSet, inputs: 
java.util.List[RelNode]): RelNode = {
+new DataStreamSort(
+  cluster,
+  traitSet,
+  inputs.get(0),
+  inputSchema,
+  schema,
+  sortCollation,
+  sortOffset,
+  sortFetch,
+  description)
+  }
+
+  override def toString: String = {
+s"Sort(by: ($$sortFieldsToSt

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16061021#comment-16061021
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r123759690
  
--- Diff: 
flink-libraries/flink-table/src/test/scala/org/apache/flink/table/runtime/aggregate/TimeSortProcessFunctionTest.scala
 ---
@@ -0,0 +1,256 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.table.runtime.aggregate
+
+import java.util.Comparator
+import java.util.concurrent.ConcurrentLinkedQueue
+import java.lang.{Integer => JInt, Long => JLong}
+
+import org.apache.flink.api.common.typeinfo.BasicTypeInfo._
+import org.apache.flink.api.common.typeinfo.{BasicTypeInfo, 
TypeInformation}
+import org.apache.flink.api.java.functions.KeySelector
+import org.apache.flink.api.java.typeutils.RowTypeInfo
+import org.apache.flink.streaming.api.operators.KeyedProcessOperator
+import org.apache.flink.streaming.api.watermark.Watermark
+import org.apache.flink.streaming.runtime.streamrecord.StreamRecord
+import 
org.apache.flink.streaming.util.{KeyedOneInputStreamOperatorTestHarness, 
TestHarnessUtil}
+import org.apache.flink.types.Row
+import org.junit.Test
+import org.apache.flink.table.runtime.aggregate.ProcTimeSortProcessFunction
+import org.apache.flink.table.runtime.aggregate.RowTimeSortProcessFunction
+import 
org.apache.flink.table.runtime.aggregate.TimeSortProcessFunctionTest._
+import org.apache.flink.api.java.typeutils.runtime.RowComparator
+import org.apache.flink.api.common.typeutils.TypeSerializer
+import org.apache.flink.api.common.typeutils.TypeComparator
+import org.apache.flink.table.runtime.types.{CRow, CRowTypeInfo}
+import org.apache.flink.streaming.api.TimeCharacteristic
+
+class TimeSortProcessFunctionTest{
+
+  
+  @Test
+  def testSortProcTimeHarnessPartitioned(): Unit = {
+
+val rT =  new RowTypeInfo(Array[TypeInformation[_]](
+  INT_TYPE_INFO,
+  LONG_TYPE_INFO,
+  INT_TYPE_INFO,
+  STRING_TYPE_INFO,
+  LONG_TYPE_INFO),
+  Array("a","b","c","d","e"))
--- End diff --

space after comma


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 operations (limit, top and sort) are similar in behavior as they all 
> require a sorted collection of the data on which the logic will be ap

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16061011#comment-16061011
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r123713426
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/nodes/CommonSort.scala
 ---
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.plan.nodes
+
+import org.apache.calcite.rex.{RexLiteral, RexNode}
+import org.apache.calcite.rel.RelFieldCollation
+import org.apache.calcite.rel.RelCollation
+import org.apache.calcite.rel.RelFieldCollation.Direction
+import org.apache.calcite.rel.`type`._
+import scala.collection.JavaConverters._
+import org.apache.flink.api.common.operators.Order
+
+
+/**
+ * Trait represents a collection of sort methods to manipulate the 
parameters
+ */
+
+trait CommonSort {
+  
+  private[flink] def offsetToString(offset: RexNode): String = {
+val offsetToString = s"$offset"
+offsetToString
+  }
+  
+  
+  private[flink] def sortFieldsToString(
+  collationSort: RelCollation, 
+  rowRelDataType: RelDataType): String = {
+val fieldCollations = collationSort.getFieldCollations.asScala  
+.map(c => (c.getFieldIndex, directionToOrder(c.getDirection)))
+
+val sortFieldsToString = fieldCollations
+  .map(col => s"${
+rowRelDataType.getFieldNames.get(col._1)} ${col._2.getShortName}" 
).mkString(", ")
+
+sortFieldsToString
+  }
+  
+  private[flink] def directionToOrder(direction: Direction) = {
+direction match {
+  case Direction.ASCENDING | Direction.STRICTLY_ASCENDING => 
Order.ASCENDING
+  case Direction.DESCENDING | Direction.STRICTLY_DESCENDING => 
Order.DESCENDING
+  case _ => throw new IllegalArgumentException("Unsupported 
direction.")
+}
+  }
+  
+  private[flink] def fetchToString(fetch: RexNode, offset: RexNode): 
String = {
+val limitEnd = getFetchLimitEnd(fetch, offset)
+val fetchToString = if (limitEnd == Long.MaxValue) {
--- End diff --

No need to define `val fetchToString` if it is only returned.


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 operations (limit, top and sort) are similar in behavior as they all 
> require a sorte

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16061025#comment-16061025
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r123757132
  
--- Diff: 
flink-libraries/flink-table/src/test/scala/org/apache/flink/table/api/scala/stream/sql/SortTest.scala
 ---
@@ -0,0 +1,87 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.table.api.scala.stream.sql
+
+import org.apache.flink.api.scala._
+import org.apache.flink.table.api.TableException
+import org.apache.flink.table.api.scala._
+import org.apache.flink.table.utils.TableTestUtil._
+import org.apache.flink.table.utils.{StreamTableTestUtil, TableTestBase}
+import org.junit.Test
+
+class SortTest extends TableTestBase {
+  private val streamUtil: StreamTableTestUtil = streamTestUtil()
+  streamUtil.addTable[(Int, String, Long)]("MyTable", 'a, 'b, 'c,
+  'proctime.proctime, 'rowtime.rowtime)
+  
+  @Test
+  def testSortProcessingTime() = {
+
+val sqlQuery = "SELECT a FROM MyTable ORDER BY proctime, c"
+  
+val expected =
+  unaryNode(
+"DataStreamCalc",
+unaryNode("DataStreamSort",
+  streamTableNode(0),
+  term("orderBy", "proctime ASC", "c ASC"),
+  term("offset", "null"), 
--- End diff --

I think we should hide `offset` and `fetch` if they are not defined.


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 operations (limit, top and sort) are similar in behavior as they all 
> require a sorted collection of the data on which the logic will be applied 
> (i.e., select a subset of the items or the entire sorted set). These 
> functions would make sense in the streaming context only in the context of a 
> window. Without defining a window the functions could never emit as the sort 
> operation would never trigger. If an SQL query will be provided without 
> limits an error will be thrown (`SELECT a FROM stream1 TOP 10` -> ERROR). 
> Although not targeted by this JIRA, in the case of working based on event 
> time order, the retraction mechanisms of windows and the lateness mechanisms 
> can be used to deal with out of order events and retraction/updates of 
> results.
> **Functionality example**
> We exemplify with the query below for all the 3 types of operators (sorting, 
> limit and top). Rowtime indica

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16060993#comment-16060993
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r123717194
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/rules/datastream/DataStreamSortRule.scala
 ---
@@ -0,0 +1,111 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.table.plan.rules.datastream
+
+import org.apache.calcite.plan.volcano.RelSubset
+import org.apache.calcite.plan.{ Convention, RelOptRule, RelOptRuleCall, 
RelTraitSet }
+import org.apache.calcite.rel.RelNode
+import org.apache.calcite.rel.convert.ConverterRule
+import org.apache.calcite.rel.logical.{ LogicalFilter, LogicalCorrelate, 
LogicalTableFunctionScan }
+import org.apache.calcite.rex.RexNode
+import org.apache.flink.table.plan.nodes.datastream.DataStreamCorrelate
+import org.apache.calcite.rel.logical.LogicalSort
+import org.apache.flink.table.plan.nodes.datastream.DataStreamSort
+import org.apache.calcite.rel.RelCollation
+import org.apache.flink.table.plan.nodes.FlinkConventions
+import org.apache.flink.table.plan.schema.RowSchema
+import org.apache.flink.table.plan.nodes.logical.FlinkLogicalSort
+import org.apache.flink.table.calcite.FlinkTypeFactory
+import org.apache.flink.table.api.TableException
+import org.apache.calcite.rel.RelFieldCollation.Direction
+import org.apache.flink.table.runtime.aggregate.SortUtil
+
+/**
+ * Rule to convert a LogicalSort into a DataStreamSort.
+ */
+class DataStreamSortRule
+extends ConverterRule(
+  classOf[FlinkLogicalSort],
+  FlinkConventions.LOGICAL,
+  FlinkConventions.DATASTREAM,
+  "DataStreamSortRule") {
+
+  override def matches(call: RelOptRuleCall): Boolean = {
+
+val result = super.matches(call)
+
+//need to identify time between others order fields. Time needs to be 
first sort element
+// we can safely convert the object if the match rule succeeded 
+if(result) {
+  val calcSort: FlinkLogicalSort = 
call.rel(0).asInstanceOf[FlinkLogicalSort]
--- End diff --

rename to `sort`. It is not a `calc`


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 operations (limit, top and sort) are similar in behavior as they all 
> require a sorted c

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16061000#comment-16061000
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r123742381
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/runtime/aggregate/SortUtil.scala
 ---
@@ -0,0 +1,247 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.table.runtime.aggregate
+
+import org.apache.flink.table.calcite.FlinkTypeFactory
+import org.apache.flink.types.Row
+import org.apache.calcite.rel.`type`._
+import org.apache.calcite.rel.RelCollation
+import org.apache.flink.streaming.api.functions.ProcessFunction
+import org.apache.flink.table.functions.AggregateFunction
+import org.apache.calcite.sql.`type`.SqlTypeName
+import org.apache.flink.table.api.TableException
+import org.apache.calcite.sql.`type`.SqlTypeName._
+import java.util.{ List => JList, ArrayList }
+import org.apache.flink.api.common.typeinfo.{SqlTimeTypeInfo, 
TypeInformation}
+import org.apache.flink.api.java.typeutils.RowTypeInfo
+import java.sql.Timestamp
+import org.apache.calcite.rel.RelFieldCollation
+import org.apache.calcite.rel.RelFieldCollation.Direction
+import java.util.Comparator
+import org.apache.flink.api.common.typeutils.TypeComparator
+import org.apache.flink.api.common.typeinfo.BasicTypeInfo._
+import 
java.lang.{Byte=>JByte,Integer=>JInt,Long=>JLong,Double=>JDouble,Short=>JShort,String=>JString,Float=>JFloat}
+import java.math.{BigDecimal=>JBigDecimal}
+import org.apache.flink.api.common.functions.MapFunction
+import org.apache.flink.api.common.operators.Order
+import org.apache.calcite.rex.{RexLiteral, RexNode}
+import org.apache.flink.api.common.ExecutionConfig
+import org.apache.flink.api.common.typeinfo.AtomicType
+import org.apache.flink.api.java.typeutils.runtime.RowComparator
+import org.apache.flink.api.common.typeutils.TypeSerializer
+import org.apache.flink.table.runtime.types.{CRow, CRowTypeInfo}
+
+/**
+ * Class represents a collection of helper methods to build the sort logic.
+ * It encapsulates as well the implementation for ordering and generic 
interfaces
+ */
+object SortUtil {
+
+  /**
+   * Function creates 
[org.apache.flink.streaming.api.functions.ProcessFunction] for sorting 
+   * elements based on rowtime and potentially other fields
+   * @param collationSort The Sort collation list
+   * @param inputType input row type
+   * @param execCfg table environment execution configuration
+   * @return org.apache.flink.streaming.api.functions.ProcessFunction
+   */
+  private[flink] def createRowTimeSortFunction(
+collationSort: RelCollation,
+inputType: RelDataType,
+inputTypeInfo: TypeInformation[Row],
+execCfg: ExecutionConfig): ProcessFunction[CRow, CRow] = {
+
+val keySortFields = getSortFieldIndexList(collationSort)
+//drop time from comparison as we sort on time in the states and 
result emission
+val keyIndexesNoTime = keySortFields.slice(1, keySortFields.size)
+val booleanOrderings = getSortFieldDirectionBooleanList(collationSort)
+val booleanDirectionsNoTime = booleanOrderings.slice(1, 
booleanOrderings.size)
+
+val fieldComps = createFieldComparators(
+  inputType, 
+  keyIndexesNoTime, 
+  booleanDirectionsNoTime,
+  execCfg)
+val fieldCompsRefs = 
fieldComps.asInstanceOf[Array[TypeComparator[AnyRef]]]
+
+val rowComp = new RowComparator(
+   inputType.getFieldCount,
+   keyIndexesNoTime,
+   fieldCompsRefs,
+   new Array[TypeSerializer[AnyRef]](0), //used only for object 
comparisons
+   booleanDirectionsNoTime)
+  
+val collectionRowCo

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16061022#comment-16061022
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r123755279
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/runtime/aggregate/RowTimeSortProcessFunction.scala
 ---
@@ -0,0 +1,146 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.table.runtime.aggregate
+
+import org.apache.flink.api.common.typeinfo.{BasicTypeInfo, 
TypeInformation}
+import org.apache.flink.api.java.typeutils.{RowTypeInfo, ListTypeInfo}
+import org.apache.flink.runtime.state.{ FunctionInitializationContext, 
FunctionSnapshotContext }
+import org.apache.flink.streaming.api.functions.ProcessFunction
+import org.apache.flink.types.Row
+import org.apache.flink.util.{ Collector, Preconditions }
+import org.apache.flink.api.common.state.ValueState
+import org.apache.flink.api.common.state.ValueStateDescriptor
+import scala.util.control.Breaks._
+import org.apache.flink.api.java.tuple.{ Tuple2 => JTuple2 }
+import org.apache.flink.api.common.state.MapState
+import org.apache.flink.api.common.state.MapStateDescriptor
+import org.apache.flink.configuration.Configuration
+import java.util.Comparator
+import java.util.ArrayList
+import java.util.Collections
+import org.apache.flink.api.common.typeutils.TypeComparator
+import java.util.{List => JList, ArrayList => JArrayList}
+import org.apache.flink.table.runtime.types.{CRow, CRowTypeInfo}
+
+/**
+ * Process Function used for the aggregate in bounded rowtime sort without 
offset/fetch
+ * [[org.apache.flink.streaming.api.datastream.DataStream]]
+ *
+ * @param inputRowType It is used to mark the type of the incoming data
+ * @param rowComparator the [[java.util.Comparator]] is used for this sort 
aggregation
+ */
+class RowTimeSortProcessFunction(
+  private val inputRowType: CRowTypeInfo,
+  private val rowComparator: CollectionRowComparator)
+extends ProcessFunction[CRow, CRow] {
+
+  Preconditions.checkNotNull(rowComparator)
+
+  // the state which keeps all the events that are not expired.
+  // Each timestamp will contain an associated list with the events 
+  // received at that timestamp
+  private var dataState: MapState[Long, JList[Row]] = _
+
+// the state which keeps the last triggering timestamp to filter late 
events
+  private var lastTriggeringTsState: ValueState[Long] = _
+  
+  private var outputC: CRow = _
+  
+  override def open(config: Configuration) {
+ 
+val keyTypeInformation: TypeInformation[Long] =
+  BasicTypeInfo.LONG_TYPE_INFO.asInstanceOf[TypeInformation[Long]]
+val valueTypeInformation: TypeInformation[JList[Row]] = new 
ListTypeInfo[Row](
+inputRowType.asInstanceOf[CRowTypeInfo].rowType)
+
+val mapStateDescriptor: MapStateDescriptor[Long, JList[Row]] =
+  new MapStateDescriptor[Long, JList[Row]](
+"dataState",
+keyTypeInformation,
+valueTypeInformation)
+
+dataState = getRuntimeContext.getMapState(mapStateDescriptor)
+
+val lastTriggeringTsDescriptor: ValueStateDescriptor[Long] =
+  new ValueStateDescriptor[Long]("lastTriggeringTsState", 
classOf[Long])
+lastTriggeringTsState = 
getRuntimeContext.getState(lastTriggeringTsDescriptor)
+
+if (outputC == null) {
--- End diff --

Can be done as `outputC = new CRow()`


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API &

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16061019#comment-16061019
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r123748629
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/runtime/aggregate/SortUtil.scala
 ---
@@ -0,0 +1,247 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.table.runtime.aggregate
+
+import org.apache.flink.table.calcite.FlinkTypeFactory
+import org.apache.flink.types.Row
+import org.apache.calcite.rel.`type`._
+import org.apache.calcite.rel.RelCollation
+import org.apache.flink.streaming.api.functions.ProcessFunction
+import org.apache.flink.table.functions.AggregateFunction
+import org.apache.calcite.sql.`type`.SqlTypeName
+import org.apache.flink.table.api.TableException
+import org.apache.calcite.sql.`type`.SqlTypeName._
+import java.util.{ List => JList, ArrayList }
+import org.apache.flink.api.common.typeinfo.{SqlTimeTypeInfo, 
TypeInformation}
+import org.apache.flink.api.java.typeutils.RowTypeInfo
+import java.sql.Timestamp
+import org.apache.calcite.rel.RelFieldCollation
+import org.apache.calcite.rel.RelFieldCollation.Direction
+import java.util.Comparator
+import org.apache.flink.api.common.typeutils.TypeComparator
+import org.apache.flink.api.common.typeinfo.BasicTypeInfo._
+import 
java.lang.{Byte=>JByte,Integer=>JInt,Long=>JLong,Double=>JDouble,Short=>JShort,String=>JString,Float=>JFloat}
+import java.math.{BigDecimal=>JBigDecimal}
+import org.apache.flink.api.common.functions.MapFunction
+import org.apache.flink.api.common.operators.Order
+import org.apache.calcite.rex.{RexLiteral, RexNode}
+import org.apache.flink.api.common.ExecutionConfig
+import org.apache.flink.api.common.typeinfo.AtomicType
+import org.apache.flink.api.java.typeutils.runtime.RowComparator
+import org.apache.flink.api.common.typeutils.TypeSerializer
+import org.apache.flink.table.runtime.types.{CRow, CRowTypeInfo}
+
+/**
+ * Class represents a collection of helper methods to build the sort logic.
+ * It encapsulates as well the implementation for ordering and generic 
interfaces
+ */
+object SortUtil {
+
+  /**
+   * Function creates 
[org.apache.flink.streaming.api.functions.ProcessFunction] for sorting 
+   * elements based on rowtime and potentially other fields
+   * @param collationSort The Sort collation list
+   * @param inputType input row type
+   * @param execCfg table environment execution configuration
+   * @return org.apache.flink.streaming.api.functions.ProcessFunction
+   */
+  private[flink] def createRowTimeSortFunction(
+collationSort: RelCollation,
+inputType: RelDataType,
+inputTypeInfo: TypeInformation[Row],
+execCfg: ExecutionConfig): ProcessFunction[CRow, CRow] = {
+
+val keySortFields = getSortFieldIndexList(collationSort)
+//drop time from comparison as we sort on time in the states and 
result emission
+val keyIndexesNoTime = keySortFields.slice(1, keySortFields.size)
+val booleanOrderings = getSortFieldDirectionBooleanList(collationSort)
+val booleanDirectionsNoTime = booleanOrderings.slice(1, 
booleanOrderings.size)
+
+val fieldComps = createFieldComparators(
+  inputType, 
+  keyIndexesNoTime, 
+  booleanDirectionsNoTime,
+  execCfg)
+val fieldCompsRefs = 
fieldComps.asInstanceOf[Array[TypeComparator[AnyRef]]]
+
+val rowComp = new RowComparator(
+   inputType.getFieldCount,
+   keyIndexesNoTime,
+   fieldCompsRefs,
+   new Array[TypeSerializer[AnyRef]](0), //used only for object 
comparisons
+   booleanDirectionsNoTime)
+  
+val collectionRowCo

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16060991#comment-16060991
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r123712536
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/nodes/dataset/DataSetSort.scala
 ---
@@ -136,36 +130,21 @@ class DataSetSort(
 }
   }
 
-  private def directionToOrder(direction: Direction) = {
-direction match {
-  case Direction.ASCENDING | Direction.STRICTLY_ASCENDING => 
Order.ASCENDING
-  case Direction.DESCENDING | Direction.STRICTLY_DESCENDING => 
Order.DESCENDING
-  case _ => throw new IllegalArgumentException("Unsupported 
direction.")
-}
-
-  }
-
   private val fieldCollations = collations.getFieldCollations.asScala
 .map(c => (c.getFieldIndex, directionToOrder(c.getDirection)))
 
-  private val sortFieldsToString = fieldCollations
-.map(col => s"${getRowType.getFieldNames.get(col._1)} 
${col._2.getShortName}" ).mkString(", ")
-
-  private val offsetToString = s"$offset"
-
-  private val fetchToString = if (limitEnd == Long.MaxValue) {
-"unlimited"
-  } else {
-s"$limitEnd"
+
+  override def toString: String = {
+s"Sort(by: ($$sortFieldsToString(collations, getRowType))," +
+  " offset: $offsetToString(offset)," +
+  " fetch: $fetchToString(fetch, offset))"
   }
-
-  override def toString: String =
-s"Sort(by: ($sortFieldsToString), offset: $offsetToString, fetch: 
$fetchToString)"
-
+
+
   override def explainTerms(pw: RelWriter) : RelWriter = {
 super.explainTerms(pw)
-  .item("orderBy", sortFieldsToString)
-  .item("offset", offsetToString)
-  .item("fetch", fetchToString)
+  .item("orderBy", sortFieldsToString(collations, getRowType))
--- End diff --

Move the implementation to `CommonSort`


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 operations (limit, top and sort) are similar in behavior as they all 
> require a sorted collection of the data on which the logic will be applied 
> (i.e., select a subset of the items or the entire sorted set). These 
> functions would make sense in the streaming context only in the context of a 
> window. Without defining a window the functions could never emit as the sort 
> operation would never trigger. If an SQL query will be provided without 
> limits an error will be thrown (`SELECT a FROM stream1 TOP 10` -> ERROR). 
> Although not targeted by this JIRA, in the case of working based on event 
> time order, the retraction mechanisms of windows and the lateness mechanisms 
> can be used to deal with out of order events and retraction/updates of 
> results.
> **Functionality example**
> We exemplify with the query below for all the 3 types of operators (sorting, 
> limit and top). Rowtime indicates when the HOP window will trigger – which 
> can be observed in the fact that outputs are generated only at those moments. 
> The HOP windows will trigger at every hour (fixed hour) and each event will 
> contribute/ be duplicated for 2 consecutive hour intervals. Proctime 
> indicates the pro

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16061009#comment-16061009
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r123709798
  
--- Diff: 
flink-libraries/flink-table/src/test/scala/org/apache/flink/table/api/scala/stream/sql/SortITCase.scala
 ---
@@ -0,0 +1,127 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.api.scala.stream.sql
+
+import org.apache.flink.api.scala._
+import 
org.apache.flink.table.api.scala.stream.sql.SortITCase.StringRowSelectorSink
+import 
org.apache.flink.table.api.scala.stream.sql.TimeTestUtil.EventTimeSourceFunction
+import org.apache.flink.streaming.api.functions.source.SourceFunction
+import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment
+import org.apache.flink.table.api.TableEnvironment
+import org.apache.flink.table.api.scala._
+import org.apache.flink.table.api.scala.stream.utils.{StreamITCase, 
StreamTestData, StreamingWithStateTestBase}
+import org.apache.flink.api.common.typeinfo.BasicTypeInfo
+import org.apache.flink.api.java.typeutils.RowTypeInfo
+import org.apache.flink.api.common.typeinfo.TypeInformation
+import org.apache.flink.types.Row
+import org.junit.Assert._
+import org.junit._
+import org.apache.flink.streaming.api.TimeCharacteristic
+import 
org.apache.flink.streaming.api.functions.source.SourceFunction.SourceContext
+import org.apache.flink.streaming.api.watermark.Watermark
+import scala.collection.mutable
+import org.apache.flink.streaming.api.functions.sink.RichSinkFunction
+
+class SortITCase extends StreamingWithStateTestBase {
+
+  @Test
+  def testEventTimeOrderBy(): Unit = {
+val data = Seq(
+  Left((1500L, (1L, 15, "Hello"))),
+  Left((1600L, (1L, 16, "Hello"))),
+  Left((1000L, (1L, 1, "Hello"))),
+  Left((2000L, (2L, 2, "Hello"))),
+  Right(1000L),
+  Left((2000L, (2L, 2, "Hello"))),
+  Left((2000L, (2L, 3, "Hello"))),
+  Left((3000L, (3L, 3, "Hello"))),
+  Left((2000L, (3L, 1, "Hello"))),
+  Right(2000L),
+  Left((4000L, (4L, 4, "Hello"))),
+  Right(3000L),
+  Left((5000L, (5L, 5, "Hello"))),
+  Right(5000L),
+  Left((6000L, (6L, 65, "Hello"))),
+  Left((6000L, (6L, 6, "Hello"))),
+  Left((6000L, (6L, 67, "Hello"))),
+  Left((6000L, (6L, -1, "Hello"))),
+  Left((6000L, (6L, 6, "Hello"))),
+  Right(7000L),
+  Left((9000L, (6L, 9, "Hello"))),
+  Left((8500L, (6L, 18, "Hello"))),
+  Left((9000L, (6L, 7, "Hello"))),
+  Right(1L),
+  Left((1L, (7L, 7, "Hello World"))),
+  Left((11000L, (7L, 77, "Hello World"))),
+  Left((11000L, (7L, 17, "Hello World"))),
+  Right(12000L),
+  Left((14000L, (7L, 18, "Hello World"))),
+  Right(14000L),
+  Left((15000L, (8L, 8, "Hello World"))),
+  Right(17000L),
+  Left((2L, (20L, 20, "Hello World"))), 
+  Right(19000L))
+
+val env = StreamExecutionEnvironment.getExecutionEnvironment
+env.setStateBackend(getStateBackend)
+env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
+val tEnv = TableEnvironment.getTableEnvironment(env)
+StreamITCase.clear
+
+val t1 = env.addSource(new EventTimeSourceFunction[(Long, Int, 
String)](data))
+  .toTable(tEnv, 'a, 'b, 'c, 'rowtime.rowtime)
+  
+tEnv.registerTable("T1", t1)
+
+val  sqlQuery = "SELECT b FROM T1 " +
+  "ORDER BY rowtime, b ASC ";
--- End diff --

rm `;`


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16061006#comment-16061006
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r123745122
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/runtime/aggregate/SortUtil.scala
 ---
@@ -0,0 +1,247 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.table.runtime.aggregate
+
+import org.apache.flink.table.calcite.FlinkTypeFactory
+import org.apache.flink.types.Row
+import org.apache.calcite.rel.`type`._
+import org.apache.calcite.rel.RelCollation
+import org.apache.flink.streaming.api.functions.ProcessFunction
+import org.apache.flink.table.functions.AggregateFunction
+import org.apache.calcite.sql.`type`.SqlTypeName
+import org.apache.flink.table.api.TableException
+import org.apache.calcite.sql.`type`.SqlTypeName._
+import java.util.{ List => JList, ArrayList }
+import org.apache.flink.api.common.typeinfo.{SqlTimeTypeInfo, 
TypeInformation}
+import org.apache.flink.api.java.typeutils.RowTypeInfo
+import java.sql.Timestamp
+import org.apache.calcite.rel.RelFieldCollation
+import org.apache.calcite.rel.RelFieldCollation.Direction
+import java.util.Comparator
+import org.apache.flink.api.common.typeutils.TypeComparator
+import org.apache.flink.api.common.typeinfo.BasicTypeInfo._
+import 
java.lang.{Byte=>JByte,Integer=>JInt,Long=>JLong,Double=>JDouble,Short=>JShort,String=>JString,Float=>JFloat}
+import java.math.{BigDecimal=>JBigDecimal}
+import org.apache.flink.api.common.functions.MapFunction
+import org.apache.flink.api.common.operators.Order
+import org.apache.calcite.rex.{RexLiteral, RexNode}
+import org.apache.flink.api.common.ExecutionConfig
+import org.apache.flink.api.common.typeinfo.AtomicType
+import org.apache.flink.api.java.typeutils.runtime.RowComparator
+import org.apache.flink.api.common.typeutils.TypeSerializer
+import org.apache.flink.table.runtime.types.{CRow, CRowTypeInfo}
+
+/**
+ * Class represents a collection of helper methods to build the sort logic.
+ * It encapsulates as well the implementation for ordering and generic 
interfaces
+ */
+object SortUtil {
+
+  /**
+   * Function creates 
[org.apache.flink.streaming.api.functions.ProcessFunction] for sorting 
+   * elements based on rowtime and potentially other fields
+   * @param collationSort The Sort collation list
+   * @param inputType input row type
+   * @param execCfg table environment execution configuration
+   * @return org.apache.flink.streaming.api.functions.ProcessFunction
+   */
+  private[flink] def createRowTimeSortFunction(
+collationSort: RelCollation,
+inputType: RelDataType,
+inputTypeInfo: TypeInformation[Row],
+execCfg: ExecutionConfig): ProcessFunction[CRow, CRow] = {
+
+val keySortFields = getSortFieldIndexList(collationSort)
+//drop time from comparison as we sort on time in the states and 
result emission
+val keyIndexesNoTime = keySortFields.slice(1, keySortFields.size)
+val booleanOrderings = getSortFieldDirectionBooleanList(collationSort)
+val booleanDirectionsNoTime = booleanOrderings.slice(1, 
booleanOrderings.size)
+
+val fieldComps = createFieldComparators(
+  inputType, 
+  keyIndexesNoTime, 
+  booleanDirectionsNoTime,
+  execCfg)
+val fieldCompsRefs = 
fieldComps.asInstanceOf[Array[TypeComparator[AnyRef]]]
+
+val rowComp = new RowComparator(
+   inputType.getFieldCount,
--- End diff --

This gives the field count of the logical row type, but we need the number 
of physical fields (`new RowSchema(inputType).physicalArity`)


> Support Limit/Top(Sort) for Stream SQL
> -

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16061008#comment-16061008
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r123718523
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/rules/datastream/DataStreamSortRule.scala
 ---
@@ -0,0 +1,111 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.table.plan.rules.datastream
+
+import org.apache.calcite.plan.volcano.RelSubset
+import org.apache.calcite.plan.{ Convention, RelOptRule, RelOptRuleCall, 
RelTraitSet }
+import org.apache.calcite.rel.RelNode
+import org.apache.calcite.rel.convert.ConverterRule
+import org.apache.calcite.rel.logical.{ LogicalFilter, LogicalCorrelate, 
LogicalTableFunctionScan }
+import org.apache.calcite.rex.RexNode
+import org.apache.flink.table.plan.nodes.datastream.DataStreamCorrelate
+import org.apache.calcite.rel.logical.LogicalSort
+import org.apache.flink.table.plan.nodes.datastream.DataStreamSort
+import org.apache.calcite.rel.RelCollation
+import org.apache.flink.table.plan.nodes.FlinkConventions
+import org.apache.flink.table.plan.schema.RowSchema
+import org.apache.flink.table.plan.nodes.logical.FlinkLogicalSort
+import org.apache.flink.table.calcite.FlinkTypeFactory
+import org.apache.flink.table.api.TableException
+import org.apache.calcite.rel.RelFieldCollation.Direction
+import org.apache.flink.table.runtime.aggregate.SortUtil
+
+/**
+ * Rule to convert a LogicalSort into a DataStreamSort.
+ */
+class DataStreamSortRule
+extends ConverterRule(
+  classOf[FlinkLogicalSort],
+  FlinkConventions.LOGICAL,
+  FlinkConventions.DATASTREAM,
+  "DataStreamSortRule") {
+
+  override def matches(call: RelOptRuleCall): Boolean = {
+
+val result = super.matches(call)
+
+//need to identify time between others order fields. Time needs to be 
first sort element
+// we can safely convert the object if the match rule succeeded 
+if(result) {
+  val calcSort: FlinkLogicalSort = 
call.rel(0).asInstanceOf[FlinkLogicalSort]
+  checkTimeOrder(calcSort)
+}
+
+result
+  }
+
+  override def convert(rel: RelNode): RelNode = {
+val calcSort: FlinkLogicalSort = rel.asInstanceOf[FlinkLogicalSort]
+val traitSet: RelTraitSet = 
rel.getTraitSet.replace(FlinkConventions.DATASTREAM)
+val convInput: RelNode = RelOptRule.convert(calcSort.getInput(0), 
FlinkConventions.DATASTREAM)
+
+val inputRowType = 
convInput.asInstanceOf[RelSubset].getOriginal.getRowType
+
+new DataStreamSort(
+  rel.getCluster,
+  traitSet,
+  convInput,
+  new RowSchema(inputRowType),
+  new RowSchema(rel.getRowType),
+  calcSort.collation,
+  calcSort.offset,
+  calcSort.fetch,
+  description)
+
+  }
+  
+   
+  /**
+   * Function is used to check at verification time if the SQL syntax is 
supported
+   */
+  
+  def checkTimeOrder(calcSort: FlinkLogicalSort) = {
+
+val rowType = calcSort.getRowType
+val sortCollation = calcSort.collation 
+ //need to identify time between others order fields. Time needs to be 
first sort element
+val timeType = SortUtil.getTimeType(sortCollation, rowType)
+//time ordering needs to be ascending
+if (SortUtil.getTimeDirection(sortCollation) != Direction.ASCENDING) {
+  throw new TableException("SQL/Table supports only ascending time 
ordering")
+}
+//enable to extend for other types of aggregates that will not be 
implemented in a window
+timeType match {
+case _ if FlinkTypeFactory.isProctimeIndicatorType(timeTy

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16061016#comment-16061016
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r123743615
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/runtime/aggregate/SortUtil.scala
 ---
@@ -0,0 +1,247 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.table.runtime.aggregate
+
+import org.apache.flink.table.calcite.FlinkTypeFactory
+import org.apache.flink.types.Row
+import org.apache.calcite.rel.`type`._
+import org.apache.calcite.rel.RelCollation
+import org.apache.flink.streaming.api.functions.ProcessFunction
+import org.apache.flink.table.functions.AggregateFunction
+import org.apache.calcite.sql.`type`.SqlTypeName
+import org.apache.flink.table.api.TableException
+import org.apache.calcite.sql.`type`.SqlTypeName._
+import java.util.{ List => JList, ArrayList }
+import org.apache.flink.api.common.typeinfo.{SqlTimeTypeInfo, 
TypeInformation}
+import org.apache.flink.api.java.typeutils.RowTypeInfo
+import java.sql.Timestamp
+import org.apache.calcite.rel.RelFieldCollation
+import org.apache.calcite.rel.RelFieldCollation.Direction
+import java.util.Comparator
+import org.apache.flink.api.common.typeutils.TypeComparator
+import org.apache.flink.api.common.typeinfo.BasicTypeInfo._
+import 
java.lang.{Byte=>JByte,Integer=>JInt,Long=>JLong,Double=>JDouble,Short=>JShort,String=>JString,Float=>JFloat}
+import java.math.{BigDecimal=>JBigDecimal}
+import org.apache.flink.api.common.functions.MapFunction
+import org.apache.flink.api.common.operators.Order
+import org.apache.calcite.rex.{RexLiteral, RexNode}
+import org.apache.flink.api.common.ExecutionConfig
+import org.apache.flink.api.common.typeinfo.AtomicType
+import org.apache.flink.api.java.typeutils.runtime.RowComparator
+import org.apache.flink.api.common.typeutils.TypeSerializer
+import org.apache.flink.table.runtime.types.{CRow, CRowTypeInfo}
+
+/**
+ * Class represents a collection of helper methods to build the sort logic.
+ * It encapsulates as well the implementation for ordering and generic 
interfaces
+ */
+object SortUtil {
+
+  /**
+   * Function creates 
[org.apache.flink.streaming.api.functions.ProcessFunction] for sorting 
+   * elements based on rowtime and potentially other fields
+   * @param collationSort The Sort collation list
+   * @param inputType input row type
+   * @param execCfg table environment execution configuration
+   * @return org.apache.flink.streaming.api.functions.ProcessFunction
+   */
+  private[flink] def createRowTimeSortFunction(
+collationSort: RelCollation,
+inputType: RelDataType,
+inputTypeInfo: TypeInformation[Row],
+execCfg: ExecutionConfig): ProcessFunction[CRow, CRow] = {
+
+val keySortFields = getSortFieldIndexList(collationSort)
+//drop time from comparison as we sort on time in the states and 
result emission
+val keyIndexesNoTime = keySortFields.slice(1, keySortFields.size)
+val booleanOrderings = getSortFieldDirectionBooleanList(collationSort)
+val booleanDirectionsNoTime = booleanOrderings.slice(1, 
booleanOrderings.size)
+
+val fieldComps = createFieldComparators(
+  inputType, 
+  keyIndexesNoTime, 
+  booleanDirectionsNoTime,
+  execCfg)
+val fieldCompsRefs = 
fieldComps.asInstanceOf[Array[TypeComparator[AnyRef]]]
+
+val rowComp = new RowComparator(
+   inputType.getFieldCount,
+   keyIndexesNoTime,
+   fieldCompsRefs,
+   new Array[TypeSerializer[AnyRef]](0), //used only for object 
comparisons
+   booleanDirectionsNoTime)
+  
+val collectionRowCo

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16061010#comment-16061010
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r123754391
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/runtime/aggregate/RowTimeSortProcessFunction.scala
 ---
@@ -0,0 +1,146 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.table.runtime.aggregate
+
+import org.apache.flink.api.common.typeinfo.{BasicTypeInfo, 
TypeInformation}
+import org.apache.flink.api.java.typeutils.{RowTypeInfo, ListTypeInfo}
+import org.apache.flink.runtime.state.{ FunctionInitializationContext, 
FunctionSnapshotContext }
+import org.apache.flink.streaming.api.functions.ProcessFunction
+import org.apache.flink.types.Row
+import org.apache.flink.util.{ Collector, Preconditions }
+import org.apache.flink.api.common.state.ValueState
+import org.apache.flink.api.common.state.ValueStateDescriptor
+import scala.util.control.Breaks._
+import org.apache.flink.api.java.tuple.{ Tuple2 => JTuple2 }
+import org.apache.flink.api.common.state.MapState
+import org.apache.flink.api.common.state.MapStateDescriptor
+import org.apache.flink.configuration.Configuration
+import java.util.Comparator
+import java.util.ArrayList
+import java.util.Collections
+import org.apache.flink.api.common.typeutils.TypeComparator
+import java.util.{List => JList, ArrayList => JArrayList}
+import org.apache.flink.table.runtime.types.{CRow, CRowTypeInfo}
+
+/**
+ * Process Function used for the aggregate in bounded rowtime sort without 
offset/fetch
+ * [[org.apache.flink.streaming.api.datastream.DataStream]]
+ *
+ * @param inputRowType It is used to mark the type of the incoming data
+ * @param rowComparator the [[java.util.Comparator]] is used for this sort 
aggregation
+ */
+class RowTimeSortProcessFunction(
+  private val inputRowType: CRowTypeInfo,
+  private val rowComparator: CollectionRowComparator)
+extends ProcessFunction[CRow, CRow] {
+
+  Preconditions.checkNotNull(rowComparator)
+
+  // the state which keeps all the events that are not expired.
+  // Each timestamp will contain an associated list with the events 
+  // received at that timestamp
+  private var dataState: MapState[Long, JList[Row]] = _
+
+// the state which keeps the last triggering timestamp to filter late 
events
--- End diff --

indention


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are suppo

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16060990#comment-16060990
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r123716234
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/nodes/datastream/DataStreamSort.scala
 ---
@@ -0,0 +1,206 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.plan.nodes.datastream
+
+import org.apache.calcite.plan.{ RelOptCluster, RelTraitSet }
+import org.apache.calcite.rel.`type`.RelDataType
+import org.apache.calcite.rel.{ RelNode, RelWriter, SingleRel }
+import org.apache.flink.api.java.tuple.Tuple
+import org.apache.flink.streaming.api.datastream.{ AllWindowedStream, 
DataStream, KeyedStream, WindowedStream }
+import org.apache.flink.streaming.api.windowing.assigners._
+import org.apache.flink.streaming.api.windowing.time.Time
+import org.apache.flink.streaming.api.windowing.windows.{ Window => 
DataStreamWindow }
+import org.apache.flink.table.calcite.FlinkRelBuilder.NamedWindowProperty
+import org.apache.flink.table.calcite.FlinkTypeFactory
+import org.apache.flink.table.expressions._
+import org.apache.flink.table.plan.logical._
+import org.apache.flink.table.plan.nodes.CommonAggregate
+import org.apache.flink.table.runtime.aggregate.AggregateUtil._
+import org.apache.flink.table.runtime.aggregate._
+import org.apache.flink.table.typeutils.TypeCheckUtils.isTimeInterval
+import org.apache.flink.table.typeutils.{ RowIntervalTypeInfo, 
TimeIntervalTypeInfo }
+import org.apache.flink.types.Row
+import org.apache.calcite.sql.SqlAggFunction
+import org.apache.flink.table.plan.nodes.datastream.DataStreamRel
+import org.apache.calcite.sql.fun.SqlSingleValueAggFunction
+import org.apache.flink.api.common.functions.RichMapFunction
+import org.apache.flink.api.common.functions.RichFlatMapFunction
+import org.apache.flink.configuration.Configuration
+import org.apache.flink.util.Collector
+import org.apache.flink.api.common.state.ValueState
+import org.apache.flink.api.common.state.ValueStateDescriptor
+import org.apache.flink.api.java.typeutils.RowTypeInfo
+import org.apache.calcite.rel.core.Sort
+import org.apache.flink.api.java.functions.NullByteKeySelector
+import org.apache.calcite.rel.RelFieldCollation.Direction
+import org.apache.flink.table.runtime.aggregate.SortUtil._
+import org.apache.calcite.rel.RelCollation
+import org.apache.calcite.rex.RexNode
+import org.apache.flink.api.common.ExecutionConfig
+import org.apache.flink.table.runtime.types.{CRow, CRowTypeInfo}
+import org.apache.flink.table.api.{StreamQueryConfig, 
StreamTableEnvironment, TableException}
+import org.apache.flink.table.plan.schema.RowSchema
+import org.apache.flink.table.plan.nodes.CommonSort
+
+/**
+  * Flink RelNode which matches along with Sort Rule.
+  *
+ */
+class DataStreamSort(
+cluster: RelOptCluster,
+traitSet: RelTraitSet,
+inputNode: RelNode,
+inputSchema: RowSchema,
+schema: RowSchema,
+sortCollation: RelCollation,
+sortOffset: RexNode,
+sortFetch: RexNode,
+description: String)
+  extends SingleRel(cluster, traitSet, inputNode)
--- End diff --

Could also extend `Sort` instead of `SingleRel`


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same on

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16060988#comment-16060988
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r123713677
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/nodes/CommonSort.scala
 ---
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.plan.nodes
+
+import org.apache.calcite.rex.{RexLiteral, RexNode}
+import org.apache.calcite.rel.RelFieldCollation
+import org.apache.calcite.rel.RelCollation
+import org.apache.calcite.rel.RelFieldCollation.Direction
+import org.apache.calcite.rel.`type`._
+import scala.collection.JavaConverters._
+import org.apache.flink.api.common.operators.Order
+
+
+/**
+ * Trait represents a collection of sort methods to manipulate the 
parameters
+ */
+
+trait CommonSort {
+  
+  private[flink] def offsetToString(offset: RexNode): String = {
+val offsetToString = s"$offset"
+offsetToString
+  }
+  
+  
+  private[flink] def sortFieldsToString(
+  collationSort: RelCollation, 
+  rowRelDataType: RelDataType): String = {
+val fieldCollations = collationSort.getFieldCollations.asScala  
+.map(c => (c.getFieldIndex, directionToOrder(c.getDirection)))
+
+val sortFieldsToString = fieldCollations
+  .map(col => s"${
+rowRelDataType.getFieldNames.get(col._1)} ${col._2.getShortName}" 
).mkString(", ")
+
+sortFieldsToString
+  }
+  
+  private[flink] def directionToOrder(direction: Direction) = {
+direction match {
+  case Direction.ASCENDING | Direction.STRICTLY_ASCENDING => 
Order.ASCENDING
+  case Direction.DESCENDING | Direction.STRICTLY_DESCENDING => 
Order.DESCENDING
+  case _ => throw new IllegalArgumentException("Unsupported 
direction.")
+}
+  }
+  
+  private[flink] def fetchToString(fetch: RexNode, offset: RexNode): 
String = {
+val limitEnd = getFetchLimitEnd(fetch, offset)
+val fetchToString = if (limitEnd == Long.MaxValue) {
--- End diff --

same for the return values of `getFetchLimitEnd` and `getFetchLimitStart`


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 operations (limit, top and sort) are similar in behavior as they all 
> req

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16061005#comment-16061005
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r123713179
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/nodes/CommonSort.scala
 ---
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.plan.nodes
+
+import org.apache.calcite.rex.{RexLiteral, RexNode}
+import org.apache.calcite.rel.RelFieldCollation
+import org.apache.calcite.rel.RelCollation
+import org.apache.calcite.rel.RelFieldCollation.Direction
+import org.apache.calcite.rel.`type`._
+import scala.collection.JavaConverters._
+import org.apache.flink.api.common.operators.Order
+
+
+/**
+ * Trait represents a collection of sort methods to manipulate the 
parameters
+ */
+
+trait CommonSort {
+  
+  private[flink] def offsetToString(offset: RexNode): String = {
+val offsetToString = s"$offset"
+offsetToString
+  }
+  
+  
+  private[flink] def sortFieldsToString(
+  collationSort: RelCollation, 
+  rowRelDataType: RelDataType): String = {
+val fieldCollations = collationSort.getFieldCollations.asScala  
+.map(c => (c.getFieldIndex, directionToOrder(c.getDirection)))
+
+val sortFieldsToString = fieldCollations
--- End diff --

`sortFieldsToString` is only used as return value. In that case we can 
simply return the result of the expression without defining a `val`:
```
fieldCollations
  .map(col => s"${rowRelDataType.getFieldNames.get(col._1)} 
${col._2.getShortName}" )
  .mkString(", ")
```


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 operations (limit, top and sort) are similar in behavior as they all 
> require a sorted collection of the data on which the logic will be applied 
> (i.e., select a subset of the items or the entire sorted set). These 
> functions would make sense in the streaming context only in the context of a 
> window. Without defining a window the functions could never emit as the sort 
> operation would never trigger. If an SQL query will be provided without 
> limits an error will be thrown (`SELECT a FROM stream1 TOP 10` -> ERROR). 
> Although not targeted by this JIRA, in the case of working based on event 
> time order, the

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16060996#comment-16060996
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r123713771
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/nodes/CommonSort.scala
 ---
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.plan.nodes
+
+import org.apache.calcite.rex.{RexLiteral, RexNode}
+import org.apache.calcite.rel.RelFieldCollation
+import org.apache.calcite.rel.RelCollation
+import org.apache.calcite.rel.RelFieldCollation.Direction
+import org.apache.calcite.rel.`type`._
+import scala.collection.JavaConverters._
+import org.apache.flink.api.common.operators.Order
+
+
+/**
+ * Trait represents a collection of sort methods to manipulate the 
parameters
+ */
+
+trait CommonSort {
+  
+  private[flink] def offsetToString(offset: RexNode): String = {
+val offsetToString = s"$offset"
+offsetToString
+  }
+  
+  
+  private[flink] def sortFieldsToString(
--- End diff --

This is probably true for most other methods in this class.


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 operations (limit, top and sort) are similar in behavior as they all 
> require a sorted collection of the data on which the logic will be applied 
> (i.e., select a subset of the items or the entire sorted set). These 
> functions would make sense in the streaming context only in the context of a 
> window. Without defining a window the functions could never emit as the sort 
> operation would never trigger. If an SQL query will be provided without 
> limits an error will be thrown (`SELECT a FROM stream1 TOP 10` -> ERROR). 
> Although not targeted by this JIRA, in the case of working based on event 
> time order, the retraction mechanisms of windows and the lateness mechanisms 
> can be used to deal with out of order events and retraction/updates of 
> results.
> **Functionality example**
> We exemplify with the query below for all the 3 types of operators (sorting, 
> limit and top). Rowtime indicates when the HOP window will trigger – which 
> can be observed in the fact that outputs are generated only at those moments. 
> The HOP windows will trigger at every hour (fixed hour) and each event will 
> contribute/ be duplicated for

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-06-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16061004#comment-16061004
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r123717004
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/rules/datastream/DataStreamSortRule.scala
 ---
@@ -0,0 +1,111 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.table.plan.rules.datastream
+
+import org.apache.calcite.plan.volcano.RelSubset
+import org.apache.calcite.plan.{ Convention, RelOptRule, RelOptRuleCall, 
RelTraitSet }
+import org.apache.calcite.rel.RelNode
+import org.apache.calcite.rel.convert.ConverterRule
+import org.apache.calcite.rel.logical.{ LogicalFilter, LogicalCorrelate, 
LogicalTableFunctionScan }
+import org.apache.calcite.rex.RexNode
+import org.apache.flink.table.plan.nodes.datastream.DataStreamCorrelate
+import org.apache.calcite.rel.logical.LogicalSort
+import org.apache.flink.table.plan.nodes.datastream.DataStreamSort
+import org.apache.calcite.rel.RelCollation
+import org.apache.flink.table.plan.nodes.FlinkConventions
+import org.apache.flink.table.plan.schema.RowSchema
+import org.apache.flink.table.plan.nodes.logical.FlinkLogicalSort
+import org.apache.flink.table.calcite.FlinkTypeFactory
+import org.apache.flink.table.api.TableException
+import org.apache.calcite.rel.RelFieldCollation.Direction
+import org.apache.flink.table.runtime.aggregate.SortUtil
+
+/**
+ * Rule to convert a LogicalSort into a DataStreamSort.
+ */
+class DataStreamSortRule
+extends ConverterRule(
+  classOf[FlinkLogicalSort],
+  FlinkConventions.LOGICAL,
+  FlinkConventions.DATASTREAM,
+  "DataStreamSortRule") {
+
+  override def matches(call: RelOptRuleCall): Boolean = {
+
+val result = super.matches(call)
+
+//need to identify time between others order fields. Time needs to be 
first sort element
+// we can safely convert the object if the match rule succeeded 
+if(result) {
--- End diff --

please add a space after `if`, `for`, `while`, etc.


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 operations (limit, top and sort) are similar in behavior as they all 
> require a sorted collection of the data on which the logic will be applied 
> (i.e., select

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-05-29 Thread radu (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16028458#comment-16028458
 ] 

radu commented on FLINK-6075:
-

TOP syntax is not supported by Calcite. Basically top is obtained by using limit

> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 operations (limit, top and sort) are similar in behavior as they all 
> require a sorted collection of the data on which the logic will be applied 
> (i.e., select a subset of the items or the entire sorted set). These 
> functions would make sense in the streaming context only in the context of a 
> window. Without defining a window the functions could never emit as the sort 
> operation would never trigger. If an SQL query will be provided without 
> limits an error will be thrown (`SELECT a FROM stream1 TOP 10` -> ERROR). 
> Although not targeted by this JIRA, in the case of working based on event 
> time order, the retraction mechanisms of windows and the lateness mechanisms 
> can be used to deal with out of order events and retraction/updates of 
> results.
> **Functionality example**
> We exemplify with the query below for all the 3 types of operators (sorting, 
> limit and top). Rowtime indicates when the HOP window will trigger – which 
> can be observed in the fact that outputs are generated only at those moments. 
> The HOP windows will trigger at every hour (fixed hour) and each event will 
> contribute/ be duplicated for 2 consecutive hour intervals. Proctime 
> indicates the processing time when a new event arrives in the system. Events 
> are of the type (a,b) with the ordering being applied on the b field.
> `SELECT a FROM stream1 HOP(proctime, INTERVAL '1' HOUR, INTERVAL '2' HOUR) 
> ORDER BY b (LIMIT 2/ TOP 2 / [ASC/DESC] `)
> ||Rowtime||   Proctime||  Stream1||   Limit 2||   Top 2|| Sort 
> [ASC]||
> | |10:00:00  |(aaa, 11)   |   | | 
>|
> | |10:05:00|(aab, 7)  |   | ||
> |10-11  |11:00:00  |  |   aab,aaa |aab,aaa  | aab,aaa 
>|
> | |11:03:00  |(aac,21)  |   | ||  
> 
> |11-12|12:00:00  |  | aab,aaa |aab,aaa  | aab,aaa,aac|
> | |12:10:00  |(abb,12)  |   | ||  
> 
> | |12:15:00  |(abb,12)  |   | ||  
> 
> |12-13  |13:00:00  |  |   abb,abb | abb,abb | 
> abb,abb,aac|
> |...|
> **Implementation option**
> Considering that the SQL operators will be associated with window boundaries, 
> the functionality will be implemented within the logic of the window as 
> follows.
> * Window assigner – selected based on the type of window used in SQL 
> (TUMBLING, SLIDING…)
> * Evictor/ Trigger – time or count evictor based on the definition of the 
> window boundaries
> * Apply – window function that sorts data and selects the output to trigger 
> (based on LIMIT/TOP parameters). All data will be sorted at once and result 
> outputted when the window is triggered
> An alternative implementation can be to use a fold window function to sort 
> the elements as they arrive, one at a time followed by a flatMap to filter 
> the number of outputs. 
> !sort.png!
> **General logic of Join**
> ```
> inputDataStream.window(new [Slide/Tumble][Time/Cou

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-05-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16028132#comment-16028132
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user rtudoran commented on the issue:

https://github.com/apache/flink/pull/4003
  
@fhueske - Fine for me! I just wanted to finish this as well :)




> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 operations (limit, top and sort) are similar in behavior as they all 
> require a sorted collection of the data on which the logic will be applied 
> (i.e., select a subset of the items or the entire sorted set). These 
> functions would make sense in the streaming context only in the context of a 
> window. Without defining a window the functions could never emit as the sort 
> operation would never trigger. If an SQL query will be provided without 
> limits an error will be thrown (`SELECT a FROM stream1 TOP 10` -> ERROR). 
> Although not targeted by this JIRA, in the case of working based on event 
> time order, the retraction mechanisms of windows and the lateness mechanisms 
> can be used to deal with out of order events and retraction/updates of 
> results.
> **Functionality example**
> We exemplify with the query below for all the 3 types of operators (sorting, 
> limit and top). Rowtime indicates when the HOP window will trigger – which 
> can be observed in the fact that outputs are generated only at those moments. 
> The HOP windows will trigger at every hour (fixed hour) and each event will 
> contribute/ be duplicated for 2 consecutive hour intervals. Proctime 
> indicates the processing time when a new event arrives in the system. Events 
> are of the type (a,b) with the ordering being applied on the b field.
> `SELECT a FROM stream1 HOP(proctime, INTERVAL '1' HOUR, INTERVAL '2' HOUR) 
> ORDER BY b (LIMIT 2/ TOP 2 / [ASC/DESC] `)
> ||Rowtime||   Proctime||  Stream1||   Limit 2||   Top 2|| Sort 
> [ASC]||
> | |10:00:00  |(aaa, 11)   |   | | 
>|
> | |10:05:00|(aab, 7)  |   | ||
> |10-11  |11:00:00  |  |   aab,aaa |aab,aaa  | aab,aaa 
>|
> | |11:03:00  |(aac,21)  |   | ||  
> 
> |11-12|12:00:00  |  | aab,aaa |aab,aaa  | aab,aaa,aac|
> | |12:10:00  |(abb,12)  |   | ||  
> 
> | |12:15:00  |(abb,12)  |   | ||  
> 
> |12-13  |13:00:00  |  |   abb,abb | abb,abb | 
> abb,abb,aac|
> |...|
> **Implementation option**
> Considering that the SQL operators will be associated with window boundaries, 
> the functionality will be implemented within the logic of the window as 
> follows.
> * Window assigner – selected based on the type of window used in SQL 
> (TUMBLING, SLIDING…)
> * Evictor/ Trigger – time or count evictor based on the definition of the 
> window boundaries
> * Apply – window function that sorts data and selects the output to trigger 
> (based on LIMIT/TOP parameters). All data will be sorted at once and result 
> outputted when the window is triggered
> An alternative implementation can be to use a fold window function to sort 
> the elements as they arrive, one at a time followed by a flatMap to filter 
> the number of

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-05-28 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16027819#comment-16027819
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user fhueske commented on the issue:

https://github.com/apache/flink/pull/4003
  
Thanks for the update @rtudoran. 
I'm traveling for another week before I return in the office. I hope I'll 
find time to have a look at #3889 in the next days. I think we should try to 
get that in first and later rebase this PR once the first part has been merged.

Thank you, Fabian


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 operations (limit, top and sort) are similar in behavior as they all 
> require a sorted collection of the data on which the logic will be applied 
> (i.e., select a subset of the items or the entire sorted set). These 
> functions would make sense in the streaming context only in the context of a 
> window. Without defining a window the functions could never emit as the sort 
> operation would never trigger. If an SQL query will be provided without 
> limits an error will be thrown (`SELECT a FROM stream1 TOP 10` -> ERROR). 
> Although not targeted by this JIRA, in the case of working based on event 
> time order, the retraction mechanisms of windows and the lateness mechanisms 
> can be used to deal with out of order events and retraction/updates of 
> results.
> **Functionality example**
> We exemplify with the query below for all the 3 types of operators (sorting, 
> limit and top). Rowtime indicates when the HOP window will trigger – which 
> can be observed in the fact that outputs are generated only at those moments. 
> The HOP windows will trigger at every hour (fixed hour) and each event will 
> contribute/ be duplicated for 2 consecutive hour intervals. Proctime 
> indicates the processing time when a new event arrives in the system. Events 
> are of the type (a,b) with the ordering being applied on the b field.
> `SELECT a FROM stream1 HOP(proctime, INTERVAL '1' HOUR, INTERVAL '2' HOUR) 
> ORDER BY b (LIMIT 2/ TOP 2 / [ASC/DESC] `)
> ||Rowtime||   Proctime||  Stream1||   Limit 2||   Top 2|| Sort 
> [ASC]||
> | |10:00:00  |(aaa, 11)   |   | | 
>|
> | |10:05:00|(aab, 7)  |   | ||
> |10-11  |11:00:00  |  |   aab,aaa |aab,aaa  | aab,aaa 
>|
> | |11:03:00  |(aac,21)  |   | ||  
> 
> |11-12|12:00:00  |  | aab,aaa |aab,aaa  | aab,aaa,aac|
> | |12:10:00  |(abb,12)  |   | ||  
> 
> | |12:15:00  |(abb,12)  |   | ||  
> 
> |12-13  |13:00:00  |  |   abb,abb | abb,abb | 
> abb,abb,aac|
> |...|
> **Implementation option**
> Considering that the SQL operators will be associated with window boundaries, 
> the functionality will be implemented within the logic of the window as 
> follows.
> * Window assigner – selected based on the type of window used in SQL 
> (TUMBLING, SLIDING…)
> * Evictor/ Trigger – time or count evictor based on the definition of the 
> window boundaries
> * Apply – window function that sorts data and selects the output to trigger 
> (based on LIMIT/TOP parameters). All data will be sorted at onc

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-05-28 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16027801#comment-16027801
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user rtudoran commented on the issue:

https://github.com/apache/flink/pull/4003
  
@shijinkui @hongyuhong @stefanobortoli 
I add you to the issue to track the progress


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 operations (limit, top and sort) are similar in behavior as they all 
> require a sorted collection of the data on which the logic will be applied 
> (i.e., select a subset of the items or the entire sorted set). These 
> functions would make sense in the streaming context only in the context of a 
> window. Without defining a window the functions could never emit as the sort 
> operation would never trigger. If an SQL query will be provided without 
> limits an error will be thrown (`SELECT a FROM stream1 TOP 10` -> ERROR). 
> Although not targeted by this JIRA, in the case of working based on event 
> time order, the retraction mechanisms of windows and the lateness mechanisms 
> can be used to deal with out of order events and retraction/updates of 
> results.
> **Functionality example**
> We exemplify with the query below for all the 3 types of operators (sorting, 
> limit and top). Rowtime indicates when the HOP window will trigger – which 
> can be observed in the fact that outputs are generated only at those moments. 
> The HOP windows will trigger at every hour (fixed hour) and each event will 
> contribute/ be duplicated for 2 consecutive hour intervals. Proctime 
> indicates the processing time when a new event arrives in the system. Events 
> are of the type (a,b) with the ordering being applied on the b field.
> `SELECT a FROM stream1 HOP(proctime, INTERVAL '1' HOUR, INTERVAL '2' HOUR) 
> ORDER BY b (LIMIT 2/ TOP 2 / [ASC/DESC] `)
> ||Rowtime||   Proctime||  Stream1||   Limit 2||   Top 2|| Sort 
> [ASC]||
> | |10:00:00  |(aaa, 11)   |   | | 
>|
> | |10:05:00|(aab, 7)  |   | ||
> |10-11  |11:00:00  |  |   aab,aaa |aab,aaa  | aab,aaa 
>|
> | |11:03:00  |(aac,21)  |   | ||  
> 
> |11-12|12:00:00  |  | aab,aaa |aab,aaa  | aab,aaa,aac|
> | |12:10:00  |(abb,12)  |   | ||  
> 
> | |12:15:00  |(abb,12)  |   | ||  
> 
> |12-13  |13:00:00  |  |   abb,abb | abb,abb | 
> abb,abb,aac|
> |...|
> **Implementation option**
> Considering that the SQL operators will be associated with window boundaries, 
> the functionality will be implemented within the logic of the window as 
> follows.
> * Window assigner – selected based on the type of window used in SQL 
> (TUMBLING, SLIDING…)
> * Evictor/ Trigger – time or count evictor based on the definition of the 
> window boundaries
> * Apply – window function that sorts data and selects the output to trigger 
> (based on LIMIT/TOP parameters). All data will be sorted at once and result 
> outputted when the window is triggered
> An alternative implementation can be to use a fold window function to sort 
> the elements as they arrive, one at a time followed by a flatMap to filte

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-05-28 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16027800#comment-16027800
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user rtudoran commented on the issue:

https://github.com/apache/flink/pull/4003
  
@fhueske 
I have implemented meanwhile also the support for offset and fetch for both 
rowtime and proctime. This PR includes all the modifications and reviews you 
made for #3889.
You can either go ahead first with that one and chekc if the basis is ok 
and merge that - and then we move to this one, or we can consider this 
directly. This PR will close the JIRA 


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 operations (limit, top and sort) are similar in behavior as they all 
> require a sorted collection of the data on which the logic will be applied 
> (i.e., select a subset of the items or the entire sorted set). These 
> functions would make sense in the streaming context only in the context of a 
> window. Without defining a window the functions could never emit as the sort 
> operation would never trigger. If an SQL query will be provided without 
> limits an error will be thrown (`SELECT a FROM stream1 TOP 10` -> ERROR). 
> Although not targeted by this JIRA, in the case of working based on event 
> time order, the retraction mechanisms of windows and the lateness mechanisms 
> can be used to deal with out of order events and retraction/updates of 
> results.
> **Functionality example**
> We exemplify with the query below for all the 3 types of operators (sorting, 
> limit and top). Rowtime indicates when the HOP window will trigger – which 
> can be observed in the fact that outputs are generated only at those moments. 
> The HOP windows will trigger at every hour (fixed hour) and each event will 
> contribute/ be duplicated for 2 consecutive hour intervals. Proctime 
> indicates the processing time when a new event arrives in the system. Events 
> are of the type (a,b) with the ordering being applied on the b field.
> `SELECT a FROM stream1 HOP(proctime, INTERVAL '1' HOUR, INTERVAL '2' HOUR) 
> ORDER BY b (LIMIT 2/ TOP 2 / [ASC/DESC] `)
> ||Rowtime||   Proctime||  Stream1||   Limit 2||   Top 2|| Sort 
> [ASC]||
> | |10:00:00  |(aaa, 11)   |   | | 
>|
> | |10:05:00|(aab, 7)  |   | ||
> |10-11  |11:00:00  |  |   aab,aaa |aab,aaa  | aab,aaa 
>|
> | |11:03:00  |(aac,21)  |   | ||  
> 
> |11-12|12:00:00  |  | aab,aaa |aab,aaa  | aab,aaa,aac|
> | |12:10:00  |(abb,12)  |   | ||  
> 
> | |12:15:00  |(abb,12)  |   | ||  
> 
> |12-13  |13:00:00  |  |   abb,abb | abb,abb | 
> abb,abb,aac|
> |...|
> **Implementation option**
> Considering that the SQL operators will be associated with window boundaries, 
> the functionality will be implemented within the logic of the window as 
> follows.
> * Window assigner – selected based on the type of window used in SQL 
> (TUMBLING, SLIDING…)
> * Evictor/ Trigger – time or count evictor based on the definition of the 
> window boundaries
> * Apply – window function that sorts data and selects the output

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-05-28 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16027799#comment-16027799
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

GitHub user rtudoran opened a pull request:

https://github.com/apache/flink/pull/4003

[FLINK-6075] - Support Limit/Top(Sort) for Stream SQL

Thanks for contributing to Apache Flink. Before you open your pull request, 
please take the following check list into consideration.
If your changes take all of the items into account, feel free to open your 
pull request. For more information and/or questions please refer to the [How To 
Contribute guide](http://flink.apache.org/how-to-contribute.html).
In addition to going through the list, please provide a meaningful 
description of your changes.

- [x ] General
  - The pull request references the related JIRA issue ("[FLINK-XXX] Jira 
title text")
  - The pull request addresses only one issue
  - Each commit in the PR has a meaningful commit message (including the 
JIRA id)

- [x ] Documentation
  - Documentation has been added for new functionality
  - Old documentation affected by the pull request has been updated
  - JavaDoc for public methods has been added

- [x ] Tests & Build
  - Functionality added by the pull request is covered by tests
  - `mvn clean verify` has been executed successfully locally or a Travis 
build has passed


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/huawei-flink/flink FLINK-6075-OFRe3

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/4003.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4003


commit 52acb914b8fbdb2f36d6b2f732cc161b97fa6f2c
Author: didi 
Date:   2017-05-28T12:13:41Z

Add full support for Order By with Fetch and Offset




> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 operations (limit, top and sort) are similar in behavior as they all 
> require a sorted collection of the data on which the logic will be applied 
> (i.e., select a subset of the items or the entire sorted set). These 
> functions would make sense in the streaming context only in the context of a 
> window. Without defining a window the functions could never emit as the sort 
> operation would never trigger. If an SQL query will be provided without 
> limits an error will be thrown (`SELECT a FROM stream1 TOP 10` -> ERROR). 
> Although not targeted by this JIRA, in the case of working based on event 
> time order, the retraction mechanisms of windows and the lateness mechanisms 
> can be used to deal with out of order events and retraction/updates of 
> results.
> **Functionality example**
> We exemplify with the query below for all the 3 types of operators (sorting, 
> limit and top). Rowtime indicates when the HOP window will trigger – which 
> can be observed in the fact that outputs are generated only at those moments. 
> The HOP windows will trigger at every hour (fixed hour) and each event will 
> contribute/ be duplicated for 2 consecutive hour intervals. Proctime 
> indicates the processing time when a new event arrives in the system. Events 
> are of the type (a,b) with the ordering being applied on the

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-05-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16024677#comment-16024677
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user rtudoran commented on the issue:

https://github.com/apache/flink/pull/3889
  
@fhueske @stefanobortoli @shijinkui @hongyuhong 
I have updated the PR. In principle all remarks were addressed. This should 
be ready to merge (i hope)


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 operations (limit, top and sort) are similar in behavior as they all 
> require a sorted collection of the data on which the logic will be applied 
> (i.e., select a subset of the items or the entire sorted set). These 
> functions would make sense in the streaming context only in the context of a 
> window. Without defining a window the functions could never emit as the sort 
> operation would never trigger. If an SQL query will be provided without 
> limits an error will be thrown (`SELECT a FROM stream1 TOP 10` -> ERROR). 
> Although not targeted by this JIRA, in the case of working based on event 
> time order, the retraction mechanisms of windows and the lateness mechanisms 
> can be used to deal with out of order events and retraction/updates of 
> results.
> **Functionality example**
> We exemplify with the query below for all the 3 types of operators (sorting, 
> limit and top). Rowtime indicates when the HOP window will trigger – which 
> can be observed in the fact that outputs are generated only at those moments. 
> The HOP windows will trigger at every hour (fixed hour) and each event will 
> contribute/ be duplicated for 2 consecutive hour intervals. Proctime 
> indicates the processing time when a new event arrives in the system. Events 
> are of the type (a,b) with the ordering being applied on the b field.
> `SELECT a FROM stream1 HOP(proctime, INTERVAL '1' HOUR, INTERVAL '2' HOUR) 
> ORDER BY b (LIMIT 2/ TOP 2 / [ASC/DESC] `)
> ||Rowtime||   Proctime||  Stream1||   Limit 2||   Top 2|| Sort 
> [ASC]||
> | |10:00:00  |(aaa, 11)   |   | | 
>|
> | |10:05:00|(aab, 7)  |   | ||
> |10-11  |11:00:00  |  |   aab,aaa |aab,aaa  | aab,aaa 
>|
> | |11:03:00  |(aac,21)  |   | ||  
> 
> |11-12|12:00:00  |  | aab,aaa |aab,aaa  | aab,aaa,aac|
> | |12:10:00  |(abb,12)  |   | ||  
> 
> | |12:15:00  |(abb,12)  |   | ||  
> 
> |12-13  |13:00:00  |  |   abb,abb | abb,abb | 
> abb,abb,aac|
> |...|
> **Implementation option**
> Considering that the SQL operators will be associated with window boundaries, 
> the functionality will be implemented within the logic of the window as 
> follows.
> * Window assigner – selected based on the type of window used in SQL 
> (TUMBLING, SLIDING…)
> * Evictor/ Trigger – time or count evictor based on the definition of the 
> window boundaries
> * Apply – window function that sorts data and selects the output to trigger 
> (based on LIMIT/TOP parameters). All data will be sorted at once and result 
> outputted when the window is triggered
> An alternative implementation can be to use a fold window function to sort 
> the e

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-05-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16024668#comment-16024668
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user rtudoran commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r118475890
  
--- Diff: 
flink-libraries/flink-table/src/test/scala/org/apache/flink/table/api/scala/stream/sql/SortTest.scala
 ---
@@ -0,0 +1,89 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.table.api.scala.stream.sql
+
+import org.apache.flink.api.scala._
+import org.apache.flink.table.api.TableException
+import org.apache.flink.table.api.scala._
+import org.apache.flink.table.utils.TableTestUtil._
+import org.apache.flink.table.utils.{StreamTableTestUtil, TableTestBase}
+import org.junit.Test
+
+class SortTest extends TableTestBase {
+  private val streamUtil: StreamTableTestUtil = streamTestUtil()
+  streamUtil.addTable[(Int, String, Long)]("MyTable", 'a, 'b, 'c,
+  'proctime.proctime, 'rowtime.rowtime)
+  
+  @Test
+  def testSortProcessingTime() = {
+
+val sqlQuery = "SELECT a FROM MyTable ORDER BY proctime, c"
+  
+val expected =
+  unaryNode(
+"DataStreamCalc",
+unaryNode("DataStreamSort",
+  streamTableNode(0),
+  term("orderBy", "proctime ASC, c ASC], offset=[null], 
fetch=[unlimited")),
--- End diff --

yeah - i was trying to make it match with respect to the number of 
parenthesis with the incoming value. The problem was that i was messing up with 
the string and no matter how many i pair i was adding it was replacing and 
adding some additional ones at the begin and end. The actual problem was that 
order by / offset and fetch needed to be standalone terms


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 operations (limit, top and sort) are similar in behavior as they all 
> require a sorted collection of the data on which the logic will be applied 
> (i.e., select a subset of the items or the entire sorted set). These 
> functions would make sense in the streaming context only in the context of a 
> window. Without defining a window the functions could never emit as the sort 
> operation would never trigger. If an SQL query will be provided without 
> limits an error will be thrown (`SELECT a FROM stream1 TOP 10` -> ERROR). 
> Although not targeted by this JIRA, in the case of working based on event 
> time order, the

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-05-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16024597#comment-16024597
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user rtudoran commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r118468562
  
--- Diff: 
flink-libraries/flink-table/src/test/scala/org/apache/flink/table/api/scala/stream/sql/SortITCase.scala
 ---
@@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.api.scala.stream.sql
+
+import org.apache.flink.api.scala._
+import 
org.apache.flink.table.api.scala.stream.sql.SortITCase.{EventTimeSourceFunction,StringRowSelectorSink}
+import org.apache.flink.streaming.api.functions.source.SourceFunction
+import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment
+import org.apache.flink.table.api.TableEnvironment
+import org.apache.flink.table.api.scala._
+import org.apache.flink.table.api.scala.stream.utils.{StreamITCase, 
StreamTestData, StreamingWithStateTestBase}
+import org.apache.flink.api.common.typeinfo.BasicTypeInfo
+import org.apache.flink.api.java.typeutils.RowTypeInfo
+import org.apache.flink.api.common.typeinfo.TypeInformation
+import org.apache.flink.types.Row
+import org.junit.Assert._
+import org.junit._
+import org.apache.flink.streaming.api.TimeCharacteristic
+import 
org.apache.flink.streaming.api.functions.source.SourceFunction.SourceContext
+import org.apache.flink.streaming.api.watermark.Watermark
+import scala.collection.mutable
+import org.apache.flink.streaming.api.functions.sink.RichSinkFunction
+
+class SortITCase extends StreamingWithStateTestBase {
+
+  @Test
+  def testEventTimeOrderBy(): Unit = {
+val data = Seq(
+  Left((1500L, (1L, 15, "Hello"))),
+  Left((1600L, (1L, 16, "Hello"))),
+  Left((1000L, (1L, 1, "Hello"))),
+  Left((2000L, (2L, 2, "Hello"))),
+  Right(1000L),
+  Left((2000L, (2L, 2, "Hello"))),
+  Left((2000L, (2L, 3, "Hello"))),
+  Left((3000L, (3L, 3, "Hello"))),
+  Right(2000L),
+  Left((4000L, (4L, 4, "Hello"))),
+  Right(3000L),
+  Left((5000L, (5L, 5, "Hello"))),
+  Right(5000L),
+  Left((6000L, (6L, 65, "Hello"))),
+  Left((6000L, (6L, 6, "Hello"))),
+  Right(7000L),
+  Left((9000L, (6L, 9, "Hello"))),
+  Left((8500L, (6L, 18, "Hello"))),
+  Left((9000L, (6L, 7, "Hello"))),
+  Right(1L),
+  Left((1L, (7L, 7, "Hello World"))),
+  Left((11000L, (7L, 77, "Hello World"))),
+  Left((11000L, (7L, 17, "Hello World"))),
+  Right(12000L),
+  Left((14000L, (7L, 18, "Hello World"))),
+  Right(14000L),
+  Left((15000L, (8L, 8, "Hello World"))),
+  Right(17000L),
+  Left((2L, (20L, 20, "Hello World"))), 
+  Right(19000L))
+
+val env = StreamExecutionEnvironment.getExecutionEnvironment
+env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
+val tEnv = TableEnvironment.getTableEnvironment(env)
+StreamITCase.clear
+
+val t1 = env.addSource(new EventTimeSourceFunction[(Long, Int, 
String)](data))
+  .toTable(tEnv, 'a, 'b, 'c, 'rowtime.rowtime)
+  
+tEnv.registerTable("T1", t1)
+
+val  sqlQuery = "SELECT b FROM T1 " +
+  "ORDER BY rowtime, b ASC ";
+  
+  
+val result = tEnv.sql(sqlQuery).toDataStream[Row]
+result.addSink(new StringRowSelectorSink(0)).setParallelism(1)
+env.execute()
+
+val expected = mutable.MutableList(
+  "1", "15", "16",
+  "2", "2", "3",
+  "3",
+  "4",
+  "5",
+  "6", "65",
+  "18", "7", "9",
+  "7", "17", "77", 
+  "18",
+  "8",
+  "20")
+assertEquals(expected,

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-05-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16024575#comment-16024575
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user rtudoran commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r118466310
  
--- Diff: 
flink-libraries/flink-table/src/test/scala/org/apache/flink/table/api/scala/stream/sql/SortITCase.scala
 ---
@@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.api.scala.stream.sql
+
+import org.apache.flink.api.scala._
+import 
org.apache.flink.table.api.scala.stream.sql.SortITCase.{EventTimeSourceFunction,StringRowSelectorSink}
+import org.apache.flink.streaming.api.functions.source.SourceFunction
+import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment
+import org.apache.flink.table.api.TableEnvironment
+import org.apache.flink.table.api.scala._
+import org.apache.flink.table.api.scala.stream.utils.{StreamITCase, 
StreamTestData, StreamingWithStateTestBase}
+import org.apache.flink.api.common.typeinfo.BasicTypeInfo
+import org.apache.flink.api.java.typeutils.RowTypeInfo
+import org.apache.flink.api.common.typeinfo.TypeInformation
+import org.apache.flink.types.Row
+import org.junit.Assert._
+import org.junit._
+import org.apache.flink.streaming.api.TimeCharacteristic
+import 
org.apache.flink.streaming.api.functions.source.SourceFunction.SourceContext
+import org.apache.flink.streaming.api.watermark.Watermark
+import scala.collection.mutable
+import org.apache.flink.streaming.api.functions.sink.RichSinkFunction
+
+class SortITCase extends StreamingWithStateTestBase {
+
+  @Test
+  def testEventTimeOrderBy(): Unit = {
+val data = Seq(
--- End diff --

I modified the data set to include up to 5 elements on the same timestamp - 
including having duplicate values, positive and negative, as well as having 
timestamps with elements that are unordered


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 operations (limit, top and sort) are similar in behavior as they all 
> require a sorted collection of the data on which the logic will be applied 
> (i.e., select a subset of the items or the entire sorted set). These 
> functions would make sense in the streaming context only in the context of a 
> window. Without defining a window the functions could never emit as the sort 
> operation would never trigger. If an SQL que

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-05-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16024565#comment-16024565
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user rtudoran commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r118465274
  
--- Diff: 
flink-libraries/flink-table/src/test/scala/org/apache/flink/table/runtime/aggregate/ProcTimeSortProcessFunctionTest.scala
 ---
@@ -0,0 +1,165 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.table.runtime.aggregate
+
+import java.util.Comparator
+import java.util.concurrent.ConcurrentLinkedQueue
+import java.lang.{Integer => JInt, Long => JLong}
+
+import org.apache.flink.api.common.typeinfo.BasicTypeInfo._
+import org.apache.flink.api.common.typeinfo.{BasicTypeInfo, 
TypeInformation}
+import org.apache.flink.api.java.functions.KeySelector
+import org.apache.flink.api.java.typeutils.RowTypeInfo
+import org.apache.flink.streaming.api.operators.KeyedProcessOperator
+import org.apache.flink.streaming.api.watermark.Watermark
+import org.apache.flink.streaming.runtime.streamrecord.StreamRecord
+import 
org.apache.flink.streaming.util.{KeyedOneInputStreamOperatorTestHarness, 
TestHarnessUtil}
+import org.apache.flink.types.Row
+import org.junit.Test
+import org.apache.flink.table.runtime.aggregate.ProcTimeSortProcessFunction
+import 
org.apache.flink.table.runtime.aggregate.ProcTimeSortProcessFunctionTest._
+import org.apache.flink.api.java.typeutils.runtime.RowComparator
+import org.apache.flink.api.common.typeutils.TypeSerializer
+import org.apache.flink.api.common.typeutils.TypeComparator
+import org.apache.flink.table.runtime.types.{CRow, CRowTypeInfo}
+
+class ProcTimeSortProcessFunctionTest{
+
+  
+  @Test
+  def testSortProcTimeHarnessPartitioned(): Unit = {
--- End diff --

ok


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 operations (limit, top and sort) are similar in behavior as they all 
> require a sorted collection of the data on which the logic will be applied 
> (i.e., select a subset of the items or the entire sorted set). These 
> functions would make sense in the streaming context only in the context of a 
> window. Without defining a window the functions could never emit as the sort 
> operation would never trigger. If an SQL query will be provided without 
> limits an error will be thrown (`SELECT a FROM stream1 TOP 10` -> ERROR).

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-05-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16024497#comment-16024497
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user rtudoran commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r118459358
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/runtime/aggregate/RowTimeSortProcessFunction.scala
 ---
@@ -0,0 +1,169 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.table.runtime.aggregate
+
+import org.apache.flink.api.common.state.{ ListState, ListStateDescriptor }
+import org.apache.flink.api.common.typeinfo.{BasicTypeInfo, 
TypeInformation}
+import org.apache.flink.api.java.typeutils.{RowTypeInfo, ListTypeInfo}
+import org.apache.flink.runtime.state.{ FunctionInitializationContext, 
FunctionSnapshotContext }
+import org.apache.flink.streaming.api.functions.ProcessFunction
+import org.apache.flink.types.Row
+import org.apache.flink.util.{ Collector, Preconditions }
+import org.apache.flink.api.common.state.ValueState
+import org.apache.flink.api.common.state.ValueStateDescriptor
+import scala.util.control.Breaks._
+import org.apache.flink.api.java.tuple.{ Tuple2 => JTuple2 }
+import org.apache.flink.api.common.state.MapState
+import org.apache.flink.api.common.state.MapStateDescriptor
+import org.apache.flink.configuration.Configuration
+import java.util.Comparator
+import java.util.ArrayList
+import java.util.Collections
+import org.apache.flink.api.common.typeutils.TypeComparator
+import java.util.{List => JList, ArrayList => JArrayList}
+import org.apache.flink.table.runtime.types.{CRow, CRowTypeInfo}
+
+/**
+ * Process Function used for the aggregate in bounded rowtime sort without 
offset/fetch
+ * [[org.apache.flink.streaming.api.datastream.DataStream]]
+ *
+ * @param fieldCount Is used to indicate fields in the current element to 
forward
+ * @param inputType It is used to mark the type of the incoming data
+ * @param rowComparator the [[java.util.Comparator]] is used for this sort 
aggregation
+ */
+class RowTimeSortProcessFunction(
+  private val fieldCount: Int,
+  private val inputRowType: CRowTypeInfo,
+  private val rowComparator: CollectionRowComparator)
+extends ProcessFunction[CRow, CRow] {
+
+  Preconditions.checkNotNull(rowComparator)
+
+  private val sortArray: ArrayList[Row] = new ArrayList[Row]
+  
+  // the state which keeps all the events that are not expired.
+  // Each timestamp will contain an associated list with the events 
+  // received at that timestamp
+  private var dataState: MapState[Long, JList[Row]] = _
+
+// the state which keeps the last triggering timestamp to filter late 
events
+  private var lastTriggeringTsState: ValueState[Long] = _
+  
+  private var outputC: CRow = _
+  
+  
+  override def open(config: Configuration) {
+ 
+val keyTypeInformation: TypeInformation[Long] =
+  BasicTypeInfo.LONG_TYPE_INFO.asInstanceOf[TypeInformation[Long]]
+val valueTypeInformation: TypeInformation[JList[Row]] = new 
ListTypeInfo[Row](
+inputRowType.asInstanceOf[CRowTypeInfo].rowType)
+
+val mapStateDescriptor: MapStateDescriptor[Long, JList[Row]] =
+  new MapStateDescriptor[Long, JList[Row]](
+"dataState",
+keyTypeInformation,
+valueTypeInformation)
+
+dataState = getRuntimeContext.getMapState(mapStateDescriptor)
+
+val lastTriggeringTsDescriptor: ValueStateDescriptor[Long] =
+  new ValueStateDescriptor[Long]("lastTriggeringTsState", 
classOf[Long])
+lastTriggeringTsState = 
getRuntimeContext.getState(lastTriggeringTsDescriptor)
+  }
+
+  
+  override def processElement(
+inputC: CRow,
+ctx: ProcessFunction[CRo

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-05-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16024239#comment-16024239
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user rtudoran commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r118423723
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/runtime/aggregate/SortUtil.scala
 ---
@@ -0,0 +1,345 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.table.runtime.aggregate
+
+import org.apache.flink.table.calcite.FlinkTypeFactory
+import org.apache.flink.types.Row
+import org.apache.calcite.rel.`type`._
+import org.apache.calcite.rel.RelCollation
+import org.apache.flink.streaming.api.functions.ProcessFunction
+import org.apache.flink.table.functions.AggregateFunction
+import org.apache.calcite.sql.`type`.SqlTypeName
+import org.apache.flink.table.api.TableException
+import org.apache.calcite.sql.`type`.SqlTypeName
+import org.apache.calcite.sql.`type`.SqlTypeName._
+import java.util.{ List => JList, ArrayList }
+import org.apache.flink.api.common.typeinfo.{ SqlTimeTypeInfo, 
TypeInformation }
+import org.apache.flink.api.java.typeutils.RowTypeInfo
+import java.sql.Timestamp
+import org.apache.calcite.rel.RelFieldCollation
+import org.apache.calcite.rel.RelFieldCollation.Direction
+import java.util.Comparator
+import org.apache.flink.api.common.typeutils.TypeComparator
+import org.apache.flink.api.common.typeinfo.BasicTypeInfo._
+import 
java.lang.{Byte=>JByte,Integer=>JInt,Long=>JLong,Double=>JDouble,Short=>JShort,String=>JString,Float=>JFloat}
+import java.math.{BigDecimal=>JBigDecimal}
+import org.apache.flink.api.common.functions.MapFunction
+import org.apache.flink.api.common.operators.Order
+import org.apache.calcite.rex.{RexLiteral, RexNode}
+import org.apache.flink.api.common.ExecutionConfig
+import org.apache.flink.api.common.typeinfo.AtomicType
+import org.apache.flink.api.java.typeutils.runtime.RowComparator
+import org.apache.flink.api.common.typeutils.TypeSerializer
+import org.apache.flink.table.runtime.types.{CRow, CRowTypeInfo}
+
+import scala.collection.JavaConverters._
+
+/**
+ * Class represents a collection of helper methods to build the sort logic.
+ * It encapsulates as well the implementation for ordering and generic 
interfaces
+ */
+
+object SortUtil {
+
+  
+  /**
+   * Function creates 
[org.apache.flink.streaming.api.functions.ProcessFunction] for sorting 
+   * elements based on rowtime and potentially other fields
+   * @param collationSort The Sort collation list
+   * @param inputType input row type
+   * @param execCfg table environment execution configuration
+   * @return org.apache.flink.streaming.api.functions.ProcessFunction
+   */
+  private[flink] def createRowTimeSortFunction(
+collationSort: RelCollation,
+inputType: RelDataType,
+inputTypeInfo: TypeInformation[Row],
+execCfg: ExecutionConfig): ProcessFunction[CRow, CRow] = {
+
+val keySortFields = getSortFieldIndexList(collationSort)
+val keySortDirections = getSortFieldDirectionList(collationSort)
+
+   //drop time from comparison as we sort on time in the states and 
result emission
+val keyIndexesNoTime = keySortFields.slice(1, keySortFields.size)
+val keyDirectionsNoTime = keySortDirections.slice(1, 
keySortDirections.size)
+val booleanOrderings = getSortFieldDirectionBooleanList(collationSort)
+val booleanDirectionsNoTime = booleanOrderings.slice(1, 
booleanOrderings.size)
+
+val fieldComps = createFieldComparators(inputType, 
+keyIndexesNoTime, keyDirectionsNoTime, execCfg)
+val fieldCompsRefs = 
fieldComps.asInstanceOf[Array[TypeComparator[AnyRef]]]
+
+val rowComp = createRowComparator(i

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-05-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16022831#comment-16022831
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user rtudoran commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r118241119
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/runtime/aggregate/SortUtil.scala
 ---
@@ -0,0 +1,345 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.table.runtime.aggregate
+
+import org.apache.flink.table.calcite.FlinkTypeFactory
+import org.apache.flink.types.Row
+import org.apache.calcite.rel.`type`._
+import org.apache.calcite.rel.RelCollation
+import org.apache.flink.streaming.api.functions.ProcessFunction
+import org.apache.flink.table.functions.AggregateFunction
+import org.apache.calcite.sql.`type`.SqlTypeName
+import org.apache.flink.table.api.TableException
+import org.apache.calcite.sql.`type`.SqlTypeName
+import org.apache.calcite.sql.`type`.SqlTypeName._
+import java.util.{ List => JList, ArrayList }
+import org.apache.flink.api.common.typeinfo.{ SqlTimeTypeInfo, 
TypeInformation }
+import org.apache.flink.api.java.typeutils.RowTypeInfo
+import java.sql.Timestamp
+import org.apache.calcite.rel.RelFieldCollation
+import org.apache.calcite.rel.RelFieldCollation.Direction
+import java.util.Comparator
+import org.apache.flink.api.common.typeutils.TypeComparator
+import org.apache.flink.api.common.typeinfo.BasicTypeInfo._
+import 
java.lang.{Byte=>JByte,Integer=>JInt,Long=>JLong,Double=>JDouble,Short=>JShort,String=>JString,Float=>JFloat}
+import java.math.{BigDecimal=>JBigDecimal}
+import org.apache.flink.api.common.functions.MapFunction
+import org.apache.flink.api.common.operators.Order
+import org.apache.calcite.rex.{RexLiteral, RexNode}
+import org.apache.flink.api.common.ExecutionConfig
+import org.apache.flink.api.common.typeinfo.AtomicType
+import org.apache.flink.api.java.typeutils.runtime.RowComparator
+import org.apache.flink.api.common.typeutils.TypeSerializer
+import org.apache.flink.table.runtime.types.{CRow, CRowTypeInfo}
+
+import scala.collection.JavaConverters._
+
+/**
+ * Class represents a collection of helper methods to build the sort logic.
+ * It encapsulates as well the implementation for ordering and generic 
interfaces
+ */
+
+object SortUtil {
+
+  
+  /**
+   * Function creates 
[org.apache.flink.streaming.api.functions.ProcessFunction] for sorting 
+   * elements based on rowtime and potentially other fields
+   * @param collationSort The Sort collation list
+   * @param inputType input row type
+   * @param execCfg table environment execution configuration
+   * @return org.apache.flink.streaming.api.functions.ProcessFunction
+   */
+  private[flink] def createRowTimeSortFunction(
+collationSort: RelCollation,
+inputType: RelDataType,
+inputTypeInfo: TypeInformation[Row],
+execCfg: ExecutionConfig): ProcessFunction[CRow, CRow] = {
+
+val keySortFields = getSortFieldIndexList(collationSort)
+val keySortDirections = getSortFieldDirectionList(collationSort)
--- End diff --

I am only using it to extract the names


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query

[jira] [Commented] (FLINK-6075) Support Limit/Top(Sort) for Stream SQL

2017-05-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16022818#comment-16022818
 ] 

ASF GitHub Bot commented on FLINK-6075:
---

Github user rtudoran commented on a diff in the pull request:

https://github.com/apache/flink/pull/3889#discussion_r118239983
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/runtime/aggregate/SortUtil.scala
 ---
@@ -0,0 +1,345 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.table.runtime.aggregate
+
+import org.apache.flink.table.calcite.FlinkTypeFactory
--- End diff --

FlinkTypeFactory is used... am i missing others unused imports to remove?


> Support Limit/Top(Sort) for Stream SQL
> --
>
> Key: FLINK-6075
> URL: https://issues.apache.org/jira/browse/FLINK-6075
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>  Labels: features
> Attachments: sort.png
>
>
> These will be split in 3 separated JIRA issues. However, the design is the 
> same only the processing function differs in terms of the output. Hence, the 
> design is the same for all of them.
> Time target: Proc Time
> **SQL targeted query examples:**
> *Sort example*
> Q1)` SELECT a FROM stream1 GROUP BY HOP(proctime, INTERVAL '1' HOUR, INTERVAL 
> '3' HOUR) ORDER BY b` 
> Comment: window is defined using GROUP BY
> Comment: ASC or DESC keywords can be placed to mark the ordering type
> *Limit example*
> Q2) `SELECT a FROM stream1 WHERE rowtime BETWEEN current_timestamp - INTERVAL 
> '1' HOUR AND current_timestamp ORDER BY b LIMIT 10`
> Comment: window is defined using time ranges in the WHERE clause
> Comment: window is row triggered
> *Top example*
> Q3) `SELECT sum(a) OVER (ORDER BY proctime RANGE INTERVAL '1' HOUR PRECEDING 
> LIMIT 10) FROM stream1`  
> Comment: limit over the contents of the sliding window
> General Comments:
> -All these SQL clauses are supported only over windows (bounded collections 
> of data). 
> -Each of the 3 operators will be supported with each of the types of 
> expressing the windows. 
> **Description**
> The 3 operations (limit, top and sort) are similar in behavior as they all 
> require a sorted collection of the data on which the logic will be applied 
> (i.e., select a subset of the items or the entire sorted set). These 
> functions would make sense in the streaming context only in the context of a 
> window. Without defining a window the functions could never emit as the sort 
> operation would never trigger. If an SQL query will be provided without 
> limits an error will be thrown (`SELECT a FROM stream1 TOP 10` -> ERROR). 
> Although not targeted by this JIRA, in the case of working based on event 
> time order, the retraction mechanisms of windows and the lateness mechanisms 
> can be used to deal with out of order events and retraction/updates of 
> results.
> **Functionality example**
> We exemplify with the query below for all the 3 types of operators (sorting, 
> limit and top). Rowtime indicates when the HOP window will trigger – which 
> can be observed in the fact that outputs are generated only at those moments. 
> The HOP windows will trigger at every hour (fixed hour) and each event will 
> contribute/ be duplicated for 2 consecutive hour intervals. Proctime 
> indicates the processing time when a new event arrives in the system. Events 
> are of the type (a,b) with the ordering being applied on the b field.
> `SELECT a FROM stream1 HOP(proctime, INTERVAL '1' HOUR, INTERVAL '2' HOUR) 
> ORDER BY b (LIMIT 2/ TOP 2 / [ASC/DESC] `)
> ||Rowtime||   Proctime||  Stream1||   Limit 2||   Top 2|| Sort 
> [ASC]||
> | |10:00:00  |(aaa, 11)   |   | | 
>|
> | |10:05:00|(aab, 7)  |   | ||
> |10-11  |11:00:00  |  |   aab,aaa |aab,aaa  | aab

1 2 3 >

1 - 100 of 213 matches

Mail list logo