[ 
https://issues.apache.org/jira/browse/SAMZA-551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14499116#comment-14499116
 ] 

Mohamed Mahmoud (El-Geish) commented on SAMZA-551:
--------------------------------------------------

Even though window operators are not going to be supported in early versions, I 
suggest that the keywords that could be used in this context should be reserved 
for future use. 

> SQL grammar support for window operator
> ---------------------------------------
>
>                 Key: SAMZA-551
>                 URL: https://issues.apache.org/jira/browse/SAMZA-551
>             Project: Samza
>          Issue Type: Sub-task
>          Components: sql
>    Affects Versions: 0.9.0
>            Reporter: Yi Pan (Data Infrastructure)
>
> Consider that we want to have a count of stock trades (as a infinite stream) 
> happened in the last hour, but only every 11min. It is easy to write the 
> first part in sqlstream as:
> {code}
>    SELECT STREAM rowtime, count(*) OVER (ORDER BY rowtime RANGE INTERVAL '1' 
> HOUR PROCEDING)
>        FROM Trades
> {code}
> The above will create a stream of counts that happened every hour 
> continuously as rows are scanned. 
> Now here is the question:
> # how do we have the count every 11min instead of as the row comes in? As we 
> discussed before, there are examples that we can create by doing truncating / 
> grouping on the rowtime to "sample" the continuous moving counting window to 
> get a count every 11min. But that has two issues:
> ** From implementation point of view, there is no efficiency improvement 
> since the system still computes the count for each and every row comes in
> ** If Samza implement a more efficient tumbling window operator, there is no 
> easy way to identify the section of SQL statement that can map to the more 
> efficient tumbling window operator, as the sampling is done via math / 
> group-by aggregation instead of window spec
> # if there is no row in Trades between 12:00pm to 2:00pm, how do we tell the 
> system to still generate 0 counts for the time moments: 12:11pm, 12:22pm, 
> 12:33pm, etc.? Or, those rows are delayed in the delivery in the system and 
> user wants to ignore late-arrival of messages after 5min timeout to close the 
> counting window? How can we support that use case w/o breaking SQL grammar?
> Both the above issues seem to require some extension to the window spec in 
> SQL grammar. Julian, what do you think? Is it creating too many 
> language/parser/planner problems in SQL?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to