[
https://issues.apache.org/jira/browse/SAMZA-551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14499116#comment-14499116
]
Mohamed Mahmoud (El-Geish) commented on SAMZA-551:
--------------------------------------------------
Even though window operators are not going to be supported in early versions, I
suggest that the keywords that could be used in this context should be reserved
for future use.
> SQL grammar support for window operator
> ---------------------------------------
>
> Key: SAMZA-551
> URL: https://issues.apache.org/jira/browse/SAMZA-551
> Project: Samza
> Issue Type: Sub-task
> Components: sql
> Affects Versions: 0.9.0
> Reporter: Yi Pan (Data Infrastructure)
>
> Consider that we want to have a count of stock trades (as a infinite stream)
> happened in the last hour, but only every 11min. It is easy to write the
> first part in sqlstream as:
> {code}
> SELECT STREAM rowtime, count(*) OVER (ORDER BY rowtime RANGE INTERVAL '1'
> HOUR PROCEDING)
> FROM Trades
> {code}
> The above will create a stream of counts that happened every hour
> continuously as rows are scanned.
> Now here is the question:
> # how do we have the count every 11min instead of as the row comes in? As we
> discussed before, there are examples that we can create by doing truncating /
> grouping on the rowtime to "sample" the continuous moving counting window to
> get a count every 11min. But that has two issues:
> ** From implementation point of view, there is no efficiency improvement
> since the system still computes the count for each and every row comes in
> ** If Samza implement a more efficient tumbling window operator, there is no
> easy way to identify the section of SQL statement that can map to the more
> efficient tumbling window operator, as the sampling is done via math /
> group-by aggregation instead of window spec
> # if there is no row in Trades between 12:00pm to 2:00pm, how do we tell the
> system to still generate 0 counts for the time moments: 12:11pm, 12:22pm,
> 12:33pm, etc.? Or, those rows are delayed in the delivery in the system and
> user wants to ignore late-arrival of messages after 5min timeout to close the
> counting window? How can we support that use case w/o breaking SQL grammar?
> Both the above issues seem to require some extension to the window spec in
> SQL grammar. Julian, what do you think? Is it creating too many
> language/parser/planner problems in SQL?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)