[ https://issues.apache.org/jira/browse/CALCITE-1237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15288687#comment-15288687 ]
Fabian Hueske commented on CALCITE-1237: ---------------------------------------- Hi [~julianhyde], thanks for the proposal! I like it. Answering your questions first: Q1: In my opinion "at most" is more intuitive than "less than". Q2: I'm in favor of allowing order-dependent aggregates Q3: I think supporting {{session}} in the {{GROUP BY}} clause is a good starting point. I have a few questions / suggestions as well: - is it possible to move the additional grouping columns out of the {{session}} function to the {{GROUP BY}} clause? This would be more similar to the definition of {{TUMBLE}} and {{HOP}} in http://calcite.apache.org/docs/stream.html? - Can we add functions similar to {{TUMBLE_START}} and {{TUMBLE_END}}? It would be consistent with the other window functions and a shortcut compared to accessing the corresponding values with {{first_value}} and {{last_value}} (given that we allow order-dependent aggregates). - You said "Unlike the tumble function, each row belongs to precisely one window.". Tumbling windows are non-overlapping, so {{tumble}} should be {{hop}}, right? > Session windows for streaming SQL > --------------------------------- > > Key: CALCITE-1237 > URL: https://issues.apache.org/jira/browse/CALCITE-1237 > Project: Calcite > Issue Type: Bug > Components: stream > Reporter: Julian Hyde > Assignee: Julian Hyde > > A session window is a collection of rows whose key values, when sorted, have > a gap of at most N. > Q1. Should "at most" be "less than"? > The key type can be any type that has a minus operator, that is, numeric and > date-time. > I propose the following syntax: {{session(key [, ...]*, interval)}}. For > example: > {code} > select stream session(rowtime, productId, interval '5' second), > productId, count(*) as c > from Orders > group by session(rowtime, productId, interval '5' second), > productId > {code} > to find bursts of orders for the same product where consecutive orders are no > more than 5 seconds apart. > The first key column {{rowtime}} defines the session and must be of > numeric/date-time type, and must have monotonicity or similar in order for > the query to make progress; the other key columns (in this case > {{productId}}) can be of any type; the last column is the interval, and must > be constant. > The {{session}} function returns the key value at the start of the window. > Unlike the {{tumble}} function, each row belongs to precisely one window. But > {{session}} is not a true function, because its value depends on the records > flowing in the stream. > Q2. If {{session}} is used, should we allow order-dependent aggregate > functions such as {{first_value}}? > Q3. Should we allow {{session}} as a windowed aggregate function? -- This message was sent by Atlassian JIRA (v6.3.4#6332)