RE: New type of window semantics

Radu Tudoran Tue, 27 Sep 2016 02:00:11 -0700

Hi,

Thanks for this points. 
I am not sure if I really understood the implications of using this option in 
the stream mode. I got the point that if we have 20 rows then we have 20 
outputs. However, I wonder what happens when a new record comes in and we have 
the query you proposed

SELECT SREAM orderId, price, AVG(price) OVER (ORDER BY orderTime ROWS 5 
PRECEDING)
  FROM Orders

Assuming we have up to moment T te following 5 orders:

ordN1, ordN2, ordN3, ordN4, ordN5

and we get ordN6 at moment T+1
..will the query provide only one result corresponding to ordN6 and thus 
average over ordN2, ordN3, ordN4, ordN5, ordN6....or because ordN2 to ordN5 are 
still in the system the query will return 5 results?

If the query answer is 1 output in this case corersponding to element ordN6 
then indeed it can do the job for this scenario. 

-----Original Message-----
From: Julian Hyde [mailto:jh...@apache.org] 
Sent: Tuesday, September 27, 2016 2:41 AM
To: dev@calcite.apache.org
Subject: Re: New type of window semantics

Have you considered the sliding window, which is already part of standard SQL?  
We propose to support it in streaming SQL also. Here is an example:

  SELECT orderId, price, AVG(price) OVER (ORDER BY orderTime ROWS 5 PRECEDING)
  FROM Orders

(This is a non-streaming query, but you can add the STREAM keyword to get a 
streaming query.)

Given orders 1 .. 20, then order 10 would show the average for orders 5 .. 10 
inclusive, order 11 would show the average for orders 6 .. 11, and so forth.

In streaming queries, windows are often used in the GROUP BY clause, but we do 
not use a GROUP BY here. The OVER clause with sliding windows does not 
aggregate rows. If 20 rows come in, then 20 rows go out. It makes sense, 
because each row cannot have its own window if multiple rows are squashed into 
one.

Julian

> On Sep 26, 2016, at 12:53 AM, Radu Tudoran <radu.tudo...@huawei.com> wrote:
> 
> Hi,
> 
> First of all let me introduce myself - My name is Radu Tudoran and I am 
> working in the field of Big Data processing with a high focus on streaming 
> and more recently in the area of SQL. 
> 
> I wanted to raise a question/proposal for discussion in the community:
> 
> Based on our requirements I realized that I would need to create a window 
> (e.g. hop window) that would move on every incoming element based. The syntax 
> that I have in mind for it is
> 
> HOP(column_name, # EVENT , INTERVAL # )   (or should it rather be # ELEMENT 
> instead of EVENT?)
> 
> I wanted to check with you what do you think about such a grammar to go 
> directly in Calcite? I think it is relevant for streaming scenarios where you 
> do not necessary have events coming at regular time interval but you would 
> still like to react on every event. 
> As an example you can consider a stock market application where you would 
> always compute for every new offer the average over the last hour.
> 
> Best regards,

RE: New type of window semantics

Reply via email to