> On Jan 29, 2015, at 3:04 PM, Yi Pan <nickpa...@gmail.com> wrote:
> 
> Hi, Julian,
> 
> Thanks for sharing your idea! It is interesting and well organized. Let me
> try to summarize the main difference between yours and the current proposal
> are:
> - removing the '[]' used to define the window specification, using OVER on
> the stream/table instead
> - join/select output can be either stream or table, since both become
> legitimate data sources in query.

Yes.

> There are still two points that I want to comment on:
> a. in the current proposal, the window operator can specify a step size for
> window advancement. With your examples, it seems that all windows will be
> incremented by step size 1. Hence, the output of the query will be
> "continuous" stream, which reports a moving average, instead of an fixed
> size window average across all rows. Is it easy to extend your model for
> that use case?

I presume we are talking about windowed aggregation -- one row out for each row 
in.

My previous example was a "sliding" window over a time interval.

You can also have a sliding window over a number of rows:

  select symbol, sum(quantity) over (order by rowtime rows 10 preceding) from 
StockTrades

The SQL syntax supports various size windows, e.g.

  select name,
    sum(sal) over (order by hiredate
                           range between shoesize preceding and age following)
  from emp;

and thus each record defines its own window. You can also define a "paged" 
window, for example the cumulative total trades since the top of the hour:

select stream rowtime, ticker, amount,
  sum(amount) over (order by rowtime partition by ticker, trunc(rowtime to 
hour))
from StockTrades;

For example, you might get:

  {10:01, ORCL, 10, 10}
  {10:02, MSFT, 30, 30}
  {10:03, ORCL, 100, 110}
  {10:17, MSFT, 45, 75}
  {10:59, ORCL, 20, 130}
  {11:02, ORCL, 50, 50}

There are more elaborate ways of defining windows but these cover ~99% of real 
world cases, IMO. The remainder could be handled by a "user-defined window" SPI 
just as complex aggregate functions could be handled by a "user-defined 
aggregate".

> b. for output streams, there is no definition of "partitions". I remember
> that we talked about it briefly in our early discussion and you commented
> that it should not be there in SQL. I would argue that a specification to
> how to partition the output stream is needed in the current system, since
> we are lacking of two things: i) automatically identify the key that can be
> used for optimal partitioning of a query's output; ii) auto-scaling of the
> number of partitions of a stream if the initial partition is not enough.
> Hence, we have to give user a way to tell the system how to partition. I am
> curious about what exactly are the reasons that you believe that partition
> should not be added in SQL syntax?

For the same reason that in an RDBMS you do not say

INSERT INTO Emp:disk1 VALUES (10, 'Fred', 'Sales');
INSERT INTO Emp:disk2 VALUES (20, 'Bill', 'Marketing');

Data independence. In an RDBMS you say

CREATE TABLE Emp(empno INTEGER, name VARCHAR(20), department VARCHAR(20))
  PARTITION BY HASHCODE (department);

and let the system take care of placing the records. You could do the same for 
a stream, partitioning based on a column, and could even partition based on its 
actual value not its hash code:

CREATE STREAM Orders(..., p INTEGER) PARTITION BY VALUE (p);

If Orders has 10 partitions, then

INSERT INTO Orders VALUES (..., 7);

will go to partition #7 and

INSERT INTO Orders VALUES (..., 11);

will fail. By putting the decisions in the hands of the optimizer guided by 
metadata, the system can adapt without you having to change application code. 
Moving the partitioning decision into a data value is another database "trick" 
-- the context is all in the data.

> BTW, it seems even with your proposed model, the current physical operators
> won't be affected, which sounds good.

Yes, that was my goal. Samza operators are fine as they are. You guys know what 
you are doing (and what your users want).

Julian

Reply via email to