Hi Julian,

I am working on tumbling windows and hoping to have a look at other types
of window aggregates next. I was trying to extract the window spec out from
the aggregate operator (for tumbling window) and figure out that its
impossible to infer tumbling window size from date time expressions or from
an expression over any other type of monotonic field (such as row number
for tuple based windows). So we were thinking of implementing aggregates
like we normally implement stream aggregate in standard SQL (assuming group
by fields are sorted) but with support for handling out of order arrivals.
One difference in this method compared to stream aggregate from SQL is that
an input row(s) can contribute to multiple outputs due to late arrivals. My
plan is to emit the first result for tumbling window aggregate when we see
a new tuple from the next window and emit result again if we get a tuple
for an old window. We'll have a window closing policy where we will not
handle tuples arriving after the window timeout. Yi's window operator
design document contains most of the details required. What do you think
about this approach to implement tumbling windows? We highly appreciate
your feedback on this.

Thanks
Milinda

On Mon, Apr 27, 2015 at 6:15 PM, Julian Hyde <jul...@hydromatic.net> wrote:

> Milinda,
>
> I have seen your work adding initial streaming SQL to Samza. Good stuff.
>
> Which types of query are you thinking of doing next?
>
> As of calcite-1.2, the streaming extensions are in Calcite’s master
> branch. (See
> https://github.com/apache/incubator-calcite/blob/master/doc/STREAM.md.)
> We are a couple of weeks away from the next Calcite release. If you need
> some work done in Calcite, now would be a good time.
>
> Julian
>
>


-- 
Milinda Pathirage

PhD Student | Research Assistant
School of Informatics and Computing | Data to Insight Center
Indiana University

twitter: milindalakmal
skype: milinda.pathirage
blog: http://milinda.pathirage.org

Reply via email to