> On Feb 13, 2016, at 11:38 PM, Wanglan (Lan) <[email protected]> wrote: > > Great discussions! > > It seems kind of agreement has been reached . In my opinion, the window > definitions are the basic concepts we should clearly describe first. How is > the progress? Do we need to create a jira or something?
I think we have a good start in http://calcite.apache.org/docs/stream.html, and some extra ideas in my HOP/TUMBLE email[1]. I promised to write them up, but I haven’t got around to it yet, and a JIRA would help me remember. I don’t know whether we can ever say we are “done” with a specification. Of course I can write down my opinion, but it would just be my opinion. :) If we have a conversation thread (or a JIRA case) about each requirement, and representatives of the streaming projects (Fabian for Flink, Milinda for Samza, ? for Storm) chime in, maybe we can reach consensus. After we reach consensus on a particular feature, and write it up, I’d also like to create a set of sample queries & responses that illustrate that feature. Calcite could contain a TCK that any compliant SQL engine could run. How do people feel about having tests as a deliverable? Would you use them in your project? > Btw, happy Chinese new year ;) ! Thank you! And you too! Julian [1] http://mail-archives.apache.org/mod_mbox/calcite-dev/201506.mbox/%3CCAPSgeETbowxM2TRX0RFxQ_tEAPk2uM=he0arywinbtovgwb...@mail.gmail.com%3E > > Lan > > -----邮件原件----- > 发件人: Fabian Hueske [mailto:[email protected]] > 发送时间: 2016年2月6日 17:29 > 收件人: [email protected] > 主题: Re: About Stream SQL > > Excellent! I missed the punctuations in the todo list. > > What kind of strategies do you have in mind to handle events that arrive too > late? I see 1. dropping of late events 2. computing an updated window result > for each late arriving element (implies that the window state is stored for a > certain period before it is discarded) 3. computing a delta to the previous > window result for each late arriving element (requires window state as well, > not applicable to all aggregation > types) > > It would be nice if strategies to handle late-arrivers could be defined in > the query. > > I think the plans of the Flink community are quite well aligned with your > ideas for SQL on Streams. > Should we start by updating / extending the Stream document on the Calcite > website to include the new window definitions (TUMBLE, HOP) and a discussion > of punctuations/watermarks/time bounds? > > Fabian > > > > > > > 2016-02-06 2:35 GMT+01:00 Julian Hyde <[email protected]>: > >> Let me rephrase: The *majority* of the literature, of which I cited >> just one example, calls them punctuation, and a couple of recent >> papers out of Mountain View doesn't change that. >> >> There are some fine distinctions between punctuation, heartbeats, >> watermarks and rowtime bounds, mostly in terms of how they are >> generated and propagated, that matter little when planning the query. >> >> On Fri, Feb 5, 2016 at 5:18 PM, Ted Dunning <[email protected]> wrote: >>> On Fri, Feb 5, 2016 at 5:10 PM, Julian Hyde <[email protected]> wrote: >>> >>>> Yes, watermarks, absolutely. The "to do" list has "punctuation", >>>> which is the same thing. (Actually, I prefer to call it "rowtime bound" >>>> because it is feels more like a dynamic constraint than a piece of >>>> data, but the literature[1] calls them punctuation.) >>>> >>> >>> Some of the literature calls them punctuation, other literature [1] >>> calls them watermarks. >>> >>> [1] http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf >>
