Thanks guys for your interest and discussion. A quick update that, 1. the feature branch is ready [1], 2. a new component created in JIRA as well [2], 3. a list of tasks in my sight [3],
Any contribute, feedback are very welcome !! [1]. https://github.com/apache/beam/tree/DSL_SQL [2]. https://issues.apache.org/jira/browse/BEAM/component/12332480 [3]. https://docs.google.com/document/d/16OeBw2-mK8CFRb_4CkbMCQg2KQ1oRp6KEthxKcjxT0A/edit?usp=sharing On Fri, Apr 14, 2017 at 6:55 PM, Mingmin Xu <[email protected]> wrote: > It's more about how State API can be introduced in SQL, the snapshot of > state converts stream to a table which is very helpful. SQL keyword INSERT > INTO may be an option to do that but I've no confidence so far. > > > On Fri, Apr 14, 2017 at 3:03 PM, Tyler Akidau <[email protected]> wrote: > >> Tarush: I don't think it depends upon the time frame (although you may be >> interested in only a specific timeframe materialized within the table). >> Stream to table conversion is purely a byproduct of grouping a stream. I >> have a doc I'm getting some initial reviews on currently that I hope to >> send out next week to hopefully give some more background here. And >> windowing is really just an additional dimension in grouping. An important >> one, to be sure, but still just grouping. >> >> Mingmin: can you expand upon those statements? I'm not sure I fully >> understand what you're saying. >> >> -Tyler >> >> On Wed, Apr 12, 2017 at 9:38 PM Mingmin Xu <[email protected]> wrote: >> >> > Expose streaming snapshot via STATE is attractive in Beam model, but >> doubt >> > it's the right way in SQL. IMO,there's 'INSERT INTO' to persistent >> > streaming output. >> > >> > >> > On Wed, Apr 12, 2017 at 8:37 PM, tarush grover <[email protected] >> > >> > wrote: >> > >> > > Hi Tyler, >> > > >> > > Transforming stream into a table will also depend on the time frame in >> > the >> > > stream or what windows we choose for the stream. >> > > >> > > Regards, >> > > Tarush >> > > >> > > >> > > On Tue, 11 Apr 2017 at 11:29 PM, Tyler Akidau >> <[email protected] >> > > >> > > wrote: >> > > >> > > > Hi 陈竞, >> > > > >> > > > I'm doubtful there will be an explicit equivalent of the State API >> in >> > > SQL, >> > > > at least not in the SQL portion of the DSL itself (it might make >> sense >> > to >> > > > expose one within UDFs). The State API is an imperative interface >> for >> > > > accessing an underlying persistent state table, whereas SQL operates >> > more >> > > > functionally. There's no good way I'm aware of to expose the >> > > > characteristics provided by the State API (logic-driven, fine- and >> > > > coarse-grained reads/writes of potentially multiple fields of state >> > > > utilizing potentially multiple data types) in raw SQL cleanly. >> > > > >> > > > On the upside, SQL has the advantage of making it very easy to >> > > materialize >> > > > new state tables very naturally. In the proposal I'll be sharing for >> > how >> > > I >> > > > think we should integrate streaming into SQL robustly, any time you >> > > perform >> > > > some grouping operation (GROUP BY, JOIN, CUBE, etc) you're >> transforming >> > > > your stream into a table. That table is effectively a persistent >> state >> > > > table. So there exists a large suite of functionality in standard >> SQL >> > > that >> > > > gives you a lot of powerful tools for creating state. >> > > > >> > > > It may also be possible for the different access patterns of more >> > > > complicated data structures (e.g., bags or lists) to be captured by >> > > > different data types supported by the underlying systems. But I >> don't >> > > > expect there to be an imperative State access API built into SQL >> > itself. >> > > > >> > > > All that said, I'm curious to hear ideas otherwise if anyone has >> them. >> > > :-) >> > > > >> > > > -Tyler >> > > > >> > > > On Mon, Apr 10, 2017 at 10:19 PM 陈竞 <[email protected]> wrote: >> > > > >> > > > > i just want to know what the SQL State API equivalent is for SQL, >> > since >> > > > > beam has already support stateful processing using state DoFn >> > > > > >> > > > > 2017-04-11 2:12 GMT+08:00 Tyler Akidau <[email protected] >> >: >> > > > > >> > > > > > 陈竞, what are you specifically curious about regarding state? Are >> > you >> > > > > > wanting to know what the SQL State API equivalent is for SQL? Or >> > are >> > > > you >> > > > > > asking an operational question about where the state for a given >> > SQL >> > > > > > pipeline will live? >> > > > > > >> > > > > > -Tyler >> > > > > > >> > > > > > >> > > > > > On Sun, Apr 9, 2017 at 12:39 PM Mingmin Xu <[email protected]> >> > > wrote: >> > > > > > >> > > > > > > Thanks @JB, will come out the initial PR soon. >> > > > > > > >> > > > > > > On Sun, Apr 9, 2017 at 12:28 PM, Jean-Baptiste Onofré < >> > > > [email protected] >> > > > > > >> > > > > > > wrote: >> > > > > > > >> > > > > > > > As discussed, I created the DSL_SQL branch with the >> skeleton. >> > > > Mingmin >> > > > > > is >> > > > > > > > rebasing on this branch to submit the PR. >> > > > > > > > >> > > > > > > > Regards >> > > > > > > > JB >> > > > > > > > >> > > > > > > > >> > > > > > > > On 04/09/2017 08:02 PM, Mingmin Xu wrote: >> > > > > > > > >> > > > > > > >> State is not touched yet, welcome to add it. >> > > > > > > >> >> > > > > > > >> On Sun, Apr 9, 2017 at 2:40 AM, 陈竞 <[email protected]> >> > wrote: >> > > > > > > >> >> > > > > > > >> how will this sql support state both in streaming and batch >> > mode >> > > > > > > >>> >> > > > > > > >>> 2017-04-07 4:54 GMT+08:00 Mingmin Xu <[email protected] >> >: >> > > > > > > >>> >> > > > > > > >>> @Tyler, there's no big change in the previous design doc, >> I >> > > added >> > > > > > some >> > > > > > > >>>> details in chapter 'Part 2. DML( [INSERT] SELECT )' , >> > > describing >> > > > > > steps >> > > > > > > >>>> to >> > > > > > > >>>> process a query, feel free to leave a comment. >> > > > > > > >>>> >> > > > > > > >>>> Come through your doc of 'EMIT', it's awesome from my >> > > > perspective. >> > > > > > > I've >> > > > > > > >>>> some tests on GroupBy with default >> triggers/allowed_lateness >> > > > now. >> > > > > > EMIT >> > > > > > > >>>> syntax can be added to fill the gap. >> > > > > > > >>>> >> > > > > > > >>>> On Thu, Apr 6, 2017 at 1:04 PM, Tyler Akidau < >> > > > [email protected]> >> > > > > > > >>>> wrote: >> > > > > > > >>>> >> > > > > > > >>>> I'm very excited by this development as well, thanks for >> > > > > continuing >> > > > > > to >> > > > > > > >>>>> >> > > > > > > >>>> push >> > > > > > > >>>> >> > > > > > > >>>>> this forward, Mingmin. :-) >> > > > > > > >>>>> >> > > > > > > >>>>> I noticed you'd made some changes to your design doc >> > > > > > > >>>>> < >> > > > > https://docs.google.com/document/d/1Uc5xYTpO9qsLXtT38OfuoqSLimH_ >> > > > > > > >>>>> 0a1Bz5BsCROMzCU/edit>. >> > > > > > > >>>>> Is it ready for another review? How reflective is it >> > > currently >> > > > of >> > > > > > the >> > > > > > > >>>>> >> > > > > > > >>>> work >> > > > > > > >>>> >> > > > > > > >>>>> that going into the feature branch? >> > > > > > > >>>>> >> > > > > > > >>>>> In parallel, I'd also like to continue helping push >> forward >> > > the >> > > > > > > >>>>> >> > > > > > > >>>> definition >> > > > > > > >>>> >> > > > > > > >>>>> of unified model semantics for SQL so we can get Calcite >> > to a >> > > > > point >> > > > > > > >>>>> >> > > > > > > >>>> where >> > > > > > > >>> >> > > > > > > >>>> it supports the full Beam model. I added a comment >> > > > > > > >>>>> <https://issues.apache.org/jira/browse/BEAM-301? >> > > > > > > >>>>> >> > > > > > > >>>> focusedCommentId=15959621& >> > > > > > > >>>> >> > > > > > > >>>>> page=com.atlassian.jira.plugin.system.issuetabpanels: >> > > > > > > >>>>> comment-tabpanel#comment-15959621> >> > > > > > > >>>>> on the JIRA suggesting I create a doc with a >> specification >> > > > > proposal >> > > > > > > for >> > > > > > > >>>>> EMIT (and any other necessary semantic changes) that we >> can >> > > > then >> > > > > > > >>>>> >> > > > > > > >>>> iterate >> > > > > > > >>> >> > > > > > > >>>> on >> > > > > > > >>>> >> > > > > > > >>>>> in public with the Calcite folks. I already have most of >> > the >> > > > > > content >> > > > > > > >>>>> written (and there's a significant amount of background >> > > needed >> > > > to >> > > > > > > >>>>> >> > > > > > > >>>> justify >> > > > > > > >>> >> > > > > > > >>>> some aspects of the proposal), so it'll mostly be a >> matter >> > of >> > > > > > pulling >> > > > > > > >>>>> >> > > > > > > >>>> it >> > > > > > > >>> >> > > > > > > >>>> all together into something coherent. Does that sound >> > > reasonable >> > > > > to >> > > > > > > >>>>> everyone? >> > > > > > > >>>>> >> > > > > > > >>>>> -Tyler >> > > > > > > >>>>> >> > > > > > > >>>>> >> > > > > > > >>>>> On Thu, Apr 6, 2017 at 10:26 AM Kenneth Knowles >> > > > > > > <[email protected] >> > > > > > > >>>>> >> > > > > > > >>>> >> > > > > > > >>>> wrote: >> > > > > > > >>>>> >> > > > > > > >>>>> Very cool! I'm really excited about this integration. >> > > > > > > >>>>>> >> > > > > > > >>>>>> On Thu, Apr 6, 2017 at 9:39 AM, Jean-Baptiste Onofré < >> > > > > > > >>>>>> >> > > > > > > >>>>> [email protected]> >> > > > > > > >>> >> > > > > > > >>>> wrote: >> > > > > > > >>>>>> >> > > > > > > >>>>>> Hi, >> > > > > > > >>>>>>> >> > > > > > > >>>>>>> Mingmin and I prepared a new branch to have the SQL >> DSL >> > in >> > > > > > dsls/sql >> > > > > > > >>>>>>> location. >> > > > > > > >>>>>>> >> > > > > > > >>>>>>> Any help is welcome ! >> > > > > > > >>>>>>> >> > > > > > > >>>>>>> Thanks, >> > > > > > > >>>>>>> Regards >> > > > > > > >>>>>>> JB >> > > > > > > >>>>>>> >> > > > > > > >>>>>>> >> > > > > > > >>>>>>> On 04/06/2017 06:36 PM, Mingmin Xu wrote: >> > > > > > > >>>>>>> >> > > > > > > >>>>>>> @Tarush, you're very welcome to join the effort. >> > > > > > > >>>>>>>> >> > > > > > > >>>>>>>> On Thu, Apr 6, 2017 at 7:22 AM, tarush grover < >> > > > > > > >>>>>>>> >> > > > > > > >>>>>>> [email protected]> >> > > > > > > >>>>> >> > > > > > > >>>>>> wrote: >> > > > > > > >>>>>>>> >> > > > > > > >>>>>>>> Hi, >> > > > > > > >>>>>>>> >> > > > > > > >>>>>>>>> >> > > > > > > >>>>>>>>> Can I be also part of this feature development. >> > > > > > > >>>>>>>>> >> > > > > > > >>>>>>>>> Regards, >> > > > > > > >>>>>>>>> Tarush Grover >> > > > > > > >>>>>>>>> >> > > > > > > >>>>>>>>> On Thu, Apr 6, 2017 at 3:17 AM, Ted Yu < >> > > > [email protected]> >> > > > > > > >>>>>>>>> >> > > > > > > >>>>>>>> wrote: >> > > > > > > >>>> >> > > > > > > >>>>> >> > > > > > > >>>>>>>>> I compiled BEAM-301 branch with calcite 1.12 - >> passed. >> > > > > > > >>>>>>>>> >> > > > > > > >>>>>>>>>> >> > > > > > > >>>>>>>>>> Julian tries to not break existing things, but he >> will >> > > if >> > > > > > > >>>>>>>>>> >> > > > > > > >>>>>>>>> there's >> > > > > > > >>> >> > > > > > > >>>> a >> > > > > > > >>>> >> > > > > > > >>>>> >> > > > > > > >>>>>>>>>> reason >> > > > > > > >>>>>>>>> >> > > > > > > >>>>>>>>> to do so :-) >> > > > > > > >>>>>>>>>> >> > > > > > > >>>>>>>>>> On Wed, Apr 5, 2017 at 2:36 PM, Mingmin Xu < >> > > > > > [email protected]> >> > > > > > > >>>>>>>>>> >> > > > > > > >>>>>>>>> wrote: >> > > > > > > >>>>>> >> > > > > > > >>>>>>> >> > > > > > > >>>>>>>>>> @Ted, thanks for the note. I intend to stick with >> one >> > > > > version, >> > > > > > > >>>>>>>>>> >> > > > > > > >>>>>>>>> Beam >> > > > > > > >>>> >> > > > > > > >>>>> >> > > > > > > >>>>>>>>>>> 0.6.0 >> > > > > > > >>>>>>>>>> >> > > > > > > >>>>>>>>> >> > > > > > > >>>>>>>>> and Calcite 1.11 so far, unless impacted by API >> change. >> > > > > Before >> > > > > > > >>>>>>>>>> >> > > > > > > >>>>>>>>> it's >> > > > > > > >>>> >> > > > > > > >>>>> >> > > > > > > >>>>>>>>>>> merged >> > > > > > > >>>>>>>>>> >> > > > > > > >>>>>>>>>> back to master, will upgrade to the latest version. >> > > > > > > >>>>>>>>>>> >> > > > > > > >>>>>>>>>>> On Wed, Apr 5, 2017 at 2:14 PM, Ted Yu < >> > > > > [email protected]> >> > > > > > > >>>>>>>>>>> >> > > > > > > >>>>>>>>>> wrote: >> > > > > > > >>>>> >> > > > > > > >>>>>> >> > > > > > > >>>>>>>>>>> Working in feature branch is good - you may want >> to >> > > > > > > >>>>>>>>>>> >> > > > > > > >>>>>>>>>> periodically >> > > > > > > >>> >> > > > > > > >>>> sync >> > > > > > > >>>>> >> > > > > > > >>>>>> >> > > > > > > >>>>>>>>>>>> up >> > > > > > > >>>>>>>>>>> >> > > > > > > >>>>>>>>>> >> > > > > > > >>>>>>>>>> with master. >> > > > > > > >>>>>>>>>>> >> > > > > > > >>>>>>>>>>>> >> > > > > > > >>>>>>>>>>>> I noticed that you are using 1.11.0 of calcite. >> > > > > > > >>>>>>>>>>>> 1.12 is out, FYI >> > > > > > > >>>>>>>>>>>> >> > > > > > > >>>>>>>>>>>> On Wed, Apr 5, 2017 at 2:05 PM, Mingmin Xu < >> > > > > > > >>>>>>>>>>>> >> > > > > > > >>>>>>>>>>> [email protected]> >> > > > > > > >>> >> > > > > > > >>>> >> > > > > > > >>>>>>>>>>>> wrote: >> > > > > > > >>>>>>>>>>> >> > > > > > > >>>>>>>>>> >> > > > > > > >>>>>>>>> >> > > > > > > >>>>>>>>>> Hi all, >> > > > > > > >>>>>>>>>>>> >> > > > > > > >>>>>>>>>>>>> >> > > > > > > >>>>>>>>>>>>> I'm working on https://issues.apache.org/ >> > > > > > > >>>>>>>>>>>>> >> > > > > > > >>>>>>>>>>>> jira/browse/BEAM-301(Add >> > > > > > > >>>>> >> > > > > > > >>>>>> >> > > > > > > >>>>>>>>>>>>> a >> > > > > > > >>>>>>>>>>>> >> > > > > > > >>>>>>>>>>> >> > > > > > > >>>>>>>>> Beam >> > > > > > > >>>>>>>>>> >> > > > > > > >>>>>>>>>>> >> > > > > > > >>>>>>>>>>> SQL DSL). The skeleton is already in >> > > > > > > >>>>>>>>>>>> >> > > > > > > >>>>>>>>>>>>> https://github.com/XuMingmin/beam/tree/BEAM-301 >> , >> > > using >> > > > > > Java >> > > > > > > >>>>>>>>>>>>> >> > > > > > > >>>>>>>>>>>> SDK >> > > > > > > >>>> >> > > > > > > >>>>> in >> > > > > > > >>>>> >> > > > > > > >>>>>> >> > > > > > > >>>>>>>>>>>>> the >> > > > > > > >>>>>>>>>>>> >> > > > > > > >>>>>>>>>>> >> > > > > > > >>>>>>>>>> back-end. The goal is to provide a SQL interface >> over >> > > > Beam, >> > > > > > > >>>>>>>>>>> >> > > > > > > >>>>>>>>>> based >> > > > > > > >>> >> > > > > > > >>>> >> > > > > > > >>>>>>>>>>>>> on >> > > > > > > >>>>>>>>>>>> >> > > > > > > >>>>>>>>>>> >> > > > > > > >>>>>>>>> Calcite, including: >> > > > > > > >>>>>>>>>> >> > > > > > > >>>>>>>>>>> 1). a translator to create Beam pipeline from SQL, >> > > > > > > >>>>>>>>>>>>> (SELECT/INSERT/FILTER/GROUP-BY/JOIN/...); >> > > > > > > >>>>>>>>>>>>> 2). an interactive client to submit queries; >> > > (All-SQL >> > > > > > mode) >> > > > > > > >>>>>>>>>>>>> 3). a SQL API which reduce the work to create a >> > > > Pipeline; >> > > > > > > >>>>>>>>>>>>> >> > > > > > > >>>>>>>>>>>> (Semi-SQL >> > > > > > > >>>>> >> > > > > > > >>>>>> >> > > > > > > >>>>>>>>>>>>> mode) >> > > > > > > >>>>>>>>>>>> >> > > > > > > >>>>>>>>>>> >> > > > > > > >>>>>>>>>>> >> > > > > > > >>>>>>>>>>>> As we see many folks are interested in this >> feature, >> > > > would >> > > > > > > >>>>>>>>>>>>> >> > > > > > > >>>>>>>>>>>> like >> > > > > > > >>> >> > > > > > > >>>> to >> > > > > > > >>>>> >> > > > > > > >>>>>> >> > > > > > > >>>>>>>>>>>>> create a >> > > > > > > >>>>>>>>>>>> >> > > > > > > >>>>>>>>>>>> feature branch to have more involvement. >> > > > > > > >>>>>>>>>>>>> Looking for comments and feedback. >> > > > > > > >>>>>>>>>>>>> >> > > > > > > >>>>>>>>>>>>> Thanks! >> > > > > > > >>>>>>>>>>>>> ---- >> > > > > > > >>>>>>>>>>>>> Mingmin >> > > > > > > >>>>>>>>>>>>> >> > > > > > > >>>>>>>>>>>>> >> > > > > > > >>>>>>>>>>>>> >> > > > > > > >>>>>>>>>>>> >> > > > > > > >>>>>>>>>>> >> > > > > > > >>>>>>>>>>> -- >> > > > > > > >>>>>>>>>>> ---- >> > > > > > > >>>>>>>>>>> Mingmin >> > > > > > > >>>>>>>>>>> >> > > > > > > >>>>>>>>>>> >> > > > > > > >>>>>>>>>>> >> > > > > > > >>>>>>>>>> >> > > > > > > >>>>>>>>> >> > > > > > > >>>>>>>> >> > > > > > > >>>>>>>> >> > > > > > > >>>>>>>> -- >> > > > > > > >>>>>>> Jean-Baptiste Onofré >> > > > > > > >>>>>>> [email protected] >> > > > > > > >>>>>>> http://blog.nanthrax.net >> > > > > > > >>>>>>> Talend - http://www.talend.com >> > > > > > > >>>>>>> >> > > > > > > >>>>>>> >> > > > > > > >>>>>> >> > > > > > > >>>>> >> > > > > > > >>>> >> > > > > > > >>>> >> > > > > > > >>>> -- >> > > > > > > >>>> ---- >> > > > > > > >>>> Mingmin >> > > > > > > >>>> >> > > > > > > >>>> >> > > > > > > >>> >> > > > > > > >>> >> > > > > > > >>> -- >> > > > > > > >>> 陈竞,中科院计算技术研究所,高性能计算机中心 >> > > > > > > >>> Jing Chen HPCC.ICT.AC China >> > > > > > > >>> >> > > > > > > >>> >> > > > > > > >> >> > > > > > > >> >> > > > > > > >> >> > > > > > > > -- >> > > > > > > > Jean-Baptiste Onofré >> > > > > > > > [email protected] >> > > > > > > > http://blog.nanthrax.net >> > > > > > > > Talend - http://www.talend.com >> > > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > -- >> > > > > > > ---- >> > > > > > > Mingmin >> > > > > > > >> > > > > > >> > > > > >> > > > > >> > > > > >> > > > > -- >> > > > > 陈竞,中科院计算技术研究所,高性能计算机中心 >> > > > > Jing Chen HPCC.ICT.AC China >> > > > > >> > > > >> > > >> > >> > >> > >> > -- >> > ---- >> > Mingmin >> > >> > > > > -- > ---- > Mingmin > -- ---- Mingmin
