Hi Mingmin, Thanks for adding "INSERT INTO" (which I missed from the example)
I am not sure if I understand the question: 1. multiple GBK with retraction is solved by [1]. 2. In terms of SQL and its view, the output are defined by the last GBK. [1]: https://docs.google.com/document/d/14WRfxwk_iLUHGPty3C6ZenddPsp_d6jhmx0vuafXqmE/edit?usp=sharing -Rui On Mon, Aug 19, 2019 at 4:02 PM Mingmin Xu <mingm...@gmail.com> wrote: > +1 to support EMIT in Beam side first if we cannot include it in Calcite > in short time(See #1, #2). I'm open to use any format, the one above or > something as below. The tricky question is, what's the expected behavior > for a complex query with more than 1 GBK operators? > > EMIT <INTERVAL '1' MINUTE> | <INTERVAL '100' ROW> [ACCUMULATE|DISCARD] > [INSERT INTO ...] > SELECT ... > > #1. > https://sematext.com/opensee/m/Calcite/FR3K9JVAl32VULr6?subj=Towards+a+spec+for+robust+streaming+SQL+Part+1 > #2 > https://sematext.com/opensee/m/Beam/gfKHFFDd4i1I3nZc2?subj=Towards+a+spec+for+robust+streaming+SQL+Part+2 > > On Mon, Aug 19, 2019 at 12:02 PM Rui Wang <ruw...@google.com> wrote: > >> To update this idea, I think we can go a step further to support EMIT >> syntax from one-sql-to-rule-them-all paper [1]. >> >> EMIT will allow periodic delay stream materialization. For stream view, >> it means we will add support to sinks to keep generating a changelog table. >> For view only, it means we will add support to sinks to generate a >> compacted table form changelog table periodically. >> >> Regarding to SQL, a typical query like the following should run: >> >> >> *WITH joined_table AS (SELECT * FROM S1 JOIN S2)* >> *SELECT XX FROM HOP(joined_table)* >> *EMTI [STREAM] AFTER DELAY INTERVAL '1' HOUR* >> >> >> By doing so, retractions will be much useful for SQL from a product >> scenario, in which we can have a meaningful end to end SQL pipeline. >> >> [1]: https://arxiv.org/pdf/1905.12133.pdf >> >> -Rui >> >> On Mon, Aug 12, 2019 at 11:30 PM Rui Wang <ruw...@google.com> wrote: >> >>> Hi Community, >>> >>> BeamSQL currently does not support unbounded-unbounded join with >>> non-default trigger. It is because: >>> >>> - Discarding mode does not work for outer joins because of lacking of >>> ability to retract pre-emitted values. You can think about an example in >>> which a tuple of (left_row, null) needed to be retracted if the matched >>> right_row appears since last trigger fired. >>> - Accumulating mode *theoretically* can support unbounded-unbounded >>> join because it's supposed to always "overwrite" previous result. However >>> in practice, for join use cases such overwriting is too expensive. It would >>> be much more efficient if small changes in inputs of join only cause small >>> changes to downstream to compute. >>> - Both discarding mode and accumulating mode are not sufficient to >>> refine materialized data. >>> >>> Meanwhile, [1] has kicked off a discussion on retractions in Beam model. >>> I have been collecting people's feedback and generally speaking people >>> agree that retractions are useful for some use cases. >>> >>> Thus I propose to combine SQL join with retractions to >>> support multiple-triggering SQL Join. >>> >>> I think SQL join is a good start for supporting retraction in Beam with >>> the following caveats: >>> 1. multiple-triggering SQL Join is a useful feature. >>> 2. SQL join is an opportunity for us to figure out implementation >>> details of retraction by building it for a well defined use case. >>> 3. Supporting retraction should not cause performance regression on >>> existing pipelines, or require changes on existing pipelines. >>> >>> >>> What do you think? >>> >>> [1]: >>> https://lists.apache.org/thread.html/bb2d40b1bea8b21fbbb7caf599fabba823da357768ceca8ea2363789@%3Cdev.beam.apache.org%3E >>> >>> >>> -Rui >>> >> > > -- > ---- > Mingmin >