For histogram-based watermark strategy, one possible solution is that we
still use the stateless scalar function, and keep the stateful objects
directly
in the function. By doing that we will loose some information after the job
get restarted, but I think it might acceptable because histogram-based is
an approximate algorithm after all.

But I agree we will meet some troubles if we want to have some accurate
watermark computation logic. In this case, I would suggest to create a
dedicated upstream job to do the watermark calculation, save the value
into a field. Then in current job, we can just reference to the calculated
field and specify it as this job's watermark.

Best,
Kurt


On Mon, Sep 23, 2019 at 8:49 PM Jark Wu <imj...@gmail.com> wrote:

> Hi,
>
> Thanks Fabian for your reply. I agree with your point that the
> histogram-based case need the function to be stateful which is not
> supported currently and in this design.
> Maybe we can support stateful scalar function like TableAggregateFunction.
> We can further discuss how to support this in the future.
> I added this limitation in the "Complex Watermark Strategies" section.
>
> Btw, I also updated how to automatically apply the watermark assigner by
> the planner at the end of "Implementation" section [1].
> This can avoid every TableSource extending DefinedProctimeAttribute to
> carry time attribute information.
>
> If there is no objection, I would like to update the cwiki FLIP page and
> start a new voting process in the next days.
>
> Best,
> Jark
>
> [1]:
>
> https://docs.google.com/document/d/1-SecocBqzUh7zY6HBYcfMlG_0z-JAcuZkCvsmN3LrOw/edit#heading=h.qx7j56dotywd
>
>
> On Fri, 20 Sep 2019 at 22:18, Fabian Hueske <fhue...@gmail.com> wrote:
>
> > Hi Jark,
> >
> > Thanks for the summary!
> > I like the proposal!
> >
> > It makes it very clear that an event time attribute is an existing column
> > on which watermark metadata is defined whereas a processing time
> attribute
> > is a computed field.
> >
> > I have one comment regarding the section on "Complex Watermark
> Strategies".
> > The proposal says that you can also use a scalar function.
> > I don't think that a "text book" scalar function would be sufficient for
> > more advanced strategies.
> > For example a histogram-based approach would need to remember the values
> of
> > the last x records.
> > The interface of a scalar function would still work for that, but it
> would
> > be a stateful function (which would not be OK for a scalar function).
> > I don't think it's a problem, but wanted to mention it here.
> >
> > Best, Fabian
> >
> > Am Do., 19. Sept. 2019 um 18:05 Uhr schrieb Jark Wu <imj...@gmail.com>:
> >
> > > Hi everyone,
> > >
> > > Thanks all for the valuable suggestions and feedbacks so far.
> > > Before starting the vote, I would like to summarize the proposed DDL
> > syntax
> > > in the mailing list.
> > >
> > > ## Rowtime Attribute (Watermark Syntax)
> > >
> > > CREATE TABLE table_name (
> > >   WATERMARK FOR <columnName> AS <watermark_strategy_expression>
> > > ) WITH (
> > >   ...
> > > )
> > >
> > > It marks an existing field <columnName> as the rowtime attribute, and
> the
> > > watermark is generated by the expression
> <watermark_strategy_expression>.
> > > <watermark_strategy_expression> can be arbitrary expression which
> > returns a
> > > nullable BIGINT or TIMESTAMP as the watermark value.
> > >
> > > For common cases, users can use the following expressions to define a
> > > strategy.
> > > 1. Bounded Out of Orderness, the strategy can be "rowtimeField -
> INTERVAL
> > > 'string' timeUnit".
> > > 2. Preserve Watermark From Source, the strategy can be
> > > "SYSTEM_WATERMARK()".
> > >
> > > ## Proctime Attribute
> > >
> > > CREATE TABLE table_name (
> > >   ...
> > >   proc AS SYSTEM_PROCTIME()
> > > ) WITH (
> > >   ...
> > > )
> > >
> > > It uses the computed column syntax to add an additional column with
> > > proctime attribute. Here SYSTEM_PROCTIME() is a built-in function.
> > >
> > > For more details and the implementations, please refer to the design
> doc:
> > >
> > >
> >
> https://docs.google.com/document/d/1-SecocBqzUh7zY6HBYcfMlG_0z-JAcuZkCvsmN3LrOw/edit?ts=5d822dba
> > >
> > > Feel free to leave your further feedbacks!
> > >
> > > Thanks,
> > > Jark
> > >
> > > On Thu, 19 Sep 2019 at 11:23, Kurt Young <ykt...@gmail.com> wrote:
> > >
> > > > +1 to start vote process.
> > > >
> > > > Best,
> > > > Kurt
> > > >
> > > >
> > > > On Thu, Sep 19, 2019 at 10:54 AM Jark Wu <imj...@gmail.com> wrote:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > Thanks all for joining the discussion in the doc[1].
> > > > > It seems that the discussion is converged and there is a consensus
> on
> > > the
> > > > > current FLIP document.
> > > > > If there is no objection, I would like to convert it into cwiki
> FLIP
> > > page
> > > > > and start voting process.
> > > > >
> > > > > For more details, please refer to the design doc (it is slightly
> > > changed
> > > > > since the initial proposal).
> > > > >
> > > > > Thanks,
> > > > > Jark
> > > > >
> > > > > [1]:
> > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1-SecocBqzUh7zY6HBYcfMlG_0z-JAcuZkCvsmN3LrOw/edit?ts=5d8258cd
> > > > >
> > > > > On Mon, 16 Sep 2019 at 16:12, Kurt Young <ykt...@gmail.com> wrote:
> > > > >
> > > > > > After some review and discussion in the google document, I think
> > it's
> > > > > time
> > > > > > to
> > > > > > convert this design to a cwiki flip page and start voting
> process.
> > > > > >
> > > > > > Best,
> > > > > > Kurt
> > > > > >
> > > > > >
> > > > > > On Mon, Sep 9, 2019 at 7:46 PM Jark Wu <imj...@gmail.com> wrote:
> > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > Thanks all for so much feedbacks received in the doc so far.
> > > > > > > I saw a general agreement on using computed column to support
> > > > proctime
> > > > > > > attribute and extract timestamps.
> > > > > > > So we will prepare a computed column FLIP and share in the dev
> ML
> > > > soon.
> > > > > > >
> > > > > > > Feel free to leave more comments!
> > > > > > >
> > > > > > > Best,
> > > > > > > Jark
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Fri, 6 Sep 2019 at 13:50, Dian Fu <dian0511...@gmail.com>
> > > wrote:
> > > > > > >
> > > > > > > > Hi Jark,
> > > > > > > >
> > > > > > > > Thanks for bringing up this discussion and the detailed
> design
> > > doc.
> > > > > > This
> > > > > > > > is definitely a critical feature for streaming SQL jobs. I
> have
> > > > left
> > > > > a
> > > > > > > few
> > > > > > > > comments in the design doc.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Dian
> > > > > > > >
> > > > > > > > > 在 2019年9月6日,上午11:48,Forward Xu <forwardxu...@gmail.com>
> 写道:
> > > > > > > > >
> > > > > > > > > Thanks Jark for this topic, This will be very useful.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > >
> > > > > > > > > ForwardXu
> > > > > > > > >
> > > > > > > > > Danny Chan <yuzhao....@gmail.com> 于2019年9月6日周五 上午11:26写道:
> > > > > > > > >
> > > > > > > > >> Thanks Jark for bring up this topic, this is definitely an
> > > > import
> > > > > > > > feature
> > > > > > > > >> for the SQL, especially the DDL users.
> > > > > > > > >>
> > > > > > > > >> I would spend some time to review this design doc, really
> > > > thanks.
> > > > > > > > >>
> > > > > > > > >> Best,
> > > > > > > > >> Danny Chan
> > > > > > > > >> 在 2019年9月6日 +0800 AM11:19,Jark Wu <imj...@gmail.com>,写道:
> > > > > > > > >>> Hi everyone,
> > > > > > > > >>>
> > > > > > > > >>> I would like to start discussion about how to support
> time
> > > > > > attribute
> > > > > > > in
> > > > > > > > >> SQL
> > > > > > > > >>> DDL.
> > > > > > > > >>> In Flink 1.9, we already introduced a basic SQL DDL to
> > > create a
> > > > > > > table.
> > > > > > > > >>> However, it doesn't support to define time attributes.
> This
> > > > makes
> > > > > > > users
> > > > > > > > >>> can't
> > > > > > > > >>> apply window operations on the tables created by DDL
> which
> > > is a
> > > > > bad
> > > > > > > > >>> experience.
> > > > > > > > >>>
> > > > > > > > >>> In FLIP-66, we propose a syntax for watermark to define
> > > rowtime
> > > > > > > > attribute
> > > > > > > > >>> and propose to use computed column syntax to define
> > proctime
> > > > > > > attribute.
> > > > > > > > >>> But computed column is another big topic and should
> > deserve a
> > > > > > > separate
> > > > > > > > >>> FLIP.
> > > > > > > > >>> If we have a consensus on the computed column approach,
> we
> > > will
> > > > > > start
> > > > > > > > >>> computed column FLIP soon.
> > > > > > > > >>>
> > > > > > > > >>> FLIP-66:
> > > > > > > > >>>
> > > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1-SecocBqzUh7zY6HBYcfMlG_0z-JAcuZkCvsmN3LrOw/edit#
> > > > > > > > >>>
> > > > > > > > >>> Thanks for any feedback!
> > > > > > > > >>>
> > > > > > > > >>> Best,
> > > > > > > > >>> Jark
> > > > > > > > >>
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to