@Eron Wright <eronwri...@gmail.com>  The per-split watermarks are the
default in the new source interface (FLIP-27) and come for free if you use
the SplitReader.

Based on that, it is also possible to unsubscribe individual splits to
solve the alignment in the case where operators have multiple splits
assigned.
Piotr and I already discussed that, but concluded that the implementation
of that is largely orthogonal.

I am a bit worried, though, that if we release and advertise the alignment
without handling this case, we create a surprise for quite a few users.
While this is admittedly valuable for some users, I think we need to
position this accordingly. I would not fully advertise this before we have
the second part implemented as well.



On Mon, Jul 12, 2021 at 7:18 PM Eron Wright <ewri...@streamnative.io.invalid>
wrote:

> The notion of per-split watermarks seems quite interesting.  I think the
> idleness feature could benefit from a per-split approach too, because
> idleness is typically related to whether any splits are assigned to a given
> operator instance.
>
>
> On Mon, Jul 12, 2021 at 3:06 AM 刘建刚 <liujiangangp...@gmail.com> wrote:
>
> > +1 for the source watermark alignment.
> > In the previous flink version, the source connectors are different in
> > implementation and it is hard to make this feature. When the consumed
> data
> > is not aligned or consuming history data, it is very easy to cause the
> > unalignment. Source alignment can resolve many unstable problems.
> >
> > Seth Wiesman <sjwies...@gmail.com> 于2021年7月9日周五 下午11:25写道:
> >
> > > +1
> > >
> > > In my opinion, this limitation is perfectly fine for the MVP. Watermark
> > > alignment is a long-standing issue and this already moves the ball so
> far
> > > forward.
> > >
> > > I don't expect this will cause many issues in practice, as I understand
> > it
> > > the FileSource always processes one split at a time, and in my
> > experience,
> > > 90% of Kafka users have a small number of partitions scale their
> > pipelines
> > > to have one reader per partition. Obviously, there are larger-scale
> Kafka
> > > topics and more sources that will be ported over in the future but I
> > think
> > > there is an implicit understanding that aligning sources adds latency
> to
> > > pipelines, and we can frame the follow-up "per-split" alignment as an
> > > optimization.
> > >
> > > On Fri, Jul 9, 2021 at 6:40 AM Piotr Nowojski <
> piotr.nowoj...@gmail.com>
> > > wrote:
> > >
> > > > Hey!
> > > >
> > > > A couple of weeks ago me and Arvid Heise played around with an idea
> to
> > > > address a long standing issue of Flink: lack of watermark/event time
> > > > alignment between different parallel instances of sources, that can
> > lead
> > > to
> > > > ever growing state size for downstream operators like WindowOperator.
> > > >
> > > > We had an impression that this is relatively low hanging fruit that
> can
> > > be
> > > > quite easily implemented - at least partially (the first part
> mentioned
> > > in
> > > > the FLIP document). I have written down our proposal [1] and you can
> > also
> > > > check out our PoC that we have implemented [2].
> > > >
> > > > We think that this is a quite easy proposal, that has been in large
> > part
> > > > already implemented. There is one obvious limitation of our PoC.
> Namely
> > > we
> > > > can only easily block individual SourceOperators. This works
> perfectly
> > > fine
> > > > as long as there is at most one split per SourceOperator. However it
> > > > doesn't work with multiple splits. In that case, if a single
> > > > `SourceOperator` is responsible for processing both the least and the
> > > most
> > > > advanced splits, we won't be able to block this most advanced split
> for
> > > > generating new records. I'm proposing to solve this problem in the
> > future
> > > > in another follow up FLIP, as a solution that works with a single
> split
> > > per
> > > > operator is easier and already valuable for some of the users.
> > > >
> > > > What do you think about this proposal?
> > > > Best, Piotrek
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-182%3A+Support+watermark+alignment+of+FLIP-27+Sources
> > > > [2] https://github.com/pnowojski/flink/commits/aligned-sources
> > > >
> > >
> >
>

Reply via email to