Hi,

>  I would not fully advertise this before we have the second part
implemented as well.

I'm not sure, maybe we could advertise with a big warning about this
limitation. I mean it's not as if this change would break something. At
worst it just wouldn't fully solve the problem with multiple splits per
single operator, but only limit the scope of that problem. At the same time
I don't have strong feelings about this. If the consensus would be to not
advertise it, I'm also fine with it. Only in that case we should probably
quickly follow up with the per split solution.

Anyway, thanks for voicing your support and the discussions. I'm going to
start a voting thread for this feature soon.

Best,
Piotrek

wt., 13 lip 2021 o 19:09 Stephan Ewen <se...@apache.org> napisał(a):

> @Eron Wright <eronwri...@gmail.com>  The per-split watermarks are the
> default in the new source interface (FLIP-27) and come for free if you use
> the SplitReader.
>
> Based on that, it is also possible to unsubscribe individual splits to
> solve the alignment in the case where operators have multiple splits
> assigned.
> Piotr and I already discussed that, but concluded that the implementation
> of that is largely orthogonal.
>
> I am a bit worried, though, that if we release and advertise the alignment
> without handling this case, we create a surprise for quite a few users.
> While this is admittedly valuable for some users, I think we need to
> position this accordingly. I would not fully advertise this before we have
> the second part implemented as well.
>
>
>
> On Mon, Jul 12, 2021 at 7:18 PM Eron Wright <ewri...@streamnative.io
> .invalid>
> wrote:
>
> > The notion of per-split watermarks seems quite interesting.  I think the
> > idleness feature could benefit from a per-split approach too, because
> > idleness is typically related to whether any splits are assigned to a
> given
> > operator instance.
> >
> >
> > On Mon, Jul 12, 2021 at 3:06 AM 刘建刚 <liujiangangp...@gmail.com> wrote:
> >
> > > +1 for the source watermark alignment.
> > > In the previous flink version, the source connectors are different in
> > > implementation and it is hard to make this feature. When the consumed
> > data
> > > is not aligned or consuming history data, it is very easy to cause the
> > > unalignment. Source alignment can resolve many unstable problems.
> > >
> > > Seth Wiesman <sjwies...@gmail.com> 于2021年7月9日周五 下午11:25写道:
> > >
> > > > +1
> > > >
> > > > In my opinion, this limitation is perfectly fine for the MVP.
> Watermark
> > > > alignment is a long-standing issue and this already moves the ball so
> > far
> > > > forward.
> > > >
> > > > I don't expect this will cause many issues in practice, as I
> understand
> > > it
> > > > the FileSource always processes one split at a time, and in my
> > > experience,
> > > > 90% of Kafka users have a small number of partitions scale their
> > > pipelines
> > > > to have one reader per partition. Obviously, there are larger-scale
> > Kafka
> > > > topics and more sources that will be ported over in the future but I
> > > think
> > > > there is an implicit understanding that aligning sources adds latency
> > to
> > > > pipelines, and we can frame the follow-up "per-split" alignment as an
> > > > optimization.
> > > >
> > > > On Fri, Jul 9, 2021 at 6:40 AM Piotr Nowojski <
> > piotr.nowoj...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hey!
> > > > >
> > > > > A couple of weeks ago me and Arvid Heise played around with an idea
> > to
> > > > > address a long standing issue of Flink: lack of watermark/event
> time
> > > > > alignment between different parallel instances of sources, that can
> > > lead
> > > > to
> > > > > ever growing state size for downstream operators like
> WindowOperator.
> > > > >
> > > > > We had an impression that this is relatively low hanging fruit that
> > can
> > > > be
> > > > > quite easily implemented - at least partially (the first part
> > mentioned
> > > > in
> > > > > the FLIP document). I have written down our proposal [1] and you
> can
> > > also
> > > > > check out our PoC that we have implemented [2].
> > > > >
> > > > > We think that this is a quite easy proposal, that has been in large
> > > part
> > > > > already implemented. There is one obvious limitation of our PoC.
> > Namely
> > > > we
> > > > > can only easily block individual SourceOperators. This works
> > perfectly
> > > > fine
> > > > > as long as there is at most one split per SourceOperator. However
> it
> > > > > doesn't work with multiple splits. In that case, if a single
> > > > > `SourceOperator` is responsible for processing both the least and
> the
> > > > most
> > > > > advanced splits, we won't be able to block this most advanced split
> > for
> > > > > generating new records. I'm proposing to solve this problem in the
> > > future
> > > > > in another follow up FLIP, as a solution that works with a single
> > split
> > > > per
> > > > > operator is easier and already valuable for some of the users.
> > > > >
> > > > > What do you think about this proposal?
> > > > > Best, Piotrek
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-182%3A+Support+watermark+alignment+of+FLIP-27+Sources
> > > > > [2] https://github.com/pnowojski/flink/commits/aligned-sources
> > > > >
> > > >
> > >
> >
>

Reply via email to