Hey Austin,

Since we are getting deeper into the implementation details of the
DataGeneratorSource
and it is not the main topic of this thread, I propose to move our
discussion to where it belongs: [DISCUSS] FLIP-238 [1]. Could you please
briefly formulate your requirements to make it easier for the others to
follow? I am happy to continue this conversation there.

[1] https://lists.apache.org/thread/7gjxto1rmkpff4kl54j8nlg5db2rqhkt

Best,
Alexander Fedulov

On Tue, Jun 7, 2022 at 6:14 PM Austin Cawley-Edwards <
austin.caw...@gmail.com> wrote:

> > @Austin, in the FLIP I mentioned above [1], the user is expected to
> pass a MapFunction<Long,
> OUT>
> to the generator. I wonder if you could have your external client and
> polling logic wrapped in a custom
> MapFunction implementation class? Would that answer your needs or do you
> have some
> more sophisticated scenario in mind?
>
> At first glance, the FLIP looks good but for this case in regards to the
> map function, but leaves out 1) ability to control polling intervals and 2)
> ability to produce an unknown number of records, both per-poll and overall
> boundedness. Do you think something like this could be built from the same
> pieces?
> I'm also wondering what handles threading, is that on the user or is that
> part of the DataGeneratorSource?
>
> Best,
> Austin
>
> On Tue, Jun 7, 2022 at 9:34 AM Alexander Fedulov <alexan...@ververica.com>
> wrote:
>
> > Hi everyone,
> >
> > Thanks for all the input and a lively discussion. It seems that there is
> a
> > consensus that due to
> > the inherent complexity of FLIP-27 sources we should provide more
> > user-facing utilities to bridge
> > the gap between the existing SourceFunction-based functionality and the
> new
> > APIs.
> >
> > To start addressing this I picked the issue that David raised and many
> > upvoted. Here is a proposal
> > for  the new DataGeneratorSource: FLIP-238 [1]. Please take a look, I am
> > going to open a separate
> > discussion thread on it shortly.
> >
> > Jing also raised some great points regarding the interfaces and
> subclasses.
> > It seems to me that
> > what might actually help is some sort of a "soft deprecation" concept and
> > annotation. It could be
> > used in places where we do not have an alternative implementation yet,
> but
> > we clearly want
> > to indicate that continuing to build on top of these interfaces is
> > discouraged. The area of
> > impact of deprecating all SourceFunction subclasses is rather big, and we
> > can expect it to
> > take a while. The hope would be that if in the meantime someone finds
> > themselves using one of
> > such old APIs, the "soft deprecation" annotation will be a clear
> indication
> > and encouragement to
> > work on introducing an alternative FLIP-27-based implementation instead.
> >
> > @Austin, in the FLIP I mentioned above [1], the user is expected to
> > pass a MapFunction<Long,
> > OUT>
> > to the generator. I wonder if you could have your external client and
> > polling logic wrapped in a custom
> > MapFunction implementation class? Would that answer your needs or do you
> > have some
> > more sophisticated scenario in mind?
> >
> > [1] https://cwiki.apache.org/confluence/x/9Av1D
> > Best,
> > Alexander Fedulov
> >
> > On Mon, Jun 6, 2022 at 7:08 PM Austin Cawley-Edwards <
> > austin.caw...@gmail.com> wrote:
> >
> > > Thanks for the nice discussion all.
> > >
> > > I was recently trying to implement a very simple polling source and
> > > would've loved a higher-level base to work from. I'm wondering if in
> > > addition to the data generator use cases, it would be good to support a
> > > simple non-parallel polling abstraction to make it easier to, for
> > instance,
> > > start prototyping with data in existing APIs without adding a Kafka or
> > such
> > > in the middle.
> > >
> > > Best,
> > > Austin
> > >
> > > On Mon, Jun 6, 2022 at 10:02 AM tison <wander4...@gmail.com> wrote:
> > >
> > > > Well. It's a bit off-topic. For deprecating SourceFunction as FLIP-27
> > > > series works go ahead, +1 from my side. It's a significant work
> towards
> > > the
> > > > unification of batch and streaming effort :)
> > > >
> > > > Best,
> > > > tison.
> > > >
> > > >
> > > > tison <wander4...@gmail.com> 于2022年6月6日周一 21:54写道:
> > > >
> > > > > The starting point of the version bump and removal question is that
> > > > > downstream projects may experience a tough time to adapt new
> > interfaces
> > > > > while Flink keeps in 1.x versions so that users may expect it as an
> > > easy
> > > > > task. From my experience, it's really challenge to maintain
> > > > > compatibility between multiple versions of Flink while significant
> > > > changes
> > > > > made but sharing 1.x version series - users may not be aware that
> > it's
> > > > > almost a major version bump.
> > > > >
> > > > > Best,
> > > > > tison.
> > > > >
> > > > >
> > > > > tison <wander4...@gmail.com> 于2022年6月6日周一 21:51写道:
> > > > >
> > > > >> One question from my side:
> > > > >>
> > > > >> As SourceFunction a @Public interface, we cannot remove it before
> > > doing
> > > > >> a major version bump (Flink 2.0).
> > > > >>
> > > > >> Of course it's not a blocker to make such deprecation and let the
> > new
> > > > >> interface step in. My question is whether we have a plan to
> finally
> > > > remove
> > > > >> the deprecated interfaces, or postpone it until a clear plan of
> > Flink
> > > > 2.0?
> > > > >>
> > > > >> Best,
> > > > >> tison.
> > > > >>
> > > > >>
> > > > >> David Anderson <dander...@apache.org> 于2022年6月6日周一 21:35写道:
> > > > >>
> > > > >>> >
> > > > >>> > David, can you elaborate why you need watermark generation in
> the
> > > > >>> source
> > > > >>> > for your data generators?
> > > > >>>
> > > > >>>
> > > > >>> The training exercises should strive to provide examples of best
> > > > >>> practices.
> > > > >>> If the exercises and their solutions use
> > > > >>>
> > > > >>> env.fromSource(source, WatermarkStrategy.noWatermarks(),
> > > > >>> "name-of-source")
> > > > >>>   .map(...)
> > > > >>>   .assignTimestampsAndWatermarks(...)
> > > > >>>
> > > > >>> this will help establish this anti-pattern as the normal way of
> > doing
> > > > >>> things.
> > > > >>>
> > > > >>> Most new Flink users are using a KafkaSource with a noWatermarks
> > > > strategy
> > > > >>> and a SimpleStringSchema, followed by a map that does the real
> > > > >>> deserialization, followed by the real watermarking -- because
> they
> > > > aren't
> > > > >>> seeing examples that teach how these interfaces are meant to be
> > used.
> > > > >>>
> > > > >>> When we redo the sources used in training exercises, I want to
> > avoid
> > > > >>> these
> > > > >>> pitfalls.
> > > > >>>
> > > > >>> David
> > > > >>>
> > > > >>> On Mon, Jun 6, 2022 at 9:12 AM Konstantin Knauf <
> kna...@apache.org
> > >
> > > > >>> wrote:
> > > > >>>
> > > > >>> > Hi everyone,
> > > > >>> >
> > > > >>> > very interesting thread. The proposal for deprecation seems to
> > have
> > > > >>> sparked
> > > > >>> > a very important discussion. Do we what users struggle with
> > > > >>> specifically?
> > > > >>> >
> > > > >>> > Speaking for myself, when I upgrade flink-faker to the new
> Source
> > > API
> > > > >>> an
> > > > >>> > unbounded version of the NumberSequenceSource would have been
> > all I
> > > > >>> needed,
> > > > >>> > but that's just the data generator use case. I think, that one
> > > could
> > > > be
> > > > >>> > solved quite easily. David, can you elaborate why you need
> > > watermark
> > > > >>> > generation in the source for your data generators?
> > > > >>> >
> > > > >>> > Cheers,
> > > > >>> >
> > > > >>> > Konstantin
> > > > >>> >
> > > > >>> >
> > > > >>> >
> > > > >>> >
> > > > >>> >
> > > > >>> > Am So., 5. Juni 2022 um 17:48 Uhr schrieb Piotr Nowojski <
> > > > >>> > pnowoj...@apache.org>:
> > > > >>> >
> > > > >>> > > Also +1 to what David has written. But it doesn't mean we
> > should
> > > be
> > > > >>> > waiting
> > > > >>> > > indefinitely to deprecate SourceFunction.
> > > > >>> > >
> > > > >>> > > Best,
> > > > >>> > > Piotrek
> > > > >>> > >
> > > > >>> > > niedz., 5 cze 2022 o 16:46 Jark Wu <imj...@gmail.com>
> > > napisał(a):
> > > > >>> > >
> > > > >>> > > > +1 to David's point.
> > > > >>> > > >
> > > > >>> > > > Usually, when we deprecate some interfaces, we should point
> > > users
> > > > >>> to
> > > > >>> > use
> > > > >>> > > > the recommended alternatives.
> > > > >>> > > > However, implementing the new Source interface for some
> > simple
> > > > >>> > scenarios
> > > > >>> > > is
> > > > >>> > > > too challenging and complex.
> > > > >>> > > > We also found it isn't easy to push the internal connector
> to
> > > > >>> upgrade
> > > > >>> > to
> > > > >>> > > > the new Source because
> > > > >>> > > > "FLIP-27 are hard to understand, while SourceFunction is
> > easy".
> > > > >>> > > >
> > > > >>> > > > +1 to make implementing a simple Source easier before
> > > deprecating
> > > > >>> > > > SourceFunction.
> > > > >>> > > >
> > > > >>> > > > Best,
> > > > >>> > > > Jark
> > > > >>> > > >
> > > > >>> > > >
> > > > >>> > > > On Sun, 5 Jun 2022 at 07:29, Jingsong Lee <
> > > > lzljs3620...@apache.org
> > > > >>> >
> > > > >>> > > wrote:
> > > > >>> > > >
> > > > >>> > > > > +1 to David and Ingo.
> > > > >>> > > > >
> > > > >>> > > > > Before deprecate and remove SourceFunction, we should
> have
> > > some
> > > > >>> > easier
> > > > >>> > > > APIs
> > > > >>> > > > > to wrap new Source, the cost to write a new Source is too
> > > high
> > > > >>> now.
> > > > >>> > > > >
> > > > >>> > > > >
> > > > >>> > > > >
> > > > >>> > > > > Ingo Bürk <airbla...@apache.org>于2022年6月5日 周日05:32写道:
> > > > >>> > > > >
> > > > >>> > > > > > I +1 everything David said. The new Source API raised
> the
> > > > >>> > complexity
> > > > >>> > > > > > significantly. It's great to have such a rich, powerful
> > API
> > > > >>> that
> > > > >>> > can
> > > > >>> > > do
> > > > >>> > > > > > everything, but in the process we lost the ability to
> > > onboard
> > > > >>> > people
> > > > >>> > > to
> > > > >>> > > > > > the APIs.
> > > > >>> > > > > >
> > > > >>> > > > > >
> > > > >>> > > > > > Best
> > > > >>> > > > > > Ingo
> > > > >>> > > > > >
> > > > >>> > > > > > On 04.06.22 21:21, David Anderson wrote:
> > > > >>> > > > > > > I'm in favor of this, but I think we need to make it
> > > easier
> > > > >>> to
> > > > >>> > > > > implement
> > > > >>> > > > > > > data generators and test sources. As things stand in
> > > 1.15,
> > > > >>> unless
> > > > >>> > > you
> > > > >>> > > > > can
> > > > >>> > > > > > > be satisfied with using a NumberSequenceSource
> followed
> > > by
> > > > a
> > > > >>> map,
> > > > >>> > > > > things
> > > > >>> > > > > > > get quite complicated. I looked into reworking the
> data
> > > > >>> > generators
> > > > >>> > > > used
> > > > >>> > > > > > in
> > > > >>> > > > > > > the training exercises, and got discouraged by the
> > amount
> > > > of
> > > > >>> work
> > > > >>> > > > > > involved.
> > > > >>> > > > > > > (The sources used in the training want to be
> unbounded,
> > > and
> > > > >>> need
> > > > >>> > > > > > > watermarking in the sources, which means that using
> > > > >>> > > > > NumberSequenceSource
> > > > >>> > > > > > > isn't an option.)
> > > > >>> > > > > > >
> > > > >>> > > > > > > I think the proposed deprecation will be better
> > received
> > > if
> > > > >>> it
> > > > >>> > can
> > > > >>> > > be
> > > > >>> > > > > > > accompanied by something that makes implementing a
> > simple
> > > > >>> Source
> > > > >>> > > > easier
> > > > >>> > > > > > > than it is now. People are continuing to implement
> new
> > > > >>> > > > SourceFunctions
> > > > >>> > > > > > > because the interfaces defined by FLIP-27 are hard to
> > > > >>> understand,
> > > > >>> > > > while
> > > > >>> > > > > > > SourceFunction is easy. Alex, I believe you were
> > looking
> > > > into
> > > > >>> > > > > > implementing
> > > > >>> > > > > > > an easier-to-use building block that could be used in
> > > > >>> situations
> > > > >>> > > like
> > > > >>> > > > > > this.
> > > > >>> > > > > > > Can we get something like that in place first?
> > > > >>> > > > > > >
> > > > >>> > > > > > > David
> > > > >>> > > > > > >
> > > > >>> > > > > > > On Fri, Jun 3, 2022 at 4:52 PM Jing Ge <
> > > j...@ververica.com
> > > > >
> > > > >>> > wrote:
> > > > >>> > > > > > >
> > > > >>> > > > > > >> Hi,
> > > > >>> > > > > > >>
> > > > >>> > > > > > >> Thanks Alex for driving this!
> > > > >>> > > > > > >>
> > > > >>> > > > > > >> +1 To give the Flink developers, especially
> Connector
> > > > >>> developers
> > > > >>> > > the
> > > > >>> > > > > > clear
> > > > >>> > > > > > >> signal that the new Source API is recommended
> > according
> > > to
> > > > >>> > > FLIP-27,
> > > > >>> > > > we
> > > > >>> > > > > > >> should mark them as deprecated.
> > > > >>> > > > > > >>
> > > > >>> > > > > > >> There are some open questions to discuss:
> > > > >>> > > > > > >>
> > > > >>> > > > > > >> 1. Do we need to mark all subinterfaces/subclasses
> as
> > > > >>> > deprecated?
> > > > >>> > > > e.g.
> > > > >>> > > > > > >> FromElementsFunction, etc. there are many. What are
> > the
> > > > >>> > > > replacements?
> > > > >>> > > > > > >> 2. Do we need to mark all subclasses that have
> > > replacement
> > > > >>> as
> > > > >>> > > > > > deprecated?
> > > > >>> > > > > > >> e.g. ExternallyInducedSource whose replacement
> class,
> > > if I
> > > > >>> am
> > > > >>> > not
> > > > >>> > > > > > mistaken,
> > > > >>> > > > > > >> ExternallyInducedSourceReader is @Experimental
> > > > >>> > > > > > >> 3. Do we need to mark all related test utility
> classes
> > > as
> > > > >>> > > > deprecated?
> > > > >>> > > > > > >>
> > > > >>> > > > > > >> I think it might make sense to create an umbrella
> > ticket
> > > > to
> > > > >>> > cover
> > > > >>> > > > all
> > > > >>> > > > > of
> > > > >>> > > > > > >> these with the following process:
> > > > >>> > > > > > >>
> > > > >>> > > > > > >> 1. Mark SourceFunction as deprecated asap.
> > > > >>> > > > > > >> 2. Mark subinterfaces and subclasses as deprecated,
> if
> > > > >>> there are
> > > > >>> > > > > > graduated
> > > > >>> > > > > > >> replacements. Good example is that KafkaSource
> > replaced
> > > > >>> > > > KafkaConsumer
> > > > >>> > > > > > which
> > > > >>> > > > > > >> has been marked as deprecated.
> > > > >>> > > > > > >> 3. Do not mark subinterfaces and subclasses as
> > > deprecated,
> > > > >>> if
> > > > >>> > > > > > replacement
> > > > >>> > > > > > >> classes are still experimental, check if it is time
> to
> > > > >>> graduate
> > > > >>> > > > them.
> > > > >>> > > > > > After
> > > > >>> > > > > > >> graduation, go to step 2. It might take a while for
> > > > >>> graduation.
> > > > >>> > > > > > >> 4. Do not mark subinterfaces and subclasses as
> > > deprecated,
> > > > >>> if
> > > > >>> > the
> > > > >>> > > > > > >> replacement classes are experimental and are too
> young
> > > to
> > > > >>> > > graduate.
> > > > >>> > > > We
> > > > >>> > > > > > have
> > > > >>> > > > > > >> to wait. But in this case we could create new
> tickets
> > > > under
> > > > >>> the
> > > > >>> > > > > umbrella
> > > > >>> > > > > > >> ticket.
> > > > >>> > > > > > >> 5. Do not mark subinterfaces and subclasses as
> > > deprecated,
> > > > >>> if
> > > > >>> > > there
> > > > >>> > > > is
> > > > >>> > > > > > no
> > > > >>> > > > > > >> replacement at all. We have to create new tickets
> and
> > > wait
> > > > >>> until
> > > > >>> > > the
> > > > >>> > > > > new
> > > > >>> > > > > > >> implementation has been done and graduated. It will
> > > take a
> > > > >>> > longer
> > > > >>> > > > > time,
> > > > >>> > > > > > >> roughly 1,5 years.
> > > > >>> > > > > > >> 6. For test classes, we could follow the same rule.
> > But
> > > I
> > > > >>> think
> > > > >>> > > for
> > > > >>> > > > > some
> > > > >>> > > > > > >> cases, we could consider doing the replacement
> > directly
> > > > >>> without
> > > > >>> > > > going
> > > > >>> > > > > > >> through the deprecation phase.
> > > > >>> > > > > > >>
> > > > >>> > > > > > >> When we look back on all of these, we can realize it
> > is
> > > a
> > > > >>> big
> > > > >>> > epic
> > > > >>> > > > > (even
> > > > >>> > > > > > >> bigger than an epic). It needs someone to drive it
> and
> > > > keep
> > > > >>> > focus
> > > > >>> > > on
> > > > >>> > > > > it
> > > > >>> > > > > > >> continuously with support from the community and
> push
> > > the
> > > > >>> > > > development
> > > > >>> > > > > > >> towards the new Source API of FLIP-27.
> > > > >>> > > > > > >>
> > > > >>> > > > > > >> If we could have consensus for this,  Alex and I
> could
> > > > >>> create
> > > > >>> > the
> > > > >>> > > > > > umbrella
> > > > >>> > > > > > >> ticket to kick it off.
> > > > >>> > > > > > >>
> > > > >>> > > > > > >> Best regards,
> > > > >>> > > > > > >> Jing
> > > > >>> > > > > > >>
> > > > >>> > > > > > >>
> > > > >>> > > > > > >> On Fri, Jun 3, 2022 at 3:54 PM Alexander Fedulov <
> > > > >>> > > > > > alexan...@ververica.com>
> > > > >>> > > > > > >> wrote:
> > > > >>> > > > > > >>
> > > > >>> > > > > > >>> Hi everyone,
> > > > >>> > > > > > >>>
> > > > >>> > > > > > >>> I would like to start the discussion about marking
> > > > >>> > > > > SourceFunction-based
> > > > >>> > > > > > >>> interfaces as deprecated. With the FLIP-27 APIs
> > > becoming
> > > > >>> the
> > > > >>> > new
> > > > >>> > > > > > >> standard,
> > > > >>> > > > > > >>> the old ones have to be eventually phased out.
> > Although
> > > > >>> this
> > > > >>> > > state
> > > > >>> > > > is
> > > > >>> > > > > > >> well
> > > > >>> > > > > > >>> known within the community and no new connectors
> > based
> > > on
> > > > >>> the
> > > > >>> > old
> > > > >>> > > > > > >>> interfaces can be accepted into the project, the
> > > > footprint
> > > > >>> of
> > > > >>> > > > > > >>> SourceFunction in the user code still keeps growing
> > > > >>> (primarily
> > > > >>> > > for
> > > > >>> > > > > data
> > > > >>> > > > > > >>> generators and test utilities). I believe it is
> best
> > to
> > > > >>> mark
> > > > >>> > > > > > >> SourceFunction
> > > > >>> > > > > > >>> as deprecated as soon as possible. What do you
> think?
> > > > >>> > > > > > >>>
> > > > >>> > > > > > >>> Best,
> > > > >>> > > > > > >>> Alexander Fedulov
> > > > >>> > > > > > >>>
> > > > >>> > > > > > >>
> > > > >>> > > > > > >
> > > > >>> > > > > >
> > > > >>> > > > >
> > > > >>> > > >
> > > > >>> > >
> > > > >>> >
> > > > >>> >
> > > > >>> > --
> > > > >>> > https://twitter.com/snntrable
> > > > >>> > https://github.com/knaufk
> > > > >>> >
> > > > >>>
> > > > >>
> > > >
> > >
> >
>

Reply via email to