> @Austin, in the FLIP I mentioned above [1], the user is expected to
pass a MapFunction<Long,
OUT>
to the generator. I wonder if you could have your external client and
polling logic wrapped in a custom
MapFunction implementation class? Would that answer your needs or do you
have some
more sophisticated scenario in mind?

At first glance, the FLIP looks good but for this case in regards to the
map function, but leaves out 1) ability to control polling intervals and 2)
ability to produce an unknown number of records, both per-poll and overall
boundedness. Do you think something like this could be built from the same
pieces?
I'm also wondering what handles threading, is that on the user or is that
part of the DataGeneratorSource?

Best,
Austin

On Tue, Jun 7, 2022 at 9:34 AM Alexander Fedulov <alexan...@ververica.com>
wrote:

> Hi everyone,
>
> Thanks for all the input and a lively discussion. It seems that there is a
> consensus that due to
> the inherent complexity of FLIP-27 sources we should provide more
> user-facing utilities to bridge
> the gap between the existing SourceFunction-based functionality and the new
> APIs.
>
> To start addressing this I picked the issue that David raised and many
> upvoted. Here is a proposal
> for  the new DataGeneratorSource: FLIP-238 [1]. Please take a look, I am
> going to open a separate
> discussion thread on it shortly.
>
> Jing also raised some great points regarding the interfaces and subclasses.
> It seems to me that
> what might actually help is some sort of a "soft deprecation" concept and
> annotation. It could be
> used in places where we do not have an alternative implementation yet, but
> we clearly want
> to indicate that continuing to build on top of these interfaces is
> discouraged. The area of
> impact of deprecating all SourceFunction subclasses is rather big, and we
> can expect it to
> take a while. The hope would be that if in the meantime someone finds
> themselves using one of
> such old APIs, the "soft deprecation" annotation will be a clear indication
> and encouragement to
> work on introducing an alternative FLIP-27-based implementation instead.
>
> @Austin, in the FLIP I mentioned above [1], the user is expected to
> pass a MapFunction<Long,
> OUT>
> to the generator. I wonder if you could have your external client and
> polling logic wrapped in a custom
> MapFunction implementation class? Would that answer your needs or do you
> have some
> more sophisticated scenario in mind?
>
> [1] https://cwiki.apache.org/confluence/x/9Av1D
> Best,
> Alexander Fedulov
>
> On Mon, Jun 6, 2022 at 7:08 PM Austin Cawley-Edwards <
> austin.caw...@gmail.com> wrote:
>
> > Thanks for the nice discussion all.
> >
> > I was recently trying to implement a very simple polling source and
> > would've loved a higher-level base to work from. I'm wondering if in
> > addition to the data generator use cases, it would be good to support a
> > simple non-parallel polling abstraction to make it easier to, for
> instance,
> > start prototyping with data in existing APIs without adding a Kafka or
> such
> > in the middle.
> >
> > Best,
> > Austin
> >
> > On Mon, Jun 6, 2022 at 10:02 AM tison <wander4...@gmail.com> wrote:
> >
> > > Well. It's a bit off-topic. For deprecating SourceFunction as FLIP-27
> > > series works go ahead, +1 from my side. It's a significant work towards
> > the
> > > unification of batch and streaming effort :)
> > >
> > > Best,
> > > tison.
> > >
> > >
> > > tison <wander4...@gmail.com> 于2022年6月6日周一 21:54写道:
> > >
> > > > The starting point of the version bump and removal question is that
> > > > downstream projects may experience a tough time to adapt new
> interfaces
> > > > while Flink keeps in 1.x versions so that users may expect it as an
> > easy
> > > > task. From my experience, it's really challenge to maintain
> > > > compatibility between multiple versions of Flink while significant
> > > changes
> > > > made but sharing 1.x version series - users may not be aware that
> it's
> > > > almost a major version bump.
> > > >
> > > > Best,
> > > > tison.
> > > >
> > > >
> > > > tison <wander4...@gmail.com> 于2022年6月6日周一 21:51写道:
> > > >
> > > >> One question from my side:
> > > >>
> > > >> As SourceFunction a @Public interface, we cannot remove it before
> > doing
> > > >> a major version bump (Flink 2.0).
> > > >>
> > > >> Of course it's not a blocker to make such deprecation and let the
> new
> > > >> interface step in. My question is whether we have a plan to finally
> > > remove
> > > >> the deprecated interfaces, or postpone it until a clear plan of
> Flink
> > > 2.0?
> > > >>
> > > >> Best,
> > > >> tison.
> > > >>
> > > >>
> > > >> David Anderson <dander...@apache.org> 于2022年6月6日周一 21:35写道:
> > > >>
> > > >>> >
> > > >>> > David, can you elaborate why you need watermark generation in the
> > > >>> source
> > > >>> > for your data generators?
> > > >>>
> > > >>>
> > > >>> The training exercises should strive to provide examples of best
> > > >>> practices.
> > > >>> If the exercises and their solutions use
> > > >>>
> > > >>> env.fromSource(source, WatermarkStrategy.noWatermarks(),
> > > >>> "name-of-source")
> > > >>>   .map(...)
> > > >>>   .assignTimestampsAndWatermarks(...)
> > > >>>
> > > >>> this will help establish this anti-pattern as the normal way of
> doing
> > > >>> things.
> > > >>>
> > > >>> Most new Flink users are using a KafkaSource with a noWatermarks
> > > strategy
> > > >>> and a SimpleStringSchema, followed by a map that does the real
> > > >>> deserialization, followed by the real watermarking -- because they
> > > aren't
> > > >>> seeing examples that teach how these interfaces are meant to be
> used.
> > > >>>
> > > >>> When we redo the sources used in training exercises, I want to
> avoid
> > > >>> these
> > > >>> pitfalls.
> > > >>>
> > > >>> David
> > > >>>
> > > >>> On Mon, Jun 6, 2022 at 9:12 AM Konstantin Knauf <kna...@apache.org
> >
> > > >>> wrote:
> > > >>>
> > > >>> > Hi everyone,
> > > >>> >
> > > >>> > very interesting thread. The proposal for deprecation seems to
> have
> > > >>> sparked
> > > >>> > a very important discussion. Do we what users struggle with
> > > >>> specifically?
> > > >>> >
> > > >>> > Speaking for myself, when I upgrade flink-faker to the new Source
> > API
> > > >>> an
> > > >>> > unbounded version of the NumberSequenceSource would have been
> all I
> > > >>> needed,
> > > >>> > but that's just the data generator use case. I think, that one
> > could
> > > be
> > > >>> > solved quite easily. David, can you elaborate why you need
> > watermark
> > > >>> > generation in the source for your data generators?
> > > >>> >
> > > >>> > Cheers,
> > > >>> >
> > > >>> > Konstantin
> > > >>> >
> > > >>> >
> > > >>> >
> > > >>> >
> > > >>> >
> > > >>> > Am So., 5. Juni 2022 um 17:48 Uhr schrieb Piotr Nowojski <
> > > >>> > pnowoj...@apache.org>:
> > > >>> >
> > > >>> > > Also +1 to what David has written. But it doesn't mean we
> should
> > be
> > > >>> > waiting
> > > >>> > > indefinitely to deprecate SourceFunction.
> > > >>> > >
> > > >>> > > Best,
> > > >>> > > Piotrek
> > > >>> > >
> > > >>> > > niedz., 5 cze 2022 o 16:46 Jark Wu <imj...@gmail.com>
> > napisał(a):
> > > >>> > >
> > > >>> > > > +1 to David's point.
> > > >>> > > >
> > > >>> > > > Usually, when we deprecate some interfaces, we should point
> > users
> > > >>> to
> > > >>> > use
> > > >>> > > > the recommended alternatives.
> > > >>> > > > However, implementing the new Source interface for some
> simple
> > > >>> > scenarios
> > > >>> > > is
> > > >>> > > > too challenging and complex.
> > > >>> > > > We also found it isn't easy to push the internal connector to
> > > >>> upgrade
> > > >>> > to
> > > >>> > > > the new Source because
> > > >>> > > > "FLIP-27 are hard to understand, while SourceFunction is
> easy".
> > > >>> > > >
> > > >>> > > > +1 to make implementing a simple Source easier before
> > deprecating
> > > >>> > > > SourceFunction.
> > > >>> > > >
> > > >>> > > > Best,
> > > >>> > > > Jark
> > > >>> > > >
> > > >>> > > >
> > > >>> > > > On Sun, 5 Jun 2022 at 07:29, Jingsong Lee <
> > > lzljs3620...@apache.org
> > > >>> >
> > > >>> > > wrote:
> > > >>> > > >
> > > >>> > > > > +1 to David and Ingo.
> > > >>> > > > >
> > > >>> > > > > Before deprecate and remove SourceFunction, we should have
> > some
> > > >>> > easier
> > > >>> > > > APIs
> > > >>> > > > > to wrap new Source, the cost to write a new Source is too
> > high
> > > >>> now.
> > > >>> > > > >
> > > >>> > > > >
> > > >>> > > > >
> > > >>> > > > > Ingo Bürk <airbla...@apache.org>于2022年6月5日 周日05:32写道:
> > > >>> > > > >
> > > >>> > > > > > I +1 everything David said. The new Source API raised the
> > > >>> > complexity
> > > >>> > > > > > significantly. It's great to have such a rich, powerful
> API
> > > >>> that
> > > >>> > can
> > > >>> > > do
> > > >>> > > > > > everything, but in the process we lost the ability to
> > onboard
> > > >>> > people
> > > >>> > > to
> > > >>> > > > > > the APIs.
> > > >>> > > > > >
> > > >>> > > > > >
> > > >>> > > > > > Best
> > > >>> > > > > > Ingo
> > > >>> > > > > >
> > > >>> > > > > > On 04.06.22 21:21, David Anderson wrote:
> > > >>> > > > > > > I'm in favor of this, but I think we need to make it
> > easier
> > > >>> to
> > > >>> > > > > implement
> > > >>> > > > > > > data generators and test sources. As things stand in
> > 1.15,
> > > >>> unless
> > > >>> > > you
> > > >>> > > > > can
> > > >>> > > > > > > be satisfied with using a NumberSequenceSource followed
> > by
> > > a
> > > >>> map,
> > > >>> > > > > things
> > > >>> > > > > > > get quite complicated. I looked into reworking the data
> > > >>> > generators
> > > >>> > > > used
> > > >>> > > > > > in
> > > >>> > > > > > > the training exercises, and got discouraged by the
> amount
> > > of
> > > >>> work
> > > >>> > > > > > involved.
> > > >>> > > > > > > (The sources used in the training want to be unbounded,
> > and
> > > >>> need
> > > >>> > > > > > > watermarking in the sources, which means that using
> > > >>> > > > > NumberSequenceSource
> > > >>> > > > > > > isn't an option.)
> > > >>> > > > > > >
> > > >>> > > > > > > I think the proposed deprecation will be better
> received
> > if
> > > >>> it
> > > >>> > can
> > > >>> > > be
> > > >>> > > > > > > accompanied by something that makes implementing a
> simple
> > > >>> Source
> > > >>> > > > easier
> > > >>> > > > > > > than it is now. People are continuing to implement new
> > > >>> > > > SourceFunctions
> > > >>> > > > > > > because the interfaces defined by FLIP-27 are hard to
> > > >>> understand,
> > > >>> > > > while
> > > >>> > > > > > > SourceFunction is easy. Alex, I believe you were
> looking
> > > into
> > > >>> > > > > > implementing
> > > >>> > > > > > > an easier-to-use building block that could be used in
> > > >>> situations
> > > >>> > > like
> > > >>> > > > > > this.
> > > >>> > > > > > > Can we get something like that in place first?
> > > >>> > > > > > >
> > > >>> > > > > > > David
> > > >>> > > > > > >
> > > >>> > > > > > > On Fri, Jun 3, 2022 at 4:52 PM Jing Ge <
> > j...@ververica.com
> > > >
> > > >>> > wrote:
> > > >>> > > > > > >
> > > >>> > > > > > >> Hi,
> > > >>> > > > > > >>
> > > >>> > > > > > >> Thanks Alex for driving this!
> > > >>> > > > > > >>
> > > >>> > > > > > >> +1 To give the Flink developers, especially Connector
> > > >>> developers
> > > >>> > > the
> > > >>> > > > > > clear
> > > >>> > > > > > >> signal that the new Source API is recommended
> according
> > to
> > > >>> > > FLIP-27,
> > > >>> > > > we
> > > >>> > > > > > >> should mark them as deprecated.
> > > >>> > > > > > >>
> > > >>> > > > > > >> There are some open questions to discuss:
> > > >>> > > > > > >>
> > > >>> > > > > > >> 1. Do we need to mark all subinterfaces/subclasses as
> > > >>> > deprecated?
> > > >>> > > > e.g.
> > > >>> > > > > > >> FromElementsFunction, etc. there are many. What are
> the
> > > >>> > > > replacements?
> > > >>> > > > > > >> 2. Do we need to mark all subclasses that have
> > replacement
> > > >>> as
> > > >>> > > > > > deprecated?
> > > >>> > > > > > >> e.g. ExternallyInducedSource whose replacement class,
> > if I
> > > >>> am
> > > >>> > not
> > > >>> > > > > > mistaken,
> > > >>> > > > > > >> ExternallyInducedSourceReader is @Experimental
> > > >>> > > > > > >> 3. Do we need to mark all related test utility classes
> > as
> > > >>> > > > deprecated?
> > > >>> > > > > > >>
> > > >>> > > > > > >> I think it might make sense to create an umbrella
> ticket
> > > to
> > > >>> > cover
> > > >>> > > > all
> > > >>> > > > > of
> > > >>> > > > > > >> these with the following process:
> > > >>> > > > > > >>
> > > >>> > > > > > >> 1. Mark SourceFunction as deprecated asap.
> > > >>> > > > > > >> 2. Mark subinterfaces and subclasses as deprecated, if
> > > >>> there are
> > > >>> > > > > > graduated
> > > >>> > > > > > >> replacements. Good example is that KafkaSource
> replaced
> > > >>> > > > KafkaConsumer
> > > >>> > > > > > which
> > > >>> > > > > > >> has been marked as deprecated.
> > > >>> > > > > > >> 3. Do not mark subinterfaces and subclasses as
> > deprecated,
> > > >>> if
> > > >>> > > > > > replacement
> > > >>> > > > > > >> classes are still experimental, check if it is time to
> > > >>> graduate
> > > >>> > > > them.
> > > >>> > > > > > After
> > > >>> > > > > > >> graduation, go to step 2. It might take a while for
> > > >>> graduation.
> > > >>> > > > > > >> 4. Do not mark subinterfaces and subclasses as
> > deprecated,
> > > >>> if
> > > >>> > the
> > > >>> > > > > > >> replacement classes are experimental and are too young
> > to
> > > >>> > > graduate.
> > > >>> > > > We
> > > >>> > > > > > have
> > > >>> > > > > > >> to wait. But in this case we could create new tickets
> > > under
> > > >>> the
> > > >>> > > > > umbrella
> > > >>> > > > > > >> ticket.
> > > >>> > > > > > >> 5. Do not mark subinterfaces and subclasses as
> > deprecated,
> > > >>> if
> > > >>> > > there
> > > >>> > > > is
> > > >>> > > > > > no
> > > >>> > > > > > >> replacement at all. We have to create new tickets and
> > wait
> > > >>> until
> > > >>> > > the
> > > >>> > > > > new
> > > >>> > > > > > >> implementation has been done and graduated. It will
> > take a
> > > >>> > longer
> > > >>> > > > > time,
> > > >>> > > > > > >> roughly 1,5 years.
> > > >>> > > > > > >> 6. For test classes, we could follow the same rule.
> But
> > I
> > > >>> think
> > > >>> > > for
> > > >>> > > > > some
> > > >>> > > > > > >> cases, we could consider doing the replacement
> directly
> > > >>> without
> > > >>> > > > going
> > > >>> > > > > > >> through the deprecation phase.
> > > >>> > > > > > >>
> > > >>> > > > > > >> When we look back on all of these, we can realize it
> is
> > a
> > > >>> big
> > > >>> > epic
> > > >>> > > > > (even
> > > >>> > > > > > >> bigger than an epic). It needs someone to drive it and
> > > keep
> > > >>> > focus
> > > >>> > > on
> > > >>> > > > > it
> > > >>> > > > > > >> continuously with support from the community and push
> > the
> > > >>> > > > development
> > > >>> > > > > > >> towards the new Source API of FLIP-27.
> > > >>> > > > > > >>
> > > >>> > > > > > >> If we could have consensus for this,  Alex and I could
> > > >>> create
> > > >>> > the
> > > >>> > > > > > umbrella
> > > >>> > > > > > >> ticket to kick it off.
> > > >>> > > > > > >>
> > > >>> > > > > > >> Best regards,
> > > >>> > > > > > >> Jing
> > > >>> > > > > > >>
> > > >>> > > > > > >>
> > > >>> > > > > > >> On Fri, Jun 3, 2022 at 3:54 PM Alexander Fedulov <
> > > >>> > > > > > alexan...@ververica.com>
> > > >>> > > > > > >> wrote:
> > > >>> > > > > > >>
> > > >>> > > > > > >>> Hi everyone,
> > > >>> > > > > > >>>
> > > >>> > > > > > >>> I would like to start the discussion about marking
> > > >>> > > > > SourceFunction-based
> > > >>> > > > > > >>> interfaces as deprecated. With the FLIP-27 APIs
> > becoming
> > > >>> the
> > > >>> > new
> > > >>> > > > > > >> standard,
> > > >>> > > > > > >>> the old ones have to be eventually phased out.
> Although
> > > >>> this
> > > >>> > > state
> > > >>> > > > is
> > > >>> > > > > > >> well
> > > >>> > > > > > >>> known within the community and no new connectors
> based
> > on
> > > >>> the
> > > >>> > old
> > > >>> > > > > > >>> interfaces can be accepted into the project, the
> > > footprint
> > > >>> of
> > > >>> > > > > > >>> SourceFunction in the user code still keeps growing
> > > >>> (primarily
> > > >>> > > for
> > > >>> > > > > data
> > > >>> > > > > > >>> generators and test utilities). I believe it is best
> to
> > > >>> mark
> > > >>> > > > > > >> SourceFunction
> > > >>> > > > > > >>> as deprecated as soon as possible. What do you think?
> > > >>> > > > > > >>>
> > > >>> > > > > > >>> Best,
> > > >>> > > > > > >>> Alexander Fedulov
> > > >>> > > > > > >>>
> > > >>> > > > > > >>
> > > >>> > > > > > >
> > > >>> > > > > >
> > > >>> > > > >
> > > >>> > > >
> > > >>> > >
> > > >>> >
> > > >>> >
> > > >>> > --
> > > >>> > https://twitter.com/snntrable
> > > >>> > https://github.com/knaufk
> > > >>> >
> > > >>>
> > > >>
> > >
> >
>

Reply via email to