Hi all,

My name is Kyle and I’m an open source developer primarily focused on
Apache Iceberg.

I’m happy to help clarify or elaborate on any aspect of our experience
working on a relatively decoupled connector that is downstream and pretty
popular.

I’d also love to be able to contribute or assist in any way I can.

I don’t mean to thread jack, but are there any meetings or community sync
ups, specifically around the connector APIs, that I might join / be invited
to?

I did want to add that even though I’ve experienced some of the pain points
of integrating with an evolving system / API (catalog support is generally
speaking pretty new everywhere really in this space), I also agree
personally that you shouldn’t slow down development velocity too much for
the sake of external connector. Getting to a performant and stable place
should be the primary goal, and slowing that down to support stragglers
will (in my personal opinion) always be a losing game. Some folks will
simply stay behind on versions regardless until they have to upgrade.

I am working on ensuring that the Iceberg community stays within 1-2
versions of Flink, so that we can help provide more feedback or contribute
things that might make our ability to support multiple Flink runtimes /
versions with one project / codebase and minimal to no reflection (our
desired goal).

If there’s anything I can do or any way I can be of assistance, please
don’t hesitate to reach out. Or find me on ASF slack 😀

I greatly appreciate your general concern for the needs of downstream
connector integrators!

Cheers
Kyle Bendickson (GitHub: kbendick)
Open Source Developer
kyle [at] tabular [dot] io

On Wed, Oct 20, 2021 at 11:35 AM Thomas Weise <t...@apache.org> wrote:

> Hi,
>
> I see the stable core Flink API as a prerequisite for modularity. And
> for connectors it is not just the source and sink API (source being
> stable as of 1.14), but everything that is required to build and
> maintain a connector downstream, such as the test utilities and
> infrastructure.
>
> Without the stable surface of core Flink, changes will leak into
> downstream dependencies and force lock step updates. Refactoring
> across N repos is more painful than a single repo. Those with
> experience developing downstream of Flink will know the pain, and that
> isn't limited to connectors. I don't remember a Flink "minor version"
> update that was just a dependency version change and did not force
> other downstream changes.
>
> Imagine a project with a complex set of dependencies. Let's say Flink
> version A plus Flink reliant dependencies released by other projects
> (Flink-external connectors, Beam, Iceberg, Hudi, ..). We don't want a
> situation where we bump the core Flink version to B and things fall
> apart (interface changes, utilities that were useful but not public,
> transitive dependencies etc.).
>
> The discussion here also highlights the benefits of keeping certain
> connectors outside Flink. Whether that is due to difference in
> developer community, maturity of the connectors, their
> specialized/limited usage etc. I would like to see that as a sign of a
> growing ecosystem and most of the ideas that Arvid has put forward
> would benefit further growth of the connector ecosystem.
>
> As for keeping connectors within Apache Flink: I prefer that as the
> path forward for "essential" connectors like FileSource, KafkaSource,
> ... And we can still achieve a more flexible and faster release cycle.
>
> Thanks,
> Thomas
>
>
>
>
>
> On Wed, Oct 20, 2021 at 3:32 AM Jark Wu <imj...@gmail.com> wrote:
> >
> > Hi Konstantin,
> >
> > > the connectors need to be adopted and require at least one release per
> > Flink minor release.
> > However, this will make the releases of connectors slower, e.g. maintain
> > features for multiple branches and release multiple branches.
> > I think the main purpose of having an external connector repository is in
> > order to have "faster releases of connectors"?
> >
> >
> > From the perspective of CDC connector maintainers, the biggest advantage
> of
> > maintaining it outside of the Flink project is that:
> > 1) we can have a more flexible and faster release cycle
> > 2) we can be more liberal with committership for connector maintainers
> > which can also attract more committers to help the release.
> >
> > Personally, I think maintaining one connector repository under the ASF
> may
> > not have the above benefits.
> >
> > Best,
> > Jark
> >
> > On Wed, 20 Oct 2021 at 15:14, Konstantin Knauf <kna...@apache.org>
> wrote:
> >
> > > Hi everyone,
> > >
> > > regarding the stability of the APIs. I think everyone agrees that
> > > connector APIs which are stable across minor versions (1.13->1.14) are
> the
> > > mid-term goal. But:
> > >
> > > a) These APIs are still quite young, and we shouldn't make them @Public
> > > prematurely either.
> > >
> > > b) Isn't this *mostly* orthogonal to where the connector code lives?
> Yes,
> > > as long as there are breaking changes, the connectors need to be
> adopted
> > > and require at least one release per Flink minor release.
> > > Documentation-wise this can be addressed via a compatibility matrix for
> > > each connector as Arvid suggested. IMO we shouldn't block this effort
> on
> > > the stability of the APIs.
> > >
> > > Cheers,
> > >
> > > Konstantin
> > >
> > >
> > >
> > > On Wed, Oct 20, 2021 at 8:56 AM Jark Wu <imj...@gmail.com> wrote:
> > >
> > >> Hi,
> > >>
> > >> I think Thomas raised very good questions and would like to know your
> > >> opinions if we want to move connectors out of flink in this version.
> > >>
> > >> (1) is the connector API already stable?
> > >> > Separate releases would only make sense if the core Flink surface is
> > >> > fairly stable though. As evident from Iceberg (and also Beam),
> that's
> > >> > not the case currently. We should probably focus on addressing the
> > >> > stability first, before splitting code. A success criteria could be
> > >> > that we are able to build Iceberg and Beam against multiple Flink
> > >> > versions w/o the need to change code. The goal would be that no
> > >> > connector breaks when we make changes to Flink core. Until that's
> the
> > >> > case, code separation creates a setup where 1+1 or N+1 repositories
> > >> > need to move lock step.
> > >>
> > >> From another discussion thread [1], connector API is far from stable.
> > >> Currently, it's hard to build connectors against multiple Flink
> versions.
> > >> There are breaking API changes both in 1.12 -> 1.13 and 1.13 -> 1.14
> and
> > >>  maybe also in the future versions,  because Table related APIs are
> still
> > >> @PublicEvolving and new Sink API is still @Experimental.
> > >>
> > >>
> > >> (2) Flink testability without connectors.
> > >> > Flink w/o Kafka connector (and few others) isn't
> > >> > viable. Testability of Flink was already brought up, can we really
> > >> > certify a Flink core release without Kafka connector? Maybe those
> > >> > connectors that are used in Flink e2e tests to validate
> functionality
> > >> > of core Flink should not be broken out?
> > >>
> > >> This is a very good question. How can we guarantee the new Source and
> Sink
> > >> API are stable with only test implementation?
> > >>
> > >>
> > >> Best,
> > >> Jark
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> On Tue, 19 Oct 2021 at 23:56, Chesnay Schepler <ches...@apache.org>
> > >> wrote:
> > >>
> > >> > Could you clarify what release cadence you're thinking of? There's
> quite
> > >> > a big range that fits "more frequent than Flink" (per-commit, daily,
> > >> > weekly, bi-weekly, monthly, even bi-monthly).
> > >> >
> > >> > On 19/10/2021 14:15, Martijn Visser wrote:
> > >> > > Hi all,
> > >> > >
> > >> > > I think it would be a huge benefit if we can achieve more frequent
> > >> > releases
> > >> > > of connectors, which are not bound to the release cycle of Flink
> > >> itself.
> > >> > I
> > >> > > agree that in order to get there, we need to have stable
> interfaces
> > >> which
> > >> > > are trustworthy and reliable, so they can be safely used by those
> > >> > > connectors. I do think that work still needs to be done on those
> > >> > > interfaces, but I am confident that we can get there from a Flink
> > >> > > perspective.
> > >> > >
> > >> > > I am worried that we would not be able to achieve those frequent
> > >> releases
> > >> > > of connectors if we are putting these connectors under the Apache
> > >> > umbrella,
> > >> > > because that means that for each connector release we have to
> follow
> > >> the
> > >> > > Apache release creation process. This requires a lot of manual
> steps
> > >> and
> > >> > > prohibits automation and I think it would be hard to scale out
> > >> frequent
> > >> > > releases of connectors. I'm curious how others think this
> challenge
> > >> could
> > >> > > be solved.
> > >> > >
> > >> > > Best regards,
> > >> > >
> > >> > > Martijn
> > >> > >
> > >> > > On Mon, 18 Oct 2021 at 22:22, Thomas Weise <t...@apache.org>
> wrote:
> > >> > >
> > >> > >> Thanks for initiating this discussion.
> > >> > >>
> > >> > >> There are definitely a few things that are not optimal with our
> > >> > >> current management of connectors. I would not necessarily
> > >> characterize
> > >> > >> it as a "mess" though. As the points raised so far show, it isn't
> > >> easy
> > >> > >> to find a solution that balances competing requirements and
> leads to
> > >> a
> > >> > >> net improvement.
> > >> > >>
> > >> > >> It would be great if we can find a setup that allows for
> connectors
> > >> to
> > >> > >> be released independently of core Flink and that each connector
> can
> > >> be
> > >> > >> released separately. Flink already has separate releases
> > >> > >> (flink-shaded), so that by itself isn't a new thing.
> Per-connector
> > >> > >> releases would need to allow for more frequent releases (without
> the
> > >> > >> baggage that a full Flink release comes with).
> > >> > >>
> > >> > >> Separate releases would only make sense if the core Flink
> surface is
> > >> > >> fairly stable though. As evident from Iceberg (and also Beam),
> that's
> > >> > >> not the case currently. We should probably focus on addressing
> the
> > >> > >> stability first, before splitting code. A success criteria could
> be
> > >> > >> that we are able to build Iceberg and Beam against multiple Flink
> > >> > >> versions w/o the need to change code. The goal would be that no
> > >> > >> connector breaks when we make changes to Flink core. Until
> that's the
> > >> > >> case, code separation creates a setup where 1+1 or N+1
> repositories
> > >> > >> need to move lock step.
> > >> > >>
> > >> > >> Regarding some connectors being more important for Flink than
> others:
> > >> > >> That's a fact. Flink w/o Kafka connector (and few others) isn't
> > >> > >> viable. Testability of Flink was already brought up, can we
> really
> > >> > >> certify a Flink core release without Kafka connector? Maybe those
> > >> > >> connectors that are used in Flink e2e tests to validate
> functionality
> > >> > >> of core Flink should not be broken out?
> > >> > >>
> > >> > >> Finally, I think that the connectors that move into separate
> repos
> > >> > >> should remain part of the Apache Flink project. Larger
> organizations
> > >> > >> tend to approve the use of and contribution to open source at the
> > >> > >> project level. Sometimes it is everything ASF. More often it is
> > >> > >> "Apache Foo". It would be fatal to end up with a patchwork of
> > >> projects
> > >> > >> with potentially different licenses and governance to arrive at a
> > >> > >> working Flink setup. This may mean we prioritize usability over
> > >> > >> developer convenience, if that's in the best interest of Flink
> as a
> > >> > >> whole.
> > >> > >>
> > >> > >> Thanks,
> > >> > >> Thomas
> > >> > >>
> > >> > >>
> > >> > >>
> > >> > >> On Mon, Oct 18, 2021 at 6:59 AM Chesnay Schepler <
> ches...@apache.org
> > >> >
> > >> > >> wrote:
> > >> > >>> Generally, the issues are reproducibility and control.
> > >> > >>>
> > >> > >>> Stuffs completely broken on the Flink side for a week? Well
> then so
> > >> are
> > >> > >>> the connector repos.
> > >> > >>> (As-is) You can't go back to a previous version of the snapshot.
> > >> Which
> > >> > >>> also means that checking out older commits can be problematic
> > >> because
> > >> > >>> you'd still work against the latest snapshots, and they not be
> > >> > >>> compatible with each other.
> > >> > >>>
> > >> > >>>
> > >> > >>> On 18/10/2021 15:22, Arvid Heise wrote:
> > >> > >>>> I was actually betting on snapshots versions. What are the
> limits?
> > >> > >>>> Obviously, we can only do a release of a 1.15 connector after
> 1.15
> > >> is
> > >> > >>>> release.
> > >> > >>>
> > >> >
> > >> >
> > >>
> > >
> > >
> > > --
> > >
> > > Konstantin Knauf
> > >
> > > https://twitter.com/snntrable
> > >
> > > https://github.com/knaufk
> > >
>

Reply via email to