Re: [DISCUSS] Creating an external connector repository

Kyle Bendickson Wed, 20 Oct 2021 22:12:54 -0700

Hi all,

My name is Kyle and I’m an open source developer primarily focused on
Apache Iceberg.


I’m happy to help clarify or elaborate on any aspect of our experience
working on a relatively decoupled connector that is downstream and pretty
popular.

I’d also love to be able to contribute or assist in any way I can.

I don’t mean to thread jack, but are there any meetings or community sync
ups, specifically around the connector APIs, that I might join / be invited
to?

I did want to add that even though I’ve experienced some of the pain points
of integrating with an evolving system / API (catalog support is generally
speaking pretty new everywhere really in this space), I also agree
personally that you shouldn’t slow down development velocity too much for
the sake of external connector. Getting to a performant and stable place
should be the primary goal, and slowing that down to support stragglers
will (in my personal opinion) always be a losing game. Some folks will
simply stay behind on versions regardless until they have to upgrade.

I am working on ensuring that the Iceberg community stays within 1-2
versions of Flink, so that we can help provide more feedback or contribute
things that might make our ability to support multiple Flink runtimes /
versions with one project / codebase and minimal to no reflection (our
desired goal).

If there’s anything I can do or any way I can be of assistance, please
don’t hesitate to reach out. Or find me on ASF slack 😀

I greatly appreciate your general concern for the needs of downstream
connector integrators!

Cheers
Kyle Bendickson (GitHub: kbendick)
Open Source Developer
kyle [at] tabular [dot] io

On Wed, Oct 20, 2021 at 11:35 AM Thomas Weise <[email protected]> wrote:

> Hi,
>
> I see the stable core Flink API as a prerequisite for modularity. And
> for connectors it is not just the source and sink API (source being
> stable as of 1.14), but everything that is required to build and
> maintain a connector downstream, such as the test utilities and
> infrastructure.
>
> Without the stable surface of core Flink, changes will leak into
> downstream dependencies and force lock step updates. Refactoring
> across N repos is more painful than a single repo. Those with
> experience developing downstream of Flink will know the pain, and that
> isn't limited to connectors. I don't remember a Flink "minor version"
> update that was just a dependency version change and did not force
> other downstream changes.
>
> Imagine a project with a complex set of dependencies. Let's say Flink
> version A plus Flink reliant dependencies released by other projects
> (Flink-external connectors, Beam, Iceberg, Hudi, ..). We don't want a
> situation where we bump the core Flink version to B and things fall
> apart (interface changes, utilities that were useful but not public,
> transitive dependencies etc.).
>
> The discussion here also highlights the benefits of keeping certain
> connectors outside Flink. Whether that is due to difference in
> developer community, maturity of the connectors, their
> specialized/limited usage etc. I would like to see that as a sign of a
> growing ecosystem and most of the ideas that Arvid has put forward
> would benefit further growth of the connector ecosystem.
>
> As for keeping connectors within Apache Flink: I prefer that as the
> path forward for "essential" connectors like FileSource, KafkaSource,
> ... And we can still achieve a more flexible and faster release cycle.
>
> Thanks,
> Thomas
>
>
>
>
>
> On Wed, Oct 20, 2021 at 3:32 AM Jark Wu <[email protected]> wrote:
> >
> > Hi Konstantin,
> >
> > > the connectors need to be adopted and require at least one release per
> > Flink minor release.
> > However, this will make the releases of connectors slower, e.g. maintain
> > features for multiple branches and release multiple branches.
> > I think the main purpose of having an external connector repository is in
> > order to have "faster releases of connectors"?
> >
> >
> > From the perspective of CDC connector maintainers, the biggest advantage
> of
> > maintaining it outside of the Flink project is that:
> > 1) we can have a more flexible and faster release cycle
> > 2) we can be more liberal with committership for connector maintainers
> > which can also attract more committers to help the release.
> >
> > Personally, I think maintaining one connector repository under the ASF
> may
> > not have the above benefits.
> >
> > Best,
> > Jark
> >
> > On Wed, 20 Oct 2021 at 15:14, Konstantin Knauf <[email protected]>
> wrote:
> >
> > > Hi everyone,
> > >
> > > regarding the stability of the APIs. I think everyone agrees that
> > > connector APIs which are stable across minor versions (1.13->1.14) are
> the
> > > mid-term goal. But:
> > >
> > > a) These APIs are still quite young, and we shouldn't make them @Public
> > > prematurely either.
> > >
> > > b) Isn't this *mostly* orthogonal to where the connector code lives?
> Yes,
> > > as long as there are breaking changes, the connectors need to be
> adopted
> > > and require at least one release per Flink minor release.
> > > Documentation-wise this can be addressed via a compatibility matrix for
> > > each connector as Arvid suggested. IMO we shouldn't block this effort
> on
> > > the stability of the APIs.
> > >
> > > Cheers,
> > >
> > > Konstantin
> > >
> > >
> > >
> > > On Wed, Oct 20, 2021 at 8:56 AM Jark Wu <[email protected]> wrote:
> > >
> > >> Hi,
> > >>
> > >> I think Thomas raised very good questions and would like to know your
> > >> opinions if we want to move connectors out of flink in this version.
> > >>
> > >> (1) is the connector API already stable?
> > >> > Separate releases would only make sense if the core Flink surface is
> > >> > fairly stable though. As evident from Iceberg (and also Beam),
> that's
> > >> > not the case currently. We should probably focus on addressing the
> > >> > stability first, before splitting code. A success criteria could be
> > >> > that we are able to build Iceberg and Beam against multiple Flink
> > >> > versions w/o the need to change code. The goal would be that no
> > >> > connector breaks when we make changes to Flink core. Until that's
> the
> > >> > case, code separation creates a setup where 1+1 or N+1 repositories
> > >> > need to move lock step.
> > >>
> > >> From another discussion thread [1], connector API is far from stable.
> > >> Currently, it's hard to build connectors against multiple Flink
> versions.
> > >> There are breaking API changes both in 1.12 -> 1.13 and 1.13 -> 1.14
> and
> > >>  maybe also in the future versions,  because Table related APIs are
> still
> > >> @PublicEvolving and new Sink API is still @Experimental.
> > >>
> > >>
> > >> (2) Flink testability without connectors.
> > >> > Flink w/o Kafka connector (and few others) isn't
> > >> > viable. Testability of Flink was already brought up, can we really
> > >> > certify a Flink core release without Kafka connector? Maybe those
> > >> > connectors that are used in Flink e2e tests to validate
> functionality
> > >> > of core Flink should not be broken out?
> > >>
> > >> This is a very good question. How can we guarantee the new Source and
> Sink
> > >> API are stable with only test implementation?
> > >>
> > >>
> > >> Best,
> > >> Jark
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> On Tue, 19 Oct 2021 at 23:56, Chesnay Schepler <[email protected]>
> > >> wrote:
> > >>
> > >> > Could you clarify what release cadence you're thinking of? There's
> quite
> > >> > a big range that fits "more frequent than Flink" (per-commit, daily,
> > >> > weekly, bi-weekly, monthly, even bi-monthly).
> > >> >
> > >> > On 19/10/2021 14:15, Martijn Visser wrote:
> > >> > > Hi all,
> > >> > >
> > >> > > I think it would be a huge benefit if we can achieve more frequent
> > >> > releases
> > >> > > of connectors, which are not bound to the release cycle of Flink
> > >> itself.
> > >> > I
> > >> > > agree that in order to get there, we need to have stable
> interfaces
> > >> which
> > >> > > are trustworthy and reliable, so they can be safely used by those
> > >> > > connectors. I do think that work still needs to be done on those
> > >> > > interfaces, but I am confident that we can get there from a Flink
> > >> > > perspective.
> > >> > >
> > >> > > I am worried that we would not be able to achieve those frequent
> > >> releases
> > >> > > of connectors if we are putting these connectors under the Apache
> > >> > umbrella,
> > >> > > because that means that for each connector release we have to
> follow
> > >> the
> > >> > > Apache release creation process. This requires a lot of manual
> steps
> > >> and
> > >> > > prohibits automation and I think it would be hard to scale out
> > >> frequent
> > >> > > releases of connectors. I'm curious how others think this
> challenge
> > >> could
> > >> > > be solved.
> > >> > >
> > >> > > Best regards,
> > >> > >
> > >> > > Martijn
> > >> > >
> > >> > > On Mon, 18 Oct 2021 at 22:22, Thomas Weise <[email protected]>
> wrote:
> > >> > >
> > >> > >> Thanks for initiating this discussion.
> > >> > >>
> > >> > >> There are definitely a few things that are not optimal with our
> > >> > >> current management of connectors. I would not necessarily
> > >> characterize
> > >> > >> it as a "mess" though. As the points raised so far show, it isn't
> > >> easy
> > >> > >> to find a solution that balances competing requirements and
> leads to
> > >> a
> > >> > >> net improvement.
> > >> > >>
> > >> > >> It would be great if we can find a setup that allows for
> connectors
> > >> to
> > >> > >> be released independently of core Flink and that each connector
> can
> > >> be
> > >> > >> released separately. Flink already has separate releases
> > >> > >> (flink-shaded), so that by itself isn't a new thing.
> Per-connector
> > >> > >> releases would need to allow for more frequent releases (without
> the
> > >> > >> baggage that a full Flink release comes with).
> > >> > >>
> > >> > >> Separate releases would only make sense if the core Flink
> surface is
> > >> > >> fairly stable though. As evident from Iceberg (and also Beam),
> that's
> > >> > >> not the case currently. We should probably focus on addressing
> the
> > >> > >> stability first, before splitting code. A success criteria could
> be
> > >> > >> that we are able to build Iceberg and Beam against multiple Flink
> > >> > >> versions w/o the need to change code. The goal would be that no
> > >> > >> connector breaks when we make changes to Flink core. Until
> that's the
> > >> > >> case, code separation creates a setup where 1+1 or N+1
> repositories
> > >> > >> need to move lock step.
> > >> > >>
> > >> > >> Regarding some connectors being more important for Flink than
> others:
> > >> > >> That's a fact. Flink w/o Kafka connector (and few others) isn't
> > >> > >> viable. Testability of Flink was already brought up, can we
> really
> > >> > >> certify a Flink core release without Kafka connector? Maybe those
> > >> > >> connectors that are used in Flink e2e tests to validate
> functionality
> > >> > >> of core Flink should not be broken out?
> > >> > >>
> > >> > >> Finally, I think that the connectors that move into separate
> repos
> > >> > >> should remain part of the Apache Flink project. Larger
> organizations
> > >> > >> tend to approve the use of and contribution to open source at the
> > >> > >> project level. Sometimes it is everything ASF. More often it is
> > >> > >> "Apache Foo". It would be fatal to end up with a patchwork of
> > >> projects
> > >> > >> with potentially different licenses and governance to arrive at a
> > >> > >> working Flink setup. This may mean we prioritize usability over
> > >> > >> developer convenience, if that's in the best interest of Flink
> as a
> > >> > >> whole.
> > >> > >>
> > >> > >> Thanks,
> > >> > >> Thomas
> > >> > >>
> > >> > >>
> > >> > >>
> > >> > >> On Mon, Oct 18, 2021 at 6:59 AM Chesnay Schepler <
> [email protected]
> > >> >
> > >> > >> wrote:
> > >> > >>> Generally, the issues are reproducibility and control.
> > >> > >>>
> > >> > >>> Stuffs completely broken on the Flink side for a week? Well
> then so
> > >> are
> > >> > >>> the connector repos.
> > >> > >>> (As-is) You can't go back to a previous version of the snapshot.
> > >> Which
> > >> > >>> also means that checking out older commits can be problematic
> > >> because
> > >> > >>> you'd still work against the latest snapshots, and they not be
> > >> > >>> compatible with each other.
> > >> > >>>
> > >> > >>>
> > >> > >>> On 18/10/2021 15:22, Arvid Heise wrote:
> > >> > >>>> I was actually betting on snapshots versions. What are the
> limits?
> > >> > >>>> Obviously, we can only do a release of a 1.15 connector after
> 1.15
> > >> is
> > >> > >>>> release.
> > >> > >>>
> > >> >
> > >> >
> > >>
> > >
> > >
> > > --
> > >
> > > Konstantin Knauf
> > >
> > > https://twitter.com/snntrable
> > >
> > > https://github.com/knaufk
> > >
>

Re: [DISCUSS] Creating an external connector repository

Reply via email to