Re: [DISCUSS] Creating an external connector repository

Arvid Heise Thu, 25 Nov 2021 04:59:41 -0800

Hi Brian,

Thank you for sharing. I think your approach is very valid and is in line
with what I had in mind.


Basically Pravega community aligns the connector releases with the Pravega
> mainline release
>
This certainly would mean that there is little value in coupling connector
versions. So it's making a good case for having separate connector repos.


> and maintains the connector with the latest 3 Flink versions(CI will
> publish snapshots for all these 3 branches)
>
I'd like to give connector devs a simple way to express to which Flink
versions the current branch is compatible. From there we can generate the
compatibility matrix automatically and optionally also create different
releases per supported Flink version. Not sure if the latter is indeed
better than having just one artifact that happens to run with multiple
Flink versions. I guess it depends on what dependencies we are exposing. If
the connector uses flink-connector-base, then we probably need separate
artifacts with poms anyways.

Best,

Arvid

On Fri, Nov 19, 2021 at 10:55 AM Zhou, Brian <[email protected]> wrote:

> Hi Arvid,
>
> For branching model, the Pravega Flink connector has some experience what
> I would like to share. Here[1][2] is the compatibility matrix and wiki
> explaining the branching model and releases. Basically Pravega community
> aligns the connector releases with the Pravega mainline release, and
> maintains the connector with the latest 3 Flink versions(CI will publish
> snapshots for all these 3 branches).
> For example, recently we have 0.10.1 release[3], and in maven central we
> need to upload three artifacts(For Flink 1.13, 1.12, 1.11) for 0.10.1
> version[4].
>
> There are some alternatives. Another solution that we once discussed but
> finally got abandoned is to have a independent version just like the
> current CDC connector, and then give a big compatibility matrix to users.
> We think it would be too confusing when the connector develops. On the
> contrary, we can also do the opposite way to align with Flink version and
> maintain several branches for different system version.
>
> I would say this is only a fairly-OK solution because it is a bit painful
> for maintainers as cherry-picks are very common and releases would require
> much work. However, if neither systems do not have a nice backward
> compatibility, there seems to be no comfortable solution to the their
> connector.
>
> [1] https://github.com/pravega/flink-connectors#compatibility-matrix
> [2]
> https://github.com/pravega/flink-connectors/wiki/Versioning-strategy-for-Flink-connector
> [3] https://github.com/pravega/flink-connectors/releases/tag/v0.10.1
> [4] https://search.maven.org/search?q=pravega-connectors-flink
>
> Best Regards,
> Brian
>
>
> Internal Use - Confidential
>
> -----Original Message-----
> From: Arvid Heise <[email protected]>
> Sent: Friday, November 19, 2021 4:12 PM
> To: dev
> Subject: Re: [DISCUSS] Creating an external connector repository
>
>
> [EXTERNAL EMAIL]
>
> Hi everyone,
>
> we are currently in the process of setting up the flink-connectors repo
> [1] for new connectors but we hit a wall that we currently cannot take:
> branching model.
> To reiterate the original motivation of the external connector repo: We
> want to decouple the release cycle of a connector with Flink. However, if
> we want to support semantic versioning in the connectors with the ability
> to introduce breaking changes through major version bumps and support
> bugfixes on old versions, then we need release branches similar to how
> Flink core operates.
> Consider two connectors, let's call them kafka and hbase. We have kafka in
> version 1.0.X, 1.1.Y (small improvement), 2.0.Z (config option) change and
> hbase only on 1.0.A.
>
> Now our current assumption was that we can work with a mono-repo under ASF
> (flink-connectors). Then, for release-branches, we found 3 options:
> 1. We would need to create some ugly mess with the cross product of
> connector and version: so you have kafka-release-1.0, kafka-release-1.1,
> kafka-release-2.0, hbase-release-1.0. The main issue is not the amount of
> branches (that's something that git can handle) but there the state of
> kafka is undefined in hbase-release-1.0. That's a call for desaster and
> makes releasing connectors very cumbersome (CI would only execute and
> publish hbase SNAPSHOTS on hbase-release-1.0).
> 2. We could avoid the undefined state by having an empty master and each
> release branch really only holds the code of the connector. But that's also
> not great: any user that looks at the repo and sees no connector would
> assume that it's dead.
> 3. We could have synced releases similar to the CDC connectors [2]. That
> means that if any connector introduces a breaking change, all connectors
> get a new major. I find that quite confusing to a user if hbase gets a new
> release without any change because kafka introduced a breaking change.
>
> To fully decouple release cycles and CI of connectors, we could add
> individual repositories under ASF (flink-connector-kafka,
> flink-connector-hbase). Then we can apply the same branching model as
> before. I quickly checked if there are precedences in the apache community
> for that approach and just by scanning alphabetically I found cordova with
> 70 and couchdb with 77 apache repos respectively. So it certainly seems
> like other projects approached our problem in that way and the apache
> organization is okay with that. I currently expect max 20 additional repos
> for connectors and in the future 10 max each for formats and filesystems if
> we would also move them out at some point in time. So we would be at a
> total of 50 repos.
>
> Note for all options, we need to provide a compability matrix that we aim
> to autogenerate.
>
> Now for the potential downsides that we internally discussed:
> - How can we ensure common infra structure code, utilties, and quality?
> I propose to add a flink-connector-common that contains all these things
> and is added as a git submodule/subtree to the repos.
> - Do we implicitly discourage connector developers to maintain more than
> one connector with a fragmented code base?
> That is certainly a risk. However, I currently also see few devs working
> on more than one connector. However, it may actually help keeping the devs
> that maintain a specific connector on the hook. We could use github issues
> to track bugs and feature requests and a dev can focus his limited time on
> getting that one connector right.
>
> So WDYT? Compared to some intermediate suggestions with split repos, the
> big difference is that everything remains under Apache umbrella and the
> Flink community.
>
> [1]
> https://urldefense.com/v3/__https://github.com/apache/flink-connectors__;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-XjpYgXzxxweh4$
> [github[.]com] [2]
> https://urldefense.com/v3/__https://github.com/ververica/flink-cdc-connectors/__;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-XjpYgXzgoPGA8$
> [github[.]com]
>
> On Fri, Nov 12, 2021 at 3:39 PM Arvid Heise <[email protected]> wrote:
>
> > Hi everyone,
> >
> > I created the flink-connectors repo [1] to advance the topic. We would
> > create a proof-of-concept in the next few weeks as a special branch
> > that I'd then use for discussions. If the community agrees with the
> > approach, that special branch will become the master. If not, we can
> > reiterate over it or create competing POCs.
> >
> > If someone wants to try things out in parallel, just make sure that
> > you are not accidentally pushing POCs to the master.
> >
> > As a reminder: We will not move out any current connector from Flink
> > at this point in time, so everything in Flink will remain as is and be
> > maintained there.
> >
> > Best,
> >
> > Arvid
> >
> > [1]
> > https://urldefense.com/v3/__https://github.com/apache/flink-connectors
> > __;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-XjpYgXzxxweh4
> > $ [github[.]com]
> >
> > On Fri, Oct 29, 2021 at 6:57 PM Till Rohrmann <[email protected]>
> > wrote:
> >
> >> Hi everyone,
> >>
> >> From the discussion, it seems to me that we have different opinions
> >> whether to have an ASF umbrella repository or to host them outside of
> >> the ASF. It also seems that this is not really the problem to solve.
> >> Since there are many good arguments for either approach, we could
> >> simply start with an ASF umbrella repository and see how people adopt
> >> it. If the individual connectors cannot move fast enough or if people
> >> prefer to not buy into the more heavy-weight ASF processes, then they
> >> can host the code also somewhere else. We simply need to make sure
> >> that these connectors are discoverable (e.g. via flink-packages).
> >>
> >> The more important problem seems to be to provide common tooling
> >> (testing, infrastructure, documentation) that can easily be reused.
> >> Similarly, it has become clear that the Flink community needs to
> >> improve on providing stable APIs. I think it is not realistic to
> >> first complete these tasks before starting to move connectors to
> >> dedicated repositories. As Stephan said, creating a connector
> >> repository will force us to pay more attention to API stability and
> >> also to think about which testing tools are required. Hence, I
> >> believe that starting to add connectors to a different repository
> >> than apache/flink will help improve our connector tooling (declaring
> >> testing classes as public, creating a common test utility repo,
> >> creating a repo
> >> template) and vice versa. Hence, I like Arvid's proposed process as
> >> it will start kicking things off w/o letting this effort fizzle out.
> >>
> >> Cheers,
> >> Till
> >>
> >> On Thu, Oct 28, 2021 at 11:44 AM Stephan Ewen <[email protected]> wrote:
> >>
> >> > Thank you all, for the nice discussion!
> >> >
> >> > From my point of view, I very much like the idea of putting
> >> > connectors
> >> in a
> >> > separate repository. But I would argue it should be part of Apache
> >> Flink,
> >> > similar to flink-statefun, flink-ml, etc.
> >> >
> >> > I share many of the reasons for that:
> >> >   - As argued many times, reduces complexity of the Flink repo,
> >> increases
> >> > response times of CI, etc.
> >> >   - Much lower barrier of contribution, because an unstable
> >> > connector
> >> would
> >> > not de-stabilize the whole build. Of course, we would need to make
> >> > sure
> >> we
> >> > set this up the right way, with connectors having individual CI
> >> > runs,
> >> build
> >> > status, etc. But it certainly seems possible.
> >> >
> >> >
> >> > I would argue some points a bit different than some cases made before:
> >> >
> >> > (a) I believe the separation would increase connector stability.
> >> Because it
> >> > really forces us to work with the connectors against the APIs like
> >> > any external developer. A mono repo is somehow the wrong thing if
> >> > you in practice want to actually guarantee stable internal APIs at
> some layer.
> >> > Because the mono repo makes it easy to just change something on
> >> > both
> >> sides
> >> > of the API (provider and consumer) seamlessly.
> >> >
> >> > Major refactorings in Flink need to keep all connector API
> >> > contracts intact, or we need to have a new version of the connector
> API.
> >> >
> >> > (b) We may even be able to go towards more lightweight and
> >> > automated releases over time, even if we stay in Apache Flink with
> that repo.
> >> > This isn't yet fully aligned with the Apache release policies, yet,
> >> > but there are board discussions about whether there can be
> >> > bot-triggered releases (by dependabot) and how that could fit into
> the Apache process.
> >> >
> >> > This doesn't seem to be quite there just yet, but seeing that those
> >> start
> >> > is a good sign, and there is a good chance we can do some things
> there.
> >> > I am not sure whether we should let bots trigger releases, because
> >> > a
> >> final
> >> > human look at things isn't a bad thing, especially given the
> >> > popularity
> >> of
> >> > software supply chain attacks recently.
> >> >
> >> >
> >> > I do share Chesnay's concerns about complexity in tooling, though.
> >> > Both release tooling and test tooling. They are not incompatible
> >> > with that approach, but they are a task we need to tackle during
> >> > this change which will add additional work.
> >> >
> >> >
> >> >
> >> > On Tue, Oct 26, 2021 at 10:31 AM Arvid Heise <[email protected]>
> wrote:
> >> >
> >> > > Hi folks,
> >> > >
> >> > > I think some questions came up and I'd like to address the
> >> > > question of
> >> > the
> >> > > timing.
> >> > >
> >> > > Could you clarify what release cadence you're thinking of?
> >> > > There's
> >> quite
> >> > > > a big range that fits "more frequent than Flink" (per-commit,
> >> > > > daily, weekly, bi-weekly, monthly, even bi-monthly).
> >> > >
> >> > > The short answer is: as often as needed:
> >> > > - If there is a CVE in a dependency and we need to bump it -
> >> > > release immediately.
> >> > > - If there is a new feature merged, release soonish. We may
> >> > > collect a
> >> few
> >> > > successive features before a release.
> >> > > - If there is a bugfix, release immediately or soonish depending
> >> > > on
> >> the
> >> > > severity and if there are workarounds available.
> >> > >
> >> > > We should not limit ourselves; the whole idea of independent
> >> > > releases
> >> is
> >> > > exactly that you release as needed. There is no release planning
> >> > > or anything needed, you just go with a release as if it was an
> >> > > external artifact.
> >> > >
> >> > > (1) is the connector API already stable?
> >> > > > From another discussion thread [1], connector API is far from
> >> stable.
> >> > > > Currently, it's hard to build connectors against multiple Flink
> >> > versions.
> >> > > > There are breaking API changes both in 1.12 -> 1.13 and 1.13 ->
> >> > > > 1.14
> >> > and
> >> > > >  maybe also in the future versions,  because Table related APIs
> >> > > > are
> >> > still
> >> > > > @PublicEvolving and new Sink API is still @Experimental.
> >> > > >
> >> > >
> >> > > The question is: what is stable in an evolving system? We
> >> > > recently discovered that the old SourceFunction needed to be
> >> > > refined such that cancellation works correctly [1]. So that
> >> > > interface is in Flink since
> >> 7
> >> > > years, heavily used also outside, and we still had to change the
> >> contract
> >> > > in a way that I'd expect any implementer to recheck their
> >> implementation.
> >> > > It might not be necessary to change anything and you can probably
> >> change
> >> > > the the code for all Flink versions but still, the interface was
> >> > > not
> >> > stable
> >> > > in the closest sense.
> >> > >
> >> > > If we focus just on API changes on the unified interfaces, then
> >> > > we
> >> expect
> >> > > one more change to Sink API to support compaction. For Table API,
> >> there
> >> > > will most likely also be some changes in 1.15. So we could wait
> >> > > for
> >> 1.15.
> >> > >
> >> > > But I'm questioning if that's really necessary because we will
> >> > > add
> >> more
> >> > > functionality beyond 1.15 without breaking API. For example, we
> >> > > may
> >> add
> >> > > more unified connector metrics. If you want to use it in your
> >> connector,
> >> > > you have to support multiple Flink versions anyhow. So rather
> >> > > then
> >> > focusing
> >> > > the discussion on "when is stuff stable", I'd rather focus on
> >> > > "how
> >> can we
> >> > > support building connectors against multiple Flink versions" and
> >> > > make
> >> it
> >> > as
> >> > > painless as possible.
> >> > >
> >> > > Chesnay pointed out to use different branches for different Flink
> >> > versions
> >> > > which sounds like a good suggestion. With a mono-repo, we can't
> >> > > use branches differently anyways (there is no way to have release
> >> > > branches
> >> > per
> >> > > connector without chaos). In these branches, we could provide
> >> > > shims to simulate future features in older Flink versions such
> >> > > that code-wise,
> >> the
> >> > > source code of a specific connector may not diverge (much). For
> >> example,
> >> > to
> >> > > register unified connector metrics, we could simulate the current
> >> > approach
> >> > > also in some utility package of the mono-repo.
> >> > >
> >> > > I see the stable core Flink API as a prerequisite for modularity.
> >> > > And
> >> > > > for connectors it is not just the source and sink API (source
> >> > > > being stable as of 1.14), but everything that is required to
> >> > > > build and maintain a connector downstream, such as the test
> >> > > > utilities and infrastructure.
> >> > > >
> >> > >
> >> > > That is a very fair point. I'm actually surprised to see that
> >> > > MiniClusterWithClientResource is not public. I see it being used
> >> > > in
> >> all
> >> > > connectors, especially outside of Flink. I fear that as long as
> >> > > we do
> >> not
> >> > > have connectors outside, we will not properly annotate and
> >> > > maintain
> >> these
> >> > > utilties in a classic hen-and-egg-problem. I will outline an idea
> >> > > at
> >> the
> >> > > end.
> >> > >
> >> > > > the connectors need to be adopted and require at least one
> >> > > > release
> >> per
> >> > > > Flink minor release.
> >> > > > However, this will make the releases of connectors slower, e.g.
> >> > maintain
> >> > > > features for multiple branches and release multiple branches.
> >> > > > I think the main purpose of having an external connector
> >> > > > repository
> >> is
> >> > in
> >> > > > order to have "faster releases of connectors"?
> >> > > >
> >> > >
> >> > > > Imagine a project with a complex set of dependencies. Let's say
> >> Flink
> >> > > > version A plus Flink reliant dependencies released by other
> >> > > > projects (Flink-external connectors, Beam, Iceberg, Hudi, ..).
> >> > > > We don't want
> >> a
> >> > > > situation where we bump the core Flink version to B and things
> >> > > > fall apart (interface changes, utilities that were useful but
> >> > > > not public, transitive dependencies etc.).
> >> > > >
> >> > >
> >> > > Yes, that's why I wanted to automate the processes more which is
> >> > > not
> >> that
> >> > > easy under ASF. Maybe we automate the source provision across
> >> supported
> >> > > versions and have 1 vote thread for all versions of a connector?
> >> > >
> >> > > From the perspective of CDC connector maintainers, the biggest
> >> advantage
> >> > of
> >> > > > maintaining it outside of the Flink project is that:
> >> > > > 1) we can have a more flexible and faster release cycle
> >> > > > 2) we can be more liberal with committership for connector
> >> maintainers
> >> > > > which can also attract more committers to help the release.
> >> > > >
> >> > > > Personally, I think maintaining one connector repository under
> >> > > > the
> >> ASF
> >> > > may
> >> > > > not have the above benefits.
> >> > > >
> >> > >
> >> > > Yes, I also feel that ASF is too restrictive for our needs. But
> >> > > it
> >> feels
> >> > > like there are too many that see it differently and I think we
> >> > > need
> >> > >
> >> > > (2) Flink testability without connectors.
> >> > > > This is a very good question. How can we guarantee the new
> >> > > > Source
> >> and
> >> > > Sink
> >> > > > API are stable with only test implementation?
> >> > > >
> >> > >
> >> > > We can't and shouldn't. Since the connector repo is managed by
> >> > > Flink,
> >> a
> >> > > Flink release manager needs to check if the Flink connectors are
> >> actually
> >> > > working prior to creating an RC. That's similar to how
> >> > > flink-shaded
> >> and
> >> > > flink core are related.
> >> > >
> >> > >
> >> > > So here is one idea that I had to get things rolling. We are
> >> > > going to address the external repo iteratively without
> >> > > compromising what we
> >> > already
> >> > > have:
> >> > > 1.Phase, add new contributions to external repo. We use that time
> >> > > to
> >> > setup
> >> > > infra accordingly and optimize release processes. We will
> >> > > identify
> >> test
> >> > > utilities that are not yet public/stable and fix that.
> >> > > 2.Phase, add ports to the new unified interfaces of existing
> >> connectors.
> >> > > That requires a previous Flink release to make utilities stable.
> >> > > Keep
> >> old
> >> > > interfaces in flink-core.
> >> > > 3.Phase, remove old interfaces in flink-core of some connectors
> >> > > (tbd
> >> at a
> >> > > later point).
> >> > > 4.Phase, optionally move all remaining connectors (tbd at a later
> >> point).
> >> > >
> >> > > I'd envision having ~3 months between the starting the different
> >> phases.
> >> > > WDYT?
> >> > >
> >> > >
> >> > > [1]
> >> > > https://urldefense.com/v3/__https://issues.apache.org/jira/browse
> >> > > /FLINK-23527__;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgd
> >> > > ke_-XjpYgX2sIvAP4$ [issues[.]apache[.]org]
> >> > >
> >> > > On Thu, Oct 21, 2021 at 7:12 AM Kyle Bendickson <[email protected]>
> >> wrote:
> >> > >
> >> > > > Hi all,
> >> > > >
> >> > > > My name is Kyle and I’m an open source developer primarily
> >> > > > focused
> >> on
> >> > > > Apache Iceberg.
> >> > > >
> >> > > > I’m happy to help clarify or elaborate on any aspect of our
> >> experience
> >> > > > working on a relatively decoupled connector that is downstream
> >> > > > and
> >> > pretty
> >> > > > popular.
> >> > > >
> >> > > > I’d also love to be able to contribute or assist in any way I can.
> >> > > >
> >> > > > I don’t mean to thread jack, but are there any meetings or
> >> > > > community
> >> > sync
> >> > > > ups, specifically around the connector APIs, that I might join
> >> > > > / be
> >> > > invited
> >> > > > to?
> >> > > >
> >> > > > I did want to add that even though I’ve experienced some of the
> >> > > > pain
> >> > > points
> >> > > > of integrating with an evolving system / API (catalog support
> >> > > > is
> >> > > generally
> >> > > > speaking pretty new everywhere really in this space), I also
> >> > > > agree personally that you shouldn’t slow down development
> >> > > > velocity too
> >> much
> >> > for
> >> > > > the sake of external connector. Getting to a performant and
> >> > > > stable
> >> > place
> >> > > > should be the primary goal, and slowing that down to support
> >> stragglers
> >> > > > will (in my personal opinion) always be a losing game. Some
> >> > > > folks
> >> will
> >> > > > simply stay behind on versions regardless until they have to
> >> upgrade.
> >> > > >
> >> > > > I am working on ensuring that the Iceberg community stays
> >> > > > within 1-2 versions of Flink, so that we can help provide more
> >> > > > feedback or
> >> > > contribute
> >> > > > things that might make our ability to support multiple Flink
> >> runtimes /
> >> > > > versions with one project / codebase and minimal to no
> >> > > > reflection
> >> (our
> >> > > > desired goal).
> >> > > >
> >> > > > If there’s anything I can do or any way I can be of assistance,
> >> please
> >> > > > don’t hesitate to reach out. Or find me on ASF slack 😀
> >> > > >
> >> > > > I greatly appreciate your general concern for the needs of
> >> downstream
> >> > > > connector integrators!
> >> > > >
> >> > > > Cheers
> >> > > > Kyle Bendickson (GitHub: kbendick) Open Source Developer kyle
> >> > > > [at] tabular [dot] io
> >> > > >
> >> > > > On Wed, Oct 20, 2021 at 11:35 AM Thomas Weise <[email protected]>
> >> wrote:
> >> > > >
> >> > > > > Hi,
> >> > > > >
> >> > > > > I see the stable core Flink API as a prerequisite for
> modularity.
> >> And
> >> > > > > for connectors it is not just the source and sink API (source
> >> being
> >> > > > > stable as of 1.14), but everything that is required to build
> >> > > > > and maintain a connector downstream, such as the test
> >> > > > > utilities and infrastructure.
> >> > > > >
> >> > > > > Without the stable surface of core Flink, changes will leak
> >> > > > > into downstream dependencies and force lock step updates.
> >> > > > > Refactoring across N repos is more painful than a single
> >> > > > > repo. Those with experience developing downstream of Flink
> >> > > > > will know the pain, and
> >> > that
> >> > > > > isn't limited to connectors. I don't remember a Flink "minor
> >> version"
> >> > > > > update that was just a dependency version change and did not
> >> > > > > force other downstream changes.
> >> > > > >
> >> > > > > Imagine a project with a complex set of dependencies. Let's
> >> > > > > say
> >> Flink
> >> > > > > version A plus Flink reliant dependencies released by other
> >> projects
> >> > > > > (Flink-external connectors, Beam, Iceberg, Hudi, ..). We
> >> > > > > don't
> >> want a
> >> > > > > situation where we bump the core Flink version to B and
> >> > > > > things
> >> fall
> >> > > > > apart (interface changes, utilities that were useful but not
> >> public,
> >> > > > > transitive dependencies etc.).
> >> > > > >
> >> > > > > The discussion here also highlights the benefits of keeping
> >> certain
> >> > > > > connectors outside Flink. Whether that is due to difference
> >> > > > > in developer community, maturity of the connectors, their
> >> > > > > specialized/limited usage etc. I would like to see that as a
> >> > > > > sign
> >> of
> >> > a
> >> > > > > growing ecosystem and most of the ideas that Arvid has put
> >> > > > > forward would benefit further growth of the connector ecosystem.
> >> > > > >
> >> > > > > As for keeping connectors within Apache Flink: I prefer that
> >> > > > > as
> >> the
> >> > > > > path forward for "essential" connectors like FileSource,
> >> KafkaSource,
> >> > > > > ... And we can still achieve a more flexible and faster
> >> > > > > release
> >> > cycle.
> >> > > > >
> >> > > > > Thanks,
> >> > > > > Thomas
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > On Wed, Oct 20, 2021 at 3:32 AM Jark Wu <[email protected]>
> wrote:
> >> > > > > >
> >> > > > > > Hi Konstantin,
> >> > > > > >
> >> > > > > > > the connectors need to be adopted and require at least
> >> > > > > > > one
> >> > release
> >> > > > per
> >> > > > > > Flink minor release.
> >> > > > > > However, this will make the releases of connectors slower,
> e.g.
> >> > > > maintain
> >> > > > > > features for multiple branches and release multiple branches.
> >> > > > > > I think the main purpose of having an external connector
> >> repository
> >> > > is
> >> > > > in
> >> > > > > > order to have "faster releases of connectors"?
> >> > > > > >
> >> > > > > >
> >> > > > > > From the perspective of CDC connector maintainers, the
> >> > > > > > biggest
> >> > > > advantage
> >> > > > > of
> >> > > > > > maintaining it outside of the Flink project is that:
> >> > > > > > 1) we can have a more flexible and faster release cycle
> >> > > > > > 2) we can be more liberal with committership for connector
> >> > > maintainers
> >> > > > > > which can also attract more committers to help the release.
> >> > > > > >
> >> > > > > > Personally, I think maintaining one connector repository
> >> > > > > > under
> >> the
> >> > > ASF
> >> > > > > may
> >> > > > > > not have the above benefits.
> >> > > > > >
> >> > > > > > Best,
> >> > > > > > Jark
> >> > > > > >
> >> > > > > > On Wed, 20 Oct 2021 at 15:14, Konstantin Knauf <
> >> [email protected]>
> >> > > > > wrote:
> >> > > > > >
> >> > > > > > > Hi everyone,
> >> > > > > > >
> >> > > > > > > regarding the stability of the APIs. I think everyone
> >> > > > > > > agrees
> >> that
> >> > > > > > > connector APIs which are stable across minor versions
> >> > (1.13->1.14)
> >> > > > are
> >> > > > > the
> >> > > > > > > mid-term goal. But:
> >> > > > > > >
> >> > > > > > > a) These APIs are still quite young, and we shouldn't
> >> > > > > > > make
> >> them
> >> > > > @Public
> >> > > > > > > prematurely either.
> >> > > > > > >
> >> > > > > > > b) Isn't this *mostly* orthogonal to where the connector
> >> > > > > > > code
> >> > > lives?
> >> > > > > Yes,
> >> > > > > > > as long as there are breaking changes, the connectors
> >> > > > > > > need to
> >> be
> >> > > > > adopted
> >> > > > > > > and require at least one release per Flink minor release.
> >> > > > > > > Documentation-wise this can be addressed via a
> >> > > > > > > compatibility
> >> > matrix
> >> > > > for
> >> > > > > > > each connector as Arvid suggested. IMO we shouldn't block
> >> > > > > > > this
> >> > > effort
> >> > > > > on
> >> > > > > > > the stability of the APIs.
> >> > > > > > >
> >> > > > > > > Cheers,
> >> > > > > > >
> >> > > > > > > Konstantin
> >> > > > > > >
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > On Wed, Oct 20, 2021 at 8:56 AM Jark Wu
> >> > > > > > > <[email protected]>
> >> > wrote:
> >> > > > > > >
> >> > > > > > >> Hi,
> >> > > > > > >>
> >> > > > > > >> I think Thomas raised very good questions and would like
> >> > > > > > >> to
> >> know
> >> > > > your
> >> > > > > > >> opinions if we want to move connectors out of flink in
> >> > > > > > >> this
> >> > > version.
> >> > > > > > >>
> >> > > > > > >> (1) is the connector API already stable?
> >> > > > > > >> > Separate releases would only make sense if the core
> >> > > > > > >> > Flink
> >> > > surface
> >> > > > is
> >> > > > > > >> > fairly stable though. As evident from Iceberg (and
> >> > > > > > >> > also
> >> Beam),
> >> > > > > that's
> >> > > > > > >> > not the case currently. We should probably focus on
> >> addressing
> >> > > the
> >> > > > > > >> > stability first, before splitting code. A success
> >> > > > > > >> > criteria
> >> > could
> >> > > > be
> >> > > > > > >> > that we are able to build Iceberg and Beam against
> >> > > > > > >> > multiple
> >> > > Flink
> >> > > > > > >> > versions w/o the need to change code. The goal would
> >> > > > > > >> > be
> >> that
> >> > no
> >> > > > > > >> > connector breaks when we make changes to Flink core.
> >> > > > > > >> > Until
> >> > > that's
> >> > > > > the
> >> > > > > > >> > case, code separation creates a setup where 1+1 or N+1
> >> > > > repositories
> >> > > > > > >> > need to move lock step.
> >> > > > > > >>
> >> > > > > > >> From another discussion thread [1], connector API is far
> >> > > > > > >> from
> >> > > > stable.
> >> > > > > > >> Currently, it's hard to build connectors against
> >> > > > > > >> multiple
> >> Flink
> >> > > > > versions.
> >> > > > > > >> There are breaking API changes both in 1.12 -> 1.13 and
> >> > > > > > >> 1.13
> >> ->
> >> > > 1.14
> >> > > > > and
> >> > > > > > >>  maybe also in the future versions,  because Table
> >> > > > > > >> related
> >> APIs
> >> > > are
> >> > > > > still
> >> > > > > > >> @PublicEvolving and new Sink API is still @Experimental.
> >> > > > > > >>
> >> > > > > > >>
> >> > > > > > >> (2) Flink testability without connectors.
> >> > > > > > >> > Flink w/o Kafka connector (and few others) isn't
> >> > > > > > >> > viable. Testability of Flink was already brought up,
> >> > > > > > >> > can we
> >> > > really
> >> > > > > > >> > certify a Flink core release without Kafka connector?
> >> > > > > > >> > Maybe
> >> > > those
> >> > > > > > >> > connectors that are used in Flink e2e tests to
> >> > > > > > >> > validate
> >> > > > > functionality
> >> > > > > > >> > of core Flink should not be broken out?
> >> > > > > > >>
> >> > > > > > >> This is a very good question. How can we guarantee the
> >> > > > > > >> new
> >> > Source
> >> > > > and
> >> > > > > Sink
> >> > > > > > >> API are stable with only test implementation?
> >> > > > > > >>
> >> > > > > > >>
> >> > > > > > >> Best,
> >> > > > > > >> Jark
> >> > > > > > >>
> >> > > > > > >>
> >> > > > > > >>
> >> > > > > > >>
> >> > > > > > >>
> >> > > > > > >> On Tue, 19 Oct 2021 at 23:56, Chesnay Schepler <
> >> > > [email protected]>
> >> > > > > > >> wrote:
> >> > > > > > >>
> >> > > > > > >> > Could you clarify what release cadence you're thinking
> of?
> >> > > There's
> >> > > > > quite
> >> > > > > > >> > a big range that fits "more frequent than Flink"
> >> (per-commit,
> >> > > > daily,
> >> > > > > > >> > weekly, bi-weekly, monthly, even bi-monthly).
> >> > > > > > >> >
> >> > > > > > >> > On 19/10/2021 14:15, Martijn Visser wrote:
> >> > > > > > >> > > Hi all,
> >> > > > > > >> > >
> >> > > > > > >> > > I think it would be a huge benefit if we can achieve
> >> > > > > > >> > > more
> >> > > > frequent
> >> > > > > > >> > releases
> >> > > > > > >> > > of connectors, which are not bound to the release
> >> > > > > > >> > > cycle
> >> of
> >> > > Flink
> >> > > > > > >> itself.
> >> > > > > > >> > I
> >> > > > > > >> > > agree that in order to get there, we need to have
> >> > > > > > >> > > stable
> >> > > > > interfaces
> >> > > > > > >> which
> >> > > > > > >> > > are trustworthy and reliable, so they can be safely
> >> > > > > > >> > > used
> >> by
> >> > > > those
> >> > > > > > >> > > connectors. I do think that work still needs to be
> >> > > > > > >> > > done
> >> on
> >> > > those
> >> > > > > > >> > > interfaces, but I am confident that we can get there
> >> from a
> >> > > > Flink
> >> > > > > > >> > > perspective.
> >> > > > > > >> > >
> >> > > > > > >> > > I am worried that we would not be able to achieve
> >> > > > > > >> > > those
> >> > > frequent
> >> > > > > > >> releases
> >> > > > > > >> > > of connectors if we are putting these connectors
> >> > > > > > >> > > under
> >> the
> >> > > > Apache
> >> > > > > > >> > umbrella,
> >> > > > > > >> > > because that means that for each connector release
> >> > > > > > >> > > we
> >> have
> >> > to
> >> > > > > follow
> >> > > > > > >> the
> >> > > > > > >> > > Apache release creation process. This requires a lot
> >> > > > > > >> > > of
> >> > manual
> >> > > > > steps
> >> > > > > > >> and
> >> > > > > > >> > > prohibits automation and I think it would be hard to
> >> scale
> >> > out
> >> > > > > > >> frequent
> >> > > > > > >> > > releases of connectors. I'm curious how others think
> >> > > > > > >> > > this
> >> > > > > challenge
> >> > > > > > >> could
> >> > > > > > >> > > be solved.
> >> > > > > > >> > >
> >> > > > > > >> > > Best regards,
> >> > > > > > >> > >
> >> > > > > > >> > > Martijn
> >> > > > > > >> > >
> >> > > > > > >> > > On Mon, 18 Oct 2021 at 22:22, Thomas Weise <
> >> [email protected]>
> >> > > > > wrote:
> >> > > > > > >> > >
> >> > > > > > >> > >> Thanks for initiating this discussion.
> >> > > > > > >> > >>
> >> > > > > > >> > >> There are definitely a few things that are not
> >> > > > > > >> > >> optimal
> >> with
> >> > > our
> >> > > > > > >> > >> current management of connectors. I would not
> >> necessarily
> >> > > > > > >> characterize
> >> > > > > > >> > >> it as a "mess" though. As the points raised so far
> >> show, it
> >> > > > isn't
> >> > > > > > >> easy
> >> > > > > > >> > >> to find a solution that balances competing
> >> > > > > > >> > >> requirements
> >> and
> >> > > > > leads to
> >> > > > > > >> a
> >> > > > > > >> > >> net improvement.
> >> > > > > > >> > >>
> >> > > > > > >> > >> It would be great if we can find a setup that
> >> > > > > > >> > >> allows for
> >> > > > > connectors
> >> > > > > > >> to
> >> > > > > > >> > >> be released independently of core Flink and that
> >> > > > > > >> > >> each
> >> > > connector
> >> > > > > can
> >> > > > > > >> be
> >> > > > > > >> > >> released separately. Flink already has separate
> >> > > > > > >> > >> releases (flink-shaded), so that by itself isn't a
> new thing.
> >> > > > > Per-connector
> >> > > > > > >> > >> releases would need to allow for more frequent
> >> > > > > > >> > >> releases
> >> > > > (without
> >> > > > > the
> >> > > > > > >> > >> baggage that a full Flink release comes with).
> >> > > > > > >> > >>
> >> > > > > > >> > >> Separate releases would only make sense if the core
> >> Flink
> >> > > > > surface is
> >> > > > > > >> > >> fairly stable though. As evident from Iceberg (and
> >> > > > > > >> > >> also
> >> > > Beam),
> >> > > > > that's
> >> > > > > > >> > >> not the case currently. We should probably focus on
> >> > > addressing
> >> > > > > the
> >> > > > > > >> > >> stability first, before splitting code. A success
> >> criteria
> >> > > > could
> >> > > > > be
> >> > > > > > >> > >> that we are able to build Iceberg and Beam against
> >> multiple
> >> > > > Flink
> >> > > > > > >> > >> versions w/o the need to change code. The goal
> >> > > > > > >> > >> would be
> >> > that
> >> > > no
> >> > > > > > >> > >> connector breaks when we make changes to Flink core.
> >> Until
> >> > > > > that's the
> >> > > > > > >> > >> case, code separation creates a setup where 1+1 or
> >> > > > > > >> > >> N+1
> >> > > > > repositories
> >> > > > > > >> > >> need to move lock step.
> >> > > > > > >> > >>
> >> > > > > > >> > >> Regarding some connectors being more important for
> >> > > > > > >> > >> Flink
> >> > than
> >> > > > > others:
> >> > > > > > >> > >> That's a fact. Flink w/o Kafka connector (and few
> >> others)
> >> > > isn't
> >> > > > > > >> > >> viable. Testability of Flink was already brought
> >> > > > > > >> > >> up,
> >> can we
> >> > > > > really
> >> > > > > > >> > >> certify a Flink core release without Kafka connector?
> >> Maybe
> >> > > > those
> >> > > > > > >> > >> connectors that are used in Flink e2e tests to
> >> > > > > > >> > >> validate
> >> > > > > functionality
> >> > > > > > >> > >> of core Flink should not be broken out?
> >> > > > > > >> > >>
> >> > > > > > >> > >> Finally, I think that the connectors that move into
> >> > separate
> >> > > > > repos
> >> > > > > > >> > >> should remain part of the Apache Flink project.
> >> > > > > > >> > >> Larger
> >> > > > > organizations
> >> > > > > > >> > >> tend to approve the use of and contribution to open
> >> source
> >> > at
> >> > > > the
> >> > > > > > >> > >> project level. Sometimes it is everything ASF. More
> >> often
> >> > it
> >> > > is
> >> > > > > > >> > >> "Apache Foo". It would be fatal to end up with a
> >> patchwork
> >> > of
> >> > > > > > >> projects
> >> > > > > > >> > >> with potentially different licenses and governance
> >> > > > > > >> > >> to
> >> > arrive
> >> > > > at a
> >> > > > > > >> > >> working Flink setup. This may mean we prioritize
> >> usability
> >> > > over
> >> > > > > > >> > >> developer convenience, if that's in the best
> >> > > > > > >> > >> interest of
> >> > > Flink
> >> > > > > as a
> >> > > > > > >> > >> whole.
> >> > > > > > >> > >>
> >> > > > > > >> > >> Thanks,
> >> > > > > > >> > >> Thomas
> >> > > > > > >> > >>
> >> > > > > > >> > >>
> >> > > > > > >> > >>
> >> > > > > > >> > >> On Mon, Oct 18, 2021 at 6:59 AM Chesnay Schepler <
> >> > > > > [email protected]
> >> > > > > > >> >
> >> > > > > > >> > >> wrote:
> >> > > > > > >> > >>> Generally, the issues are reproducibility and
> control.
> >> > > > > > >> > >>>
> >> > > > > > >> > >>> Stuffs completely broken on the Flink side for a
> week?
> >> > Well
> >> > > > > then so
> >> > > > > > >> are
> >> > > > > > >> > >>> the connector repos.
> >> > > > > > >> > >>> (As-is) You can't go back to a previous version of
> >> > > > > > >> > >>> the
> >> > > > snapshot.
> >> > > > > > >> Which
> >> > > > > > >> > >>> also means that checking out older commits can be
> >> > > problematic
> >> > > > > > >> because
> >> > > > > > >> > >>> you'd still work against the latest snapshots, and
> >> > > > > > >> > >>> they
> >> > not
> >> > > be
> >> > > > > > >> > >>> compatible with each other.
> >> > > > > > >> > >>>
> >> > > > > > >> > >>>
> >> > > > > > >> > >>> On 18/10/2021 15:22, Arvid Heise wrote:
> >> > > > > > >> > >>>> I was actually betting on snapshots versions.
> >> > > > > > >> > >>>> What are
> >> > the
> >> > > > > limits?
> >> > > > > > >> > >>>> Obviously, we can only do a release of a 1.15
> >> connector
> >> > > after
> >> > > > > 1.15
> >> > > > > > >> is
> >> > > > > > >> > >>>> release.
> >> > > > > > >> > >>>
> >> > > > > > >> >
> >> > > > > > >> >
> >> > > > > > >>
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > --
> >> > > > > > >
> >> > > > > > > Konstantin Knauf
> >> > > > > > >
> >> > > > > > > https://urldefense.com/v3/__https://twitter.com/snntrable
> >> > > > > > > __;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-
> >> > > > > > > XjpYgX5MUy9M4$ [twitter[.]com]
> >> > > > > > >
> >> > > > > > > https://urldefense.com/v3/__https://github.com/knaufk__;!
> >> > > > > > > !LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-XjpY
> >> > > > > > > gXyX8u50S$ [github[.]com]
> >> > > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
>

Re: [DISCUSS] Creating an external connector repository

Reply via email to