Re: [DISCUSS] Creating an external connector repository

Till Rohrmann Thu, 09 Dec 2021 08:43:42 -0800

+1 for the single repo approach.

Cheers,
Till


On Thu, Dec 9, 2021 at 3:54 PM Martijn Visser <[email protected]> wrote:

> I also agree that it feels more natural to go with a repo for each
> individual connector. Each repository can be made available at
> flink-packages.org so users can find them, next to referring to them in
> documentation. +1 from my side.
>
> On Thu, 9 Dec 2021 at 15:38, Arvid Heise <[email protected]> wrote:
>
> > Hi all,
> >
> > We tried out Chesnay's proposal and went with Option 2. Unfortunately, we
> > experienced tough nuts to crack and feel like we hit a dead end:
> > - The main pain point with the outlined Frankensteinian connector repo is
> > how to handle shared code / infra code. If we have it in some <common>
> > branch, then we need to merge the common branch in the connector branch
> on
> > update. However, it's unclear to me how improvements in the common branch
> > that naturally appear while working on a specific connector go back into
> > the common branch. You can't use a pull request from your branch or else
> > your connector code would poison the connector-less common branch. So you
> > would probably manually copy the files over to a common branch and
> create a
> > PR branch for that.
> > - A weird solution could be to have the common branch as a submodule in
> the
> > repo itself (if that's even possible). I'm sure that this setup would
> blow
> > up the minds of all newcomers.
> > - Similarly, it's mandatory to have safeguards against code from
> connector
> > A poisoning connector B, common, or main. I had some similar setup in the
> > past and code from two "distinct" branch types constantly swept over.
> > - We could also say that we simply release <common> independently and
> just
> > have a maven (SNAPSHOT) dependency on it. But that would create a weird
> > flow if you need to change in common where you need to constantly switch
> > branches back and forth.
> > - In general, Frankensteinian's approach is very switch intensive. If you
> > maintain 3 connectors and need to fix 1 build stability each at the same
> > time (quite common nowadays for some reason) and you have 2 review
> rounds,
> > you need to switch branches 9 times ignoring changes to common.
> >
> > Additionally, we still have the rather user/dev unfriendly main that is
> > mostly empty. I'm also not sure we can generate an overview README.md to
> > make it more friendly here because in theory every connector branch
> should
> > be based on main and we would get merge conflicts.
> >
> > I'd like to propose once again to go with individual repositories.
> > - The only downside that we discussed so far is that we have more initial
> > setup to do. Since we organically grow the number of
> connector/repositories
> > that load is quite distributed. We can offer templates after finding a
> good
> > approach that can even be used by outside organizations.
> > - Regarding secrets, I think it's actually an advantage that the Kafka
> > connector has no access to the AWS secrets. If there are secrets to be
> > shared across connectors, we can and should use Azure's Variable Groups
> (I
> > have used it in the past to share Nexus creds across repos). That would
> > also make rotation easy.
> > - Working on different connectors would be rather easy as all modern IDE
> > support multiple repo setups in the same project. You still need to do
> > multiple releases in case you update common code (either accessed through
> > Nexus or git submodule) and you want to release your connector.
> > - There is no difference in respect to how many CI runs there in both
> > approaches.
> > - Individual repositories also have the advantage of allowing external
> > incubation. Let's assume someone builds connector A and hosts it in their
> > organization (very common setup). If they want to contribute the code to
> > Flink, we could simply transfer the repository into ASF after ensuring
> > Flink coding standards. Then we retain git history and Github issues.
> >
> > Is there any point that I'm missing?
> >
> > On Fri, Nov 26, 2021 at 1:32 PM Chesnay Schepler <[email protected]>
> > wrote:
> >
> > > For sharing workflows we should be able to use composite actions. We'd
> > > have the main definition files in the flink-connectors repo, that we
> > > also need to tag/release, which other branches/repos can then import.
> > > These are also versioned, so we don't have to worry about accidentally
> > > breaking stuff.
> > > These could also be used to enforce certain standards / interfaces such
> > > that we can automate more things (e.g., integration into the Flink
> > > documentation).
> > >
> > > It is true that Option 2) and dedicated repositories share a lot of
> > > properties. While I did say in an offline conversation that we in that
> > > case might just as well use separate repositories, I'm not so sure
> > > anymore. One repo would make administration a bit easier, for example
> > > secrets wouldn't have to be applied to each repo (we wouldn't want
> > > certain secrets to be set up organization-wide).
> > > I overall also like that one repo would present a single access point;
> > > you can't "miss" a connector repo, and I would hope that having it as
> > > one repo would nurture more collaboration between the connectors, which
> > > after all need to solve similar problems.
> > >
> > > It is a fair point that the branching model would be quite weird, but I
> > > think that would subside pretty quickly.
> > >
> > > Personally I'd go with Option 2, and if that doesn't work out we can
> > > still split the repo later on. (Which should then be a trivial matter
> of
> > > copying all <connector>/* branches and renaming them).
> > >
> > > On 26/11/2021 12:47, Till Rohrmann wrote:
> > > > Hi Arvid,
> > > >
> > > > Thanks for updating this thread with the latest findings. The
> described
> > > > limitations for a single connector repo sound suboptimal to me.
> > > >
> > > > * Option 2. sounds as if we try to simulate multi connector repos
> > inside
> > > of
> > > > a single repo. I also don't know how we would share code between the
> > > > different branches (sharing infrastructure would probably be easier
> > > > though). This seems to have the same limitations as dedicated repos
> > with
> > > > the downside of having a not very intuitive branching model.
> > > > * Isn't option 1. kind of a degenerated version of option 2. where we
> > > have
> > > > some unrelated code from other connectors in the individual connector
> > > > branches?
> > > > * Option 3. has the downside that someone creating a release has to
> > > release
> > > > all connectors. This means that she either has to sync with the
> > different
> > > > connector maintainers or has to be able to release all connectors on
> > her
> > > > own. We are already seeing in the Flink community that releases
> require
> > > > quite good communication/coordination between the different people
> > > working
> > > > on different Flink components. Given our goals to make connector
> > releases
> > > > easier and more frequent, I think that coupling different connector
> > > > releases might be counter-productive.
> > > >
> > > > To me it sounds not very practical to mainly use a mono repository
> w/o
> > > > having some more advanced build infrastructure that, for example,
> > allows
> > > to
> > > > have different git roots in different connector directories. Maybe
> the
> > > mono
> > > > repo can be a catch all repository for connectors that want to be
> > > released
> > > > in lock-step (Option 3.) with all other connectors the repo contains.
> > But
> > > > for connectors that get changed frequently, having a dedicated
> > repository
> > > > that allows independent releases sounds preferable to me.
> > > >
> > > > What utilities and infrastructure code do you intend to share? Using
> > git
> > > > submodules can definitely be one option to share code. However, it
> > might
> > > > also be ok to depend on flink-connector-common artifacts which could
> > make
> > > > things easier. Where I am unsure is whether git submodules can be
> used
> > to
> > > > share infrastructure code (e.g. the .github/workflows) because you
> need
> > > > these files in the repo to trigger the CI infrastructure.
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Thu, Nov 25, 2021 at 1:59 PM Arvid Heise <[email protected]>
> wrote:
> > > >
> > > >> Hi Brian,
> > > >>
> > > >> Thank you for sharing. I think your approach is very valid and is in
> > > line
> > > >> with what I had in mind.
> > > >>
> > > >> Basically Pravega community aligns the connector releases with the
> > > Pravega
> > > >>> mainline release
> > > >>>
> > > >> This certainly would mean that there is little value in coupling
> > > connector
> > > >> versions. So it's making a good case for having separate connector
> > > repos.
> > > >>
> > > >>
> > > >>> and maintains the connector with the latest 3 Flink versions(CI
> will
> > > >>> publish snapshots for all these 3 branches)
> > > >>>
> > > >> I'd like to give connector devs a simple way to express to which
> Flink
> > > >> versions the current branch is compatible. From there we can
> generate
> > > the
> > > >> compatibility matrix automatically and optionally also create
> > different
> > > >> releases per supported Flink version. Not sure if the latter is
> indeed
> > > >> better than having just one artifact that happens to run with
> multiple
> > > >> Flink versions. I guess it depends on what dependencies we are
> > > exposing. If
> > > >> the connector uses flink-connector-base, then we probably need
> > separate
> > > >> artifacts with poms anyways.
> > > >>
> > > >> Best,
> > > >>
> > > >> Arvid
> > > >>
> > > >> On Fri, Nov 19, 2021 at 10:55 AM Zhou, Brian <[email protected]>
> wrote:
> > > >>
> > > >>> Hi Arvid,
> > > >>>
> > > >>> For branching model, the Pravega Flink connector has some
> experience
> > > what
> > > >>> I would like to share. Here[1][2] is the compatibility matrix and
> > wiki
> > > >>> explaining the branching model and releases. Basically Pravega
> > > community
> > > >>> aligns the connector releases with the Pravega mainline release,
> and
> > > >>> maintains the connector with the latest 3 Flink versions(CI will
> > > publish
> > > >>> snapshots for all these 3 branches).
> > > >>> For example, recently we have 0.10.1 release[3], and in maven
> central
> > > we
> > > >>> need to upload three artifacts(For Flink 1.13, 1.12, 1.11) for
> 0.10.1
> > > >>> version[4].
> > > >>>
> > > >>> There are some alternatives. Another solution that we once
> discussed
> > > but
> > > >>> finally got abandoned is to have a independent version just like
> the
> > > >>> current CDC connector, and then give a big compatibility matrix to
> > > users.
> > > >>> We think it would be too confusing when the connector develops. On
> > the
> > > >>> contrary, we can also do the opposite way to align with Flink
> version
> > > and
> > > >>> maintain several branches for different system version.
> > > >>>
> > > >>> I would say this is only a fairly-OK solution because it is a bit
> > > painful
> > > >>> for maintainers as cherry-picks are very common and releases would
> > > >> require
> > > >>> much work. However, if neither systems do not have a nice backward
> > > >>> compatibility, there seems to be no comfortable solution to the
> their
> > > >>> connector.
> > > >>>
> > > >>> [1]
> https://github.com/pravega/flink-connectors#compatibility-matrix
> > > >>> [2]
> > > >>>
> > > >>
> > >
> >
> https://github.com/pravega/flink-connectors/wiki/Versioning-strategy-for-Flink-connector
> > > >>> [3]
> https://github.com/pravega/flink-connectors/releases/tag/v0.10.1
> > > >>> [4] https://search.maven.org/search?q=pravega-connectors-flink
> > > >>>
> > > >>> Best Regards,
> > > >>> Brian
> > > >>>
> > > >>>
> > > >>> Internal Use - Confidential
> > > >>>
> > > >>> -----Original Message-----
> > > >>> From: Arvid Heise <[email protected]>
> > > >>> Sent: Friday, November 19, 2021 4:12 PM
> > > >>> To: dev
> > > >>> Subject: Re: [DISCUSS] Creating an external connector repository
> > > >>>
> > > >>>
> > > >>> [EXTERNAL EMAIL]
> > > >>>
> > > >>> Hi everyone,
> > > >>>
> > > >>> we are currently in the process of setting up the flink-connectors
> > repo
> > > >>> [1] for new connectors but we hit a wall that we currently cannot
> > take:
> > > >>> branching model.
> > > >>> To reiterate the original motivation of the external connector
> repo:
> > We
> > > >>> want to decouple the release cycle of a connector with Flink.
> > However,
> > > if
> > > >>> we want to support semantic versioning in the connectors with the
> > > ability
> > > >>> to introduce breaking changes through major version bumps and
> support
> > > >>> bugfixes on old versions, then we need release branches similar to
> > how
> > > >>> Flink core operates.
> > > >>> Consider two connectors, let's call them kafka and hbase. We have
> > kafka
> > > >> in
> > > >>> version 1.0.X, 1.1.Y (small improvement), 2.0.Z (config option)
> > change
> > > >> and
> > > >>> hbase only on 1.0.A.
> > > >>>
> > > >>> Now our current assumption was that we can work with a mono-repo
> > under
> > > >> ASF
> > > >>> (flink-connectors). Then, for release-branches, we found 3 options:
> > > >>> 1. We would need to create some ugly mess with the cross product of
> > > >>> connector and version: so you have kafka-release-1.0,
> > > kafka-release-1.1,
> > > >>> kafka-release-2.0, hbase-release-1.0. The main issue is not the
> > amount
> > > of
> > > >>> branches (that's something that git can handle) but there the state
> > of
> > > >>> kafka is undefined in hbase-release-1.0. That's a call for desaster
> > and
> > > >>> makes releasing connectors very cumbersome (CI would only execute
> and
> > > >>> publish hbase SNAPSHOTS on hbase-release-1.0).
> > > >>> 2. We could avoid the undefined state by having an empty master and
> > > each
> > > >>> release branch really only holds the code of the connector. But
> > that's
> > > >> also
> > > >>> not great: any user that looks at the repo and sees no connector
> > would
> > > >>> assume that it's dead.
> > > >>> 3. We could have synced releases similar to the CDC connectors [2].
> > > That
> > > >>> means that if any connector introduces a breaking change, all
> > > connectors
> > > >>> get a new major. I find that quite confusing to a user if hbase
> gets
> > a
> > > >> new
> > > >>> release without any change because kafka introduced a breaking
> > change.
> > > >>>
> > > >>> To fully decouple release cycles and CI of connectors, we could add
> > > >>> individual repositories under ASF (flink-connector-kafka,
> > > >>> flink-connector-hbase). Then we can apply the same branching model
> as
> > > >>> before. I quickly checked if there are precedences in the apache
> > > >> community
> > > >>> for that approach and just by scanning alphabetically I found
> cordova
> > > >> with
> > > >>> 70 and couchdb with 77 apache repos respectively. So it certainly
> > seems
> > > >>> like other projects approached our problem in that way and the
> apache
> > > >>> organization is okay with that. I currently expect max 20
> additional
> > > >> repos
> > > >>> for connectors and in the future 10 max each for formats and
> > > filesystems
> > > >> if
> > > >>> we would also move them out at some point in time. So we would be
> at
> > a
> > > >>> total of 50 repos.
> > > >>>
> > > >>> Note for all options, we need to provide a compability matrix that
> we
> > > aim
> > > >>> to autogenerate.
> > > >>>
> > > >>> Now for the potential downsides that we internally discussed:
> > > >>> - How can we ensure common infra structure code, utilties, and
> > quality?
> > > >>> I propose to add a flink-connector-common that contains all these
> > > things
> > > >>> and is added as a git submodule/subtree to the repos.
> > > >>> - Do we implicitly discourage connector developers to maintain more
> > > than
> > > >>> one connector with a fragmented code base?
> > > >>> That is certainly a risk. However, I currently also see few devs
> > > working
> > > >>> on more than one connector. However, it may actually help keeping
> the
> > > >> devs
> > > >>> that maintain a specific connector on the hook. We could use github
> > > >> issues
> > > >>> to track bugs and feature requests and a dev can focus his limited
> > time
> > > >> on
> > > >>> getting that one connector right.
> > > >>>
> > > >>> So WDYT? Compared to some intermediate suggestions with split
> repos,
> > > the
> > > >>> big difference is that everything remains under Apache umbrella and
> > the
> > > >>> Flink community.
> > > >>>
> > > >>> [1]
> > > >>>
> > > >>
> > >
> >
> https://urldefense.com/v3/__https://github.com/apache/flink-connectors__;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-XjpYgXzxxweh4$
> > > >>> [github[.]com] [2]
> > > >>>
> > > >>
> > >
> >
> https://urldefense.com/v3/__https://github.com/ververica/flink-cdc-connectors/__;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-XjpYgXzgoPGA8$
> > > >>> [github[.]com]
> > > >>>
> > > >>> On Fri, Nov 12, 2021 at 3:39 PM Arvid Heise <[email protected]>
> > wrote:
> > > >>>
> > > >>>> Hi everyone,
> > > >>>>
> > > >>>> I created the flink-connectors repo [1] to advance the topic. We
> > would
> > > >>>> create a proof-of-concept in the next few weeks as a special
> branch
> > > >>>> that I'd then use for discussions. If the community agrees with
> the
> > > >>>> approach, that special branch will become the master. If not, we
> can
> > > >>>> reiterate over it or create competing POCs.
> > > >>>>
> > > >>>> If someone wants to try things out in parallel, just make sure
> that
> > > >>>> you are not accidentally pushing POCs to the master.
> > > >>>>
> > > >>>> As a reminder: We will not move out any current connector from
> Flink
> > > >>>> at this point in time, so everything in Flink will remain as is
> and
> > be
> > > >>>> maintained there.
> > > >>>>
> > > >>>> Best,
> > > >>>>
> > > >>>> Arvid
> > > >>>>
> > > >>>> [1]
> > > >>>>
> > > https://urldefense.com/v3/__https://github.com/apache/flink-connectors
> > > >>>>
> > __;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-XjpYgXzxxweh4
> > > >>>> $ [github[.]com]
> > > >>>>
> > > >>>> On Fri, Oct 29, 2021 at 6:57 PM Till Rohrmann <
> [email protected]
> > >
> > > >>>> wrote:
> > > >>>>
> > > >>>>> Hi everyone,
> > > >>>>>
> > > >>>>>  From the discussion, it seems to me that we have different
> > opinions
> > > >>>>> whether to have an ASF umbrella repository or to host them
> outside
> > of
> > > >>>>> the ASF. It also seems that this is not really the problem to
> > solve.
> > > >>>>> Since there are many good arguments for either approach, we could
> > > >>>>> simply start with an ASF umbrella repository and see how people
> > adopt
> > > >>>>> it. If the individual connectors cannot move fast enough or if
> > people
> > > >>>>> prefer to not buy into the more heavy-weight ASF processes, then
> > they
> > > >>>>> can host the code also somewhere else. We simply need to make
> sure
> > > >>>>> that these connectors are discoverable (e.g. via flink-packages).
> > > >>>>>
> > > >>>>> The more important problem seems to be to provide common tooling
> > > >>>>> (testing, infrastructure, documentation) that can easily be
> reused.
> > > >>>>> Similarly, it has become clear that the Flink community needs to
> > > >>>>> improve on providing stable APIs. I think it is not realistic to
> > > >>>>> first complete these tasks before starting to move connectors to
> > > >>>>> dedicated repositories. As Stephan said, creating a connector
> > > >>>>> repository will force us to pay more attention to API stability
> and
> > > >>>>> also to think about which testing tools are required. Hence, I
> > > >>>>> believe that starting to add connectors to a different repository
> > > >>>>> than apache/flink will help improve our connector tooling
> > (declaring
> > > >>>>> testing classes as public, creating a common test utility repo,
> > > >>>>> creating a repo
> > > >>>>> template) and vice versa. Hence, I like Arvid's proposed process
> as
> > > >>>>> it will start kicking things off w/o letting this effort fizzle
> > out.
> > > >>>>>
> > > >>>>> Cheers,
> > > >>>>> Till
> > > >>>>>
> > > >>>>> On Thu, Oct 28, 2021 at 11:44 AM Stephan Ewen <[email protected]>
> > > >> wrote:
> > > >>>>>> Thank you all, for the nice discussion!
> > > >>>>>>
> > > >>>>>>  From my point of view, I very much like the idea of putting
> > > >>>>>> connectors
> > > >>>>> in a
> > > >>>>>> separate repository. But I would argue it should be part of
> Apache
> > > >>>>> Flink,
> > > >>>>>> similar to flink-statefun, flink-ml, etc.
> > > >>>>>>
> > > >>>>>> I share many of the reasons for that:
> > > >>>>>>    - As argued many times, reduces complexity of the Flink repo,
> > > >>>>> increases
> > > >>>>>> response times of CI, etc.
> > > >>>>>>    - Much lower barrier of contribution, because an unstable
> > > >>>>>> connector
> > > >>>>> would
> > > >>>>>> not de-stabilize the whole build. Of course, we would need to
> make
> > > >>>>>> sure
> > > >>>>> we
> > > >>>>>> set this up the right way, with connectors having individual CI
> > > >>>>>> runs,
> > > >>>>> build
> > > >>>>>> status, etc. But it certainly seems possible.
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> I would argue some points a bit different than some cases made
> > > >> before:
> > > >>>>>> (a) I believe the separation would increase connector stability.
> > > >>>>> Because it
> > > >>>>>> really forces us to work with the connectors against the APIs
> like
> > > >>>>>> any external developer. A mono repo is somehow the wrong thing
> if
> > > >>>>>> you in practice want to actually guarantee stable internal APIs
> at
> > > >>> some layer.
> > > >>>>>> Because the mono repo makes it easy to just change something on
> > > >>>>>> both
> > > >>>>> sides
> > > >>>>>> of the API (provider and consumer) seamlessly.
> > > >>>>>>
> > > >>>>>> Major refactorings in Flink need to keep all connector API
> > > >>>>>> contracts intact, or we need to have a new version of the
> > connector
> > > >>> API.
> > > >>>>>> (b) We may even be able to go towards more lightweight and
> > > >>>>>> automated releases over time, even if we stay in Apache Flink
> with
> > > >>> that repo.
> > > >>>>>> This isn't yet fully aligned with the Apache release policies,
> > yet,
> > > >>>>>> but there are board discussions about whether there can be
> > > >>>>>> bot-triggered releases (by dependabot) and how that could fit
> into
> > > >>> the Apache process.
> > > >>>>>> This doesn't seem to be quite there just yet, but seeing that
> > those
> > > >>>>> start
> > > >>>>>> is a good sign, and there is a good chance we can do some things
> > > >>> there.
> > > >>>>>> I am not sure whether we should let bots trigger releases,
> because
> > > >>>>>> a
> > > >>>>> final
> > > >>>>>> human look at things isn't a bad thing, especially given the
> > > >>>>>> popularity
> > > >>>>> of
> > > >>>>>> software supply chain attacks recently.
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> I do share Chesnay's concerns about complexity in tooling,
> though.
> > > >>>>>> Both release tooling and test tooling. They are not incompatible
> > > >>>>>> with that approach, but they are a task we need to tackle during
> > > >>>>>> this change which will add additional work.
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> On Tue, Oct 26, 2021 at 10:31 AM Arvid Heise <[email protected]>
> > > >>> wrote:
> > > >>>>>>> Hi folks,
> > > >>>>>>>
> > > >>>>>>> I think some questions came up and I'd like to address the
> > > >>>>>>> question of
> > > >>>>>> the
> > > >>>>>>> timing.
> > > >>>>>>>
> > > >>>>>>> Could you clarify what release cadence you're thinking of?
> > > >>>>>>> There's
> > > >>>>> quite
> > > >>>>>>>> a big range that fits "more frequent than Flink" (per-commit,
> > > >>>>>>>> daily, weekly, bi-weekly, monthly, even bi-monthly).
> > > >>>>>>> The short answer is: as often as needed:
> > > >>>>>>> - If there is a CVE in a dependency and we need to bump it -
> > > >>>>>>> release immediately.
> > > >>>>>>> - If there is a new feature merged, release soonish. We may
> > > >>>>>>> collect a
> > > >>>>> few
> > > >>>>>>> successive features before a release.
> > > >>>>>>> - If there is a bugfix, release immediately or soonish
> depending
> > > >>>>>>> on
> > > >>>>> the
> > > >>>>>>> severity and if there are workarounds available.
> > > >>>>>>>
> > > >>>>>>> We should not limit ourselves; the whole idea of independent
> > > >>>>>>> releases
> > > >>>>> is
> > > >>>>>>> exactly that you release as needed. There is no release
> planning
> > > >>>>>>> or anything needed, you just go with a release as if it was an
> > > >>>>>>> external artifact.
> > > >>>>>>>
> > > >>>>>>> (1) is the connector API already stable?
> > > >>>>>>>>  From another discussion thread [1], connector API is far from
> > > >>>>> stable.
> > > >>>>>>>> Currently, it's hard to build connectors against multiple
> Flink
> > > >>>>>> versions.
> > > >>>>>>>> There are breaking API changes both in 1.12 -> 1.13 and 1.13
> ->
> > > >>>>>>>> 1.14
> > > >>>>>> and
> > > >>>>>>>>   maybe also in the future versions,  because Table related
> APIs
> > > >>>>>>>> are
> > > >>>>>> still
> > > >>>>>>>> @PublicEvolving and new Sink API is still @Experimental.
> > > >>>>>>>>
> > > >>>>>>> The question is: what is stable in an evolving system? We
> > > >>>>>>> recently discovered that the old SourceFunction needed to be
> > > >>>>>>> refined such that cancellation works correctly [1]. So that
> > > >>>>>>> interface is in Flink since
> > > >>>>> 7
> > > >>>>>>> years, heavily used also outside, and we still had to change
> the
> > > >>>>> contract
> > > >>>>>>> in a way that I'd expect any implementer to recheck their
> > > >>>>> implementation.
> > > >>>>>>> It might not be necessary to change anything and you can
> probably
> > > >>>>> change
> > > >>>>>>> the the code for all Flink versions but still, the interface
> was
> > > >>>>>>> not
> > > >>>>>> stable
> > > >>>>>>> in the closest sense.
> > > >>>>>>>
> > > >>>>>>> If we focus just on API changes on the unified interfaces, then
> > > >>>>>>> we
> > > >>>>> expect
> > > >>>>>>> one more change to Sink API to support compaction. For Table
> API,
> > > >>>>> there
> > > >>>>>>> will most likely also be some changes in 1.15. So we could wait
> > > >>>>>>> for
> > > >>>>> 1.15.
> > > >>>>>>> But I'm questioning if that's really necessary because we will
> > > >>>>>>> add
> > > >>>>> more
> > > >>>>>>> functionality beyond 1.15 without breaking API. For example, we
> > > >>>>>>> may
> > > >>>>> add
> > > >>>>>>> more unified connector metrics. If you want to use it in your
> > > >>>>> connector,
> > > >>>>>>> you have to support multiple Flink versions anyhow. So rather
> > > >>>>>>> then
> > > >>>>>> focusing
> > > >>>>>>> the discussion on "when is stuff stable", I'd rather focus on
> > > >>>>>>> "how
> > > >>>>> can we
> > > >>>>>>> support building connectors against multiple Flink versions"
> and
> > > >>>>>>> make
> > > >>>>> it
> > > >>>>>> as
> > > >>>>>>> painless as possible.
> > > >>>>>>>
> > > >>>>>>> Chesnay pointed out to use different branches for different
> Flink
> > > >>>>>> versions
> > > >>>>>>> which sounds like a good suggestion. With a mono-repo, we can't
> > > >>>>>>> use branches differently anyways (there is no way to have
> release
> > > >>>>>>> branches
> > > >>>>>> per
> > > >>>>>>> connector without chaos). In these branches, we could provide
> > > >>>>>>> shims to simulate future features in older Flink versions such
> > > >>>>>>> that code-wise,
> > > >>>>> the
> > > >>>>>>> source code of a specific connector may not diverge (much). For
> > > >>>>> example,
> > > >>>>>> to
> > > >>>>>>> register unified connector metrics, we could simulate the
> current
> > > >>>>>> approach
> > > >>>>>>> also in some utility package of the mono-repo.
> > > >>>>>>>
> > > >>>>>>> I see the stable core Flink API as a prerequisite for
> modularity.
> > > >>>>>>> And
> > > >>>>>>>> for connectors it is not just the source and sink API (source
> > > >>>>>>>> being stable as of 1.14), but everything that is required to
> > > >>>>>>>> build and maintain a connector downstream, such as the test
> > > >>>>>>>> utilities and infrastructure.
> > > >>>>>>>>
> > > >>>>>>> That is a very fair point. I'm actually surprised to see that
> > > >>>>>>> MiniClusterWithClientResource is not public. I see it being
> used
> > > >>>>>>> in
> > > >>>>> all
> > > >>>>>>> connectors, especially outside of Flink. I fear that as long as
> > > >>>>>>> we do
> > > >>>>> not
> > > >>>>>>> have connectors outside, we will not properly annotate and
> > > >>>>>>> maintain
> > > >>>>> these
> > > >>>>>>> utilties in a classic hen-and-egg-problem. I will outline an
> idea
> > > >>>>>>> at
> > > >>>>> the
> > > >>>>>>> end.
> > > >>>>>>>
> > > >>>>>>>> the connectors need to be adopted and require at least one
> > > >>>>>>>> release
> > > >>>>> per
> > > >>>>>>>> Flink minor release.
> > > >>>>>>>> However, this will make the releases of connectors slower,
> e.g.
> > > >>>>>> maintain
> > > >>>>>>>> features for multiple branches and release multiple branches.
> > > >>>>>>>> I think the main purpose of having an external connector
> > > >>>>>>>> repository
> > > >>>>> is
> > > >>>>>> in
> > > >>>>>>>> order to have "faster releases of connectors"?
> > > >>>>>>>>
> > > >>>>>>>> Imagine a project with a complex set of dependencies. Let's
> say
> > > >>>>> Flink
> > > >>>>>>>> version A plus Flink reliant dependencies released by other
> > > >>>>>>>> projects (Flink-external connectors, Beam, Iceberg, Hudi, ..).
> > > >>>>>>>> We don't want
> > > >>>>> a
> > > >>>>>>>> situation where we bump the core Flink version to B and things
> > > >>>>>>>> fall apart (interface changes, utilities that were useful but
> > > >>>>>>>> not public, transitive dependencies etc.).
> > > >>>>>>>>
> > > >>>>>>> Yes, that's why I wanted to automate the processes more which
> is
> > > >>>>>>> not
> > > >>>>> that
> > > >>>>>>> easy under ASF. Maybe we automate the source provision across
> > > >>>>> supported
> > > >>>>>>> versions and have 1 vote thread for all versions of a
> connector?
> > > >>>>>>>
> > > >>>>>>>  From the perspective of CDC connector maintainers, the biggest
> > > >>>>> advantage
> > > >>>>>> of
> > > >>>>>>>> maintaining it outside of the Flink project is that:
> > > >>>>>>>> 1) we can have a more flexible and faster release cycle
> > > >>>>>>>> 2) we can be more liberal with committership for connector
> > > >>>>> maintainers
> > > >>>>>>>> which can also attract more committers to help the release.
> > > >>>>>>>>
> > > >>>>>>>> Personally, I think maintaining one connector repository under
> > > >>>>>>>> the
> > > >>>>> ASF
> > > >>>>>>> may
> > > >>>>>>>> not have the above benefits.
> > > >>>>>>>>
> > > >>>>>>> Yes, I also feel that ASF is too restrictive for our needs. But
> > > >>>>>>> it
> > > >>>>> feels
> > > >>>>>>> like there are too many that see it differently and I think we
> > > >>>>>>> need
> > > >>>>>>>
> > > >>>>>>> (2) Flink testability without connectors.
> > > >>>>>>>> This is a very good question. How can we guarantee the new
> > > >>>>>>>> Source
> > > >>>>> and
> > > >>>>>>> Sink
> > > >>>>>>>> API are stable with only test implementation?
> > > >>>>>>>>
> > > >>>>>>> We can't and shouldn't. Since the connector repo is managed by
> > > >>>>>>> Flink,
> > > >>>>> a
> > > >>>>>>> Flink release manager needs to check if the Flink connectors
> are
> > > >>>>> actually
> > > >>>>>>> working prior to creating an RC. That's similar to how
> > > >>>>>>> flink-shaded
> > > >>>>> and
> > > >>>>>>> flink core are related.
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> So here is one idea that I had to get things rolling. We are
> > > >>>>>>> going to address the external repo iteratively without
> > > >>>>>>> compromising what we
> > > >>>>>> already
> > > >>>>>>> have:
> > > >>>>>>> 1.Phase, add new contributions to external repo. We use that
> time
> > > >>>>>>> to
> > > >>>>>> setup
> > > >>>>>>> infra accordingly and optimize release processes. We will
> > > >>>>>>> identify
> > > >>>>> test
> > > >>>>>>> utilities that are not yet public/stable and fix that.
> > > >>>>>>> 2.Phase, add ports to the new unified interfaces of existing
> > > >>>>> connectors.
> > > >>>>>>> That requires a previous Flink release to make utilities
> stable.
> > > >>>>>>> Keep
> > > >>>>> old
> > > >>>>>>> interfaces in flink-core.
> > > >>>>>>> 3.Phase, remove old interfaces in flink-core of some connectors
> > > >>>>>>> (tbd
> > > >>>>> at a
> > > >>>>>>> later point).
> > > >>>>>>> 4.Phase, optionally move all remaining connectors (tbd at a
> later
> > > >>>>> point).
> > > >>>>>>> I'd envision having ~3 months between the starting the
> different
> > > >>>>> phases.
> > > >>>>>>> WDYT?
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> [1]
> > > >>>>>>>
> > https://urldefense.com/v3/__https://issues.apache.org/jira/browse
> > > >>>>>>>
> /FLINK-23527__;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgd
> > > >>>>>>> ke_-XjpYgX2sIvAP4$ [issues[.]apache[.]org]
> > > >>>>>>>
> > > >>>>>>> On Thu, Oct 21, 2021 at 7:12 AM Kyle Bendickson <
> [email protected]
> > >
> > > >>>>> wrote:
> > > >>>>>>>> Hi all,
> > > >>>>>>>>
> > > >>>>>>>> My name is Kyle and I’m an open source developer primarily
> > > >>>>>>>> focused
> > > >>>>> on
> > > >>>>>>>> Apache Iceberg.
> > > >>>>>>>>
> > > >>>>>>>> I’m happy to help clarify or elaborate on any aspect of our
> > > >>>>> experience
> > > >>>>>>>> working on a relatively decoupled connector that is downstream
> > > >>>>>>>> and
> > > >>>>>> pretty
> > > >>>>>>>> popular.
> > > >>>>>>>>
> > > >>>>>>>> I’d also love to be able to contribute or assist in any way I
> > > >> can.
> > > >>>>>>>> I don’t mean to thread jack, but are there any meetings or
> > > >>>>>>>> community
> > > >>>>>> sync
> > > >>>>>>>> ups, specifically around the connector APIs, that I might join
> > > >>>>>>>> / be
> > > >>>>>>> invited
> > > >>>>>>>> to?
> > > >>>>>>>>
> > > >>>>>>>> I did want to add that even though I’ve experienced some of
> the
> > > >>>>>>>> pain
> > > >>>>>>> points
> > > >>>>>>>> of integrating with an evolving system / API (catalog support
> > > >>>>>>>> is
> > > >>>>>>> generally
> > > >>>>>>>> speaking pretty new everywhere really in this space), I also
> > > >>>>>>>> agree personally that you shouldn’t slow down development
> > > >>>>>>>> velocity too
> > > >>>>> much
> > > >>>>>> for
> > > >>>>>>>> the sake of external connector. Getting to a performant and
> > > >>>>>>>> stable
> > > >>>>>> place
> > > >>>>>>>> should be the primary goal, and slowing that down to support
> > > >>>>> stragglers
> > > >>>>>>>> will (in my personal opinion) always be a losing game. Some
> > > >>>>>>>> folks
> > > >>>>> will
> > > >>>>>>>> simply stay behind on versions regardless until they have to
> > > >>>>> upgrade.
> > > >>>>>>>> I am working on ensuring that the Iceberg community stays
> > > >>>>>>>> within 1-2 versions of Flink, so that we can help provide more
> > > >>>>>>>> feedback or
> > > >>>>>>> contribute
> > > >>>>>>>> things that might make our ability to support multiple Flink
> > > >>>>> runtimes /
> > > >>>>>>>> versions with one project / codebase and minimal to no
> > > >>>>>>>> reflection
> > > >>>>> (our
> > > >>>>>>>> desired goal).
> > > >>>>>>>>
> > > >>>>>>>> If there’s anything I can do or any way I can be of
> assistance,
> > > >>>>> please
> > > >>>>>>>> don’t hesitate to reach out. Or find me on ASF slack 😀
> > > >>>>>>>>
> > > >>>>>>>> I greatly appreciate your general concern for the needs of
> > > >>>>> downstream
> > > >>>>>>>> connector integrators!
> > > >>>>>>>>
> > > >>>>>>>> Cheers
> > > >>>>>>>> Kyle Bendickson (GitHub: kbendick) Open Source Developer kyle
> > > >>>>>>>> [at] tabular [dot] io
> > > >>>>>>>>
> > > >>>>>>>> On Wed, Oct 20, 2021 at 11:35 AM Thomas Weise <[email protected]
> >
> > > >>>>> wrote:
> > > >>>>>>>>> Hi,
> > > >>>>>>>>>
> > > >>>>>>>>> I see the stable core Flink API as a prerequisite for
> > > >>> modularity.
> > > >>>>> And
> > > >>>>>>>>> for connectors it is not just the source and sink API (source
> > > >>>>> being
> > > >>>>>>>>> stable as of 1.14), but everything that is required to build
> > > >>>>>>>>> and maintain a connector downstream, such as the test
> > > >>>>>>>>> utilities and infrastructure.
> > > >>>>>>>>>
> > > >>>>>>>>> Without the stable surface of core Flink, changes will leak
> > > >>>>>>>>> into downstream dependencies and force lock step updates.
> > > >>>>>>>>> Refactoring across N repos is more painful than a single
> > > >>>>>>>>> repo. Those with experience developing downstream of Flink
> > > >>>>>>>>> will know the pain, and
> > > >>>>>> that
> > > >>>>>>>>> isn't limited to connectors. I don't remember a Flink "minor
> > > >>>>> version"
> > > >>>>>>>>> update that was just a dependency version change and did not
> > > >>>>>>>>> force other downstream changes.
> > > >>>>>>>>>
> > > >>>>>>>>> Imagine a project with a complex set of dependencies. Let's
> > > >>>>>>>>> say
> > > >>>>> Flink
> > > >>>>>>>>> version A plus Flink reliant dependencies released by other
> > > >>>>> projects
> > > >>>>>>>>> (Flink-external connectors, Beam, Iceberg, Hudi, ..). We
> > > >>>>>>>>> don't
> > > >>>>> want a
> > > >>>>>>>>> situation where we bump the core Flink version to B and
> > > >>>>>>>>> things
> > > >>>>> fall
> > > >>>>>>>>> apart (interface changes, utilities that were useful but not
> > > >>>>> public,
> > > >>>>>>>>> transitive dependencies etc.).
> > > >>>>>>>>>
> > > >>>>>>>>> The discussion here also highlights the benefits of keeping
> > > >>>>> certain
> > > >>>>>>>>> connectors outside Flink. Whether that is due to difference
> > > >>>>>>>>> in developer community, maturity of the connectors, their
> > > >>>>>>>>> specialized/limited usage etc. I would like to see that as a
> > > >>>>>>>>> sign
> > > >>>>> of
> > > >>>>>> a
> > > >>>>>>>>> growing ecosystem and most of the ideas that Arvid has put
> > > >>>>>>>>> forward would benefit further growth of the connector
> > > >> ecosystem.
> > > >>>>>>>>> As for keeping connectors within Apache Flink: I prefer that
> > > >>>>>>>>> as
> > > >>>>> the
> > > >>>>>>>>> path forward for "essential" connectors like FileSource,
> > > >>>>> KafkaSource,
> > > >>>>>>>>> ... And we can still achieve a more flexible and faster
> > > >>>>>>>>> release
> > > >>>>>> cycle.
> > > >>>>>>>>> Thanks,
> > > >>>>>>>>> Thomas
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>> On Wed, Oct 20, 2021 at 3:32 AM Jark Wu <[email protected]>
> > > >>> wrote:
> > > >>>>>>>>>> Hi Konstantin,
> > > >>>>>>>>>>
> > > >>>>>>>>>>> the connectors need to be adopted and require at least
> > > >>>>>>>>>>> one
> > > >>>>>> release
> > > >>>>>>>> per
> > > >>>>>>>>>> Flink minor release.
> > > >>>>>>>>>> However, this will make the releases of connectors slower,
> > > >>> e.g.
> > > >>>>>>>> maintain
> > > >>>>>>>>>> features for multiple branches and release multiple
> > > >> branches.
> > > >>>>>>>>>> I think the main purpose of having an external connector
> > > >>>>> repository
> > > >>>>>>> is
> > > >>>>>>>> in
> > > >>>>>>>>>> order to have "faster releases of connectors"?
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>>  From the perspective of CDC connector maintainers, the
> > > >>>>>>>>>> biggest
> > > >>>>>>>> advantage
> > > >>>>>>>>> of
> > > >>>>>>>>>> maintaining it outside of the Flink project is that:
> > > >>>>>>>>>> 1) we can have a more flexible and faster release cycle
> > > >>>>>>>>>> 2) we can be more liberal with committership for connector
> > > >>>>>>> maintainers
> > > >>>>>>>>>> which can also attract more committers to help the release.
> > > >>>>>>>>>>
> > > >>>>>>>>>> Personally, I think maintaining one connector repository
> > > >>>>>>>>>> under
> > > >>>>> the
> > > >>>>>>> ASF
> > > >>>>>>>>> may
> > > >>>>>>>>>> not have the above benefits.
> > > >>>>>>>>>>
> > > >>>>>>>>>> Best,
> > > >>>>>>>>>> Jark
> > > >>>>>>>>>>
> > > >>>>>>>>>> On Wed, 20 Oct 2021 at 15:14, Konstantin Knauf <
> > > >>>>> [email protected]>
> > > >>>>>>>>> wrote:
> > > >>>>>>>>>>> Hi everyone,
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> regarding the stability of the APIs. I think everyone
> > > >>>>>>>>>>> agrees
> > > >>>>> that
> > > >>>>>>>>>>> connector APIs which are stable across minor versions
> > > >>>>>> (1.13->1.14)
> > > >>>>>>>> are
> > > >>>>>>>>> the
> > > >>>>>>>>>>> mid-term goal. But:
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> a) These APIs are still quite young, and we shouldn't
> > > >>>>>>>>>>> make
> > > >>>>> them
> > > >>>>>>>> @Public
> > > >>>>>>>>>>> prematurely either.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> b) Isn't this *mostly* orthogonal to where the connector
> > > >>>>>>>>>>> code
> > > >>>>>>> lives?
> > > >>>>>>>>> Yes,
> > > >>>>>>>>>>> as long as there are breaking changes, the connectors
> > > >>>>>>>>>>> need to
> > > >>>>> be
> > > >>>>>>>>> adopted
> > > >>>>>>>>>>> and require at least one release per Flink minor release.
> > > >>>>>>>>>>> Documentation-wise this can be addressed via a
> > > >>>>>>>>>>> compatibility
> > > >>>>>> matrix
> > > >>>>>>>> for
> > > >>>>>>>>>>> each connector as Arvid suggested. IMO we shouldn't block
> > > >>>>>>>>>>> this
> > > >>>>>>> effort
> > > >>>>>>>>> on
> > > >>>>>>>>>>> the stability of the APIs.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Cheers,
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Konstantin
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> On Wed, Oct 20, 2021 at 8:56 AM Jark Wu
> > > >>>>>>>>>>> <[email protected]>
> > > >>>>>> wrote:
> > > >>>>>>>>>>>> Hi,
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> I think Thomas raised very good questions and would like
> > > >>>>>>>>>>>> to
> > > >>>>> know
> > > >>>>>>>> your
> > > >>>>>>>>>>>> opinions if we want to move connectors out of flink in
> > > >>>>>>>>>>>> this
> > > >>>>>>> version.
> > > >>>>>>>>>>>> (1) is the connector API already stable?
> > > >>>>>>>>>>>>> Separate releases would only make sense if the core
> > > >>>>>>>>>>>>> Flink
> > > >>>>>>> surface
> > > >>>>>>>> is
> > > >>>>>>>>>>>>> fairly stable though. As evident from Iceberg (and
> > > >>>>>>>>>>>>> also
> > > >>>>> Beam),
> > > >>>>>>>>> that's
> > > >>>>>>>>>>>>> not the case currently. We should probably focus on
> > > >>>>> addressing
> > > >>>>>>> the
> > > >>>>>>>>>>>>> stability first, before splitting code. A success
> > > >>>>>>>>>>>>> criteria
> > > >>>>>> could
> > > >>>>>>>> be
> > > >>>>>>>>>>>>> that we are able to build Iceberg and Beam against
> > > >>>>>>>>>>>>> multiple
> > > >>>>>>> Flink
> > > >>>>>>>>>>>>> versions w/o the need to change code. The goal would
> > > >>>>>>>>>>>>> be
> > > >>>>> that
> > > >>>>>> no
> > > >>>>>>>>>>>>> connector breaks when we make changes to Flink core.
> > > >>>>>>>>>>>>> Until
> > > >>>>>>> that's
> > > >>>>>>>>> the
> > > >>>>>>>>>>>>> case, code separation creates a setup where 1+1 or N+1
> > > >>>>>>>> repositories
> > > >>>>>>>>>>>>> need to move lock step.
> > > >>>>>>>>>>>>  From another discussion thread [1], connector API is far
> > > >>>>>>>>>>>> from
> > > >>>>>>>> stable.
> > > >>>>>>>>>>>> Currently, it's hard to build connectors against
> > > >>>>>>>>>>>> multiple
> > > >>>>> Flink
> > > >>>>>>>>> versions.
> > > >>>>>>>>>>>> There are breaking API changes both in 1.12 -> 1.13 and
> > > >>>>>>>>>>>> 1.13
> > > >>>>> ->
> > > >>>>>>> 1.14
> > > >>>>>>>>> and
> > > >>>>>>>>>>>>   maybe also in the future versions,  because Table
> > > >>>>>>>>>>>> related
> > > >>>>> APIs
> > > >>>>>>> are
> > > >>>>>>>>> still
> > > >>>>>>>>>>>> @PublicEvolving and new Sink API is still @Experimental.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> (2) Flink testability without connectors.
> > > >>>>>>>>>>>>> Flink w/o Kafka connector (and few others) isn't
> > > >>>>>>>>>>>>> viable. Testability of Flink was already brought up,
> > > >>>>>>>>>>>>> can we
> > > >>>>>>> really
> > > >>>>>>>>>>>>> certify a Flink core release without Kafka connector?
> > > >>>>>>>>>>>>> Maybe
> > > >>>>>>> those
> > > >>>>>>>>>>>>> connectors that are used in Flink e2e tests to
> > > >>>>>>>>>>>>> validate
> > > >>>>>>>>> functionality
> > > >>>>>>>>>>>>> of core Flink should not be broken out?
> > > >>>>>>>>>>>> This is a very good question. How can we guarantee the
> > > >>>>>>>>>>>> new
> > > >>>>>> Source
> > > >>>>>>>> and
> > > >>>>>>>>> Sink
> > > >>>>>>>>>>>> API are stable with only test implementation?
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> Best,
> > > >>>>>>>>>>>> Jark
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> On Tue, 19 Oct 2021 at 23:56, Chesnay Schepler <
> > > >>>>>>> [email protected]>
> > > >>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>> Could you clarify what release cadence you're thinking
> > > >>> of?
> > > >>>>>>> There's
> > > >>>>>>>>> quite
> > > >>>>>>>>>>>>> a big range that fits "more frequent than Flink"
> > > >>>>> (per-commit,
> > > >>>>>>>> daily,
> > > >>>>>>>>>>>>> weekly, bi-weekly, monthly, even bi-monthly).
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> On 19/10/2021 14:15, Martijn Visser wrote:
> > > >>>>>>>>>>>>>> Hi all,
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> I think it would be a huge benefit if we can achieve
> > > >>>>>>>>>>>>>> more
> > > >>>>>>>> frequent
> > > >>>>>>>>>>>>> releases
> > > >>>>>>>>>>>>>> of connectors, which are not bound to the release
> > > >>>>>>>>>>>>>> cycle
> > > >>>>> of
> > > >>>>>>> Flink
> > > >>>>>>>>>>>> itself.
> > > >>>>>>>>>>>>> I
> > > >>>>>>>>>>>>>> agree that in order to get there, we need to have
> > > >>>>>>>>>>>>>> stable
> > > >>>>>>>>> interfaces
> > > >>>>>>>>>>>> which
> > > >>>>>>>>>>>>>> are trustworthy and reliable, so they can be safely
> > > >>>>>>>>>>>>>> used
> > > >>>>> by
> > > >>>>>>>> those
> > > >>>>>>>>>>>>>> connectors. I do think that work still needs to be
> > > >>>>>>>>>>>>>> done
> > > >>>>> on
> > > >>>>>>> those
> > > >>>>>>>>>>>>>> interfaces, but I am confident that we can get there
> > > >>>>> from a
> > > >>>>>>>> Flink
> > > >>>>>>>>>>>>>> perspective.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> I am worried that we would not be able to achieve
> > > >>>>>>>>>>>>>> those
> > > >>>>>>> frequent
> > > >>>>>>>>>>>> releases
> > > >>>>>>>>>>>>>> of connectors if we are putting these connectors
> > > >>>>>>>>>>>>>> under
> > > >>>>> the
> > > >>>>>>>> Apache
> > > >>>>>>>>>>>>> umbrella,
> > > >>>>>>>>>>>>>> because that means that for each connector release
> > > >>>>>>>>>>>>>> we
> > > >>>>> have
> > > >>>>>> to
> > > >>>>>>>>> follow
> > > >>>>>>>>>>>> the
> > > >>>>>>>>>>>>>> Apache release creation process. This requires a lot
> > > >>>>>>>>>>>>>> of
> > > >>>>>> manual
> > > >>>>>>>>> steps
> > > >>>>>>>>>>>> and
> > > >>>>>>>>>>>>>> prohibits automation and I think it would be hard to
> > > >>>>> scale
> > > >>>>>> out
> > > >>>>>>>>>>>> frequent
> > > >>>>>>>>>>>>>> releases of connectors. I'm curious how others think
> > > >>>>>>>>>>>>>> this
> > > >>>>>>>>> challenge
> > > >>>>>>>>>>>> could
> > > >>>>>>>>>>>>>> be solved.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Best regards,
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Martijn
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> On Mon, 18 Oct 2021 at 22:22, Thomas Weise <
> > > >>>>> [email protected]>
> > > >>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>> Thanks for initiating this discussion.
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> There are definitely a few things that are not
> > > >>>>>>>>>>>>>>> optimal
> > > >>>>> with
> > > >>>>>>> our
> > > >>>>>>>>>>>>>>> current management of connectors. I would not
> > > >>>>> necessarily
> > > >>>>>>>>>>>> characterize
> > > >>>>>>>>>>>>>>> it as a "mess" though. As the points raised so far
> > > >>>>> show, it
> > > >>>>>>>> isn't
> > > >>>>>>>>>>>> easy
> > > >>>>>>>>>>>>>>> to find a solution that balances competing
> > > >>>>>>>>>>>>>>> requirements
> > > >>>>> and
> > > >>>>>>>>> leads to
> > > >>>>>>>>>>>> a
> > > >>>>>>>>>>>>>>> net improvement.
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> It would be great if we can find a setup that
> > > >>>>>>>>>>>>>>> allows for
> > > >>>>>>>>> connectors
> > > >>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>> be released independently of core Flink and that
> > > >>>>>>>>>>>>>>> each
> > > >>>>>>> connector
> > > >>>>>>>>> can
> > > >>>>>>>>>>>> be
> > > >>>>>>>>>>>>>>> released separately. Flink already has separate
> > > >>>>>>>>>>>>>>> releases (flink-shaded), so that by itself isn't a
> > > >>> new thing.
> > > >>>>>>>>> Per-connector
> > > >>>>>>>>>>>>>>> releases would need to allow for more frequent
> > > >>>>>>>>>>>>>>> releases
> > > >>>>>>>> (without
> > > >>>>>>>>> the
> > > >>>>>>>>>>>>>>> baggage that a full Flink release comes with).
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> Separate releases would only make sense if the core
> > > >>>>> Flink
> > > >>>>>>>>> surface is
> > > >>>>>>>>>>>>>>> fairly stable though. As evident from Iceberg (and
> > > >>>>>>>>>>>>>>> also
> > > >>>>>>> Beam),
> > > >>>>>>>>> that's
> > > >>>>>>>>>>>>>>> not the case currently. We should probably focus on
> > > >>>>>>> addressing
> > > >>>>>>>>> the
> > > >>>>>>>>>>>>>>> stability first, before splitting code. A success
> > > >>>>> criteria
> > > >>>>>>>> could
> > > >>>>>>>>> be
> > > >>>>>>>>>>>>>>> that we are able to build Iceberg and Beam against
> > > >>>>> multiple
> > > >>>>>>>> Flink
> > > >>>>>>>>>>>>>>> versions w/o the need to change code. The goal
> > > >>>>>>>>>>>>>>> would be
> > > >>>>>> that
> > > >>>>>>> no
> > > >>>>>>>>>>>>>>> connector breaks when we make changes to Flink core.
> > > >>>>> Until
> > > >>>>>>>>> that's the
> > > >>>>>>>>>>>>>>> case, code separation creates a setup where 1+1 or
> > > >>>>>>>>>>>>>>> N+1
> > > >>>>>>>>> repositories
> > > >>>>>>>>>>>>>>> need to move lock step.
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> Regarding some connectors being more important for
> > > >>>>>>>>>>>>>>> Flink
> > > >>>>>> than
> > > >>>>>>>>> others:
> > > >>>>>>>>>>>>>>> That's a fact. Flink w/o Kafka connector (and few
> > > >>>>> others)
> > > >>>>>>> isn't
> > > >>>>>>>>>>>>>>> viable. Testability of Flink was already brought
> > > >>>>>>>>>>>>>>> up,
> > > >>>>> can we
> > > >>>>>>>>> really
> > > >>>>>>>>>>>>>>> certify a Flink core release without Kafka
> > > >> connector?
> > > >>>>> Maybe
> > > >>>>>>>> those
> > > >>>>>>>>>>>>>>> connectors that are used in Flink e2e tests to
> > > >>>>>>>>>>>>>>> validate
> > > >>>>>>>>> functionality
> > > >>>>>>>>>>>>>>> of core Flink should not be broken out?
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> Finally, I think that the connectors that move into
> > > >>>>>> separate
> > > >>>>>>>>> repos
> > > >>>>>>>>>>>>>>> should remain part of the Apache Flink project.
> > > >>>>>>>>>>>>>>> Larger
> > > >>>>>>>>> organizations
> > > >>>>>>>>>>>>>>> tend to approve the use of and contribution to open
> > > >>>>> source
> > > >>>>>> at
> > > >>>>>>>> the
> > > >>>>>>>>>>>>>>> project level. Sometimes it is everything ASF. More
> > > >>>>> often
> > > >>>>>> it
> > > >>>>>>> is
> > > >>>>>>>>>>>>>>> "Apache Foo". It would be fatal to end up with a
> > > >>>>> patchwork
> > > >>>>>> of
> > > >>>>>>>>>>>> projects
> > > >>>>>>>>>>>>>>> with potentially different licenses and governance
> > > >>>>>>>>>>>>>>> to
> > > >>>>>> arrive
> > > >>>>>>>> at a
> > > >>>>>>>>>>>>>>> working Flink setup. This may mean we prioritize
> > > >>>>> usability
> > > >>>>>>> over
> > > >>>>>>>>>>>>>>> developer convenience, if that's in the best
> > > >>>>>>>>>>>>>>> interest of
> > > >>>>>>> Flink
> > > >>>>>>>>> as a
> > > >>>>>>>>>>>>>>> whole.
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> Thanks,
> > > >>>>>>>>>>>>>>> Thomas
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> On Mon, Oct 18, 2021 at 6:59 AM Chesnay Schepler <
> > > >>>>>>>>> [email protected]
> > > >>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>> Generally, the issues are reproducibility and
> > > >>> control.
> > > >>>>>>>>>>>>>>>> Stuffs completely broken on the Flink side for a
> > > >>> week?
> > > >>>>>> Well
> > > >>>>>>>>> then so
> > > >>>>>>>>>>>> are
> > > >>>>>>>>>>>>>>>> the connector repos.
> > > >>>>>>>>>>>>>>>> (As-is) You can't go back to a previous version of
> > > >>>>>>>>>>>>>>>> the
> > > >>>>>>>> snapshot.
> > > >>>>>>>>>>>> Which
> > > >>>>>>>>>>>>>>>> also means that checking out older commits can be
> > > >>>>>>> problematic
> > > >>>>>>>>>>>> because
> > > >>>>>>>>>>>>>>>> you'd still work against the latest snapshots, and
> > > >>>>>>>>>>>>>>>> they
> > > >>>>>> not
> > > >>>>>>> be
> > > >>>>>>>>>>>>>>>> compatible with each other.
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> On 18/10/2021 15:22, Arvid Heise wrote:
> > > >>>>>>>>>>>>>>>>> I was actually betting on snapshots versions.
> > > >>>>>>>>>>>>>>>>> What are
> > > >>>>>> the
> > > >>>>>>>>> limits?
> > > >>>>>>>>>>>>>>>>> Obviously, we can only do a release of a 1.15
> > > >>>>> connector
> > > >>>>>>> after
> > > >>>>>>>>> 1.15
> > > >>>>>>>>>>>> is
> > > >>>>>>>>>>>>>>>>> release.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> --
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Konstantin Knauf
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> https://urldefense.com/v3/__https://twitter.com/snntrable
> > > >>>>>>>>>>> __;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-
> > > >>>>>>>>>>> XjpYgX5MUy9M4$ [twitter[.]com]
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> https://urldefense.com/v3/__https://github.com/knaufk__;!
> > > >>>>>>>>>>> !LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-XjpY
> > > >>>>>>>>>>> gXyX8u50S$ [github[.]com]
> > > >>>>>>>>>>>
> > >
> > >
> >
>

Re: [DISCUSS] Creating an external connector repository

Reply via email to