Re: [DISCUSS] Creating an external connector repository

Martijn Visser Thu, 09 Dec 2021 06:54:39 -0800

I also agree that it feels more natural to go with a repo for each
individual connector. Each repository can be made available at
flink-packages.org so users can find them, next to referring to them in
documentation. +1 from my side.


On Thu, 9 Dec 2021 at 15:38, Arvid Heise <[email protected]> wrote:

> Hi all,
>
> We tried out Chesnay's proposal and went with Option 2. Unfortunately, we
> experienced tough nuts to crack and feel like we hit a dead end:
> - The main pain point with the outlined Frankensteinian connector repo is
> how to handle shared code / infra code. If we have it in some <common>
> branch, then we need to merge the common branch in the connector branch on
> update. However, it's unclear to me how improvements in the common branch
> that naturally appear while working on a specific connector go back into
> the common branch. You can't use a pull request from your branch or else
> your connector code would poison the connector-less common branch. So you
> would probably manually copy the files over to a common branch and create a
> PR branch for that.
> - A weird solution could be to have the common branch as a submodule in the
> repo itself (if that's even possible). I'm sure that this setup would blow
> up the minds of all newcomers.
> - Similarly, it's mandatory to have safeguards against code from connector
> A poisoning connector B, common, or main. I had some similar setup in the
> past and code from two "distinct" branch types constantly swept over.
> - We could also say that we simply release <common> independently and just
> have a maven (SNAPSHOT) dependency on it. But that would create a weird
> flow if you need to change in common where you need to constantly switch
> branches back and forth.
> - In general, Frankensteinian's approach is very switch intensive. If you
> maintain 3 connectors and need to fix 1 build stability each at the same
> time (quite common nowadays for some reason) and you have 2 review rounds,
> you need to switch branches 9 times ignoring changes to common.
>
> Additionally, we still have the rather user/dev unfriendly main that is
> mostly empty. I'm also not sure we can generate an overview README.md to
> make it more friendly here because in theory every connector branch should
> be based on main and we would get merge conflicts.
>
> I'd like to propose once again to go with individual repositories.
> - The only downside that we discussed so far is that we have more initial
> setup to do. Since we organically grow the number of connector/repositories
> that load is quite distributed. We can offer templates after finding a good
> approach that can even be used by outside organizations.
> - Regarding secrets, I think it's actually an advantage that the Kafka
> connector has no access to the AWS secrets. If there are secrets to be
> shared across connectors, we can and should use Azure's Variable Groups (I
> have used it in the past to share Nexus creds across repos). That would
> also make rotation easy.
> - Working on different connectors would be rather easy as all modern IDE
> support multiple repo setups in the same project. You still need to do
> multiple releases in case you update common code (either accessed through
> Nexus or git submodule) and you want to release your connector.
> - There is no difference in respect to how many CI runs there in both
> approaches.
> - Individual repositories also have the advantage of allowing external
> incubation. Let's assume someone builds connector A and hosts it in their
> organization (very common setup). If they want to contribute the code to
> Flink, we could simply transfer the repository into ASF after ensuring
> Flink coding standards. Then we retain git history and Github issues.
>
> Is there any point that I'm missing?
>
> On Fri, Nov 26, 2021 at 1:32 PM Chesnay Schepler <[email protected]>
> wrote:
>
> > For sharing workflows we should be able to use composite actions. We'd
> > have the main definition files in the flink-connectors repo, that we
> > also need to tag/release, which other branches/repos can then import.
> > These are also versioned, so we don't have to worry about accidentally
> > breaking stuff.
> > These could also be used to enforce certain standards / interfaces such
> > that we can automate more things (e.g., integration into the Flink
> > documentation).
> >
> > It is true that Option 2) and dedicated repositories share a lot of
> > properties. While I did say in an offline conversation that we in that
> > case might just as well use separate repositories, I'm not so sure
> > anymore. One repo would make administration a bit easier, for example
> > secrets wouldn't have to be applied to each repo (we wouldn't want
> > certain secrets to be set up organization-wide).
> > I overall also like that one repo would present a single access point;
> > you can't "miss" a connector repo, and I would hope that having it as
> > one repo would nurture more collaboration between the connectors, which
> > after all need to solve similar problems.
> >
> > It is a fair point that the branching model would be quite weird, but I
> > think that would subside pretty quickly.
> >
> > Personally I'd go with Option 2, and if that doesn't work out we can
> > still split the repo later on. (Which should then be a trivial matter of
> > copying all <connector>/* branches and renaming them).
> >
> > On 26/11/2021 12:47, Till Rohrmann wrote:
> > > Hi Arvid,
> > >
> > > Thanks for updating this thread with the latest findings. The described
> > > limitations for a single connector repo sound suboptimal to me.
> > >
> > > * Option 2. sounds as if we try to simulate multi connector repos
> inside
> > of
> > > a single repo. I also don't know how we would share code between the
> > > different branches (sharing infrastructure would probably be easier
> > > though). This seems to have the same limitations as dedicated repos
> with
> > > the downside of having a not very intuitive branching model.
> > > * Isn't option 1. kind of a degenerated version of option 2. where we
> > have
> > > some unrelated code from other connectors in the individual connector
> > > branches?
> > > * Option 3. has the downside that someone creating a release has to
> > release
> > > all connectors. This means that she either has to sync with the
> different
> > > connector maintainers or has to be able to release all connectors on
> her
> > > own. We are already seeing in the Flink community that releases require
> > > quite good communication/coordination between the different people
> > working
> > > on different Flink components. Given our goals to make connector
> releases
> > > easier and more frequent, I think that coupling different connector
> > > releases might be counter-productive.
> > >
> > > To me it sounds not very practical to mainly use a mono repository w/o
> > > having some more advanced build infrastructure that, for example,
> allows
> > to
> > > have different git roots in different connector directories. Maybe the
> > mono
> > > repo can be a catch all repository for connectors that want to be
> > released
> > > in lock-step (Option 3.) with all other connectors the repo contains.
> But
> > > for connectors that get changed frequently, having a dedicated
> repository
> > > that allows independent releases sounds preferable to me.
> > >
> > > What utilities and infrastructure code do you intend to share? Using
> git
> > > submodules can definitely be one option to share code. However, it
> might
> > > also be ok to depend on flink-connector-common artifacts which could
> make
> > > things easier. Where I am unsure is whether git submodules can be used
> to
> > > share infrastructure code (e.g. the .github/workflows) because you need
> > > these files in the repo to trigger the CI infrastructure.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Thu, Nov 25, 2021 at 1:59 PM Arvid Heise <[email protected]> wrote:
> > >
> > >> Hi Brian,
> > >>
> > >> Thank you for sharing. I think your approach is very valid and is in
> > line
> > >> with what I had in mind.
> > >>
> > >> Basically Pravega community aligns the connector releases with the
> > Pravega
> > >>> mainline release
> > >>>
> > >> This certainly would mean that there is little value in coupling
> > connector
> > >> versions. So it's making a good case for having separate connector
> > repos.
> > >>
> > >>
> > >>> and maintains the connector with the latest 3 Flink versions(CI will
> > >>> publish snapshots for all these 3 branches)
> > >>>
> > >> I'd like to give connector devs a simple way to express to which Flink
> > >> versions the current branch is compatible. From there we can generate
> > the
> > >> compatibility matrix automatically and optionally also create
> different
> > >> releases per supported Flink version. Not sure if the latter is indeed
> > >> better than having just one artifact that happens to run with multiple
> > >> Flink versions. I guess it depends on what dependencies we are
> > exposing. If
> > >> the connector uses flink-connector-base, then we probably need
> separate
> > >> artifacts with poms anyways.
> > >>
> > >> Best,
> > >>
> > >> Arvid
> > >>
> > >> On Fri, Nov 19, 2021 at 10:55 AM Zhou, Brian <[email protected]> wrote:
> > >>
> > >>> Hi Arvid,
> > >>>
> > >>> For branching model, the Pravega Flink connector has some experience
> > what
> > >>> I would like to share. Here[1][2] is the compatibility matrix and
> wiki
> > >>> explaining the branching model and releases. Basically Pravega
> > community
> > >>> aligns the connector releases with the Pravega mainline release, and
> > >>> maintains the connector with the latest 3 Flink versions(CI will
> > publish
> > >>> snapshots for all these 3 branches).
> > >>> For example, recently we have 0.10.1 release[3], and in maven central
> > we
> > >>> need to upload three artifacts(For Flink 1.13, 1.12, 1.11) for 0.10.1
> > >>> version[4].
> > >>>
> > >>> There are some alternatives. Another solution that we once discussed
> > but
> > >>> finally got abandoned is to have a independent version just like the
> > >>> current CDC connector, and then give a big compatibility matrix to
> > users.
> > >>> We think it would be too confusing when the connector develops. On
> the
> > >>> contrary, we can also do the opposite way to align with Flink version
> > and
> > >>> maintain several branches for different system version.
> > >>>
> > >>> I would say this is only a fairly-OK solution because it is a bit
> > painful
> > >>> for maintainers as cherry-picks are very common and releases would
> > >> require
> > >>> much work. However, if neither systems do not have a nice backward
> > >>> compatibility, there seems to be no comfortable solution to the their
> > >>> connector.
> > >>>
> > >>> [1] https://github.com/pravega/flink-connectors#compatibility-matrix
> > >>> [2]
> > >>>
> > >>
> >
> https://github.com/pravega/flink-connectors/wiki/Versioning-strategy-for-Flink-connector
> > >>> [3] https://github.com/pravega/flink-connectors/releases/tag/v0.10.1
> > >>> [4] https://search.maven.org/search?q=pravega-connectors-flink
> > >>>
> > >>> Best Regards,
> > >>> Brian
> > >>>
> > >>>
> > >>> Internal Use - Confidential
> > >>>
> > >>> -----Original Message-----
> > >>> From: Arvid Heise <[email protected]>
> > >>> Sent: Friday, November 19, 2021 4:12 PM
> > >>> To: dev
> > >>> Subject: Re: [DISCUSS] Creating an external connector repository
> > >>>
> > >>>
> > >>> [EXTERNAL EMAIL]
> > >>>
> > >>> Hi everyone,
> > >>>
> > >>> we are currently in the process of setting up the flink-connectors
> repo
> > >>> [1] for new connectors but we hit a wall that we currently cannot
> take:
> > >>> branching model.
> > >>> To reiterate the original motivation of the external connector repo:
> We
> > >>> want to decouple the release cycle of a connector with Flink.
> However,
> > if
> > >>> we want to support semantic versioning in the connectors with the
> > ability
> > >>> to introduce breaking changes through major version bumps and support
> > >>> bugfixes on old versions, then we need release branches similar to
> how
> > >>> Flink core operates.
> > >>> Consider two connectors, let's call them kafka and hbase. We have
> kafka
> > >> in
> > >>> version 1.0.X, 1.1.Y (small improvement), 2.0.Z (config option)
> change
> > >> and
> > >>> hbase only on 1.0.A.
> > >>>
> > >>> Now our current assumption was that we can work with a mono-repo
> under
> > >> ASF
> > >>> (flink-connectors). Then, for release-branches, we found 3 options:
> > >>> 1. We would need to create some ugly mess with the cross product of
> > >>> connector and version: so you have kafka-release-1.0,
> > kafka-release-1.1,
> > >>> kafka-release-2.0, hbase-release-1.0. The main issue is not the
> amount
> > of
> > >>> branches (that's something that git can handle) but there the state
> of
> > >>> kafka is undefined in hbase-release-1.0. That's a call for desaster
> and
> > >>> makes releasing connectors very cumbersome (CI would only execute and
> > >>> publish hbase SNAPSHOTS on hbase-release-1.0).
> > >>> 2. We could avoid the undefined state by having an empty master and
> > each
> > >>> release branch really only holds the code of the connector. But
> that's
> > >> also
> > >>> not great: any user that looks at the repo and sees no connector
> would
> > >>> assume that it's dead.
> > >>> 3. We could have synced releases similar to the CDC connectors [2].
> > That
> > >>> means that if any connector introduces a breaking change, all
> > connectors
> > >>> get a new major. I find that quite confusing to a user if hbase gets
> a
> > >> new
> > >>> release without any change because kafka introduced a breaking
> change.
> > >>>
> > >>> To fully decouple release cycles and CI of connectors, we could add
> > >>> individual repositories under ASF (flink-connector-kafka,
> > >>> flink-connector-hbase). Then we can apply the same branching model as
> > >>> before. I quickly checked if there are precedences in the apache
> > >> community
> > >>> for that approach and just by scanning alphabetically I found cordova
> > >> with
> > >>> 70 and couchdb with 77 apache repos respectively. So it certainly
> seems
> > >>> like other projects approached our problem in that way and the apache
> > >>> organization is okay with that. I currently expect max 20 additional
> > >> repos
> > >>> for connectors and in the future 10 max each for formats and
> > filesystems
> > >> if
> > >>> we would also move them out at some point in time. So we would be at
> a
> > >>> total of 50 repos.
> > >>>
> > >>> Note for all options, we need to provide a compability matrix that we
> > aim
> > >>> to autogenerate.
> > >>>
> > >>> Now for the potential downsides that we internally discussed:
> > >>> - How can we ensure common infra structure code, utilties, and
> quality?
> > >>> I propose to add a flink-connector-common that contains all these
> > things
> > >>> and is added as a git submodule/subtree to the repos.
> > >>> - Do we implicitly discourage connector developers to maintain more
> > than
> > >>> one connector with a fragmented code base?
> > >>> That is certainly a risk. However, I currently also see few devs
> > working
> > >>> on more than one connector. However, it may actually help keeping the
> > >> devs
> > >>> that maintain a specific connector on the hook. We could use github
> > >> issues
> > >>> to track bugs and feature requests and a dev can focus his limited
> time
> > >> on
> > >>> getting that one connector right.
> > >>>
> > >>> So WDYT? Compared to some intermediate suggestions with split repos,
> > the
> > >>> big difference is that everything remains under Apache umbrella and
> the
> > >>> Flink community.
> > >>>
> > >>> [1]
> > >>>
> > >>
> >
> https://urldefense.com/v3/__https://github.com/apache/flink-connectors__;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-XjpYgXzxxweh4$
> > >>> [github[.]com] [2]
> > >>>
> > >>
> >
> https://urldefense.com/v3/__https://github.com/ververica/flink-cdc-connectors/__;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-XjpYgXzgoPGA8$
> > >>> [github[.]com]
> > >>>
> > >>> On Fri, Nov 12, 2021 at 3:39 PM Arvid Heise <[email protected]>
> wrote:
> > >>>
> > >>>> Hi everyone,
> > >>>>
> > >>>> I created the flink-connectors repo [1] to advance the topic. We
> would
> > >>>> create a proof-of-concept in the next few weeks as a special branch
> > >>>> that I'd then use for discussions. If the community agrees with the
> > >>>> approach, that special branch will become the master. If not, we can
> > >>>> reiterate over it or create competing POCs.
> > >>>>
> > >>>> If someone wants to try things out in parallel, just make sure that
> > >>>> you are not accidentally pushing POCs to the master.
> > >>>>
> > >>>> As a reminder: We will not move out any current connector from Flink
> > >>>> at this point in time, so everything in Flink will remain as is and
> be
> > >>>> maintained there.
> > >>>>
> > >>>> Best,
> > >>>>
> > >>>> Arvid
> > >>>>
> > >>>> [1]
> > >>>>
> > https://urldefense.com/v3/__https://github.com/apache/flink-connectors
> > >>>>
> __;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-XjpYgXzxxweh4
> > >>>> $ [github[.]com]
> > >>>>
> > >>>> On Fri, Oct 29, 2021 at 6:57 PM Till Rohrmann <[email protected]
> >
> > >>>> wrote:
> > >>>>
> > >>>>> Hi everyone,
> > >>>>>
> > >>>>>  From the discussion, it seems to me that we have different
> opinions
> > >>>>> whether to have an ASF umbrella repository or to host them outside
> of
> > >>>>> the ASF. It also seems that this is not really the problem to
> solve.
> > >>>>> Since there are many good arguments for either approach, we could
> > >>>>> simply start with an ASF umbrella repository and see how people
> adopt
> > >>>>> it. If the individual connectors cannot move fast enough or if
> people
> > >>>>> prefer to not buy into the more heavy-weight ASF processes, then
> they
> > >>>>> can host the code also somewhere else. We simply need to make sure
> > >>>>> that these connectors are discoverable (e.g. via flink-packages).
> > >>>>>
> > >>>>> The more important problem seems to be to provide common tooling
> > >>>>> (testing, infrastructure, documentation) that can easily be reused.
> > >>>>> Similarly, it has become clear that the Flink community needs to
> > >>>>> improve on providing stable APIs. I think it is not realistic to
> > >>>>> first complete these tasks before starting to move connectors to
> > >>>>> dedicated repositories. As Stephan said, creating a connector
> > >>>>> repository will force us to pay more attention to API stability and
> > >>>>> also to think about which testing tools are required. Hence, I
> > >>>>> believe that starting to add connectors to a different repository
> > >>>>> than apache/flink will help improve our connector tooling
> (declaring
> > >>>>> testing classes as public, creating a common test utility repo,
> > >>>>> creating a repo
> > >>>>> template) and vice versa. Hence, I like Arvid's proposed process as
> > >>>>> it will start kicking things off w/o letting this effort fizzle
> out.
> > >>>>>
> > >>>>> Cheers,
> > >>>>> Till
> > >>>>>
> > >>>>> On Thu, Oct 28, 2021 at 11:44 AM Stephan Ewen <[email protected]>
> > >> wrote:
> > >>>>>> Thank you all, for the nice discussion!
> > >>>>>>
> > >>>>>>  From my point of view, I very much like the idea of putting
> > >>>>>> connectors
> > >>>>> in a
> > >>>>>> separate repository. But I would argue it should be part of Apache
> > >>>>> Flink,
> > >>>>>> similar to flink-statefun, flink-ml, etc.
> > >>>>>>
> > >>>>>> I share many of the reasons for that:
> > >>>>>>    - As argued many times, reduces complexity of the Flink repo,
> > >>>>> increases
> > >>>>>> response times of CI, etc.
> > >>>>>>    - Much lower barrier of contribution, because an unstable
> > >>>>>> connector
> > >>>>> would
> > >>>>>> not de-stabilize the whole build. Of course, we would need to make
> > >>>>>> sure
> > >>>>> we
> > >>>>>> set this up the right way, with connectors having individual CI
> > >>>>>> runs,
> > >>>>> build
> > >>>>>> status, etc. But it certainly seems possible.
> > >>>>>>
> > >>>>>>
> > >>>>>> I would argue some points a bit different than some cases made
> > >> before:
> > >>>>>> (a) I believe the separation would increase connector stability.
> > >>>>> Because it
> > >>>>>> really forces us to work with the connectors against the APIs like
> > >>>>>> any external developer. A mono repo is somehow the wrong thing if
> > >>>>>> you in practice want to actually guarantee stable internal APIs at
> > >>> some layer.
> > >>>>>> Because the mono repo makes it easy to just change something on
> > >>>>>> both
> > >>>>> sides
> > >>>>>> of the API (provider and consumer) seamlessly.
> > >>>>>>
> > >>>>>> Major refactorings in Flink need to keep all connector API
> > >>>>>> contracts intact, or we need to have a new version of the
> connector
> > >>> API.
> > >>>>>> (b) We may even be able to go towards more lightweight and
> > >>>>>> automated releases over time, even if we stay in Apache Flink with
> > >>> that repo.
> > >>>>>> This isn't yet fully aligned with the Apache release policies,
> yet,
> > >>>>>> but there are board discussions about whether there can be
> > >>>>>> bot-triggered releases (by dependabot) and how that could fit into
> > >>> the Apache process.
> > >>>>>> This doesn't seem to be quite there just yet, but seeing that
> those
> > >>>>> start
> > >>>>>> is a good sign, and there is a good chance we can do some things
> > >>> there.
> > >>>>>> I am not sure whether we should let bots trigger releases, because
> > >>>>>> a
> > >>>>> final
> > >>>>>> human look at things isn't a bad thing, especially given the
> > >>>>>> popularity
> > >>>>> of
> > >>>>>> software supply chain attacks recently.
> > >>>>>>
> > >>>>>>
> > >>>>>> I do share Chesnay's concerns about complexity in tooling, though.
> > >>>>>> Both release tooling and test tooling. They are not incompatible
> > >>>>>> with that approach, but they are a task we need to tackle during
> > >>>>>> this change which will add additional work.
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> On Tue, Oct 26, 2021 at 10:31 AM Arvid Heise <[email protected]>
> > >>> wrote:
> > >>>>>>> Hi folks,
> > >>>>>>>
> > >>>>>>> I think some questions came up and I'd like to address the
> > >>>>>>> question of
> > >>>>>> the
> > >>>>>>> timing.
> > >>>>>>>
> > >>>>>>> Could you clarify what release cadence you're thinking of?
> > >>>>>>> There's
> > >>>>> quite
> > >>>>>>>> a big range that fits "more frequent than Flink" (per-commit,
> > >>>>>>>> daily, weekly, bi-weekly, monthly, even bi-monthly).
> > >>>>>>> The short answer is: as often as needed:
> > >>>>>>> - If there is a CVE in a dependency and we need to bump it -
> > >>>>>>> release immediately.
> > >>>>>>> - If there is a new feature merged, release soonish. We may
> > >>>>>>> collect a
> > >>>>> few
> > >>>>>>> successive features before a release.
> > >>>>>>> - If there is a bugfix, release immediately or soonish depending
> > >>>>>>> on
> > >>>>> the
> > >>>>>>> severity and if there are workarounds available.
> > >>>>>>>
> > >>>>>>> We should not limit ourselves; the whole idea of independent
> > >>>>>>> releases
> > >>>>> is
> > >>>>>>> exactly that you release as needed. There is no release planning
> > >>>>>>> or anything needed, you just go with a release as if it was an
> > >>>>>>> external artifact.
> > >>>>>>>
> > >>>>>>> (1) is the connector API already stable?
> > >>>>>>>>  From another discussion thread [1], connector API is far from
> > >>>>> stable.
> > >>>>>>>> Currently, it's hard to build connectors against multiple Flink
> > >>>>>> versions.
> > >>>>>>>> There are breaking API changes both in 1.12 -> 1.13 and 1.13 ->
> > >>>>>>>> 1.14
> > >>>>>> and
> > >>>>>>>>   maybe also in the future versions,  because Table related APIs
> > >>>>>>>> are
> > >>>>>> still
> > >>>>>>>> @PublicEvolving and new Sink API is still @Experimental.
> > >>>>>>>>
> > >>>>>>> The question is: what is stable in an evolving system? We
> > >>>>>>> recently discovered that the old SourceFunction needed to be
> > >>>>>>> refined such that cancellation works correctly [1]. So that
> > >>>>>>> interface is in Flink since
> > >>>>> 7
> > >>>>>>> years, heavily used also outside, and we still had to change the
> > >>>>> contract
> > >>>>>>> in a way that I'd expect any implementer to recheck their
> > >>>>> implementation.
> > >>>>>>> It might not be necessary to change anything and you can probably
> > >>>>> change
> > >>>>>>> the the code for all Flink versions but still, the interface was
> > >>>>>>> not
> > >>>>>> stable
> > >>>>>>> in the closest sense.
> > >>>>>>>
> > >>>>>>> If we focus just on API changes on the unified interfaces, then
> > >>>>>>> we
> > >>>>> expect
> > >>>>>>> one more change to Sink API to support compaction. For Table API,
> > >>>>> there
> > >>>>>>> will most likely also be some changes in 1.15. So we could wait
> > >>>>>>> for
> > >>>>> 1.15.
> > >>>>>>> But I'm questioning if that's really necessary because we will
> > >>>>>>> add
> > >>>>> more
> > >>>>>>> functionality beyond 1.15 without breaking API. For example, we
> > >>>>>>> may
> > >>>>> add
> > >>>>>>> more unified connector metrics. If you want to use it in your
> > >>>>> connector,
> > >>>>>>> you have to support multiple Flink versions anyhow. So rather
> > >>>>>>> then
> > >>>>>> focusing
> > >>>>>>> the discussion on "when is stuff stable", I'd rather focus on
> > >>>>>>> "how
> > >>>>> can we
> > >>>>>>> support building connectors against multiple Flink versions" and
> > >>>>>>> make
> > >>>>> it
> > >>>>>> as
> > >>>>>>> painless as possible.
> > >>>>>>>
> > >>>>>>> Chesnay pointed out to use different branches for different Flink
> > >>>>>> versions
> > >>>>>>> which sounds like a good suggestion. With a mono-repo, we can't
> > >>>>>>> use branches differently anyways (there is no way to have release
> > >>>>>>> branches
> > >>>>>> per
> > >>>>>>> connector without chaos). In these branches, we could provide
> > >>>>>>> shims to simulate future features in older Flink versions such
> > >>>>>>> that code-wise,
> > >>>>> the
> > >>>>>>> source code of a specific connector may not diverge (much). For
> > >>>>> example,
> > >>>>>> to
> > >>>>>>> register unified connector metrics, we could simulate the current
> > >>>>>> approach
> > >>>>>>> also in some utility package of the mono-repo.
> > >>>>>>>
> > >>>>>>> I see the stable core Flink API as a prerequisite for modularity.
> > >>>>>>> And
> > >>>>>>>> for connectors it is not just the source and sink API (source
> > >>>>>>>> being stable as of 1.14), but everything that is required to
> > >>>>>>>> build and maintain a connector downstream, such as the test
> > >>>>>>>> utilities and infrastructure.
> > >>>>>>>>
> > >>>>>>> That is a very fair point. I'm actually surprised to see that
> > >>>>>>> MiniClusterWithClientResource is not public. I see it being used
> > >>>>>>> in
> > >>>>> all
> > >>>>>>> connectors, especially outside of Flink. I fear that as long as
> > >>>>>>> we do
> > >>>>> not
> > >>>>>>> have connectors outside, we will not properly annotate and
> > >>>>>>> maintain
> > >>>>> these
> > >>>>>>> utilties in a classic hen-and-egg-problem. I will outline an idea
> > >>>>>>> at
> > >>>>> the
> > >>>>>>> end.
> > >>>>>>>
> > >>>>>>>> the connectors need to be adopted and require at least one
> > >>>>>>>> release
> > >>>>> per
> > >>>>>>>> Flink minor release.
> > >>>>>>>> However, this will make the releases of connectors slower, e.g.
> > >>>>>> maintain
> > >>>>>>>> features for multiple branches and release multiple branches.
> > >>>>>>>> I think the main purpose of having an external connector
> > >>>>>>>> repository
> > >>>>> is
> > >>>>>> in
> > >>>>>>>> order to have "faster releases of connectors"?
> > >>>>>>>>
> > >>>>>>>> Imagine a project with a complex set of dependencies. Let's say
> > >>>>> Flink
> > >>>>>>>> version A plus Flink reliant dependencies released by other
> > >>>>>>>> projects (Flink-external connectors, Beam, Iceberg, Hudi, ..).
> > >>>>>>>> We don't want
> > >>>>> a
> > >>>>>>>> situation where we bump the core Flink version to B and things
> > >>>>>>>> fall apart (interface changes, utilities that were useful but
> > >>>>>>>> not public, transitive dependencies etc.).
> > >>>>>>>>
> > >>>>>>> Yes, that's why I wanted to automate the processes more which is
> > >>>>>>> not
> > >>>>> that
> > >>>>>>> easy under ASF. Maybe we automate the source provision across
> > >>>>> supported
> > >>>>>>> versions and have 1 vote thread for all versions of a connector?
> > >>>>>>>
> > >>>>>>>  From the perspective of CDC connector maintainers, the biggest
> > >>>>> advantage
> > >>>>>> of
> > >>>>>>>> maintaining it outside of the Flink project is that:
> > >>>>>>>> 1) we can have a more flexible and faster release cycle
> > >>>>>>>> 2) we can be more liberal with committership for connector
> > >>>>> maintainers
> > >>>>>>>> which can also attract more committers to help the release.
> > >>>>>>>>
> > >>>>>>>> Personally, I think maintaining one connector repository under
> > >>>>>>>> the
> > >>>>> ASF
> > >>>>>>> may
> > >>>>>>>> not have the above benefits.
> > >>>>>>>>
> > >>>>>>> Yes, I also feel that ASF is too restrictive for our needs. But
> > >>>>>>> it
> > >>>>> feels
> > >>>>>>> like there are too many that see it differently and I think we
> > >>>>>>> need
> > >>>>>>>
> > >>>>>>> (2) Flink testability without connectors.
> > >>>>>>>> This is a very good question. How can we guarantee the new
> > >>>>>>>> Source
> > >>>>> and
> > >>>>>>> Sink
> > >>>>>>>> API are stable with only test implementation?
> > >>>>>>>>
> > >>>>>>> We can't and shouldn't. Since the connector repo is managed by
> > >>>>>>> Flink,
> > >>>>> a
> > >>>>>>> Flink release manager needs to check if the Flink connectors are
> > >>>>> actually
> > >>>>>>> working prior to creating an RC. That's similar to how
> > >>>>>>> flink-shaded
> > >>>>> and
> > >>>>>>> flink core are related.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> So here is one idea that I had to get things rolling. We are
> > >>>>>>> going to address the external repo iteratively without
> > >>>>>>> compromising what we
> > >>>>>> already
> > >>>>>>> have:
> > >>>>>>> 1.Phase, add new contributions to external repo. We use that time
> > >>>>>>> to
> > >>>>>> setup
> > >>>>>>> infra accordingly and optimize release processes. We will
> > >>>>>>> identify
> > >>>>> test
> > >>>>>>> utilities that are not yet public/stable and fix that.
> > >>>>>>> 2.Phase, add ports to the new unified interfaces of existing
> > >>>>> connectors.
> > >>>>>>> That requires a previous Flink release to make utilities stable.
> > >>>>>>> Keep
> > >>>>> old
> > >>>>>>> interfaces in flink-core.
> > >>>>>>> 3.Phase, remove old interfaces in flink-core of some connectors
> > >>>>>>> (tbd
> > >>>>> at a
> > >>>>>>> later point).
> > >>>>>>> 4.Phase, optionally move all remaining connectors (tbd at a later
> > >>>>> point).
> > >>>>>>> I'd envision having ~3 months between the starting the different
> > >>>>> phases.
> > >>>>>>> WDYT?
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> [1]
> > >>>>>>>
> https://urldefense.com/v3/__https://issues.apache.org/jira/browse
> > >>>>>>> /FLINK-23527__;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgd
> > >>>>>>> ke_-XjpYgX2sIvAP4$ [issues[.]apache[.]org]
> > >>>>>>>
> > >>>>>>> On Thu, Oct 21, 2021 at 7:12 AM Kyle Bendickson <[email protected]
> >
> > >>>>> wrote:
> > >>>>>>>> Hi all,
> > >>>>>>>>
> > >>>>>>>> My name is Kyle and I’m an open source developer primarily
> > >>>>>>>> focused
> > >>>>> on
> > >>>>>>>> Apache Iceberg.
> > >>>>>>>>
> > >>>>>>>> I’m happy to help clarify or elaborate on any aspect of our
> > >>>>> experience
> > >>>>>>>> working on a relatively decoupled connector that is downstream
> > >>>>>>>> and
> > >>>>>> pretty
> > >>>>>>>> popular.
> > >>>>>>>>
> > >>>>>>>> I’d also love to be able to contribute or assist in any way I
> > >> can.
> > >>>>>>>> I don’t mean to thread jack, but are there any meetings or
> > >>>>>>>> community
> > >>>>>> sync
> > >>>>>>>> ups, specifically around the connector APIs, that I might join
> > >>>>>>>> / be
> > >>>>>>> invited
> > >>>>>>>> to?
> > >>>>>>>>
> > >>>>>>>> I did want to add that even though I’ve experienced some of the
> > >>>>>>>> pain
> > >>>>>>> points
> > >>>>>>>> of integrating with an evolving system / API (catalog support
> > >>>>>>>> is
> > >>>>>>> generally
> > >>>>>>>> speaking pretty new everywhere really in this space), I also
> > >>>>>>>> agree personally that you shouldn’t slow down development
> > >>>>>>>> velocity too
> > >>>>> much
> > >>>>>> for
> > >>>>>>>> the sake of external connector. Getting to a performant and
> > >>>>>>>> stable
> > >>>>>> place
> > >>>>>>>> should be the primary goal, and slowing that down to support
> > >>>>> stragglers
> > >>>>>>>> will (in my personal opinion) always be a losing game. Some
> > >>>>>>>> folks
> > >>>>> will
> > >>>>>>>> simply stay behind on versions regardless until they have to
> > >>>>> upgrade.
> > >>>>>>>> I am working on ensuring that the Iceberg community stays
> > >>>>>>>> within 1-2 versions of Flink, so that we can help provide more
> > >>>>>>>> feedback or
> > >>>>>>> contribute
> > >>>>>>>> things that might make our ability to support multiple Flink
> > >>>>> runtimes /
> > >>>>>>>> versions with one project / codebase and minimal to no
> > >>>>>>>> reflection
> > >>>>> (our
> > >>>>>>>> desired goal).
> > >>>>>>>>
> > >>>>>>>> If there’s anything I can do or any way I can be of assistance,
> > >>>>> please
> > >>>>>>>> don’t hesitate to reach out. Or find me on ASF slack 😀
> > >>>>>>>>
> > >>>>>>>> I greatly appreciate your general concern for the needs of
> > >>>>> downstream
> > >>>>>>>> connector integrators!
> > >>>>>>>>
> > >>>>>>>> Cheers
> > >>>>>>>> Kyle Bendickson (GitHub: kbendick) Open Source Developer kyle
> > >>>>>>>> [at] tabular [dot] io
> > >>>>>>>>
> > >>>>>>>> On Wed, Oct 20, 2021 at 11:35 AM Thomas Weise <[email protected]>
> > >>>>> wrote:
> > >>>>>>>>> Hi,
> > >>>>>>>>>
> > >>>>>>>>> I see the stable core Flink API as a prerequisite for
> > >>> modularity.
> > >>>>> And
> > >>>>>>>>> for connectors it is not just the source and sink API (source
> > >>>>> being
> > >>>>>>>>> stable as of 1.14), but everything that is required to build
> > >>>>>>>>> and maintain a connector downstream, such as the test
> > >>>>>>>>> utilities and infrastructure.
> > >>>>>>>>>
> > >>>>>>>>> Without the stable surface of core Flink, changes will leak
> > >>>>>>>>> into downstream dependencies and force lock step updates.
> > >>>>>>>>> Refactoring across N repos is more painful than a single
> > >>>>>>>>> repo. Those with experience developing downstream of Flink
> > >>>>>>>>> will know the pain, and
> > >>>>>> that
> > >>>>>>>>> isn't limited to connectors. I don't remember a Flink "minor
> > >>>>> version"
> > >>>>>>>>> update that was just a dependency version change and did not
> > >>>>>>>>> force other downstream changes.
> > >>>>>>>>>
> > >>>>>>>>> Imagine a project with a complex set of dependencies. Let's
> > >>>>>>>>> say
> > >>>>> Flink
> > >>>>>>>>> version A plus Flink reliant dependencies released by other
> > >>>>> projects
> > >>>>>>>>> (Flink-external connectors, Beam, Iceberg, Hudi, ..). We
> > >>>>>>>>> don't
> > >>>>> want a
> > >>>>>>>>> situation where we bump the core Flink version to B and
> > >>>>>>>>> things
> > >>>>> fall
> > >>>>>>>>> apart (interface changes, utilities that were useful but not
> > >>>>> public,
> > >>>>>>>>> transitive dependencies etc.).
> > >>>>>>>>>
> > >>>>>>>>> The discussion here also highlights the benefits of keeping
> > >>>>> certain
> > >>>>>>>>> connectors outside Flink. Whether that is due to difference
> > >>>>>>>>> in developer community, maturity of the connectors, their
> > >>>>>>>>> specialized/limited usage etc. I would like to see that as a
> > >>>>>>>>> sign
> > >>>>> of
> > >>>>>> a
> > >>>>>>>>> growing ecosystem and most of the ideas that Arvid has put
> > >>>>>>>>> forward would benefit further growth of the connector
> > >> ecosystem.
> > >>>>>>>>> As for keeping connectors within Apache Flink: I prefer that
> > >>>>>>>>> as
> > >>>>> the
> > >>>>>>>>> path forward for "essential" connectors like FileSource,
> > >>>>> KafkaSource,
> > >>>>>>>>> ... And we can still achieve a more flexible and faster
> > >>>>>>>>> release
> > >>>>>> cycle.
> > >>>>>>>>> Thanks,
> > >>>>>>>>> Thomas
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> On Wed, Oct 20, 2021 at 3:32 AM Jark Wu <[email protected]>
> > >>> wrote:
> > >>>>>>>>>> Hi Konstantin,
> > >>>>>>>>>>
> > >>>>>>>>>>> the connectors need to be adopted and require at least
> > >>>>>>>>>>> one
> > >>>>>> release
> > >>>>>>>> per
> > >>>>>>>>>> Flink minor release.
> > >>>>>>>>>> However, this will make the releases of connectors slower,
> > >>> e.g.
> > >>>>>>>> maintain
> > >>>>>>>>>> features for multiple branches and release multiple
> > >> branches.
> > >>>>>>>>>> I think the main purpose of having an external connector
> > >>>>> repository
> > >>>>>>> is
> > >>>>>>>> in
> > >>>>>>>>>> order to have "faster releases of connectors"?
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>  From the perspective of CDC connector maintainers, the
> > >>>>>>>>>> biggest
> > >>>>>>>> advantage
> > >>>>>>>>> of
> > >>>>>>>>>> maintaining it outside of the Flink project is that:
> > >>>>>>>>>> 1) we can have a more flexible and faster release cycle
> > >>>>>>>>>> 2) we can be more liberal with committership for connector
> > >>>>>>> maintainers
> > >>>>>>>>>> which can also attract more committers to help the release.
> > >>>>>>>>>>
> > >>>>>>>>>> Personally, I think maintaining one connector repository
> > >>>>>>>>>> under
> > >>>>> the
> > >>>>>>> ASF
> > >>>>>>>>> may
> > >>>>>>>>>> not have the above benefits.
> > >>>>>>>>>>
> > >>>>>>>>>> Best,
> > >>>>>>>>>> Jark
> > >>>>>>>>>>
> > >>>>>>>>>> On Wed, 20 Oct 2021 at 15:14, Konstantin Knauf <
> > >>>>> [email protected]>
> > >>>>>>>>> wrote:
> > >>>>>>>>>>> Hi everyone,
> > >>>>>>>>>>>
> > >>>>>>>>>>> regarding the stability of the APIs. I think everyone
> > >>>>>>>>>>> agrees
> > >>>>> that
> > >>>>>>>>>>> connector APIs which are stable across minor versions
> > >>>>>> (1.13->1.14)
> > >>>>>>>> are
> > >>>>>>>>> the
> > >>>>>>>>>>> mid-term goal. But:
> > >>>>>>>>>>>
> > >>>>>>>>>>> a) These APIs are still quite young, and we shouldn't
> > >>>>>>>>>>> make
> > >>>>> them
> > >>>>>>>> @Public
> > >>>>>>>>>>> prematurely either.
> > >>>>>>>>>>>
> > >>>>>>>>>>> b) Isn't this *mostly* orthogonal to where the connector
> > >>>>>>>>>>> code
> > >>>>>>> lives?
> > >>>>>>>>> Yes,
> > >>>>>>>>>>> as long as there are breaking changes, the connectors
> > >>>>>>>>>>> need to
> > >>>>> be
> > >>>>>>>>> adopted
> > >>>>>>>>>>> and require at least one release per Flink minor release.
> > >>>>>>>>>>> Documentation-wise this can be addressed via a
> > >>>>>>>>>>> compatibility
> > >>>>>> matrix
> > >>>>>>>> for
> > >>>>>>>>>>> each connector as Arvid suggested. IMO we shouldn't block
> > >>>>>>>>>>> this
> > >>>>>>> effort
> > >>>>>>>>> on
> > >>>>>>>>>>> the stability of the APIs.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Cheers,
> > >>>>>>>>>>>
> > >>>>>>>>>>> Konstantin
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> On Wed, Oct 20, 2021 at 8:56 AM Jark Wu
> > >>>>>>>>>>> <[email protected]>
> > >>>>>> wrote:
> > >>>>>>>>>>>> Hi,
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> I think Thomas raised very good questions and would like
> > >>>>>>>>>>>> to
> > >>>>> know
> > >>>>>>>> your
> > >>>>>>>>>>>> opinions if we want to move connectors out of flink in
> > >>>>>>>>>>>> this
> > >>>>>>> version.
> > >>>>>>>>>>>> (1) is the connector API already stable?
> > >>>>>>>>>>>>> Separate releases would only make sense if the core
> > >>>>>>>>>>>>> Flink
> > >>>>>>> surface
> > >>>>>>>> is
> > >>>>>>>>>>>>> fairly stable though. As evident from Iceberg (and
> > >>>>>>>>>>>>> also
> > >>>>> Beam),
> > >>>>>>>>> that's
> > >>>>>>>>>>>>> not the case currently. We should probably focus on
> > >>>>> addressing
> > >>>>>>> the
> > >>>>>>>>>>>>> stability first, before splitting code. A success
> > >>>>>>>>>>>>> criteria
> > >>>>>> could
> > >>>>>>>> be
> > >>>>>>>>>>>>> that we are able to build Iceberg and Beam against
> > >>>>>>>>>>>>> multiple
> > >>>>>>> Flink
> > >>>>>>>>>>>>> versions w/o the need to change code. The goal would
> > >>>>>>>>>>>>> be
> > >>>>> that
> > >>>>>> no
> > >>>>>>>>>>>>> connector breaks when we make changes to Flink core.
> > >>>>>>>>>>>>> Until
> > >>>>>>> that's
> > >>>>>>>>> the
> > >>>>>>>>>>>>> case, code separation creates a setup where 1+1 or N+1
> > >>>>>>>> repositories
> > >>>>>>>>>>>>> need to move lock step.
> > >>>>>>>>>>>>  From another discussion thread [1], connector API is far
> > >>>>>>>>>>>> from
> > >>>>>>>> stable.
> > >>>>>>>>>>>> Currently, it's hard to build connectors against
> > >>>>>>>>>>>> multiple
> > >>>>> Flink
> > >>>>>>>>> versions.
> > >>>>>>>>>>>> There are breaking API changes both in 1.12 -> 1.13 and
> > >>>>>>>>>>>> 1.13
> > >>>>> ->
> > >>>>>>> 1.14
> > >>>>>>>>> and
> > >>>>>>>>>>>>   maybe also in the future versions,  because Table
> > >>>>>>>>>>>> related
> > >>>>> APIs
> > >>>>>>> are
> > >>>>>>>>> still
> > >>>>>>>>>>>> @PublicEvolving and new Sink API is still @Experimental.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> (2) Flink testability without connectors.
> > >>>>>>>>>>>>> Flink w/o Kafka connector (and few others) isn't
> > >>>>>>>>>>>>> viable. Testability of Flink was already brought up,
> > >>>>>>>>>>>>> can we
> > >>>>>>> really
> > >>>>>>>>>>>>> certify a Flink core release without Kafka connector?
> > >>>>>>>>>>>>> Maybe
> > >>>>>>> those
> > >>>>>>>>>>>>> connectors that are used in Flink e2e tests to
> > >>>>>>>>>>>>> validate
> > >>>>>>>>> functionality
> > >>>>>>>>>>>>> of core Flink should not be broken out?
> > >>>>>>>>>>>> This is a very good question. How can we guarantee the
> > >>>>>>>>>>>> new
> > >>>>>> Source
> > >>>>>>>> and
> > >>>>>>>>> Sink
> > >>>>>>>>>>>> API are stable with only test implementation?
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Best,
> > >>>>>>>>>>>> Jark
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> On Tue, 19 Oct 2021 at 23:56, Chesnay Schepler <
> > >>>>>>> [email protected]>
> > >>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> Could you clarify what release cadence you're thinking
> > >>> of?
> > >>>>>>> There's
> > >>>>>>>>> quite
> > >>>>>>>>>>>>> a big range that fits "more frequent than Flink"
> > >>>>> (per-commit,
> > >>>>>>>> daily,
> > >>>>>>>>>>>>> weekly, bi-weekly, monthly, even bi-monthly).
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> On 19/10/2021 14:15, Martijn Visser wrote:
> > >>>>>>>>>>>>>> Hi all,
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> I think it would be a huge benefit if we can achieve
> > >>>>>>>>>>>>>> more
> > >>>>>>>> frequent
> > >>>>>>>>>>>>> releases
> > >>>>>>>>>>>>>> of connectors, which are not bound to the release
> > >>>>>>>>>>>>>> cycle
> > >>>>> of
> > >>>>>>> Flink
> > >>>>>>>>>>>> itself.
> > >>>>>>>>>>>>> I
> > >>>>>>>>>>>>>> agree that in order to get there, we need to have
> > >>>>>>>>>>>>>> stable
> > >>>>>>>>> interfaces
> > >>>>>>>>>>>> which
> > >>>>>>>>>>>>>> are trustworthy and reliable, so they can be safely
> > >>>>>>>>>>>>>> used
> > >>>>> by
> > >>>>>>>> those
> > >>>>>>>>>>>>>> connectors. I do think that work still needs to be
> > >>>>>>>>>>>>>> done
> > >>>>> on
> > >>>>>>> those
> > >>>>>>>>>>>>>> interfaces, but I am confident that we can get there
> > >>>>> from a
> > >>>>>>>> Flink
> > >>>>>>>>>>>>>> perspective.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> I am worried that we would not be able to achieve
> > >>>>>>>>>>>>>> those
> > >>>>>>> frequent
> > >>>>>>>>>>>> releases
> > >>>>>>>>>>>>>> of connectors if we are putting these connectors
> > >>>>>>>>>>>>>> under
> > >>>>> the
> > >>>>>>>> Apache
> > >>>>>>>>>>>>> umbrella,
> > >>>>>>>>>>>>>> because that means that for each connector release
> > >>>>>>>>>>>>>> we
> > >>>>> have
> > >>>>>> to
> > >>>>>>>>> follow
> > >>>>>>>>>>>> the
> > >>>>>>>>>>>>>> Apache release creation process. This requires a lot
> > >>>>>>>>>>>>>> of
> > >>>>>> manual
> > >>>>>>>>> steps
> > >>>>>>>>>>>> and
> > >>>>>>>>>>>>>> prohibits automation and I think it would be hard to
> > >>>>> scale
> > >>>>>> out
> > >>>>>>>>>>>> frequent
> > >>>>>>>>>>>>>> releases of connectors. I'm curious how others think
> > >>>>>>>>>>>>>> this
> > >>>>>>>>> challenge
> > >>>>>>>>>>>> could
> > >>>>>>>>>>>>>> be solved.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Best regards,
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Martijn
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> On Mon, 18 Oct 2021 at 22:22, Thomas Weise <
> > >>>>> [email protected]>
> > >>>>>>>>> wrote:
> > >>>>>>>>>>>>>>> Thanks for initiating this discussion.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> There are definitely a few things that are not
> > >>>>>>>>>>>>>>> optimal
> > >>>>> with
> > >>>>>>> our
> > >>>>>>>>>>>>>>> current management of connectors. I would not
> > >>>>> necessarily
> > >>>>>>>>>>>> characterize
> > >>>>>>>>>>>>>>> it as a "mess" though. As the points raised so far
> > >>>>> show, it
> > >>>>>>>> isn't
> > >>>>>>>>>>>> easy
> > >>>>>>>>>>>>>>> to find a solution that balances competing
> > >>>>>>>>>>>>>>> requirements
> > >>>>> and
> > >>>>>>>>> leads to
> > >>>>>>>>>>>> a
> > >>>>>>>>>>>>>>> net improvement.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> It would be great if we can find a setup that
> > >>>>>>>>>>>>>>> allows for
> > >>>>>>>>> connectors
> > >>>>>>>>>>>> to
> > >>>>>>>>>>>>>>> be released independently of core Flink and that
> > >>>>>>>>>>>>>>> each
> > >>>>>>> connector
> > >>>>>>>>> can
> > >>>>>>>>>>>> be
> > >>>>>>>>>>>>>>> released separately. Flink already has separate
> > >>>>>>>>>>>>>>> releases (flink-shaded), so that by itself isn't a
> > >>> new thing.
> > >>>>>>>>> Per-connector
> > >>>>>>>>>>>>>>> releases would need to allow for more frequent
> > >>>>>>>>>>>>>>> releases
> > >>>>>>>> (without
> > >>>>>>>>> the
> > >>>>>>>>>>>>>>> baggage that a full Flink release comes with).
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Separate releases would only make sense if the core
> > >>>>> Flink
> > >>>>>>>>> surface is
> > >>>>>>>>>>>>>>> fairly stable though. As evident from Iceberg (and
> > >>>>>>>>>>>>>>> also
> > >>>>>>> Beam),
> > >>>>>>>>> that's
> > >>>>>>>>>>>>>>> not the case currently. We should probably focus on
> > >>>>>>> addressing
> > >>>>>>>>> the
> > >>>>>>>>>>>>>>> stability first, before splitting code. A success
> > >>>>> criteria
> > >>>>>>>> could
> > >>>>>>>>> be
> > >>>>>>>>>>>>>>> that we are able to build Iceberg and Beam against
> > >>>>> multiple
> > >>>>>>>> Flink
> > >>>>>>>>>>>>>>> versions w/o the need to change code. The goal
> > >>>>>>>>>>>>>>> would be
> > >>>>>> that
> > >>>>>>> no
> > >>>>>>>>>>>>>>> connector breaks when we make changes to Flink core.
> > >>>>> Until
> > >>>>>>>>> that's the
> > >>>>>>>>>>>>>>> case, code separation creates a setup where 1+1 or
> > >>>>>>>>>>>>>>> N+1
> > >>>>>>>>> repositories
> > >>>>>>>>>>>>>>> need to move lock step.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Regarding some connectors being more important for
> > >>>>>>>>>>>>>>> Flink
> > >>>>>> than
> > >>>>>>>>> others:
> > >>>>>>>>>>>>>>> That's a fact. Flink w/o Kafka connector (and few
> > >>>>> others)
> > >>>>>>> isn't
> > >>>>>>>>>>>>>>> viable. Testability of Flink was already brought
> > >>>>>>>>>>>>>>> up,
> > >>>>> can we
> > >>>>>>>>> really
> > >>>>>>>>>>>>>>> certify a Flink core release without Kafka
> > >> connector?
> > >>>>> Maybe
> > >>>>>>>> those
> > >>>>>>>>>>>>>>> connectors that are used in Flink e2e tests to
> > >>>>>>>>>>>>>>> validate
> > >>>>>>>>> functionality
> > >>>>>>>>>>>>>>> of core Flink should not be broken out?
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Finally, I think that the connectors that move into
> > >>>>>> separate
> > >>>>>>>>> repos
> > >>>>>>>>>>>>>>> should remain part of the Apache Flink project.
> > >>>>>>>>>>>>>>> Larger
> > >>>>>>>>> organizations
> > >>>>>>>>>>>>>>> tend to approve the use of and contribution to open
> > >>>>> source
> > >>>>>> at
> > >>>>>>>> the
> > >>>>>>>>>>>>>>> project level. Sometimes it is everything ASF. More
> > >>>>> often
> > >>>>>> it
> > >>>>>>> is
> > >>>>>>>>>>>>>>> "Apache Foo". It would be fatal to end up with a
> > >>>>> patchwork
> > >>>>>> of
> > >>>>>>>>>>>> projects
> > >>>>>>>>>>>>>>> with potentially different licenses and governance
> > >>>>>>>>>>>>>>> to
> > >>>>>> arrive
> > >>>>>>>> at a
> > >>>>>>>>>>>>>>> working Flink setup. This may mean we prioritize
> > >>>>> usability
> > >>>>>>> over
> > >>>>>>>>>>>>>>> developer convenience, if that's in the best
> > >>>>>>>>>>>>>>> interest of
> > >>>>>>> Flink
> > >>>>>>>>> as a
> > >>>>>>>>>>>>>>> whole.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Thanks,
> > >>>>>>>>>>>>>>> Thomas
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> On Mon, Oct 18, 2021 at 6:59 AM Chesnay Schepler <
> > >>>>>>>>> [email protected]
> > >>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>> Generally, the issues are reproducibility and
> > >>> control.
> > >>>>>>>>>>>>>>>> Stuffs completely broken on the Flink side for a
> > >>> week?
> > >>>>>> Well
> > >>>>>>>>> then so
> > >>>>>>>>>>>> are
> > >>>>>>>>>>>>>>>> the connector repos.
> > >>>>>>>>>>>>>>>> (As-is) You can't go back to a previous version of
> > >>>>>>>>>>>>>>>> the
> > >>>>>>>> snapshot.
> > >>>>>>>>>>>> Which
> > >>>>>>>>>>>>>>>> also means that checking out older commits can be
> > >>>>>>> problematic
> > >>>>>>>>>>>> because
> > >>>>>>>>>>>>>>>> you'd still work against the latest snapshots, and
> > >>>>>>>>>>>>>>>> they
> > >>>>>> not
> > >>>>>>> be
> > >>>>>>>>>>>>>>>> compatible with each other.
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> On 18/10/2021 15:22, Arvid Heise wrote:
> > >>>>>>>>>>>>>>>>> I was actually betting on snapshots versions.
> > >>>>>>>>>>>>>>>>> What are
> > >>>>>> the
> > >>>>>>>>> limits?
> > >>>>>>>>>>>>>>>>> Obviously, we can only do a release of a 1.15
> > >>>>> connector
> > >>>>>>> after
> > >>>>>>>>> 1.15
> > >>>>>>>>>>>> is
> > >>>>>>>>>>>>>>>>> release.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> --
> > >>>>>>>>>>>
> > >>>>>>>>>>> Konstantin Knauf
> > >>>>>>>>>>>
> > >>>>>>>>>>> https://urldefense.com/v3/__https://twitter.com/snntrable
> > >>>>>>>>>>> __;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-
> > >>>>>>>>>>> XjpYgX5MUy9M4$ [twitter[.]com]
> > >>>>>>>>>>>
> > >>>>>>>>>>> https://urldefense.com/v3/__https://github.com/knaufk__;!
> > >>>>>>>>>>> !LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-XjpY
> > >>>>>>>>>>> gXyX8u50S$ [github[.]com]
> > >>>>>>>>>>>
> >
> >
>

Re: [DISCUSS] Creating an external connector repository

Reply via email to