Re: [DISCUSS] Creating an external connector repository

Thomas Weise Thu, 09 Dec 2021 19:47:16 -0800

+1 for repo per connector from my side also

Thanks for trying out the different approaches.


Where would the common/infra pieces live? In a separate repository
with its own release?

Thomas

On Thu, Dec 9, 2021 at 12:42 PM Till Rohrmann <[email protected]> wrote:
>
> Sorry if I was a bit unclear. +1 for the single repo per connector approach.
>
> Cheers,
> Till
>
> On Thu, Dec 9, 2021 at 5:41 PM Till Rohrmann <[email protected]> wrote:
>
> > +1 for the single repo approach.
> >
> > Cheers,
> > Till
> >
> > On Thu, Dec 9, 2021 at 3:54 PM Martijn Visser <[email protected]>
> > wrote:
> >
> >> I also agree that it feels more natural to go with a repo for each
> >> individual connector. Each repository can be made available at
> >> flink-packages.org so users can find them, next to referring to them in
> >> documentation. +1 from my side.
> >>
> >> On Thu, 9 Dec 2021 at 15:38, Arvid Heise <[email protected]> wrote:
> >>
> >> > Hi all,
> >> >
> >> > We tried out Chesnay's proposal and went with Option 2. Unfortunately,
> >> we
> >> > experienced tough nuts to crack and feel like we hit a dead end:
> >> > - The main pain point with the outlined Frankensteinian connector repo
> >> is
> >> > how to handle shared code / infra code. If we have it in some <common>
> >> > branch, then we need to merge the common branch in the connector branch
> >> on
> >> > update. However, it's unclear to me how improvements in the common
> >> branch
> >> > that naturally appear while working on a specific connector go back into
> >> > the common branch. You can't use a pull request from your branch or else
> >> > your connector code would poison the connector-less common branch. So
> >> you
> >> > would probably manually copy the files over to a common branch and
> >> create a
> >> > PR branch for that.
> >> > - A weird solution could be to have the common branch as a submodule in
> >> the
> >> > repo itself (if that's even possible). I'm sure that this setup would
> >> blow
> >> > up the minds of all newcomers.
> >> > - Similarly, it's mandatory to have safeguards against code from
> >> connector
> >> > A poisoning connector B, common, or main. I had some similar setup in
> >> the
> >> > past and code from two "distinct" branch types constantly swept over.
> >> > - We could also say that we simply release <common> independently and
> >> just
> >> > have a maven (SNAPSHOT) dependency on it. But that would create a weird
> >> > flow if you need to change in common where you need to constantly switch
> >> > branches back and forth.
> >> > - In general, Frankensteinian's approach is very switch intensive. If
> >> you
> >> > maintain 3 connectors and need to fix 1 build stability each at the same
> >> > time (quite common nowadays for some reason) and you have 2 review
> >> rounds,
> >> > you need to switch branches 9 times ignoring changes to common.
> >> >
> >> > Additionally, we still have the rather user/dev unfriendly main that is
> >> > mostly empty. I'm also not sure we can generate an overview README.md to
> >> > make it more friendly here because in theory every connector branch
> >> should
> >> > be based on main and we would get merge conflicts.
> >> >
> >> > I'd like to propose once again to go with individual repositories.
> >> > - The only downside that we discussed so far is that we have more
> >> initial
> >> > setup to do. Since we organically grow the number of
> >> connector/repositories
> >> > that load is quite distributed. We can offer templates after finding a
> >> good
> >> > approach that can even be used by outside organizations.
> >> > - Regarding secrets, I think it's actually an advantage that the Kafka
> >> > connector has no access to the AWS secrets. If there are secrets to be
> >> > shared across connectors, we can and should use Azure's Variable Groups
> >> (I
> >> > have used it in the past to share Nexus creds across repos). That would
> >> > also make rotation easy.
> >> > - Working on different connectors would be rather easy as all modern IDE
> >> > support multiple repo setups in the same project. You still need to do
> >> > multiple releases in case you update common code (either accessed
> >> through
> >> > Nexus or git submodule) and you want to release your connector.
> >> > - There is no difference in respect to how many CI runs there in both
> >> > approaches.
> >> > - Individual repositories also have the advantage of allowing external
> >> > incubation. Let's assume someone builds connector A and hosts it in
> >> their
> >> > organization (very common setup). If they want to contribute the code to
> >> > Flink, we could simply transfer the repository into ASF after ensuring
> >> > Flink coding standards. Then we retain git history and Github issues.
> >> >
> >> > Is there any point that I'm missing?
> >> >
> >> > On Fri, Nov 26, 2021 at 1:32 PM Chesnay Schepler <[email protected]>
> >> > wrote:
> >> >
> >> > > For sharing workflows we should be able to use composite actions. We'd
> >> > > have the main definition files in the flink-connectors repo, that we
> >> > > also need to tag/release, which other branches/repos can then import.
> >> > > These are also versioned, so we don't have to worry about accidentally
> >> > > breaking stuff.
> >> > > These could also be used to enforce certain standards / interfaces
> >> such
> >> > > that we can automate more things (e.g., integration into the Flink
> >> > > documentation).
> >> > >
> >> > > It is true that Option 2) and dedicated repositories share a lot of
> >> > > properties. While I did say in an offline conversation that we in that
> >> > > case might just as well use separate repositories, I'm not so sure
> >> > > anymore. One repo would make administration a bit easier, for example
> >> > > secrets wouldn't have to be applied to each repo (we wouldn't want
> >> > > certain secrets to be set up organization-wide).
> >> > > I overall also like that one repo would present a single access point;
> >> > > you can't "miss" a connector repo, and I would hope that having it as
> >> > > one repo would nurture more collaboration between the connectors,
> >> which
> >> > > after all need to solve similar problems.
> >> > >
> >> > > It is a fair point that the branching model would be quite weird, but
> >> I
> >> > > think that would subside pretty quickly.
> >> > >
> >> > > Personally I'd go with Option 2, and if that doesn't work out we can
> >> > > still split the repo later on. (Which should then be a trivial matter
> >> of
> >> > > copying all <connector>/* branches and renaming them).
> >> > >
> >> > > On 26/11/2021 12:47, Till Rohrmann wrote:
> >> > > > Hi Arvid,
> >> > > >
> >> > > > Thanks for updating this thread with the latest findings. The
> >> described
> >> > > > limitations for a single connector repo sound suboptimal to me.
> >> > > >
> >> > > > * Option 2. sounds as if we try to simulate multi connector repos
> >> > inside
> >> > > of
> >> > > > a single repo. I also don't know how we would share code between the
> >> > > > different branches (sharing infrastructure would probably be easier
> >> > > > though). This seems to have the same limitations as dedicated repos
> >> > with
> >> > > > the downside of having a not very intuitive branching model.
> >> > > > * Isn't option 1. kind of a degenerated version of option 2. where
> >> we
> >> > > have
> >> > > > some unrelated code from other connectors in the individual
> >> connector
> >> > > > branches?
> >> > > > * Option 3. has the downside that someone creating a release has to
> >> > > release
> >> > > > all connectors. This means that she either has to sync with the
> >> > different
> >> > > > connector maintainers or has to be able to release all connectors on
> >> > her
> >> > > > own. We are already seeing in the Flink community that releases
> >> require
> >> > > > quite good communication/coordination between the different people
> >> > > working
> >> > > > on different Flink components. Given our goals to make connector
> >> > releases
> >> > > > easier and more frequent, I think that coupling different connector
> >> > > > releases might be counter-productive.
> >> > > >
> >> > > > To me it sounds not very practical to mainly use a mono repository
> >> w/o
> >> > > > having some more advanced build infrastructure that, for example,
> >> > allows
> >> > > to
> >> > > > have different git roots in different connector directories. Maybe
> >> the
> >> > > mono
> >> > > > repo can be a catch all repository for connectors that want to be
> >> > > released
> >> > > > in lock-step (Option 3.) with all other connectors the repo
> >> contains.
> >> > But
> >> > > > for connectors that get changed frequently, having a dedicated
> >> > repository
> >> > > > that allows independent releases sounds preferable to me.
> >> > > >
> >> > > > What utilities and infrastructure code do you intend to share? Using
> >> > git
> >> > > > submodules can definitely be one option to share code. However, it
> >> > might
> >> > > > also be ok to depend on flink-connector-common artifacts which could
> >> > make
> >> > > > things easier. Where I am unsure is whether git submodules can be
> >> used
> >> > to
> >> > > > share infrastructure code (e.g. the .github/workflows) because you
> >> need
> >> > > > these files in the repo to trigger the CI infrastructure.
> >> > > >
> >> > > > Cheers,
> >> > > > Till
> >> > > >
> >> > > > On Thu, Nov 25, 2021 at 1:59 PM Arvid Heise <[email protected]>
> >> wrote:
> >> > > >
> >> > > >> Hi Brian,
> >> > > >>
> >> > > >> Thank you for sharing. I think your approach is very valid and is
> >> in
> >> > > line
> >> > > >> with what I had in mind.
> >> > > >>
> >> > > >> Basically Pravega community aligns the connector releases with the
> >> > > Pravega
> >> > > >>> mainline release
> >> > > >>>
> >> > > >> This certainly would mean that there is little value in coupling
> >> > > connector
> >> > > >> versions. So it's making a good case for having separate connector
> >> > > repos.
> >> > > >>
> >> > > >>
> >> > > >>> and maintains the connector with the latest 3 Flink versions(CI
> >> will
> >> > > >>> publish snapshots for all these 3 branches)
> >> > > >>>
> >> > > >> I'd like to give connector devs a simple way to express to which
> >> Flink
> >> > > >> versions the current branch is compatible. From there we can
> >> generate
> >> > > the
> >> > > >> compatibility matrix automatically and optionally also create
> >> > different
> >> > > >> releases per supported Flink version. Not sure if the latter is
> >> indeed
> >> > > >> better than having just one artifact that happens to run with
> >> multiple
> >> > > >> Flink versions. I guess it depends on what dependencies we are
> >> > > exposing. If
> >> > > >> the connector uses flink-connector-base, then we probably need
> >> > separate
> >> > > >> artifacts with poms anyways.
> >> > > >>
> >> > > >> Best,
> >> > > >>
> >> > > >> Arvid
> >> > > >>
> >> > > >> On Fri, Nov 19, 2021 at 10:55 AM Zhou, Brian <[email protected]>
> >> wrote:
> >> > > >>
> >> > > >>> Hi Arvid,
> >> > > >>>
> >> > > >>> For branching model, the Pravega Flink connector has some
> >> experience
> >> > > what
> >> > > >>> I would like to share. Here[1][2] is the compatibility matrix and
> >> > wiki
> >> > > >>> explaining the branching model and releases. Basically Pravega
> >> > > community
> >> > > >>> aligns the connector releases with the Pravega mainline release,
> >> and
> >> > > >>> maintains the connector with the latest 3 Flink versions(CI will
> >> > > publish
> >> > > >>> snapshots for all these 3 branches).
> >> > > >>> For example, recently we have 0.10.1 release[3], and in maven
> >> central
> >> > > we
> >> > > >>> need to upload three artifacts(For Flink 1.13, 1.12, 1.11) for
> >> 0.10.1
> >> > > >>> version[4].
> >> > > >>>
> >> > > >>> There are some alternatives. Another solution that we once
> >> discussed
> >> > > but
> >> > > >>> finally got abandoned is to have a independent version just like
> >> the
> >> > > >>> current CDC connector, and then give a big compatibility matrix to
> >> > > users.
> >> > > >>> We think it would be too confusing when the connector develops. On
> >> > the
> >> > > >>> contrary, we can also do the opposite way to align with Flink
> >> version
> >> > > and
> >> > > >>> maintain several branches for different system version.
> >> > > >>>
> >> > > >>> I would say this is only a fairly-OK solution because it is a bit
> >> > > painful
> >> > > >>> for maintainers as cherry-picks are very common and releases would
> >> > > >> require
> >> > > >>> much work. However, if neither systems do not have a nice backward
> >> > > >>> compatibility, there seems to be no comfortable solution to the
> >> their
> >> > > >>> connector.
> >> > > >>>
> >> > > >>> [1]
> >> https://github.com/pravega/flink-connectors#compatibility-matrix
> >> > > >>> [2]
> >> > > >>>
> >> > > >>
> >> > >
> >> >
> >> https://github.com/pravega/flink-connectors/wiki/Versioning-strategy-for-Flink-connector
> >> > > >>> [3]
> >> https://github.com/pravega/flink-connectors/releases/tag/v0.10.1
> >> > > >>> [4] https://search.maven.org/search?q=pravega-connectors-flink
> >> > > >>>
> >> > > >>> Best Regards,
> >> > > >>> Brian
> >> > > >>>
> >> > > >>>
> >> > > >>> Internal Use - Confidential
> >> > > >>>
> >> > > >>> -----Original Message-----
> >> > > >>> From: Arvid Heise <[email protected]>
> >> > > >>> Sent: Friday, November 19, 2021 4:12 PM
> >> > > >>> To: dev
> >> > > >>> Subject: Re: [DISCUSS] Creating an external connector repository
> >> > > >>>
> >> > > >>>
> >> > > >>> [EXTERNAL EMAIL]
> >> > > >>>
> >> > > >>> Hi everyone,
> >> > > >>>
> >> > > >>> we are currently in the process of setting up the flink-connectors
> >> > repo
> >> > > >>> [1] for new connectors but we hit a wall that we currently cannot
> >> > take:
> >> > > >>> branching model.
> >> > > >>> To reiterate the original motivation of the external connector
> >> repo:
> >> > We
> >> > > >>> want to decouple the release cycle of a connector with Flink.
> >> > However,
> >> > > if
> >> > > >>> we want to support semantic versioning in the connectors with the
> >> > > ability
> >> > > >>> to introduce breaking changes through major version bumps and
> >> support
> >> > > >>> bugfixes on old versions, then we need release branches similar to
> >> > how
> >> > > >>> Flink core operates.
> >> > > >>> Consider two connectors, let's call them kafka and hbase. We have
> >> > kafka
> >> > > >> in
> >> > > >>> version 1.0.X, 1.1.Y (small improvement), 2.0.Z (config option)
> >> > change
> >> > > >> and
> >> > > >>> hbase only on 1.0.A.
> >> > > >>>
> >> > > >>> Now our current assumption was that we can work with a mono-repo
> >> > under
> >> > > >> ASF
> >> > > >>> (flink-connectors). Then, for release-branches, we found 3
> >> options:
> >> > > >>> 1. We would need to create some ugly mess with the cross product
> >> of
> >> > > >>> connector and version: so you have kafka-release-1.0,
> >> > > kafka-release-1.1,
> >> > > >>> kafka-release-2.0, hbase-release-1.0. The main issue is not the
> >> > amount
> >> > > of
> >> > > >>> branches (that's something that git can handle) but there the
> >> state
> >> > of
> >> > > >>> kafka is undefined in hbase-release-1.0. That's a call for
> >> desaster
> >> > and
> >> > > >>> makes releasing connectors very cumbersome (CI would only execute
> >> and
> >> > > >>> publish hbase SNAPSHOTS on hbase-release-1.0).
> >> > > >>> 2. We could avoid the undefined state by having an empty master
> >> and
> >> > > each
> >> > > >>> release branch really only holds the code of the connector. But
> >> > that's
> >> > > >> also
> >> > > >>> not great: any user that looks at the repo and sees no connector
> >> > would
> >> > > >>> assume that it's dead.
> >> > > >>> 3. We could have synced releases similar to the CDC connectors
> >> [2].
> >> > > That
> >> > > >>> means that if any connector introduces a breaking change, all
> >> > > connectors
> >> > > >>> get a new major. I find that quite confusing to a user if hbase
> >> gets
> >> > a
> >> > > >> new
> >> > > >>> release without any change because kafka introduced a breaking
> >> > change.
> >> > > >>>
> >> > > >>> To fully decouple release cycles and CI of connectors, we could
> >> add
> >> > > >>> individual repositories under ASF (flink-connector-kafka,
> >> > > >>> flink-connector-hbase). Then we can apply the same branching
> >> model as
> >> > > >>> before. I quickly checked if there are precedences in the apache
> >> > > >> community
> >> > > >>> for that approach and just by scanning alphabetically I found
> >> cordova
> >> > > >> with
> >> > > >>> 70 and couchdb with 77 apache repos respectively. So it certainly
> >> > seems
> >> > > >>> like other projects approached our problem in that way and the
> >> apache
> >> > > >>> organization is okay with that. I currently expect max 20
> >> additional
> >> > > >> repos
> >> > > >>> for connectors and in the future 10 max each for formats and
> >> > > filesystems
> >> > > >> if
> >> > > >>> we would also move them out at some point in time. So we would be
> >> at
> >> > a
> >> > > >>> total of 50 repos.
> >> > > >>>
> >> > > >>> Note for all options, we need to provide a compability matrix
> >> that we
> >> > > aim
> >> > > >>> to autogenerate.
> >> > > >>>
> >> > > >>> Now for the potential downsides that we internally discussed:
> >> > > >>> - How can we ensure common infra structure code, utilties, and
> >> > quality?
> >> > > >>> I propose to add a flink-connector-common that contains all these
> >> > > things
> >> > > >>> and is added as a git submodule/subtree to the repos.
> >> > > >>> - Do we implicitly discourage connector developers to maintain
> >> more
> >> > > than
> >> > > >>> one connector with a fragmented code base?
> >> > > >>> That is certainly a risk. However, I currently also see few devs
> >> > > working
> >> > > >>> on more than one connector. However, it may actually help keeping
> >> the
> >> > > >> devs
> >> > > >>> that maintain a specific connector on the hook. We could use
> >> github
> >> > > >> issues
> >> > > >>> to track bugs and feature requests and a dev can focus his limited
> >> > time
> >> > > >> on
> >> > > >>> getting that one connector right.
> >> > > >>>
> >> > > >>> So WDYT? Compared to some intermediate suggestions with split
> >> repos,
> >> > > the
> >> > > >>> big difference is that everything remains under Apache umbrella
> >> and
> >> > the
> >> > > >>> Flink community.
> >> > > >>>
> >> > > >>> [1]
> >> > > >>>
> >> > > >>
> >> > >
> >> >
> >> https://urldefense.com/v3/__https://github.com/apache/flink-connectors__;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-XjpYgXzxxweh4$
> >> > > >>> [github[.]com] [2]
> >> > > >>>
> >> > > >>
> >> > >
> >> >
> >> https://urldefense.com/v3/__https://github.com/ververica/flink-cdc-connectors/__;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-XjpYgXzgoPGA8$
> >> > > >>> [github[.]com]
> >> > > >>>
> >> > > >>> On Fri, Nov 12, 2021 at 3:39 PM Arvid Heise <[email protected]>
> >> > wrote:
> >> > > >>>
> >> > > >>>> Hi everyone,
> >> > > >>>>
> >> > > >>>> I created the flink-connectors repo [1] to advance the topic. We
> >> > would
> >> > > >>>> create a proof-of-concept in the next few weeks as a special
> >> branch
> >> > > >>>> that I'd then use for discussions. If the community agrees with
> >> the
> >> > > >>>> approach, that special branch will become the master. If not, we
> >> can
> >> > > >>>> reiterate over it or create competing POCs.
> >> > > >>>>
> >> > > >>>> If someone wants to try things out in parallel, just make sure
> >> that
> >> > > >>>> you are not accidentally pushing POCs to the master.
> >> > > >>>>
> >> > > >>>> As a reminder: We will not move out any current connector from
> >> Flink
> >> > > >>>> at this point in time, so everything in Flink will remain as is
> >> and
> >> > be
> >> > > >>>> maintained there.
> >> > > >>>>
> >> > > >>>> Best,
> >> > > >>>>
> >> > > >>>> Arvid
> >> > > >>>>
> >> > > >>>> [1]
> >> > > >>>>
> >> > >
> >> https://urldefense.com/v3/__https://github.com/apache/flink-connectors
> >> > > >>>>
> >> > __;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-XjpYgXzxxweh4
> >> > > >>>> $ [github[.]com]
> >> > > >>>>
> >> > > >>>> On Fri, Oct 29, 2021 at 6:57 PM Till Rohrmann <
> >> [email protected]
> >> > >
> >> > > >>>> wrote:
> >> > > >>>>
> >> > > >>>>> Hi everyone,
> >> > > >>>>>
> >> > > >>>>>  From the discussion, it seems to me that we have different
> >> > opinions
> >> > > >>>>> whether to have an ASF umbrella repository or to host them
> >> outside
> >> > of
> >> > > >>>>> the ASF. It also seems that this is not really the problem to
> >> > solve.
> >> > > >>>>> Since there are many good arguments for either approach, we
> >> could
> >> > > >>>>> simply start with an ASF umbrella repository and see how people
> >> > adopt
> >> > > >>>>> it. If the individual connectors cannot move fast enough or if
> >> > people
> >> > > >>>>> prefer to not buy into the more heavy-weight ASF processes, then
> >> > they
> >> > > >>>>> can host the code also somewhere else. We simply need to make
> >> sure
> >> > > >>>>> that these connectors are discoverable (e.g. via
> >> flink-packages).
> >> > > >>>>>
> >> > > >>>>> The more important problem seems to be to provide common tooling
> >> > > >>>>> (testing, infrastructure, documentation) that can easily be
> >> reused.
> >> > > >>>>> Similarly, it has become clear that the Flink community needs to
> >> > > >>>>> improve on providing stable APIs. I think it is not realistic to
> >> > > >>>>> first complete these tasks before starting to move connectors to
> >> > > >>>>> dedicated repositories. As Stephan said, creating a connector
> >> > > >>>>> repository will force us to pay more attention to API stability
> >> and
> >> > > >>>>> also to think about which testing tools are required. Hence, I
> >> > > >>>>> believe that starting to add connectors to a different
> >> repository
> >> > > >>>>> than apache/flink will help improve our connector tooling
> >> > (declaring
> >> > > >>>>> testing classes as public, creating a common test utility repo,
> >> > > >>>>> creating a repo
> >> > > >>>>> template) and vice versa. Hence, I like Arvid's proposed
> >> process as
> >> > > >>>>> it will start kicking things off w/o letting this effort fizzle
> >> > out.
> >> > > >>>>>
> >> > > >>>>> Cheers,
> >> > > >>>>> Till
> >> > > >>>>>
> >> > > >>>>> On Thu, Oct 28, 2021 at 11:44 AM Stephan Ewen <[email protected]
> >> >
> >> > > >> wrote:
> >> > > >>>>>> Thank you all, for the nice discussion!
> >> > > >>>>>>
> >> > > >>>>>>  From my point of view, I very much like the idea of putting
> >> > > >>>>>> connectors
> >> > > >>>>> in a
> >> > > >>>>>> separate repository. But I would argue it should be part of
> >> Apache
> >> > > >>>>> Flink,
> >> > > >>>>>> similar to flink-statefun, flink-ml, etc.
> >> > > >>>>>>
> >> > > >>>>>> I share many of the reasons for that:
> >> > > >>>>>>    - As argued many times, reduces complexity of the Flink
> >> repo,
> >> > > >>>>> increases
> >> > > >>>>>> response times of CI, etc.
> >> > > >>>>>>    - Much lower barrier of contribution, because an unstable
> >> > > >>>>>> connector
> >> > > >>>>> would
> >> > > >>>>>> not de-stabilize the whole build. Of course, we would need to
> >> make
> >> > > >>>>>> sure
> >> > > >>>>> we
> >> > > >>>>>> set this up the right way, with connectors having individual CI
> >> > > >>>>>> runs,
> >> > > >>>>> build
> >> > > >>>>>> status, etc. But it certainly seems possible.
> >> > > >>>>>>
> >> > > >>>>>>
> >> > > >>>>>> I would argue some points a bit different than some cases made
> >> > > >> before:
> >> > > >>>>>> (a) I believe the separation would increase connector
> >> stability.
> >> > > >>>>> Because it
> >> > > >>>>>> really forces us to work with the connectors against the APIs
> >> like
> >> > > >>>>>> any external developer. A mono repo is somehow the wrong thing
> >> if
> >> > > >>>>>> you in practice want to actually guarantee stable internal
> >> APIs at
> >> > > >>> some layer.
> >> > > >>>>>> Because the mono repo makes it easy to just change something on
> >> > > >>>>>> both
> >> > > >>>>> sides
> >> > > >>>>>> of the API (provider and consumer) seamlessly.
> >> > > >>>>>>
> >> > > >>>>>> Major refactorings in Flink need to keep all connector API
> >> > > >>>>>> contracts intact, or we need to have a new version of the
> >> > connector
> >> > > >>> API.
> >> > > >>>>>> (b) We may even be able to go towards more lightweight and
> >> > > >>>>>> automated releases over time, even if we stay in Apache Flink
> >> with
> >> > > >>> that repo.
> >> > > >>>>>> This isn't yet fully aligned with the Apache release policies,
> >> > yet,
> >> > > >>>>>> but there are board discussions about whether there can be
> >> > > >>>>>> bot-triggered releases (by dependabot) and how that could fit
> >> into
> >> > > >>> the Apache process.
> >> > > >>>>>> This doesn't seem to be quite there just yet, but seeing that
> >> > those
> >> > > >>>>> start
> >> > > >>>>>> is a good sign, and there is a good chance we can do some
> >> things
> >> > > >>> there.
> >> > > >>>>>> I am not sure whether we should let bots trigger releases,
> >> because
> >> > > >>>>>> a
> >> > > >>>>> final
> >> > > >>>>>> human look at things isn't a bad thing, especially given the
> >> > > >>>>>> popularity
> >> > > >>>>> of
> >> > > >>>>>> software supply chain attacks recently.
> >> > > >>>>>>
> >> > > >>>>>>
> >> > > >>>>>> I do share Chesnay's concerns about complexity in tooling,
> >> though.
> >> > > >>>>>> Both release tooling and test tooling. They are not
> >> incompatible
> >> > > >>>>>> with that approach, but they are a task we need to tackle
> >> during
> >> > > >>>>>> this change which will add additional work.
> >> > > >>>>>>
> >> > > >>>>>>
> >> > > >>>>>>
> >> > > >>>>>> On Tue, Oct 26, 2021 at 10:31 AM Arvid Heise <[email protected]
> >> >
> >> > > >>> wrote:
> >> > > >>>>>>> Hi folks,
> >> > > >>>>>>>
> >> > > >>>>>>> I think some questions came up and I'd like to address the
> >> > > >>>>>>> question of
> >> > > >>>>>> the
> >> > > >>>>>>> timing.
> >> > > >>>>>>>
> >> > > >>>>>>> Could you clarify what release cadence you're thinking of?
> >> > > >>>>>>> There's
> >> > > >>>>> quite
> >> > > >>>>>>>> a big range that fits "more frequent than Flink" (per-commit,
> >> > > >>>>>>>> daily, weekly, bi-weekly, monthly, even bi-monthly).
> >> > > >>>>>>> The short answer is: as often as needed:
> >> > > >>>>>>> - If there is a CVE in a dependency and we need to bump it -
> >> > > >>>>>>> release immediately.
> >> > > >>>>>>> - If there is a new feature merged, release soonish. We may
> >> > > >>>>>>> collect a
> >> > > >>>>> few
> >> > > >>>>>>> successive features before a release.
> >> > > >>>>>>> - If there is a bugfix, release immediately or soonish
> >> depending
> >> > > >>>>>>> on
> >> > > >>>>> the
> >> > > >>>>>>> severity and if there are workarounds available.
> >> > > >>>>>>>
> >> > > >>>>>>> We should not limit ourselves; the whole idea of independent
> >> > > >>>>>>> releases
> >> > > >>>>> is
> >> > > >>>>>>> exactly that you release as needed. There is no release
> >> planning
> >> > > >>>>>>> or anything needed, you just go with a release as if it was an
> >> > > >>>>>>> external artifact.
> >> > > >>>>>>>
> >> > > >>>>>>> (1) is the connector API already stable?
> >> > > >>>>>>>>  From another discussion thread [1], connector API is far
> >> from
> >> > > >>>>> stable.
> >> > > >>>>>>>> Currently, it's hard to build connectors against multiple
> >> Flink
> >> > > >>>>>> versions.
> >> > > >>>>>>>> There are breaking API changes both in 1.12 -> 1.13 and 1.13
> >> ->
> >> > > >>>>>>>> 1.14
> >> > > >>>>>> and
> >> > > >>>>>>>>   maybe also in the future versions,  because Table related
> >> APIs
> >> > > >>>>>>>> are
> >> > > >>>>>> still
> >> > > >>>>>>>> @PublicEvolving and new Sink API is still @Experimental.
> >> > > >>>>>>>>
> >> > > >>>>>>> The question is: what is stable in an evolving system? We
> >> > > >>>>>>> recently discovered that the old SourceFunction needed to be
> >> > > >>>>>>> refined such that cancellation works correctly [1]. So that
> >> > > >>>>>>> interface is in Flink since
> >> > > >>>>> 7
> >> > > >>>>>>> years, heavily used also outside, and we still had to change
> >> the
> >> > > >>>>> contract
> >> > > >>>>>>> in a way that I'd expect any implementer to recheck their
> >> > > >>>>> implementation.
> >> > > >>>>>>> It might not be necessary to change anything and you can
> >> probably
> >> > > >>>>> change
> >> > > >>>>>>> the the code for all Flink versions but still, the interface
> >> was
> >> > > >>>>>>> not
> >> > > >>>>>> stable
> >> > > >>>>>>> in the closest sense.
> >> > > >>>>>>>
> >> > > >>>>>>> If we focus just on API changes on the unified interfaces,
> >> then
> >> > > >>>>>>> we
> >> > > >>>>> expect
> >> > > >>>>>>> one more change to Sink API to support compaction. For Table
> >> API,
> >> > > >>>>> there
> >> > > >>>>>>> will most likely also be some changes in 1.15. So we could
> >> wait
> >> > > >>>>>>> for
> >> > > >>>>> 1.15.
> >> > > >>>>>>> But I'm questioning if that's really necessary because we will
> >> > > >>>>>>> add
> >> > > >>>>> more
> >> > > >>>>>>> functionality beyond 1.15 without breaking API. For example,
> >> we
> >> > > >>>>>>> may
> >> > > >>>>> add
> >> > > >>>>>>> more unified connector metrics. If you want to use it in your
> >> > > >>>>> connector,
> >> > > >>>>>>> you have to support multiple Flink versions anyhow. So rather
> >> > > >>>>>>> then
> >> > > >>>>>> focusing
> >> > > >>>>>>> the discussion on "when is stuff stable", I'd rather focus on
> >> > > >>>>>>> "how
> >> > > >>>>> can we
> >> > > >>>>>>> support building connectors against multiple Flink versions"
> >> and
> >> > > >>>>>>> make
> >> > > >>>>> it
> >> > > >>>>>> as
> >> > > >>>>>>> painless as possible.
> >> > > >>>>>>>
> >> > > >>>>>>> Chesnay pointed out to use different branches for different
> >> Flink
> >> > > >>>>>> versions
> >> > > >>>>>>> which sounds like a good suggestion. With a mono-repo, we
> >> can't
> >> > > >>>>>>> use branches differently anyways (there is no way to have
> >> release
> >> > > >>>>>>> branches
> >> > > >>>>>> per
> >> > > >>>>>>> connector without chaos). In these branches, we could provide
> >> > > >>>>>>> shims to simulate future features in older Flink versions such
> >> > > >>>>>>> that code-wise,
> >> > > >>>>> the
> >> > > >>>>>>> source code of a specific connector may not diverge (much).
> >> For
> >> > > >>>>> example,
> >> > > >>>>>> to
> >> > > >>>>>>> register unified connector metrics, we could simulate the
> >> current
> >> > > >>>>>> approach
> >> > > >>>>>>> also in some utility package of the mono-repo.
> >> > > >>>>>>>
> >> > > >>>>>>> I see the stable core Flink API as a prerequisite for
> >> modularity.
> >> > > >>>>>>> And
> >> > > >>>>>>>> for connectors it is not just the source and sink API (source
> >> > > >>>>>>>> being stable as of 1.14), but everything that is required to
> >> > > >>>>>>>> build and maintain a connector downstream, such as the test
> >> > > >>>>>>>> utilities and infrastructure.
> >> > > >>>>>>>>
> >> > > >>>>>>> That is a very fair point. I'm actually surprised to see that
> >> > > >>>>>>> MiniClusterWithClientResource is not public. I see it being
> >> used
> >> > > >>>>>>> in
> >> > > >>>>> all
> >> > > >>>>>>> connectors, especially outside of Flink. I fear that as long
> >> as
> >> > > >>>>>>> we do
> >> > > >>>>> not
> >> > > >>>>>>> have connectors outside, we will not properly annotate and
> >> > > >>>>>>> maintain
> >> > > >>>>> these
> >> > > >>>>>>> utilties in a classic hen-and-egg-problem. I will outline an
> >> idea
> >> > > >>>>>>> at
> >> > > >>>>> the
> >> > > >>>>>>> end.
> >> > > >>>>>>>
> >> > > >>>>>>>> the connectors need to be adopted and require at least one
> >> > > >>>>>>>> release
> >> > > >>>>> per
> >> > > >>>>>>>> Flink minor release.
> >> > > >>>>>>>> However, this will make the releases of connectors slower,
> >> e.g.
> >> > > >>>>>> maintain
> >> > > >>>>>>>> features for multiple branches and release multiple branches.
> >> > > >>>>>>>> I think the main purpose of having an external connector
> >> > > >>>>>>>> repository
> >> > > >>>>> is
> >> > > >>>>>> in
> >> > > >>>>>>>> order to have "faster releases of connectors"?
> >> > > >>>>>>>>
> >> > > >>>>>>>> Imagine a project with a complex set of dependencies. Let's
> >> say
> >> > > >>>>> Flink
> >> > > >>>>>>>> version A plus Flink reliant dependencies released by other
> >> > > >>>>>>>> projects (Flink-external connectors, Beam, Iceberg, Hudi,
> >> ..).
> >> > > >>>>>>>> We don't want
> >> > > >>>>> a
> >> > > >>>>>>>> situation where we bump the core Flink version to B and
> >> things
> >> > > >>>>>>>> fall apart (interface changes, utilities that were useful but
> >> > > >>>>>>>> not public, transitive dependencies etc.).
> >> > > >>>>>>>>
> >> > > >>>>>>> Yes, that's why I wanted to automate the processes more which
> >> is
> >> > > >>>>>>> not
> >> > > >>>>> that
> >> > > >>>>>>> easy under ASF. Maybe we automate the source provision across
> >> > > >>>>> supported
> >> > > >>>>>>> versions and have 1 vote thread for all versions of a
> >> connector?
> >> > > >>>>>>>
> >> > > >>>>>>>  From the perspective of CDC connector maintainers, the
> >> biggest
> >> > > >>>>> advantage
> >> > > >>>>>> of
> >> > > >>>>>>>> maintaining it outside of the Flink project is that:
> >> > > >>>>>>>> 1) we can have a more flexible and faster release cycle
> >> > > >>>>>>>> 2) we can be more liberal with committership for connector
> >> > > >>>>> maintainers
> >> > > >>>>>>>> which can also attract more committers to help the release.
> >> > > >>>>>>>>
> >> > > >>>>>>>> Personally, I think maintaining one connector repository
> >> under
> >> > > >>>>>>>> the
> >> > > >>>>> ASF
> >> > > >>>>>>> may
> >> > > >>>>>>>> not have the above benefits.
> >> > > >>>>>>>>
> >> > > >>>>>>> Yes, I also feel that ASF is too restrictive for our needs.
> >> But
> >> > > >>>>>>> it
> >> > > >>>>> feels
> >> > > >>>>>>> like there are too many that see it differently and I think we
> >> > > >>>>>>> need
> >> > > >>>>>>>
> >> > > >>>>>>> (2) Flink testability without connectors.
> >> > > >>>>>>>> This is a very good question. How can we guarantee the new
> >> > > >>>>>>>> Source
> >> > > >>>>> and
> >> > > >>>>>>> Sink
> >> > > >>>>>>>> API are stable with only test implementation?
> >> > > >>>>>>>>
> >> > > >>>>>>> We can't and shouldn't. Since the connector repo is managed by
> >> > > >>>>>>> Flink,
> >> > > >>>>> a
> >> > > >>>>>>> Flink release manager needs to check if the Flink connectors
> >> are
> >> > > >>>>> actually
> >> > > >>>>>>> working prior to creating an RC. That's similar to how
> >> > > >>>>>>> flink-shaded
> >> > > >>>>> and
> >> > > >>>>>>> flink core are related.
> >> > > >>>>>>>
> >> > > >>>>>>>
> >> > > >>>>>>> So here is one idea that I had to get things rolling. We are
> >> > > >>>>>>> going to address the external repo iteratively without
> >> > > >>>>>>> compromising what we
> >> > > >>>>>> already
> >> > > >>>>>>> have:
> >> > > >>>>>>> 1.Phase, add new contributions to external repo. We use that
> >> time
> >> > > >>>>>>> to
> >> > > >>>>>> setup
> >> > > >>>>>>> infra accordingly and optimize release processes. We will
> >> > > >>>>>>> identify
> >> > > >>>>> test
> >> > > >>>>>>> utilities that are not yet public/stable and fix that.
> >> > > >>>>>>> 2.Phase, add ports to the new unified interfaces of existing
> >> > > >>>>> connectors.
> >> > > >>>>>>> That requires a previous Flink release to make utilities
> >> stable.
> >> > > >>>>>>> Keep
> >> > > >>>>> old
> >> > > >>>>>>> interfaces in flink-core.
> >> > > >>>>>>> 3.Phase, remove old interfaces in flink-core of some
> >> connectors
> >> > > >>>>>>> (tbd
> >> > > >>>>> at a
> >> > > >>>>>>> later point).
> >> > > >>>>>>> 4.Phase, optionally move all remaining connectors (tbd at a
> >> later
> >> > > >>>>> point).
> >> > > >>>>>>> I'd envision having ~3 months between the starting the
> >> different
> >> > > >>>>> phases.
> >> > > >>>>>>> WDYT?
> >> > > >>>>>>>
> >> > > >>>>>>>
> >> > > >>>>>>> [1]
> >> > > >>>>>>>
> >> > https://urldefense.com/v3/__https://issues.apache.org/jira/browse
> >> > > >>>>>>>
> >> /FLINK-23527__;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgd
> >> > > >>>>>>> ke_-XjpYgX2sIvAP4$ [issues[.]apache[.]org]
> >> > > >>>>>>>
> >> > > >>>>>>> On Thu, Oct 21, 2021 at 7:12 AM Kyle Bendickson <
> >> [email protected]
> >> > >
> >> > > >>>>> wrote:
> >> > > >>>>>>>> Hi all,
> >> > > >>>>>>>>
> >> > > >>>>>>>> My name is Kyle and I’m an open source developer primarily
> >> > > >>>>>>>> focused
> >> > > >>>>> on
> >> > > >>>>>>>> Apache Iceberg.
> >> > > >>>>>>>>
> >> > > >>>>>>>> I’m happy to help clarify or elaborate on any aspect of our
> >> > > >>>>> experience
> >> > > >>>>>>>> working on a relatively decoupled connector that is
> >> downstream
> >> > > >>>>>>>> and
> >> > > >>>>>> pretty
> >> > > >>>>>>>> popular.
> >> > > >>>>>>>>
> >> > > >>>>>>>> I’d also love to be able to contribute or assist in any way I
> >> > > >> can.
> >> > > >>>>>>>> I don’t mean to thread jack, but are there any meetings or
> >> > > >>>>>>>> community
> >> > > >>>>>> sync
> >> > > >>>>>>>> ups, specifically around the connector APIs, that I might
> >> join
> >> > > >>>>>>>> / be
> >> > > >>>>>>> invited
> >> > > >>>>>>>> to?
> >> > > >>>>>>>>
> >> > > >>>>>>>> I did want to add that even though I’ve experienced some of
> >> the
> >> > > >>>>>>>> pain
> >> > > >>>>>>> points
> >> > > >>>>>>>> of integrating with an evolving system / API (catalog support
> >> > > >>>>>>>> is
> >> > > >>>>>>> generally
> >> > > >>>>>>>> speaking pretty new everywhere really in this space), I also
> >> > > >>>>>>>> agree personally that you shouldn’t slow down development
> >> > > >>>>>>>> velocity too
> >> > > >>>>> much
> >> > > >>>>>> for
> >> > > >>>>>>>> the sake of external connector. Getting to a performant and
> >> > > >>>>>>>> stable
> >> > > >>>>>> place
> >> > > >>>>>>>> should be the primary goal, and slowing that down to support
> >> > > >>>>> stragglers
> >> > > >>>>>>>> will (in my personal opinion) always be a losing game. Some
> >> > > >>>>>>>> folks
> >> > > >>>>> will
> >> > > >>>>>>>> simply stay behind on versions regardless until they have to
> >> > > >>>>> upgrade.
> >> > > >>>>>>>> I am working on ensuring that the Iceberg community stays
> >> > > >>>>>>>> within 1-2 versions of Flink, so that we can help provide
> >> more
> >> > > >>>>>>>> feedback or
> >> > > >>>>>>> contribute
> >> > > >>>>>>>> things that might make our ability to support multiple Flink
> >> > > >>>>> runtimes /
> >> > > >>>>>>>> versions with one project / codebase and minimal to no
> >> > > >>>>>>>> reflection
> >> > > >>>>> (our
> >> > > >>>>>>>> desired goal).
> >> > > >>>>>>>>
> >> > > >>>>>>>> If there’s anything I can do or any way I can be of
> >> assistance,
> >> > > >>>>> please
> >> > > >>>>>>>> don’t hesitate to reach out. Or find me on ASF slack 😀
> >> > > >>>>>>>>
> >> > > >>>>>>>> I greatly appreciate your general concern for the needs of
> >> > > >>>>> downstream
> >> > > >>>>>>>> connector integrators!
> >> > > >>>>>>>>
> >> > > >>>>>>>> Cheers
> >> > > >>>>>>>> Kyle Bendickson (GitHub: kbendick) Open Source Developer kyle
> >> > > >>>>>>>> [at] tabular [dot] io
> >> > > >>>>>>>>
> >> > > >>>>>>>> On Wed, Oct 20, 2021 at 11:35 AM Thomas Weise <
> >> [email protected]>
> >> > > >>>>> wrote:
> >> > > >>>>>>>>> Hi,
> >> > > >>>>>>>>>
> >> > > >>>>>>>>> I see the stable core Flink API as a prerequisite for
> >> > > >>> modularity.
> >> > > >>>>> And
> >> > > >>>>>>>>> for connectors it is not just the source and sink API
> >> (source
> >> > > >>>>> being
> >> > > >>>>>>>>> stable as of 1.14), but everything that is required to build
> >> > > >>>>>>>>> and maintain a connector downstream, such as the test
> >> > > >>>>>>>>> utilities and infrastructure.
> >> > > >>>>>>>>>
> >> > > >>>>>>>>> Without the stable surface of core Flink, changes will leak
> >> > > >>>>>>>>> into downstream dependencies and force lock step updates.
> >> > > >>>>>>>>> Refactoring across N repos is more painful than a single
> >> > > >>>>>>>>> repo. Those with experience developing downstream of Flink
> >> > > >>>>>>>>> will know the pain, and
> >> > > >>>>>> that
> >> > > >>>>>>>>> isn't limited to connectors. I don't remember a Flink "minor
> >> > > >>>>> version"
> >> > > >>>>>>>>> update that was just a dependency version change and did not
> >> > > >>>>>>>>> force other downstream changes.
> >> > > >>>>>>>>>
> >> > > >>>>>>>>> Imagine a project with a complex set of dependencies. Let's
> >> > > >>>>>>>>> say
> >> > > >>>>> Flink
> >> > > >>>>>>>>> version A plus Flink reliant dependencies released by other
> >> > > >>>>> projects
> >> > > >>>>>>>>> (Flink-external connectors, Beam, Iceberg, Hudi, ..). We
> >> > > >>>>>>>>> don't
> >> > > >>>>> want a
> >> > > >>>>>>>>> situation where we bump the core Flink version to B and
> >> > > >>>>>>>>> things
> >> > > >>>>> fall
> >> > > >>>>>>>>> apart (interface changes, utilities that were useful but not
> >> > > >>>>> public,
> >> > > >>>>>>>>> transitive dependencies etc.).
> >> > > >>>>>>>>>
> >> > > >>>>>>>>> The discussion here also highlights the benefits of keeping
> >> > > >>>>> certain
> >> > > >>>>>>>>> connectors outside Flink. Whether that is due to difference
> >> > > >>>>>>>>> in developer community, maturity of the connectors, their
> >> > > >>>>>>>>> specialized/limited usage etc. I would like to see that as a
> >> > > >>>>>>>>> sign
> >> > > >>>>> of
> >> > > >>>>>> a
> >> > > >>>>>>>>> growing ecosystem and most of the ideas that Arvid has put
> >> > > >>>>>>>>> forward would benefit further growth of the connector
> >> > > >> ecosystem.
> >> > > >>>>>>>>> As for keeping connectors within Apache Flink: I prefer that
> >> > > >>>>>>>>> as
> >> > > >>>>> the
> >> > > >>>>>>>>> path forward for "essential" connectors like FileSource,
> >> > > >>>>> KafkaSource,
> >> > > >>>>>>>>> ... And we can still achieve a more flexible and faster
> >> > > >>>>>>>>> release
> >> > > >>>>>> cycle.
> >> > > >>>>>>>>> Thanks,
> >> > > >>>>>>>>> Thomas
> >> > > >>>>>>>>>
> >> > > >>>>>>>>>
> >> > > >>>>>>>>>
> >> > > >>>>>>>>>
> >> > > >>>>>>>>>
> >> > > >>>>>>>>> On Wed, Oct 20, 2021 at 3:32 AM Jark Wu <[email protected]>
> >> > > >>> wrote:
> >> > > >>>>>>>>>> Hi Konstantin,
> >> > > >>>>>>>>>>
> >> > > >>>>>>>>>>> the connectors need to be adopted and require at least
> >> > > >>>>>>>>>>> one
> >> > > >>>>>> release
> >> > > >>>>>>>> per
> >> > > >>>>>>>>>> Flink minor release.
> >> > > >>>>>>>>>> However, this will make the releases of connectors slower,
> >> > > >>> e.g.
> >> > > >>>>>>>> maintain
> >> > > >>>>>>>>>> features for multiple branches and release multiple
> >> > > >> branches.
> >> > > >>>>>>>>>> I think the main purpose of having an external connector
> >> > > >>>>> repository
> >> > > >>>>>>> is
> >> > > >>>>>>>> in
> >> > > >>>>>>>>>> order to have "faster releases of connectors"?
> >> > > >>>>>>>>>>
> >> > > >>>>>>>>>>
> >> > > >>>>>>>>>>  From the perspective of CDC connector maintainers, the
> >> > > >>>>>>>>>> biggest
> >> > > >>>>>>>> advantage
> >> > > >>>>>>>>> of
> >> > > >>>>>>>>>> maintaining it outside of the Flink project is that:
> >> > > >>>>>>>>>> 1) we can have a more flexible and faster release cycle
> >> > > >>>>>>>>>> 2) we can be more liberal with committership for connector
> >> > > >>>>>>> maintainers
> >> > > >>>>>>>>>> which can also attract more committers to help the release.
> >> > > >>>>>>>>>>
> >> > > >>>>>>>>>> Personally, I think maintaining one connector repository
> >> > > >>>>>>>>>> under
> >> > > >>>>> the
> >> > > >>>>>>> ASF
> >> > > >>>>>>>>> may
> >> > > >>>>>>>>>> not have the above benefits.
> >> > > >>>>>>>>>>
> >> > > >>>>>>>>>> Best,
> >> > > >>>>>>>>>> Jark
> >> > > >>>>>>>>>>
> >> > > >>>>>>>>>> On Wed, 20 Oct 2021 at 15:14, Konstantin Knauf <
> >> > > >>>>> [email protected]>
> >> > > >>>>>>>>> wrote:
> >> > > >>>>>>>>>>> Hi everyone,
> >> > > >>>>>>>>>>>
> >> > > >>>>>>>>>>> regarding the stability of the APIs. I think everyone
> >> > > >>>>>>>>>>> agrees
> >> > > >>>>> that
> >> > > >>>>>>>>>>> connector APIs which are stable across minor versions
> >> > > >>>>>> (1.13->1.14)
> >> > > >>>>>>>> are
> >> > > >>>>>>>>> the
> >> > > >>>>>>>>>>> mid-term goal. But:
> >> > > >>>>>>>>>>>
> >> > > >>>>>>>>>>> a) These APIs are still quite young, and we shouldn't
> >> > > >>>>>>>>>>> make
> >> > > >>>>> them
> >> > > >>>>>>>> @Public
> >> > > >>>>>>>>>>> prematurely either.
> >> > > >>>>>>>>>>>
> >> > > >>>>>>>>>>> b) Isn't this *mostly* orthogonal to where the connector
> >> > > >>>>>>>>>>> code
> >> > > >>>>>>> lives?
> >> > > >>>>>>>>> Yes,
> >> > > >>>>>>>>>>> as long as there are breaking changes, the connectors
> >> > > >>>>>>>>>>> need to
> >> > > >>>>> be
> >> > > >>>>>>>>> adopted
> >> > > >>>>>>>>>>> and require at least one release per Flink minor release.
> >> > > >>>>>>>>>>> Documentation-wise this can be addressed via a
> >> > > >>>>>>>>>>> compatibility
> >> > > >>>>>> matrix
> >> > > >>>>>>>> for
> >> > > >>>>>>>>>>> each connector as Arvid suggested. IMO we shouldn't block
> >> > > >>>>>>>>>>> this
> >> > > >>>>>>> effort
> >> > > >>>>>>>>> on
> >> > > >>>>>>>>>>> the stability of the APIs.
> >> > > >>>>>>>>>>>
> >> > > >>>>>>>>>>> Cheers,
> >> > > >>>>>>>>>>>
> >> > > >>>>>>>>>>> Konstantin
> >> > > >>>>>>>>>>>
> >> > > >>>>>>>>>>>
> >> > > >>>>>>>>>>>
> >> > > >>>>>>>>>>> On Wed, Oct 20, 2021 at 8:56 AM Jark Wu
> >> > > >>>>>>>>>>> <[email protected]>
> >> > > >>>>>> wrote:
> >> > > >>>>>>>>>>>> Hi,
> >> > > >>>>>>>>>>>>
> >> > > >>>>>>>>>>>> I think Thomas raised very good questions and would like
> >> > > >>>>>>>>>>>> to
> >> > > >>>>> know
> >> > > >>>>>>>> your
> >> > > >>>>>>>>>>>> opinions if we want to move connectors out of flink in
> >> > > >>>>>>>>>>>> this
> >> > > >>>>>>> version.
> >> > > >>>>>>>>>>>> (1) is the connector API already stable?
> >> > > >>>>>>>>>>>>> Separate releases would only make sense if the core
> >> > > >>>>>>>>>>>>> Flink
> >> > > >>>>>>> surface
> >> > > >>>>>>>> is
> >> > > >>>>>>>>>>>>> fairly stable though. As evident from Iceberg (and
> >> > > >>>>>>>>>>>>> also
> >> > > >>>>> Beam),
> >> > > >>>>>>>>> that's
> >> > > >>>>>>>>>>>>> not the case currently. We should probably focus on
> >> > > >>>>> addressing
> >> > > >>>>>>> the
> >> > > >>>>>>>>>>>>> stability first, before splitting code. A success
> >> > > >>>>>>>>>>>>> criteria
> >> > > >>>>>> could
> >> > > >>>>>>>> be
> >> > > >>>>>>>>>>>>> that we are able to build Iceberg and Beam against
> >> > > >>>>>>>>>>>>> multiple
> >> > > >>>>>>> Flink
> >> > > >>>>>>>>>>>>> versions w/o the need to change code. The goal would
> >> > > >>>>>>>>>>>>> be
> >> > > >>>>> that
> >> > > >>>>>> no
> >> > > >>>>>>>>>>>>> connector breaks when we make changes to Flink core.
> >> > > >>>>>>>>>>>>> Until
> >> > > >>>>>>> that's
> >> > > >>>>>>>>> the
> >> > > >>>>>>>>>>>>> case, code separation creates a setup where 1+1 or N+1
> >> > > >>>>>>>> repositories
> >> > > >>>>>>>>>>>>> need to move lock step.
> >> > > >>>>>>>>>>>>  From another discussion thread [1], connector API is far
> >> > > >>>>>>>>>>>> from
> >> > > >>>>>>>> stable.
> >> > > >>>>>>>>>>>> Currently, it's hard to build connectors against
> >> > > >>>>>>>>>>>> multiple
> >> > > >>>>> Flink
> >> > > >>>>>>>>> versions.
> >> > > >>>>>>>>>>>> There are breaking API changes both in 1.12 -> 1.13 and
> >> > > >>>>>>>>>>>> 1.13
> >> > > >>>>> ->
> >> > > >>>>>>> 1.14
> >> > > >>>>>>>>> and
> >> > > >>>>>>>>>>>>   maybe also in the future versions,  because Table
> >> > > >>>>>>>>>>>> related
> >> > > >>>>> APIs
> >> > > >>>>>>> are
> >> > > >>>>>>>>> still
> >> > > >>>>>>>>>>>> @PublicEvolving and new Sink API is still @Experimental.
> >> > > >>>>>>>>>>>>
> >> > > >>>>>>>>>>>>
> >> > > >>>>>>>>>>>> (2) Flink testability without connectors.
> >> > > >>>>>>>>>>>>> Flink w/o Kafka connector (and few others) isn't
> >> > > >>>>>>>>>>>>> viable. Testability of Flink was already brought up,
> >> > > >>>>>>>>>>>>> can we
> >> > > >>>>>>> really
> >> > > >>>>>>>>>>>>> certify a Flink core release without Kafka connector?
> >> > > >>>>>>>>>>>>> Maybe
> >> > > >>>>>>> those
> >> > > >>>>>>>>>>>>> connectors that are used in Flink e2e tests to
> >> > > >>>>>>>>>>>>> validate
> >> > > >>>>>>>>> functionality
> >> > > >>>>>>>>>>>>> of core Flink should not be broken out?
> >> > > >>>>>>>>>>>> This is a very good question. How can we guarantee the
> >> > > >>>>>>>>>>>> new
> >> > > >>>>>> Source
> >> > > >>>>>>>> and
> >> > > >>>>>>>>> Sink
> >> > > >>>>>>>>>>>> API are stable with only test implementation?
> >> > > >>>>>>>>>>>>
> >> > > >>>>>>>>>>>>
> >> > > >>>>>>>>>>>> Best,
> >> > > >>>>>>>>>>>> Jark
> >> > > >>>>>>>>>>>>
> >> > > >>>>>>>>>>>>
> >> > > >>>>>>>>>>>>
> >> > > >>>>>>>>>>>>
> >> > > >>>>>>>>>>>>
> >> > > >>>>>>>>>>>> On Tue, 19 Oct 2021 at 23:56, Chesnay Schepler <
> >> > > >>>>>>> [email protected]>
> >> > > >>>>>>>>>>>> wrote:
> >> > > >>>>>>>>>>>>
> >> > > >>>>>>>>>>>>> Could you clarify what release cadence you're thinking
> >> > > >>> of?
> >> > > >>>>>>> There's
> >> > > >>>>>>>>> quite
> >> > > >>>>>>>>>>>>> a big range that fits "more frequent than Flink"
> >> > > >>>>> (per-commit,
> >> > > >>>>>>>> daily,
> >> > > >>>>>>>>>>>>> weekly, bi-weekly, monthly, even bi-monthly).
> >> > > >>>>>>>>>>>>>
> >> > > >>>>>>>>>>>>> On 19/10/2021 14:15, Martijn Visser wrote:
> >> > > >>>>>>>>>>>>>> Hi all,
> >> > > >>>>>>>>>>>>>>
> >> > > >>>>>>>>>>>>>> I think it would be a huge benefit if we can achieve
> >> > > >>>>>>>>>>>>>> more
> >> > > >>>>>>>> frequent
> >> > > >>>>>>>>>>>>> releases
> >> > > >>>>>>>>>>>>>> of connectors, which are not bound to the release
> >> > > >>>>>>>>>>>>>> cycle
> >> > > >>>>> of
> >> > > >>>>>>> Flink
> >> > > >>>>>>>>>>>> itself.
> >> > > >>>>>>>>>>>>> I
> >> > > >>>>>>>>>>>>>> agree that in order to get there, we need to have
> >> > > >>>>>>>>>>>>>> stable
> >> > > >>>>>>>>> interfaces
> >> > > >>>>>>>>>>>> which
> >> > > >>>>>>>>>>>>>> are trustworthy and reliable, so they can be safely
> >> > > >>>>>>>>>>>>>> used
> >> > > >>>>> by
> >> > > >>>>>>>> those
> >> > > >>>>>>>>>>>>>> connectors. I do think that work still needs to be
> >> > > >>>>>>>>>>>>>> done
> >> > > >>>>> on
> >> > > >>>>>>> those
> >> > > >>>>>>>>>>>>>> interfaces, but I am confident that we can get there
> >> > > >>>>> from a
> >> > > >>>>>>>> Flink
> >> > > >>>>>>>>>>>>>> perspective.
> >> > > >>>>>>>>>>>>>>
> >> > > >>>>>>>>>>>>>> I am worried that we would not be able to achieve
> >> > > >>>>>>>>>>>>>> those
> >> > > >>>>>>> frequent
> >> > > >>>>>>>>>>>> releases
> >> > > >>>>>>>>>>>>>> of connectors if we are putting these connectors
> >> > > >>>>>>>>>>>>>> under
> >> > > >>>>> the
> >> > > >>>>>>>> Apache
> >> > > >>>>>>>>>>>>> umbrella,
> >> > > >>>>>>>>>>>>>> because that means that for each connector release
> >> > > >>>>>>>>>>>>>> we
> >> > > >>>>> have
> >> > > >>>>>> to
> >> > > >>>>>>>>> follow
> >> > > >>>>>>>>>>>> the
> >> > > >>>>>>>>>>>>>> Apache release creation process. This requires a lot
> >> > > >>>>>>>>>>>>>> of
> >> > > >>>>>> manual
> >> > > >>>>>>>>> steps
> >> > > >>>>>>>>>>>> and
> >> > > >>>>>>>>>>>>>> prohibits automation and I think it would be hard to
> >> > > >>>>> scale
> >> > > >>>>>> out
> >> > > >>>>>>>>>>>> frequent
> >> > > >>>>>>>>>>>>>> releases of connectors. I'm curious how others think
> >> > > >>>>>>>>>>>>>> this
> >> > > >>>>>>>>> challenge
> >> > > >>>>>>>>>>>> could
> >> > > >>>>>>>>>>>>>> be solved.
> >> > > >>>>>>>>>>>>>>
> >> > > >>>>>>>>>>>>>> Best regards,
> >> > > >>>>>>>>>>>>>>
> >> > > >>>>>>>>>>>>>> Martijn
> >> > > >>>>>>>>>>>>>>
> >> > > >>>>>>>>>>>>>> On Mon, 18 Oct 2021 at 22:22, Thomas Weise <
> >> > > >>>>> [email protected]>
> >> > > >>>>>>>>> wrote:
> >> > > >>>>>>>>>>>>>>> Thanks for initiating this discussion.
> >> > > >>>>>>>>>>>>>>>
> >> > > >>>>>>>>>>>>>>> There are definitely a few things that are not
> >> > > >>>>>>>>>>>>>>> optimal
> >> > > >>>>> with
> >> > > >>>>>>> our
> >> > > >>>>>>>>>>>>>>> current management of connectors. I would not
> >> > > >>>>> necessarily
> >> > > >>>>>>>>>>>> characterize
> >> > > >>>>>>>>>>>>>>> it as a "mess" though. As the points raised so far
> >> > > >>>>> show, it
> >> > > >>>>>>>> isn't
> >> > > >>>>>>>>>>>> easy
> >> > > >>>>>>>>>>>>>>> to find a solution that balances competing
> >> > > >>>>>>>>>>>>>>> requirements
> >> > > >>>>> and
> >> > > >>>>>>>>> leads to
> >> > > >>>>>>>>>>>> a
> >> > > >>>>>>>>>>>>>>> net improvement.
> >> > > >>>>>>>>>>>>>>>
> >> > > >>>>>>>>>>>>>>> It would be great if we can find a setup that
> >> > > >>>>>>>>>>>>>>> allows for
> >> > > >>>>>>>>> connectors
> >> > > >>>>>>>>>>>> to
> >> > > >>>>>>>>>>>>>>> be released independently of core Flink and that
> >> > > >>>>>>>>>>>>>>> each
> >> > > >>>>>>> connector
> >> > > >>>>>>>>> can
> >> > > >>>>>>>>>>>> be
> >> > > >>>>>>>>>>>>>>> released separately. Flink already has separate
> >> > > >>>>>>>>>>>>>>> releases (flink-shaded), so that by itself isn't a
> >> > > >>> new thing.
> >> > > >>>>>>>>> Per-connector
> >> > > >>>>>>>>>>>>>>> releases would need to allow for more frequent
> >> > > >>>>>>>>>>>>>>> releases
> >> > > >>>>>>>> (without
> >> > > >>>>>>>>> the
> >> > > >>>>>>>>>>>>>>> baggage that a full Flink release comes with).
> >> > > >>>>>>>>>>>>>>>
> >> > > >>>>>>>>>>>>>>> Separate releases would only make sense if the core
> >> > > >>>>> Flink
> >> > > >>>>>>>>> surface is
> >> > > >>>>>>>>>>>>>>> fairly stable though. As evident from Iceberg (and
> >> > > >>>>>>>>>>>>>>> also
> >> > > >>>>>>> Beam),
> >> > > >>>>>>>>> that's
> >> > > >>>>>>>>>>>>>>> not the case currently. We should probably focus on
> >> > > >>>>>>> addressing
> >> > > >>>>>>>>> the
> >> > > >>>>>>>>>>>>>>> stability first, before splitting code. A success
> >> > > >>>>> criteria
> >> > > >>>>>>>> could
> >> > > >>>>>>>>> be
> >> > > >>>>>>>>>>>>>>> that we are able to build Iceberg and Beam against
> >> > > >>>>> multiple
> >> > > >>>>>>>> Flink
> >> > > >>>>>>>>>>>>>>> versions w/o the need to change code. The goal
> >> > > >>>>>>>>>>>>>>> would be
> >> > > >>>>>> that
> >> > > >>>>>>> no
> >> > > >>>>>>>>>>>>>>> connector breaks when we make changes to Flink core.
> >> > > >>>>> Until
> >> > > >>>>>>>>> that's the
> >> > > >>>>>>>>>>>>>>> case, code separation creates a setup where 1+1 or
> >> > > >>>>>>>>>>>>>>> N+1
> >> > > >>>>>>>>> repositories
> >> > > >>>>>>>>>>>>>>> need to move lock step.
> >> > > >>>>>>>>>>>>>>>
> >> > > >>>>>>>>>>>>>>> Regarding some connectors being more important for
> >> > > >>>>>>>>>>>>>>> Flink
> >> > > >>>>>> than
> >> > > >>>>>>>>> others:
> >> > > >>>>>>>>>>>>>>> That's a fact. Flink w/o Kafka connector (and few
> >> > > >>>>> others)
> >> > > >>>>>>> isn't
> >> > > >>>>>>>>>>>>>>> viable. Testability of Flink was already brought
> >> > > >>>>>>>>>>>>>>> up,
> >> > > >>>>> can we
> >> > > >>>>>>>>> really
> >> > > >>>>>>>>>>>>>>> certify a Flink core release without Kafka
> >> > > >> connector?
> >> > > >>>>> Maybe
> >> > > >>>>>>>> those
> >> > > >>>>>>>>>>>>>>> connectors that are used in Flink e2e tests to
> >> > > >>>>>>>>>>>>>>> validate
> >> > > >>>>>>>>> functionality
> >> > > >>>>>>>>>>>>>>> of core Flink should not be broken out?
> >> > > >>>>>>>>>>>>>>>
> >> > > >>>>>>>>>>>>>>> Finally, I think that the connectors that move into
> >> > > >>>>>> separate
> >> > > >>>>>>>>> repos
> >> > > >>>>>>>>>>>>>>> should remain part of the Apache Flink project.
> >> > > >>>>>>>>>>>>>>> Larger
> >> > > >>>>>>>>> organizations
> >> > > >>>>>>>>>>>>>>> tend to approve the use of and contribution to open
> >> > > >>>>> source
> >> > > >>>>>> at
> >> > > >>>>>>>> the
> >> > > >>>>>>>>>>>>>>> project level. Sometimes it is everything ASF. More
> >> > > >>>>> often
> >> > > >>>>>> it
> >> > > >>>>>>> is
> >> > > >>>>>>>>>>>>>>> "Apache Foo". It would be fatal to end up with a
> >> > > >>>>> patchwork
> >> > > >>>>>> of
> >> > > >>>>>>>>>>>> projects
> >> > > >>>>>>>>>>>>>>> with potentially different licenses and governance
> >> > > >>>>>>>>>>>>>>> to
> >> > > >>>>>> arrive
> >> > > >>>>>>>> at a
> >> > > >>>>>>>>>>>>>>> working Flink setup. This may mean we prioritize
> >> > > >>>>> usability
> >> > > >>>>>>> over
> >> > > >>>>>>>>>>>>>>> developer convenience, if that's in the best
> >> > > >>>>>>>>>>>>>>> interest of
> >> > > >>>>>>> Flink
> >> > > >>>>>>>>> as a
> >> > > >>>>>>>>>>>>>>> whole.
> >> > > >>>>>>>>>>>>>>>
> >> > > >>>>>>>>>>>>>>> Thanks,
> >> > > >>>>>>>>>>>>>>> Thomas
> >> > > >>>>>>>>>>>>>>>
> >> > > >>>>>>>>>>>>>>>
> >> > > >>>>>>>>>>>>>>>
> >> > > >>>>>>>>>>>>>>> On Mon, Oct 18, 2021 at 6:59 AM Chesnay Schepler <
> >> > > >>>>>>>>> [email protected]
> >> > > >>>>>>>>>>>>>>> wrote:
> >> > > >>>>>>>>>>>>>>>> Generally, the issues are reproducibility and
> >> > > >>> control.
> >> > > >>>>>>>>>>>>>>>> Stuffs completely broken on the Flink side for a
> >> > > >>> week?
> >> > > >>>>>> Well
> >> > > >>>>>>>>> then so
> >> > > >>>>>>>>>>>> are
> >> > > >>>>>>>>>>>>>>>> the connector repos.
> >> > > >>>>>>>>>>>>>>>> (As-is) You can't go back to a previous version of
> >> > > >>>>>>>>>>>>>>>> the
> >> > > >>>>>>>> snapshot.
> >> > > >>>>>>>>>>>> Which
> >> > > >>>>>>>>>>>>>>>> also means that checking out older commits can be
> >> > > >>>>>>> problematic
> >> > > >>>>>>>>>>>> because
> >> > > >>>>>>>>>>>>>>>> you'd still work against the latest snapshots, and
> >> > > >>>>>>>>>>>>>>>> they
> >> > > >>>>>> not
> >> > > >>>>>>> be
> >> > > >>>>>>>>>>>>>>>> compatible with each other.
> >> > > >>>>>>>>>>>>>>>>
> >> > > >>>>>>>>>>>>>>>>
> >> > > >>>>>>>>>>>>>>>> On 18/10/2021 15:22, Arvid Heise wrote:
> >> > > >>>>>>>>>>>>>>>>> I was actually betting on snapshots versions.
> >> > > >>>>>>>>>>>>>>>>> What are
> >> > > >>>>>> the
> >> > > >>>>>>>>> limits?
> >> > > >>>>>>>>>>>>>>>>> Obviously, we can only do a release of a 1.15
> >> > > >>>>> connector
> >> > > >>>>>>> after
> >> > > >>>>>>>>> 1.15
> >> > > >>>>>>>>>>>> is
> >> > > >>>>>>>>>>>>>>>>> release.
> >> > > >>>>>>>>>>>>>
> >> > > >>>>>>>>>>>
> >> > > >>>>>>>>>>> --
> >> > > >>>>>>>>>>>
> >> > > >>>>>>>>>>> Konstantin Knauf
> >> > > >>>>>>>>>>>
> >> > > >>>>>>>>>>> https://urldefense.com/v3/__https://twitter.com/snntrable
> >> > > >>>>>>>>>>> __;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-
> >> > > >>>>>>>>>>> XjpYgX5MUy9M4$ [twitter[.]com]
> >> > > >>>>>>>>>>>
> >> > > >>>>>>>>>>> https://urldefense.com/v3/__https://github.com/knaufk__;!
> >> > > >>>>>>>>>>> !LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-XjpY
> >> > > >>>>>>>>>>> gXyX8u50S$ [github[.]com]
> >> > > >>>>>>>>>>>
> >> > >
> >> > >
> >> >
> >>
> >

Re: [DISCUSS] Creating an external connector repository

Reply via email to