Re: [DISCUSS] Creating an external connector repository

Till Rohrmann Thu, 09 Dec 2021 12:41:14 -0800

Sorry if I was a bit unclear. +1 for the single repo per connector approach.


Cheers,
Till

On Thu, Dec 9, 2021 at 5:41 PM Till Rohrmann <trohrm...@apache.org> wrote:

> +1 for the single repo approach.
>
> Cheers,
> Till
>
> On Thu, Dec 9, 2021 at 3:54 PM Martijn Visser <mart...@ververica.com>
> wrote:
>
>> I also agree that it feels more natural to go with a repo for each
>> individual connector. Each repository can be made available at
>> flink-packages.org so users can find them, next to referring to them in
>> documentation. +1 from my side.
>>
>> On Thu, 9 Dec 2021 at 15:38, Arvid Heise <ar...@apache.org> wrote:
>>
>> > Hi all,
>> >
>> > We tried out Chesnay's proposal and went with Option 2. Unfortunately,
>> we
>> > experienced tough nuts to crack and feel like we hit a dead end:
>> > - The main pain point with the outlined Frankensteinian connector repo
>> is
>> > how to handle shared code / infra code. If we have it in some <common>
>> > branch, then we need to merge the common branch in the connector branch
>> on
>> > update. However, it's unclear to me how improvements in the common
>> branch
>> > that naturally appear while working on a specific connector go back into
>> > the common branch. You can't use a pull request from your branch or else
>> > your connector code would poison the connector-less common branch. So
>> you
>> > would probably manually copy the files over to a common branch and
>> create a
>> > PR branch for that.
>> > - A weird solution could be to have the common branch as a submodule in
>> the
>> > repo itself (if that's even possible). I'm sure that this setup would
>> blow
>> > up the minds of all newcomers.
>> > - Similarly, it's mandatory to have safeguards against code from
>> connector
>> > A poisoning connector B, common, or main. I had some similar setup in
>> the
>> > past and code from two "distinct" branch types constantly swept over.
>> > - We could also say that we simply release <common> independently and
>> just
>> > have a maven (SNAPSHOT) dependency on it. But that would create a weird
>> > flow if you need to change in common where you need to constantly switch
>> > branches back and forth.
>> > - In general, Frankensteinian's approach is very switch intensive. If
>> you
>> > maintain 3 connectors and need to fix 1 build stability each at the same
>> > time (quite common nowadays for some reason) and you have 2 review
>> rounds,
>> > you need to switch branches 9 times ignoring changes to common.
>> >
>> > Additionally, we still have the rather user/dev unfriendly main that is
>> > mostly empty. I'm also not sure we can generate an overview README.md to
>> > make it more friendly here because in theory every connector branch
>> should
>> > be based on main and we would get merge conflicts.
>> >
>> > I'd like to propose once again to go with individual repositories.
>> > - The only downside that we discussed so far is that we have more
>> initial
>> > setup to do. Since we organically grow the number of
>> connector/repositories
>> > that load is quite distributed. We can offer templates after finding a
>> good
>> > approach that can even be used by outside organizations.
>> > - Regarding secrets, I think it's actually an advantage that the Kafka
>> > connector has no access to the AWS secrets. If there are secrets to be
>> > shared across connectors, we can and should use Azure's Variable Groups
>> (I
>> > have used it in the past to share Nexus creds across repos). That would
>> > also make rotation easy.
>> > - Working on different connectors would be rather easy as all modern IDE
>> > support multiple repo setups in the same project. You still need to do
>> > multiple releases in case you update common code (either accessed
>> through
>> > Nexus or git submodule) and you want to release your connector.
>> > - There is no difference in respect to how many CI runs there in both
>> > approaches.
>> > - Individual repositories also have the advantage of allowing external
>> > incubation. Let's assume someone builds connector A and hosts it in
>> their
>> > organization (very common setup). If they want to contribute the code to
>> > Flink, we could simply transfer the repository into ASF after ensuring
>> > Flink coding standards. Then we retain git history and Github issues.
>> >
>> > Is there any point that I'm missing?
>> >
>> > On Fri, Nov 26, 2021 at 1:32 PM Chesnay Schepler <ches...@apache.org>
>> > wrote:
>> >
>> > > For sharing workflows we should be able to use composite actions. We'd
>> > > have the main definition files in the flink-connectors repo, that we
>> > > also need to tag/release, which other branches/repos can then import.
>> > > These are also versioned, so we don't have to worry about accidentally
>> > > breaking stuff.
>> > > These could also be used to enforce certain standards / interfaces
>> such
>> > > that we can automate more things (e.g., integration into the Flink
>> > > documentation).
>> > >
>> > > It is true that Option 2) and dedicated repositories share a lot of
>> > > properties. While I did say in an offline conversation that we in that
>> > > case might just as well use separate repositories, I'm not so sure
>> > > anymore. One repo would make administration a bit easier, for example
>> > > secrets wouldn't have to be applied to each repo (we wouldn't want
>> > > certain secrets to be set up organization-wide).
>> > > I overall also like that one repo would present a single access point;
>> > > you can't "miss" a connector repo, and I would hope that having it as
>> > > one repo would nurture more collaboration between the connectors,
>> which
>> > > after all need to solve similar problems.
>> > >
>> > > It is a fair point that the branching model would be quite weird, but
>> I
>> > > think that would subside pretty quickly.
>> > >
>> > > Personally I'd go with Option 2, and if that doesn't work out we can
>> > > still split the repo later on. (Which should then be a trivial matter
>> of
>> > > copying all <connector>/* branches and renaming them).
>> > >
>> > > On 26/11/2021 12:47, Till Rohrmann wrote:
>> > > > Hi Arvid,
>> > > >
>> > > > Thanks for updating this thread with the latest findings. The
>> described
>> > > > limitations for a single connector repo sound suboptimal to me.
>> > > >
>> > > > * Option 2. sounds as if we try to simulate multi connector repos
>> > inside
>> > > of
>> > > > a single repo. I also don't know how we would share code between the
>> > > > different branches (sharing infrastructure would probably be easier
>> > > > though). This seems to have the same limitations as dedicated repos
>> > with
>> > > > the downside of having a not very intuitive branching model.
>> > > > * Isn't option 1. kind of a degenerated version of option 2. where
>> we
>> > > have
>> > > > some unrelated code from other connectors in the individual
>> connector
>> > > > branches?
>> > > > * Option 3. has the downside that someone creating a release has to
>> > > release
>> > > > all connectors. This means that she either has to sync with the
>> > different
>> > > > connector maintainers or has to be able to release all connectors on
>> > her
>> > > > own. We are already seeing in the Flink community that releases
>> require
>> > > > quite good communication/coordination between the different people
>> > > working
>> > > > on different Flink components. Given our goals to make connector
>> > releases
>> > > > easier and more frequent, I think that coupling different connector
>> > > > releases might be counter-productive.
>> > > >
>> > > > To me it sounds not very practical to mainly use a mono repository
>> w/o
>> > > > having some more advanced build infrastructure that, for example,
>> > allows
>> > > to
>> > > > have different git roots in different connector directories. Maybe
>> the
>> > > mono
>> > > > repo can be a catch all repository for connectors that want to be
>> > > released
>> > > > in lock-step (Option 3.) with all other connectors the repo
>> contains.
>> > But
>> > > > for connectors that get changed frequently, having a dedicated
>> > repository
>> > > > that allows independent releases sounds preferable to me.
>> > > >
>> > > > What utilities and infrastructure code do you intend to share? Using
>> > git
>> > > > submodules can definitely be one option to share code. However, it
>> > might
>> > > > also be ok to depend on flink-connector-common artifacts which could
>> > make
>> > > > things easier. Where I am unsure is whether git submodules can be
>> used
>> > to
>> > > > share infrastructure code (e.g. the .github/workflows) because you
>> need
>> > > > these files in the repo to trigger the CI infrastructure.
>> > > >
>> > > > Cheers,
>> > > > Till
>> > > >
>> > > > On Thu, Nov 25, 2021 at 1:59 PM Arvid Heise <ar...@apache.org>
>> wrote:
>> > > >
>> > > >> Hi Brian,
>> > > >>
>> > > >> Thank you for sharing. I think your approach is very valid and is
>> in
>> > > line
>> > > >> with what I had in mind.
>> > > >>
>> > > >> Basically Pravega community aligns the connector releases with the
>> > > Pravega
>> > > >>> mainline release
>> > > >>>
>> > > >> This certainly would mean that there is little value in coupling
>> > > connector
>> > > >> versions. So it's making a good case for having separate connector
>> > > repos.
>> > > >>
>> > > >>
>> > > >>> and maintains the connector with the latest 3 Flink versions(CI
>> will
>> > > >>> publish snapshots for all these 3 branches)
>> > > >>>
>> > > >> I'd like to give connector devs a simple way to express to which
>> Flink
>> > > >> versions the current branch is compatible. From there we can
>> generate
>> > > the
>> > > >> compatibility matrix automatically and optionally also create
>> > different
>> > > >> releases per supported Flink version. Not sure if the latter is
>> indeed
>> > > >> better than having just one artifact that happens to run with
>> multiple
>> > > >> Flink versions. I guess it depends on what dependencies we are
>> > > exposing. If
>> > > >> the connector uses flink-connector-base, then we probably need
>> > separate
>> > > >> artifacts with poms anyways.
>> > > >>
>> > > >> Best,
>> > > >>
>> > > >> Arvid
>> > > >>
>> > > >> On Fri, Nov 19, 2021 at 10:55 AM Zhou, Brian <b.z...@dell.com>
>> wrote:
>> > > >>
>> > > >>> Hi Arvid,
>> > > >>>
>> > > >>> For branching model, the Pravega Flink connector has some
>> experience
>> > > what
>> > > >>> I would like to share. Here[1][2] is the compatibility matrix and
>> > wiki
>> > > >>> explaining the branching model and releases. Basically Pravega
>> > > community
>> > > >>> aligns the connector releases with the Pravega mainline release,
>> and
>> > > >>> maintains the connector with the latest 3 Flink versions(CI will
>> > > publish
>> > > >>> snapshots for all these 3 branches).
>> > > >>> For example, recently we have 0.10.1 release[3], and in maven
>> central
>> > > we
>> > > >>> need to upload three artifacts(For Flink 1.13, 1.12, 1.11) for
>> 0.10.1
>> > > >>> version[4].
>> > > >>>
>> > > >>> There are some alternatives. Another solution that we once
>> discussed
>> > > but
>> > > >>> finally got abandoned is to have a independent version just like
>> the
>> > > >>> current CDC connector, and then give a big compatibility matrix to
>> > > users.
>> > > >>> We think it would be too confusing when the connector develops. On
>> > the
>> > > >>> contrary, we can also do the opposite way to align with Flink
>> version
>> > > and
>> > > >>> maintain several branches for different system version.
>> > > >>>
>> > > >>> I would say this is only a fairly-OK solution because it is a bit
>> > > painful
>> > > >>> for maintainers as cherry-picks are very common and releases would
>> > > >> require
>> > > >>> much work. However, if neither systems do not have a nice backward
>> > > >>> compatibility, there seems to be no comfortable solution to the
>> their
>> > > >>> connector.
>> > > >>>
>> > > >>> [1]
>> https://github.com/pravega/flink-connectors#compatibility-matrix
>> > > >>> [2]
>> > > >>>
>> > > >>
>> > >
>> >
>> https://github.com/pravega/flink-connectors/wiki/Versioning-strategy-for-Flink-connector
>> > > >>> [3]
>> https://github.com/pravega/flink-connectors/releases/tag/v0.10.1
>> > > >>> [4] https://search.maven.org/search?q=pravega-connectors-flink
>> > > >>>
>> > > >>> Best Regards,
>> > > >>> Brian
>> > > >>>
>> > > >>>
>> > > >>> Internal Use - Confidential
>> > > >>>
>> > > >>> -----Original Message-----
>> > > >>> From: Arvid Heise <ar...@apache.org>
>> > > >>> Sent: Friday, November 19, 2021 4:12 PM
>> > > >>> To: dev
>> > > >>> Subject: Re: [DISCUSS] Creating an external connector repository
>> > > >>>
>> > > >>>
>> > > >>> [EXTERNAL EMAIL]
>> > > >>>
>> > > >>> Hi everyone,
>> > > >>>
>> > > >>> we are currently in the process of setting up the flink-connectors
>> > repo
>> > > >>> [1] for new connectors but we hit a wall that we currently cannot
>> > take:
>> > > >>> branching model.
>> > > >>> To reiterate the original motivation of the external connector
>> repo:
>> > We
>> > > >>> want to decouple the release cycle of a connector with Flink.
>> > However,
>> > > if
>> > > >>> we want to support semantic versioning in the connectors with the
>> > > ability
>> > > >>> to introduce breaking changes through major version bumps and
>> support
>> > > >>> bugfixes on old versions, then we need release branches similar to
>> > how
>> > > >>> Flink core operates.
>> > > >>> Consider two connectors, let's call them kafka and hbase. We have
>> > kafka
>> > > >> in
>> > > >>> version 1.0.X, 1.1.Y (small improvement), 2.0.Z (config option)
>> > change
>> > > >> and
>> > > >>> hbase only on 1.0.A.
>> > > >>>
>> > > >>> Now our current assumption was that we can work with a mono-repo
>> > under
>> > > >> ASF
>> > > >>> (flink-connectors). Then, for release-branches, we found 3
>> options:
>> > > >>> 1. We would need to create some ugly mess with the cross product
>> of
>> > > >>> connector and version: so you have kafka-release-1.0,
>> > > kafka-release-1.1,
>> > > >>> kafka-release-2.0, hbase-release-1.0. The main issue is not the
>> > amount
>> > > of
>> > > >>> branches (that's something that git can handle) but there the
>> state
>> > of
>> > > >>> kafka is undefined in hbase-release-1.0. That's a call for
>> desaster
>> > and
>> > > >>> makes releasing connectors very cumbersome (CI would only execute
>> and
>> > > >>> publish hbase SNAPSHOTS on hbase-release-1.0).
>> > > >>> 2. We could avoid the undefined state by having an empty master
>> and
>> > > each
>> > > >>> release branch really only holds the code of the connector. But
>> > that's
>> > > >> also
>> > > >>> not great: any user that looks at the repo and sees no connector
>> > would
>> > > >>> assume that it's dead.
>> > > >>> 3. We could have synced releases similar to the CDC connectors
>> [2].
>> > > That
>> > > >>> means that if any connector introduces a breaking change, all
>> > > connectors
>> > > >>> get a new major. I find that quite confusing to a user if hbase
>> gets
>> > a
>> > > >> new
>> > > >>> release without any change because kafka introduced a breaking
>> > change.
>> > > >>>
>> > > >>> To fully decouple release cycles and CI of connectors, we could
>> add
>> > > >>> individual repositories under ASF (flink-connector-kafka,
>> > > >>> flink-connector-hbase). Then we can apply the same branching
>> model as
>> > > >>> before. I quickly checked if there are precedences in the apache
>> > > >> community
>> > > >>> for that approach and just by scanning alphabetically I found
>> cordova
>> > > >> with
>> > > >>> 70 and couchdb with 77 apache repos respectively. So it certainly
>> > seems
>> > > >>> like other projects approached our problem in that way and the
>> apache
>> > > >>> organization is okay with that. I currently expect max 20
>> additional
>> > > >> repos
>> > > >>> for connectors and in the future 10 max each for formats and
>> > > filesystems
>> > > >> if
>> > > >>> we would also move them out at some point in time. So we would be
>> at
>> > a
>> > > >>> total of 50 repos.
>> > > >>>
>> > > >>> Note for all options, we need to provide a compability matrix
>> that we
>> > > aim
>> > > >>> to autogenerate.
>> > > >>>
>> > > >>> Now for the potential downsides that we internally discussed:
>> > > >>> - How can we ensure common infra structure code, utilties, and
>> > quality?
>> > > >>> I propose to add a flink-connector-common that contains all these
>> > > things
>> > > >>> and is added as a git submodule/subtree to the repos.
>> > > >>> - Do we implicitly discourage connector developers to maintain
>> more
>> > > than
>> > > >>> one connector with a fragmented code base?
>> > > >>> That is certainly a risk. However, I currently also see few devs
>> > > working
>> > > >>> on more than one connector. However, it may actually help keeping
>> the
>> > > >> devs
>> > > >>> that maintain a specific connector on the hook. We could use
>> github
>> > > >> issues
>> > > >>> to track bugs and feature requests and a dev can focus his limited
>> > time
>> > > >> on
>> > > >>> getting that one connector right.
>> > > >>>
>> > > >>> So WDYT? Compared to some intermediate suggestions with split
>> repos,
>> > > the
>> > > >>> big difference is that everything remains under Apache umbrella
>> and
>> > the
>> > > >>> Flink community.
>> > > >>>
>> > > >>> [1]
>> > > >>>
>> > > >>
>> > >
>> >
>> https://urldefense.com/v3/__https://github.com/apache/flink-connectors__;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-XjpYgXzxxweh4$
>> > > >>> [github[.]com] [2]
>> > > >>>
>> > > >>
>> > >
>> >
>> https://urldefense.com/v3/__https://github.com/ververica/flink-cdc-connectors/__;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-XjpYgXzgoPGA8$
>> > > >>> [github[.]com]
>> > > >>>
>> > > >>> On Fri, Nov 12, 2021 at 3:39 PM Arvid Heise <ar...@apache.org>
>> > wrote:
>> > > >>>
>> > > >>>> Hi everyone,
>> > > >>>>
>> > > >>>> I created the flink-connectors repo [1] to advance the topic. We
>> > would
>> > > >>>> create a proof-of-concept in the next few weeks as a special
>> branch
>> > > >>>> that I'd then use for discussions. If the community agrees with
>> the
>> > > >>>> approach, that special branch will become the master. If not, we
>> can
>> > > >>>> reiterate over it or create competing POCs.
>> > > >>>>
>> > > >>>> If someone wants to try things out in parallel, just make sure
>> that
>> > > >>>> you are not accidentally pushing POCs to the master.
>> > > >>>>
>> > > >>>> As a reminder: We will not move out any current connector from
>> Flink
>> > > >>>> at this point in time, so everything in Flink will remain as is
>> and
>> > be
>> > > >>>> maintained there.
>> > > >>>>
>> > > >>>> Best,
>> > > >>>>
>> > > >>>> Arvid
>> > > >>>>
>> > > >>>> [1]
>> > > >>>>
>> > >
>> https://urldefense.com/v3/__https://github.com/apache/flink-connectors
>> > > >>>>
>> > __;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-XjpYgXzxxweh4
>> > > >>>> $ [github[.]com]
>> > > >>>>
>> > > >>>> On Fri, Oct 29, 2021 at 6:57 PM Till Rohrmann <
>> trohrm...@apache.org
>> > >
>> > > >>>> wrote:
>> > > >>>>
>> > > >>>>> Hi everyone,
>> > > >>>>>
>> > > >>>>>  From the discussion, it seems to me that we have different
>> > opinions
>> > > >>>>> whether to have an ASF umbrella repository or to host them
>> outside
>> > of
>> > > >>>>> the ASF. It also seems that this is not really the problem to
>> > solve.
>> > > >>>>> Since there are many good arguments for either approach, we
>> could
>> > > >>>>> simply start with an ASF umbrella repository and see how people
>> > adopt
>> > > >>>>> it. If the individual connectors cannot move fast enough or if
>> > people
>> > > >>>>> prefer to not buy into the more heavy-weight ASF processes, then
>> > they
>> > > >>>>> can host the code also somewhere else. We simply need to make
>> sure
>> > > >>>>> that these connectors are discoverable (e.g. via
>> flink-packages).
>> > > >>>>>
>> > > >>>>> The more important problem seems to be to provide common tooling
>> > > >>>>> (testing, infrastructure, documentation) that can easily be
>> reused.
>> > > >>>>> Similarly, it has become clear that the Flink community needs to
>> > > >>>>> improve on providing stable APIs. I think it is not realistic to
>> > > >>>>> first complete these tasks before starting to move connectors to
>> > > >>>>> dedicated repositories. As Stephan said, creating a connector
>> > > >>>>> repository will force us to pay more attention to API stability
>> and
>> > > >>>>> also to think about which testing tools are required. Hence, I
>> > > >>>>> believe that starting to add connectors to a different
>> repository
>> > > >>>>> than apache/flink will help improve our connector tooling
>> > (declaring
>> > > >>>>> testing classes as public, creating a common test utility repo,
>> > > >>>>> creating a repo
>> > > >>>>> template) and vice versa. Hence, I like Arvid's proposed
>> process as
>> > > >>>>> it will start kicking things off w/o letting this effort fizzle
>> > out.
>> > > >>>>>
>> > > >>>>> Cheers,
>> > > >>>>> Till
>> > > >>>>>
>> > > >>>>> On Thu, Oct 28, 2021 at 11:44 AM Stephan Ewen <se...@apache.org
>> >
>> > > >> wrote:
>> > > >>>>>> Thank you all, for the nice discussion!
>> > > >>>>>>
>> > > >>>>>>  From my point of view, I very much like the idea of putting
>> > > >>>>>> connectors
>> > > >>>>> in a
>> > > >>>>>> separate repository. But I would argue it should be part of
>> Apache
>> > > >>>>> Flink,
>> > > >>>>>> similar to flink-statefun, flink-ml, etc.
>> > > >>>>>>
>> > > >>>>>> I share many of the reasons for that:
>> > > >>>>>>    - As argued many times, reduces complexity of the Flink
>> repo,
>> > > >>>>> increases
>> > > >>>>>> response times of CI, etc.
>> > > >>>>>>    - Much lower barrier of contribution, because an unstable
>> > > >>>>>> connector
>> > > >>>>> would
>> > > >>>>>> not de-stabilize the whole build. Of course, we would need to
>> make
>> > > >>>>>> sure
>> > > >>>>> we
>> > > >>>>>> set this up the right way, with connectors having individual CI
>> > > >>>>>> runs,
>> > > >>>>> build
>> > > >>>>>> status, etc. But it certainly seems possible.
>> > > >>>>>>
>> > > >>>>>>
>> > > >>>>>> I would argue some points a bit different than some cases made
>> > > >> before:
>> > > >>>>>> (a) I believe the separation would increase connector
>> stability.
>> > > >>>>> Because it
>> > > >>>>>> really forces us to work with the connectors against the APIs
>> like
>> > > >>>>>> any external developer. A mono repo is somehow the wrong thing
>> if
>> > > >>>>>> you in practice want to actually guarantee stable internal
>> APIs at
>> > > >>> some layer.
>> > > >>>>>> Because the mono repo makes it easy to just change something on
>> > > >>>>>> both
>> > > >>>>> sides
>> > > >>>>>> of the API (provider and consumer) seamlessly.
>> > > >>>>>>
>> > > >>>>>> Major refactorings in Flink need to keep all connector API
>> > > >>>>>> contracts intact, or we need to have a new version of the
>> > connector
>> > > >>> API.
>> > > >>>>>> (b) We may even be able to go towards more lightweight and
>> > > >>>>>> automated releases over time, even if we stay in Apache Flink
>> with
>> > > >>> that repo.
>> > > >>>>>> This isn't yet fully aligned with the Apache release policies,
>> > yet,
>> > > >>>>>> but there are board discussions about whether there can be
>> > > >>>>>> bot-triggered releases (by dependabot) and how that could fit
>> into
>> > > >>> the Apache process.
>> > > >>>>>> This doesn't seem to be quite there just yet, but seeing that
>> > those
>> > > >>>>> start
>> > > >>>>>> is a good sign, and there is a good chance we can do some
>> things
>> > > >>> there.
>> > > >>>>>> I am not sure whether we should let bots trigger releases,
>> because
>> > > >>>>>> a
>> > > >>>>> final
>> > > >>>>>> human look at things isn't a bad thing, especially given the
>> > > >>>>>> popularity
>> > > >>>>> of
>> > > >>>>>> software supply chain attacks recently.
>> > > >>>>>>
>> > > >>>>>>
>> > > >>>>>> I do share Chesnay's concerns about complexity in tooling,
>> though.
>> > > >>>>>> Both release tooling and test tooling. They are not
>> incompatible
>> > > >>>>>> with that approach, but they are a task we need to tackle
>> during
>> > > >>>>>> this change which will add additional work.
>> > > >>>>>>
>> > > >>>>>>
>> > > >>>>>>
>> > > >>>>>> On Tue, Oct 26, 2021 at 10:31 AM Arvid Heise <ar...@apache.org
>> >
>> > > >>> wrote:
>> > > >>>>>>> Hi folks,
>> > > >>>>>>>
>> > > >>>>>>> I think some questions came up and I'd like to address the
>> > > >>>>>>> question of
>> > > >>>>>> the
>> > > >>>>>>> timing.
>> > > >>>>>>>
>> > > >>>>>>> Could you clarify what release cadence you're thinking of?
>> > > >>>>>>> There's
>> > > >>>>> quite
>> > > >>>>>>>> a big range that fits "more frequent than Flink" (per-commit,
>> > > >>>>>>>> daily, weekly, bi-weekly, monthly, even bi-monthly).
>> > > >>>>>>> The short answer is: as often as needed:
>> > > >>>>>>> - If there is a CVE in a dependency and we need to bump it -
>> > > >>>>>>> release immediately.
>> > > >>>>>>> - If there is a new feature merged, release soonish. We may
>> > > >>>>>>> collect a
>> > > >>>>> few
>> > > >>>>>>> successive features before a release.
>> > > >>>>>>> - If there is a bugfix, release immediately or soonish
>> depending
>> > > >>>>>>> on
>> > > >>>>> the
>> > > >>>>>>> severity and if there are workarounds available.
>> > > >>>>>>>
>> > > >>>>>>> We should not limit ourselves; the whole idea of independent
>> > > >>>>>>> releases
>> > > >>>>> is
>> > > >>>>>>> exactly that you release as needed. There is no release
>> planning
>> > > >>>>>>> or anything needed, you just go with a release as if it was an
>> > > >>>>>>> external artifact.
>> > > >>>>>>>
>> > > >>>>>>> (1) is the connector API already stable?
>> > > >>>>>>>>  From another discussion thread [1], connector API is far
>> from
>> > > >>>>> stable.
>> > > >>>>>>>> Currently, it's hard to build connectors against multiple
>> Flink
>> > > >>>>>> versions.
>> > > >>>>>>>> There are breaking API changes both in 1.12 -> 1.13 and 1.13
>> ->
>> > > >>>>>>>> 1.14
>> > > >>>>>> and
>> > > >>>>>>>>   maybe also in the future versions,  because Table related
>> APIs
>> > > >>>>>>>> are
>> > > >>>>>> still
>> > > >>>>>>>> @PublicEvolving and new Sink API is still @Experimental.
>> > > >>>>>>>>
>> > > >>>>>>> The question is: what is stable in an evolving system? We
>> > > >>>>>>> recently discovered that the old SourceFunction needed to be
>> > > >>>>>>> refined such that cancellation works correctly [1]. So that
>> > > >>>>>>> interface is in Flink since
>> > > >>>>> 7
>> > > >>>>>>> years, heavily used also outside, and we still had to change
>> the
>> > > >>>>> contract
>> > > >>>>>>> in a way that I'd expect any implementer to recheck their
>> > > >>>>> implementation.
>> > > >>>>>>> It might not be necessary to change anything and you can
>> probably
>> > > >>>>> change
>> > > >>>>>>> the the code for all Flink versions but still, the interface
>> was
>> > > >>>>>>> not
>> > > >>>>>> stable
>> > > >>>>>>> in the closest sense.
>> > > >>>>>>>
>> > > >>>>>>> If we focus just on API changes on the unified interfaces,
>> then
>> > > >>>>>>> we
>> > > >>>>> expect
>> > > >>>>>>> one more change to Sink API to support compaction. For Table
>> API,
>> > > >>>>> there
>> > > >>>>>>> will most likely also be some changes in 1.15. So we could
>> wait
>> > > >>>>>>> for
>> > > >>>>> 1.15.
>> > > >>>>>>> But I'm questioning if that's really necessary because we will
>> > > >>>>>>> add
>> > > >>>>> more
>> > > >>>>>>> functionality beyond 1.15 without breaking API. For example,
>> we
>> > > >>>>>>> may
>> > > >>>>> add
>> > > >>>>>>> more unified connector metrics. If you want to use it in your
>> > > >>>>> connector,
>> > > >>>>>>> you have to support multiple Flink versions anyhow. So rather
>> > > >>>>>>> then
>> > > >>>>>> focusing
>> > > >>>>>>> the discussion on "when is stuff stable", I'd rather focus on
>> > > >>>>>>> "how
>> > > >>>>> can we
>> > > >>>>>>> support building connectors against multiple Flink versions"
>> and
>> > > >>>>>>> make
>> > > >>>>> it
>> > > >>>>>> as
>> > > >>>>>>> painless as possible.
>> > > >>>>>>>
>> > > >>>>>>> Chesnay pointed out to use different branches for different
>> Flink
>> > > >>>>>> versions
>> > > >>>>>>> which sounds like a good suggestion. With a mono-repo, we
>> can't
>> > > >>>>>>> use branches differently anyways (there is no way to have
>> release
>> > > >>>>>>> branches
>> > > >>>>>> per
>> > > >>>>>>> connector without chaos). In these branches, we could provide
>> > > >>>>>>> shims to simulate future features in older Flink versions such
>> > > >>>>>>> that code-wise,
>> > > >>>>> the
>> > > >>>>>>> source code of a specific connector may not diverge (much).
>> For
>> > > >>>>> example,
>> > > >>>>>> to
>> > > >>>>>>> register unified connector metrics, we could simulate the
>> current
>> > > >>>>>> approach
>> > > >>>>>>> also in some utility package of the mono-repo.
>> > > >>>>>>>
>> > > >>>>>>> I see the stable core Flink API as a prerequisite for
>> modularity.
>> > > >>>>>>> And
>> > > >>>>>>>> for connectors it is not just the source and sink API (source
>> > > >>>>>>>> being stable as of 1.14), but everything that is required to
>> > > >>>>>>>> build and maintain a connector downstream, such as the test
>> > > >>>>>>>> utilities and infrastructure.
>> > > >>>>>>>>
>> > > >>>>>>> That is a very fair point. I'm actually surprised to see that
>> > > >>>>>>> MiniClusterWithClientResource is not public. I see it being
>> used
>> > > >>>>>>> in
>> > > >>>>> all
>> > > >>>>>>> connectors, especially outside of Flink. I fear that as long
>> as
>> > > >>>>>>> we do
>> > > >>>>> not
>> > > >>>>>>> have connectors outside, we will not properly annotate and
>> > > >>>>>>> maintain
>> > > >>>>> these
>> > > >>>>>>> utilties in a classic hen-and-egg-problem. I will outline an
>> idea
>> > > >>>>>>> at
>> > > >>>>> the
>> > > >>>>>>> end.
>> > > >>>>>>>
>> > > >>>>>>>> the connectors need to be adopted and require at least one
>> > > >>>>>>>> release
>> > > >>>>> per
>> > > >>>>>>>> Flink minor release.
>> > > >>>>>>>> However, this will make the releases of connectors slower,
>> e.g.
>> > > >>>>>> maintain
>> > > >>>>>>>> features for multiple branches and release multiple branches.
>> > > >>>>>>>> I think the main purpose of having an external connector
>> > > >>>>>>>> repository
>> > > >>>>> is
>> > > >>>>>> in
>> > > >>>>>>>> order to have "faster releases of connectors"?
>> > > >>>>>>>>
>> > > >>>>>>>> Imagine a project with a complex set of dependencies. Let's
>> say
>> > > >>>>> Flink
>> > > >>>>>>>> version A plus Flink reliant dependencies released by other
>> > > >>>>>>>> projects (Flink-external connectors, Beam, Iceberg, Hudi,
>> ..).
>> > > >>>>>>>> We don't want
>> > > >>>>> a
>> > > >>>>>>>> situation where we bump the core Flink version to B and
>> things
>> > > >>>>>>>> fall apart (interface changes, utilities that were useful but
>> > > >>>>>>>> not public, transitive dependencies etc.).
>> > > >>>>>>>>
>> > > >>>>>>> Yes, that's why I wanted to automate the processes more which
>> is
>> > > >>>>>>> not
>> > > >>>>> that
>> > > >>>>>>> easy under ASF. Maybe we automate the source provision across
>> > > >>>>> supported
>> > > >>>>>>> versions and have 1 vote thread for all versions of a
>> connector?
>> > > >>>>>>>
>> > > >>>>>>>  From the perspective of CDC connector maintainers, the
>> biggest
>> > > >>>>> advantage
>> > > >>>>>> of
>> > > >>>>>>>> maintaining it outside of the Flink project is that:
>> > > >>>>>>>> 1) we can have a more flexible and faster release cycle
>> > > >>>>>>>> 2) we can be more liberal with committership for connector
>> > > >>>>> maintainers
>> > > >>>>>>>> which can also attract more committers to help the release.
>> > > >>>>>>>>
>> > > >>>>>>>> Personally, I think maintaining one connector repository
>> under
>> > > >>>>>>>> the
>> > > >>>>> ASF
>> > > >>>>>>> may
>> > > >>>>>>>> not have the above benefits.
>> > > >>>>>>>>
>> > > >>>>>>> Yes, I also feel that ASF is too restrictive for our needs.
>> But
>> > > >>>>>>> it
>> > > >>>>> feels
>> > > >>>>>>> like there are too many that see it differently and I think we
>> > > >>>>>>> need
>> > > >>>>>>>
>> > > >>>>>>> (2) Flink testability without connectors.
>> > > >>>>>>>> This is a very good question. How can we guarantee the new
>> > > >>>>>>>> Source
>> > > >>>>> and
>> > > >>>>>>> Sink
>> > > >>>>>>>> API are stable with only test implementation?
>> > > >>>>>>>>
>> > > >>>>>>> We can't and shouldn't. Since the connector repo is managed by
>> > > >>>>>>> Flink,
>> > > >>>>> a
>> > > >>>>>>> Flink release manager needs to check if the Flink connectors
>> are
>> > > >>>>> actually
>> > > >>>>>>> working prior to creating an RC. That's similar to how
>> > > >>>>>>> flink-shaded
>> > > >>>>> and
>> > > >>>>>>> flink core are related.
>> > > >>>>>>>
>> > > >>>>>>>
>> > > >>>>>>> So here is one idea that I had to get things rolling. We are
>> > > >>>>>>> going to address the external repo iteratively without
>> > > >>>>>>> compromising what we
>> > > >>>>>> already
>> > > >>>>>>> have:
>> > > >>>>>>> 1.Phase, add new contributions to external repo. We use that
>> time
>> > > >>>>>>> to
>> > > >>>>>> setup
>> > > >>>>>>> infra accordingly and optimize release processes. We will
>> > > >>>>>>> identify
>> > > >>>>> test
>> > > >>>>>>> utilities that are not yet public/stable and fix that.
>> > > >>>>>>> 2.Phase, add ports to the new unified interfaces of existing
>> > > >>>>> connectors.
>> > > >>>>>>> That requires a previous Flink release to make utilities
>> stable.
>> > > >>>>>>> Keep
>> > > >>>>> old
>> > > >>>>>>> interfaces in flink-core.
>> > > >>>>>>> 3.Phase, remove old interfaces in flink-core of some
>> connectors
>> > > >>>>>>> (tbd
>> > > >>>>> at a
>> > > >>>>>>> later point).
>> > > >>>>>>> 4.Phase, optionally move all remaining connectors (tbd at a
>> later
>> > > >>>>> point).
>> > > >>>>>>> I'd envision having ~3 months between the starting the
>> different
>> > > >>>>> phases.
>> > > >>>>>>> WDYT?
>> > > >>>>>>>
>> > > >>>>>>>
>> > > >>>>>>> [1]
>> > > >>>>>>>
>> > https://urldefense.com/v3/__https://issues.apache.org/jira/browse
>> > > >>>>>>>
>> /FLINK-23527__;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgd
>> > > >>>>>>> ke_-XjpYgX2sIvAP4$ [issues[.]apache[.]org]
>> > > >>>>>>>
>> > > >>>>>>> On Thu, Oct 21, 2021 at 7:12 AM Kyle Bendickson <
>> k...@tabular.io
>> > >
>> > > >>>>> wrote:
>> > > >>>>>>>> Hi all,
>> > > >>>>>>>>
>> > > >>>>>>>> My name is Kyle and I’m an open source developer primarily
>> > > >>>>>>>> focused
>> > > >>>>> on
>> > > >>>>>>>> Apache Iceberg.
>> > > >>>>>>>>
>> > > >>>>>>>> I’m happy to help clarify or elaborate on any aspect of our
>> > > >>>>> experience
>> > > >>>>>>>> working on a relatively decoupled connector that is
>> downstream
>> > > >>>>>>>> and
>> > > >>>>>> pretty
>> > > >>>>>>>> popular.
>> > > >>>>>>>>
>> > > >>>>>>>> I’d also love to be able to contribute or assist in any way I
>> > > >> can.
>> > > >>>>>>>> I don’t mean to thread jack, but are there any meetings or
>> > > >>>>>>>> community
>> > > >>>>>> sync
>> > > >>>>>>>> ups, specifically around the connector APIs, that I might
>> join
>> > > >>>>>>>> / be
>> > > >>>>>>> invited
>> > > >>>>>>>> to?
>> > > >>>>>>>>
>> > > >>>>>>>> I did want to add that even though I’ve experienced some of
>> the
>> > > >>>>>>>> pain
>> > > >>>>>>> points
>> > > >>>>>>>> of integrating with an evolving system / API (catalog support
>> > > >>>>>>>> is
>> > > >>>>>>> generally
>> > > >>>>>>>> speaking pretty new everywhere really in this space), I also
>> > > >>>>>>>> agree personally that you shouldn’t slow down development
>> > > >>>>>>>> velocity too
>> > > >>>>> much
>> > > >>>>>> for
>> > > >>>>>>>> the sake of external connector. Getting to a performant and
>> > > >>>>>>>> stable
>> > > >>>>>> place
>> > > >>>>>>>> should be the primary goal, and slowing that down to support
>> > > >>>>> stragglers
>> > > >>>>>>>> will (in my personal opinion) always be a losing game. Some
>> > > >>>>>>>> folks
>> > > >>>>> will
>> > > >>>>>>>> simply stay behind on versions regardless until they have to
>> > > >>>>> upgrade.
>> > > >>>>>>>> I am working on ensuring that the Iceberg community stays
>> > > >>>>>>>> within 1-2 versions of Flink, so that we can help provide
>> more
>> > > >>>>>>>> feedback or
>> > > >>>>>>> contribute
>> > > >>>>>>>> things that might make our ability to support multiple Flink
>> > > >>>>> runtimes /
>> > > >>>>>>>> versions with one project / codebase and minimal to no
>> > > >>>>>>>> reflection
>> > > >>>>> (our
>> > > >>>>>>>> desired goal).
>> > > >>>>>>>>
>> > > >>>>>>>> If there’s anything I can do or any way I can be of
>> assistance,
>> > > >>>>> please
>> > > >>>>>>>> don’t hesitate to reach out. Or find me on ASF slack 😀
>> > > >>>>>>>>
>> > > >>>>>>>> I greatly appreciate your general concern for the needs of
>> > > >>>>> downstream
>> > > >>>>>>>> connector integrators!
>> > > >>>>>>>>
>> > > >>>>>>>> Cheers
>> > > >>>>>>>> Kyle Bendickson (GitHub: kbendick) Open Source Developer kyle
>> > > >>>>>>>> [at] tabular [dot] io
>> > > >>>>>>>>
>> > > >>>>>>>> On Wed, Oct 20, 2021 at 11:35 AM Thomas Weise <
>> t...@apache.org>
>> > > >>>>> wrote:
>> > > >>>>>>>>> Hi,
>> > > >>>>>>>>>
>> > > >>>>>>>>> I see the stable core Flink API as a prerequisite for
>> > > >>> modularity.
>> > > >>>>> And
>> > > >>>>>>>>> for connectors it is not just the source and sink API
>> (source
>> > > >>>>> being
>> > > >>>>>>>>> stable as of 1.14), but everything that is required to build
>> > > >>>>>>>>> and maintain a connector downstream, such as the test
>> > > >>>>>>>>> utilities and infrastructure.
>> > > >>>>>>>>>
>> > > >>>>>>>>> Without the stable surface of core Flink, changes will leak
>> > > >>>>>>>>> into downstream dependencies and force lock step updates.
>> > > >>>>>>>>> Refactoring across N repos is more painful than a single
>> > > >>>>>>>>> repo. Those with experience developing downstream of Flink
>> > > >>>>>>>>> will know the pain, and
>> > > >>>>>> that
>> > > >>>>>>>>> isn't limited to connectors. I don't remember a Flink "minor
>> > > >>>>> version"
>> > > >>>>>>>>> update that was just a dependency version change and did not
>> > > >>>>>>>>> force other downstream changes.
>> > > >>>>>>>>>
>> > > >>>>>>>>> Imagine a project with a complex set of dependencies. Let's
>> > > >>>>>>>>> say
>> > > >>>>> Flink
>> > > >>>>>>>>> version A plus Flink reliant dependencies released by other
>> > > >>>>> projects
>> > > >>>>>>>>> (Flink-external connectors, Beam, Iceberg, Hudi, ..). We
>> > > >>>>>>>>> don't
>> > > >>>>> want a
>> > > >>>>>>>>> situation where we bump the core Flink version to B and
>> > > >>>>>>>>> things
>> > > >>>>> fall
>> > > >>>>>>>>> apart (interface changes, utilities that were useful but not
>> > > >>>>> public,
>> > > >>>>>>>>> transitive dependencies etc.).
>> > > >>>>>>>>>
>> > > >>>>>>>>> The discussion here also highlights the benefits of keeping
>> > > >>>>> certain
>> > > >>>>>>>>> connectors outside Flink. Whether that is due to difference
>> > > >>>>>>>>> in developer community, maturity of the connectors, their
>> > > >>>>>>>>> specialized/limited usage etc. I would like to see that as a
>> > > >>>>>>>>> sign
>> > > >>>>> of
>> > > >>>>>> a
>> > > >>>>>>>>> growing ecosystem and most of the ideas that Arvid has put
>> > > >>>>>>>>> forward would benefit further growth of the connector
>> > > >> ecosystem.
>> > > >>>>>>>>> As for keeping connectors within Apache Flink: I prefer that
>> > > >>>>>>>>> as
>> > > >>>>> the
>> > > >>>>>>>>> path forward for "essential" connectors like FileSource,
>> > > >>>>> KafkaSource,
>> > > >>>>>>>>> ... And we can still achieve a more flexible and faster
>> > > >>>>>>>>> release
>> > > >>>>>> cycle.
>> > > >>>>>>>>> Thanks,
>> > > >>>>>>>>> Thomas
>> > > >>>>>>>>>
>> > > >>>>>>>>>
>> > > >>>>>>>>>
>> > > >>>>>>>>>
>> > > >>>>>>>>>
>> > > >>>>>>>>> On Wed, Oct 20, 2021 at 3:32 AM Jark Wu <imj...@gmail.com>
>> > > >>> wrote:
>> > > >>>>>>>>>> Hi Konstantin,
>> > > >>>>>>>>>>
>> > > >>>>>>>>>>> the connectors need to be adopted and require at least
>> > > >>>>>>>>>>> one
>> > > >>>>>> release
>> > > >>>>>>>> per
>> > > >>>>>>>>>> Flink minor release.
>> > > >>>>>>>>>> However, this will make the releases of connectors slower,
>> > > >>> e.g.
>> > > >>>>>>>> maintain
>> > > >>>>>>>>>> features for multiple branches and release multiple
>> > > >> branches.
>> > > >>>>>>>>>> I think the main purpose of having an external connector
>> > > >>>>> repository
>> > > >>>>>>> is
>> > > >>>>>>>> in
>> > > >>>>>>>>>> order to have "faster releases of connectors"?
>> > > >>>>>>>>>>
>> > > >>>>>>>>>>
>> > > >>>>>>>>>>  From the perspective of CDC connector maintainers, the
>> > > >>>>>>>>>> biggest
>> > > >>>>>>>> advantage
>> > > >>>>>>>>> of
>> > > >>>>>>>>>> maintaining it outside of the Flink project is that:
>> > > >>>>>>>>>> 1) we can have a more flexible and faster release cycle
>> > > >>>>>>>>>> 2) we can be more liberal with committership for connector
>> > > >>>>>>> maintainers
>> > > >>>>>>>>>> which can also attract more committers to help the release.
>> > > >>>>>>>>>>
>> > > >>>>>>>>>> Personally, I think maintaining one connector repository
>> > > >>>>>>>>>> under
>> > > >>>>> the
>> > > >>>>>>> ASF
>> > > >>>>>>>>> may
>> > > >>>>>>>>>> not have the above benefits.
>> > > >>>>>>>>>>
>> > > >>>>>>>>>> Best,
>> > > >>>>>>>>>> Jark
>> > > >>>>>>>>>>
>> > > >>>>>>>>>> On Wed, 20 Oct 2021 at 15:14, Konstantin Knauf <
>> > > >>>>> kna...@apache.org>
>> > > >>>>>>>>> wrote:
>> > > >>>>>>>>>>> Hi everyone,
>> > > >>>>>>>>>>>
>> > > >>>>>>>>>>> regarding the stability of the APIs. I think everyone
>> > > >>>>>>>>>>> agrees
>> > > >>>>> that
>> > > >>>>>>>>>>> connector APIs which are stable across minor versions
>> > > >>>>>> (1.13->1.14)
>> > > >>>>>>>> are
>> > > >>>>>>>>> the
>> > > >>>>>>>>>>> mid-term goal. But:
>> > > >>>>>>>>>>>
>> > > >>>>>>>>>>> a) These APIs are still quite young, and we shouldn't
>> > > >>>>>>>>>>> make
>> > > >>>>> them
>> > > >>>>>>>> @Public
>> > > >>>>>>>>>>> prematurely either.
>> > > >>>>>>>>>>>
>> > > >>>>>>>>>>> b) Isn't this *mostly* orthogonal to where the connector
>> > > >>>>>>>>>>> code
>> > > >>>>>>> lives?
>> > > >>>>>>>>> Yes,
>> > > >>>>>>>>>>> as long as there are breaking changes, the connectors
>> > > >>>>>>>>>>> need to
>> > > >>>>> be
>> > > >>>>>>>>> adopted
>> > > >>>>>>>>>>> and require at least one release per Flink minor release.
>> > > >>>>>>>>>>> Documentation-wise this can be addressed via a
>> > > >>>>>>>>>>> compatibility
>> > > >>>>>> matrix
>> > > >>>>>>>> for
>> > > >>>>>>>>>>> each connector as Arvid suggested. IMO we shouldn't block
>> > > >>>>>>>>>>> this
>> > > >>>>>>> effort
>> > > >>>>>>>>> on
>> > > >>>>>>>>>>> the stability of the APIs.
>> > > >>>>>>>>>>>
>> > > >>>>>>>>>>> Cheers,
>> > > >>>>>>>>>>>
>> > > >>>>>>>>>>> Konstantin
>> > > >>>>>>>>>>>
>> > > >>>>>>>>>>>
>> > > >>>>>>>>>>>
>> > > >>>>>>>>>>> On Wed, Oct 20, 2021 at 8:56 AM Jark Wu
>> > > >>>>>>>>>>> <imj...@gmail.com>
>> > > >>>>>> wrote:
>> > > >>>>>>>>>>>> Hi,
>> > > >>>>>>>>>>>>
>> > > >>>>>>>>>>>> I think Thomas raised very good questions and would like
>> > > >>>>>>>>>>>> to
>> > > >>>>> know
>> > > >>>>>>>> your
>> > > >>>>>>>>>>>> opinions if we want to move connectors out of flink in
>> > > >>>>>>>>>>>> this
>> > > >>>>>>> version.
>> > > >>>>>>>>>>>> (1) is the connector API already stable?
>> > > >>>>>>>>>>>>> Separate releases would only make sense if the core
>> > > >>>>>>>>>>>>> Flink
>> > > >>>>>>> surface
>> > > >>>>>>>> is
>> > > >>>>>>>>>>>>> fairly stable though. As evident from Iceberg (and
>> > > >>>>>>>>>>>>> also
>> > > >>>>> Beam),
>> > > >>>>>>>>> that's
>> > > >>>>>>>>>>>>> not the case currently. We should probably focus on
>> > > >>>>> addressing
>> > > >>>>>>> the
>> > > >>>>>>>>>>>>> stability first, before splitting code. A success
>> > > >>>>>>>>>>>>> criteria
>> > > >>>>>> could
>> > > >>>>>>>> be
>> > > >>>>>>>>>>>>> that we are able to build Iceberg and Beam against
>> > > >>>>>>>>>>>>> multiple
>> > > >>>>>>> Flink
>> > > >>>>>>>>>>>>> versions w/o the need to change code. The goal would
>> > > >>>>>>>>>>>>> be
>> > > >>>>> that
>> > > >>>>>> no
>> > > >>>>>>>>>>>>> connector breaks when we make changes to Flink core.
>> > > >>>>>>>>>>>>> Until
>> > > >>>>>>> that's
>> > > >>>>>>>>> the
>> > > >>>>>>>>>>>>> case, code separation creates a setup where 1+1 or N+1
>> > > >>>>>>>> repositories
>> > > >>>>>>>>>>>>> need to move lock step.
>> > > >>>>>>>>>>>>  From another discussion thread [1], connector API is far
>> > > >>>>>>>>>>>> from
>> > > >>>>>>>> stable.
>> > > >>>>>>>>>>>> Currently, it's hard to build connectors against
>> > > >>>>>>>>>>>> multiple
>> > > >>>>> Flink
>> > > >>>>>>>>> versions.
>> > > >>>>>>>>>>>> There are breaking API changes both in 1.12 -> 1.13 and
>> > > >>>>>>>>>>>> 1.13
>> > > >>>>> ->
>> > > >>>>>>> 1.14
>> > > >>>>>>>>> and
>> > > >>>>>>>>>>>>   maybe also in the future versions,  because Table
>> > > >>>>>>>>>>>> related
>> > > >>>>> APIs
>> > > >>>>>>> are
>> > > >>>>>>>>> still
>> > > >>>>>>>>>>>> @PublicEvolving and new Sink API is still @Experimental.
>> > > >>>>>>>>>>>>
>> > > >>>>>>>>>>>>
>> > > >>>>>>>>>>>> (2) Flink testability without connectors.
>> > > >>>>>>>>>>>>> Flink w/o Kafka connector (and few others) isn't
>> > > >>>>>>>>>>>>> viable. Testability of Flink was already brought up,
>> > > >>>>>>>>>>>>> can we
>> > > >>>>>>> really
>> > > >>>>>>>>>>>>> certify a Flink core release without Kafka connector?
>> > > >>>>>>>>>>>>> Maybe
>> > > >>>>>>> those
>> > > >>>>>>>>>>>>> connectors that are used in Flink e2e tests to
>> > > >>>>>>>>>>>>> validate
>> > > >>>>>>>>> functionality
>> > > >>>>>>>>>>>>> of core Flink should not be broken out?
>> > > >>>>>>>>>>>> This is a very good question. How can we guarantee the
>> > > >>>>>>>>>>>> new
>> > > >>>>>> Source
>> > > >>>>>>>> and
>> > > >>>>>>>>> Sink
>> > > >>>>>>>>>>>> API are stable with only test implementation?
>> > > >>>>>>>>>>>>
>> > > >>>>>>>>>>>>
>> > > >>>>>>>>>>>> Best,
>> > > >>>>>>>>>>>> Jark
>> > > >>>>>>>>>>>>
>> > > >>>>>>>>>>>>
>> > > >>>>>>>>>>>>
>> > > >>>>>>>>>>>>
>> > > >>>>>>>>>>>>
>> > > >>>>>>>>>>>> On Tue, 19 Oct 2021 at 23:56, Chesnay Schepler <
>> > > >>>>>>> ches...@apache.org>
>> > > >>>>>>>>>>>> wrote:
>> > > >>>>>>>>>>>>
>> > > >>>>>>>>>>>>> Could you clarify what release cadence you're thinking
>> > > >>> of?
>> > > >>>>>>> There's
>> > > >>>>>>>>> quite
>> > > >>>>>>>>>>>>> a big range that fits "more frequent than Flink"
>> > > >>>>> (per-commit,
>> > > >>>>>>>> daily,
>> > > >>>>>>>>>>>>> weekly, bi-weekly, monthly, even bi-monthly).
>> > > >>>>>>>>>>>>>
>> > > >>>>>>>>>>>>> On 19/10/2021 14:15, Martijn Visser wrote:
>> > > >>>>>>>>>>>>>> Hi all,
>> > > >>>>>>>>>>>>>>
>> > > >>>>>>>>>>>>>> I think it would be a huge benefit if we can achieve
>> > > >>>>>>>>>>>>>> more
>> > > >>>>>>>> frequent
>> > > >>>>>>>>>>>>> releases
>> > > >>>>>>>>>>>>>> of connectors, which are not bound to the release
>> > > >>>>>>>>>>>>>> cycle
>> > > >>>>> of
>> > > >>>>>>> Flink
>> > > >>>>>>>>>>>> itself.
>> > > >>>>>>>>>>>>> I
>> > > >>>>>>>>>>>>>> agree that in order to get there, we need to have
>> > > >>>>>>>>>>>>>> stable
>> > > >>>>>>>>> interfaces
>> > > >>>>>>>>>>>> which
>> > > >>>>>>>>>>>>>> are trustworthy and reliable, so they can be safely
>> > > >>>>>>>>>>>>>> used
>> > > >>>>> by
>> > > >>>>>>>> those
>> > > >>>>>>>>>>>>>> connectors. I do think that work still needs to be
>> > > >>>>>>>>>>>>>> done
>> > > >>>>> on
>> > > >>>>>>> those
>> > > >>>>>>>>>>>>>> interfaces, but I am confident that we can get there
>> > > >>>>> from a
>> > > >>>>>>>> Flink
>> > > >>>>>>>>>>>>>> perspective.
>> > > >>>>>>>>>>>>>>
>> > > >>>>>>>>>>>>>> I am worried that we would not be able to achieve
>> > > >>>>>>>>>>>>>> those
>> > > >>>>>>> frequent
>> > > >>>>>>>>>>>> releases
>> > > >>>>>>>>>>>>>> of connectors if we are putting these connectors
>> > > >>>>>>>>>>>>>> under
>> > > >>>>> the
>> > > >>>>>>>> Apache
>> > > >>>>>>>>>>>>> umbrella,
>> > > >>>>>>>>>>>>>> because that means that for each connector release
>> > > >>>>>>>>>>>>>> we
>> > > >>>>> have
>> > > >>>>>> to
>> > > >>>>>>>>> follow
>> > > >>>>>>>>>>>> the
>> > > >>>>>>>>>>>>>> Apache release creation process. This requires a lot
>> > > >>>>>>>>>>>>>> of
>> > > >>>>>> manual
>> > > >>>>>>>>> steps
>> > > >>>>>>>>>>>> and
>> > > >>>>>>>>>>>>>> prohibits automation and I think it would be hard to
>> > > >>>>> scale
>> > > >>>>>> out
>> > > >>>>>>>>>>>> frequent
>> > > >>>>>>>>>>>>>> releases of connectors. I'm curious how others think
>> > > >>>>>>>>>>>>>> this
>> > > >>>>>>>>> challenge
>> > > >>>>>>>>>>>> could
>> > > >>>>>>>>>>>>>> be solved.
>> > > >>>>>>>>>>>>>>
>> > > >>>>>>>>>>>>>> Best regards,
>> > > >>>>>>>>>>>>>>
>> > > >>>>>>>>>>>>>> Martijn
>> > > >>>>>>>>>>>>>>
>> > > >>>>>>>>>>>>>> On Mon, 18 Oct 2021 at 22:22, Thomas Weise <
>> > > >>>>> t...@apache.org>
>> > > >>>>>>>>> wrote:
>> > > >>>>>>>>>>>>>>> Thanks for initiating this discussion.
>> > > >>>>>>>>>>>>>>>
>> > > >>>>>>>>>>>>>>> There are definitely a few things that are not
>> > > >>>>>>>>>>>>>>> optimal
>> > > >>>>> with
>> > > >>>>>>> our
>> > > >>>>>>>>>>>>>>> current management of connectors. I would not
>> > > >>>>> necessarily
>> > > >>>>>>>>>>>> characterize
>> > > >>>>>>>>>>>>>>> it as a "mess" though. As the points raised so far
>> > > >>>>> show, it
>> > > >>>>>>>> isn't
>> > > >>>>>>>>>>>> easy
>> > > >>>>>>>>>>>>>>> to find a solution that balances competing
>> > > >>>>>>>>>>>>>>> requirements
>> > > >>>>> and
>> > > >>>>>>>>> leads to
>> > > >>>>>>>>>>>> a
>> > > >>>>>>>>>>>>>>> net improvement.
>> > > >>>>>>>>>>>>>>>
>> > > >>>>>>>>>>>>>>> It would be great if we can find a setup that
>> > > >>>>>>>>>>>>>>> allows for
>> > > >>>>>>>>> connectors
>> > > >>>>>>>>>>>> to
>> > > >>>>>>>>>>>>>>> be released independently of core Flink and that
>> > > >>>>>>>>>>>>>>> each
>> > > >>>>>>> connector
>> > > >>>>>>>>> can
>> > > >>>>>>>>>>>> be
>> > > >>>>>>>>>>>>>>> released separately. Flink already has separate
>> > > >>>>>>>>>>>>>>> releases (flink-shaded), so that by itself isn't a
>> > > >>> new thing.
>> > > >>>>>>>>> Per-connector
>> > > >>>>>>>>>>>>>>> releases would need to allow for more frequent
>> > > >>>>>>>>>>>>>>> releases
>> > > >>>>>>>> (without
>> > > >>>>>>>>> the
>> > > >>>>>>>>>>>>>>> baggage that a full Flink release comes with).
>> > > >>>>>>>>>>>>>>>
>> > > >>>>>>>>>>>>>>> Separate releases would only make sense if the core
>> > > >>>>> Flink
>> > > >>>>>>>>> surface is
>> > > >>>>>>>>>>>>>>> fairly stable though. As evident from Iceberg (and
>> > > >>>>>>>>>>>>>>> also
>> > > >>>>>>> Beam),
>> > > >>>>>>>>> that's
>> > > >>>>>>>>>>>>>>> not the case currently. We should probably focus on
>> > > >>>>>>> addressing
>> > > >>>>>>>>> the
>> > > >>>>>>>>>>>>>>> stability first, before splitting code. A success
>> > > >>>>> criteria
>> > > >>>>>>>> could
>> > > >>>>>>>>> be
>> > > >>>>>>>>>>>>>>> that we are able to build Iceberg and Beam against
>> > > >>>>> multiple
>> > > >>>>>>>> Flink
>> > > >>>>>>>>>>>>>>> versions w/o the need to change code. The goal
>> > > >>>>>>>>>>>>>>> would be
>> > > >>>>>> that
>> > > >>>>>>> no
>> > > >>>>>>>>>>>>>>> connector breaks when we make changes to Flink core.
>> > > >>>>> Until
>> > > >>>>>>>>> that's the
>> > > >>>>>>>>>>>>>>> case, code separation creates a setup where 1+1 or
>> > > >>>>>>>>>>>>>>> N+1
>> > > >>>>>>>>> repositories
>> > > >>>>>>>>>>>>>>> need to move lock step.
>> > > >>>>>>>>>>>>>>>
>> > > >>>>>>>>>>>>>>> Regarding some connectors being more important for
>> > > >>>>>>>>>>>>>>> Flink
>> > > >>>>>> than
>> > > >>>>>>>>> others:
>> > > >>>>>>>>>>>>>>> That's a fact. Flink w/o Kafka connector (and few
>> > > >>>>> others)
>> > > >>>>>>> isn't
>> > > >>>>>>>>>>>>>>> viable. Testability of Flink was already brought
>> > > >>>>>>>>>>>>>>> up,
>> > > >>>>> can we
>> > > >>>>>>>>> really
>> > > >>>>>>>>>>>>>>> certify a Flink core release without Kafka
>> > > >> connector?
>> > > >>>>> Maybe
>> > > >>>>>>>> those
>> > > >>>>>>>>>>>>>>> connectors that are used in Flink e2e tests to
>> > > >>>>>>>>>>>>>>> validate
>> > > >>>>>>>>> functionality
>> > > >>>>>>>>>>>>>>> of core Flink should not be broken out?
>> > > >>>>>>>>>>>>>>>
>> > > >>>>>>>>>>>>>>> Finally, I think that the connectors that move into
>> > > >>>>>> separate
>> > > >>>>>>>>> repos
>> > > >>>>>>>>>>>>>>> should remain part of the Apache Flink project.
>> > > >>>>>>>>>>>>>>> Larger
>> > > >>>>>>>>> organizations
>> > > >>>>>>>>>>>>>>> tend to approve the use of and contribution to open
>> > > >>>>> source
>> > > >>>>>> at
>> > > >>>>>>>> the
>> > > >>>>>>>>>>>>>>> project level. Sometimes it is everything ASF. More
>> > > >>>>> often
>> > > >>>>>> it
>> > > >>>>>>> is
>> > > >>>>>>>>>>>>>>> "Apache Foo". It would be fatal to end up with a
>> > > >>>>> patchwork
>> > > >>>>>> of
>> > > >>>>>>>>>>>> projects
>> > > >>>>>>>>>>>>>>> with potentially different licenses and governance
>> > > >>>>>>>>>>>>>>> to
>> > > >>>>>> arrive
>> > > >>>>>>>> at a
>> > > >>>>>>>>>>>>>>> working Flink setup. This may mean we prioritize
>> > > >>>>> usability
>> > > >>>>>>> over
>> > > >>>>>>>>>>>>>>> developer convenience, if that's in the best
>> > > >>>>>>>>>>>>>>> interest of
>> > > >>>>>>> Flink
>> > > >>>>>>>>> as a
>> > > >>>>>>>>>>>>>>> whole.
>> > > >>>>>>>>>>>>>>>
>> > > >>>>>>>>>>>>>>> Thanks,
>> > > >>>>>>>>>>>>>>> Thomas
>> > > >>>>>>>>>>>>>>>
>> > > >>>>>>>>>>>>>>>
>> > > >>>>>>>>>>>>>>>
>> > > >>>>>>>>>>>>>>> On Mon, Oct 18, 2021 at 6:59 AM Chesnay Schepler <
>> > > >>>>>>>>> ches...@apache.org
>> > > >>>>>>>>>>>>>>> wrote:
>> > > >>>>>>>>>>>>>>>> Generally, the issues are reproducibility and
>> > > >>> control.
>> > > >>>>>>>>>>>>>>>> Stuffs completely broken on the Flink side for a
>> > > >>> week?
>> > > >>>>>> Well
>> > > >>>>>>>>> then so
>> > > >>>>>>>>>>>> are
>> > > >>>>>>>>>>>>>>>> the connector repos.
>> > > >>>>>>>>>>>>>>>> (As-is) You can't go back to a previous version of
>> > > >>>>>>>>>>>>>>>> the
>> > > >>>>>>>> snapshot.
>> > > >>>>>>>>>>>> Which
>> > > >>>>>>>>>>>>>>>> also means that checking out older commits can be
>> > > >>>>>>> problematic
>> > > >>>>>>>>>>>> because
>> > > >>>>>>>>>>>>>>>> you'd still work against the latest snapshots, and
>> > > >>>>>>>>>>>>>>>> they
>> > > >>>>>> not
>> > > >>>>>>> be
>> > > >>>>>>>>>>>>>>>> compatible with each other.
>> > > >>>>>>>>>>>>>>>>
>> > > >>>>>>>>>>>>>>>>
>> > > >>>>>>>>>>>>>>>> On 18/10/2021 15:22, Arvid Heise wrote:
>> > > >>>>>>>>>>>>>>>>> I was actually betting on snapshots versions.
>> > > >>>>>>>>>>>>>>>>> What are
>> > > >>>>>> the
>> > > >>>>>>>>> limits?
>> > > >>>>>>>>>>>>>>>>> Obviously, we can only do a release of a 1.15
>> > > >>>>> connector
>> > > >>>>>>> after
>> > > >>>>>>>>> 1.15
>> > > >>>>>>>>>>>> is
>> > > >>>>>>>>>>>>>>>>> release.
>> > > >>>>>>>>>>>>>
>> > > >>>>>>>>>>>
>> > > >>>>>>>>>>> --
>> > > >>>>>>>>>>>
>> > > >>>>>>>>>>> Konstantin Knauf
>> > > >>>>>>>>>>>
>> > > >>>>>>>>>>> https://urldefense.com/v3/__https://twitter.com/snntrable
>> > > >>>>>>>>>>> __;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-
>> > > >>>>>>>>>>> XjpYgX5MUy9M4$ [twitter[.]com]
>> > > >>>>>>>>>>>
>> > > >>>>>>>>>>> https://urldefense.com/v3/__https://github.com/knaufk__;!
>> > > >>>>>>>>>>> !LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-XjpY
>> > > >>>>>>>>>>> gXyX8u50S$ [github[.]com]
>> > > >>>>>>>>>>>
>> > >
>> > >
>> >
>>
>

Re: [DISCUSS] Creating an external connector repository

Reply via email to