Re: [DISCUSS] Creating an external connector repository

Martijn Visser Tue, 11 Jan 2022 11:13:41 -0800

Good question: we want to use the same setup as we currently have for
Flink, so using the existing CI infrastructure.


On Mon, 10 Jan 2022 at 11:19, Chesnay Schepler <ches...@apache.org> wrote:

> What CI resources do you actually intend use? Asking since the ASF GHA
> resources are afaik quite overloaded.
>
> On 05/01/2022 11:48, Martijn Visser wrote:
> > Hi everyone,
> >
> > I wanted to summarise the email thread and see if there are any open
> items
> > that still need to be discussed, before we can finalise the discussion in
> > this email thread:
> >
> > 1. About having multi connectors in one repo or each connector in its own
> > repository
> >
> > As explained by @Arvid Heise <ar...@apache.org> we ultimately propose to
> > have a single repository per connector, which seems to be favoured in the
> > community.
> >
> > 2. About having the connector repositories under ASF or not.
> >
> > The consensus is that all connectors would remain under the ASF.
> >
> > I think we can categorise the questions or concerns that are brought
> > forward as the following one:
> >
> > 3. How would we set up the testing?
> >
> > We need to make sure that we provide a proper testing framework, which
> > means that we provide a public Source- and Sink testing framework. As
> > mentioned extensively in the thread, we need to make sure that the
> > necessary interfaces are properly annotated and at least @PublicEvolving.
> > This also includes the test infrastructure, like MiniCluster. For the
> > latter, we don't know exactly yet how to balance having publicly
> available
> > test infrastructure vs being able to iterate inside of Flink, but we can
> > all agree this has to be solved.
> >
> > For testing infrastructure, we would like to use Github Actions. In the
> > current state, it probably makes sense for a connector repo to follow the
> > branching strategy of Flink. That will ensure a match between the
> released
> > connector and Flink version. This should change when all the Flink
> > interfaces have stabilised so you can use a connector with multiple Flink
> > versions. That means that we should have a nightly build test for:
> >
> > - The `main` branch of the connector (which would be the unreleased
> > version) against the `master` branch of Flink (the unreleased version of
> > Flink).
> > - Any supported `release-X.YY` branch of the connector against the
> > `release-X.YY` branch of Flink.
> >
> > We should also have a smoke test E2E tests in Flink (one for DataStream,
> > one for Table, one for SQL, one for Python) which loads all the
> connectors
> > and does an arbitrary test (post data on source, load into Flink, sink
> > output and compare that output is as expected.
> >
> > 4. How would we integrate documentation?
> >
> > Documentation for a connector should probably end up in the connector
> > repository. The Flink website should contain one entrance to all
> connectors
> > (so not the current approach where we have connectors per DataStream API,
> > Table API etc). Each connector documentation should end up as one menu
> item
> > in connectors, containing all necessary information for all DataStream,
> > Table, SQL and Python implementations.
> >
> > 5. Which connectors should end up in the external connector repo?
> >
> > I'll open up a separate thread on this topic to have a parallel
> discussion
> > on that. We should reach consensus on both threads before we can move
> > forward on this topic as a whole.
> >
> > Best regards,
> >
> > Martijn
> >
> > On Fri, 10 Dec 2021 at 04:47, Thomas Weise <t...@apache.org> wrote:
> >
> >> +1 for repo per connector from my side also
> >>
> >> Thanks for trying out the different approaches.
> >>
> >> Where would the common/infra pieces live? In a separate repository
> >> with its own release?
> >>
> >> Thomas
> >>
> >> On Thu, Dec 9, 2021 at 12:42 PM Till Rohrmann <trohrm...@apache.org>
> >> wrote:
> >>> Sorry if I was a bit unclear. +1 for the single repo per connector
> >> approach.
> >>> Cheers,
> >>> Till
> >>>
> >>> On Thu, Dec 9, 2021 at 5:41 PM Till Rohrmann <trohrm...@apache.org>
> >> wrote:
> >>>> +1 for the single repo approach.
> >>>>
> >>>> Cheers,
> >>>> Till
> >>>>
> >>>> On Thu, Dec 9, 2021 at 3:54 PM Martijn Visser <mart...@ververica.com>
> >>>> wrote:
> >>>>
> >>>>> I also agree that it feels more natural to go with a repo for each
> >>>>> individual connector. Each repository can be made available at
> >>>>> flink-packages.org so users can find them, next to referring to them
> >> in
> >>>>> documentation. +1 from my side.
> >>>>>
> >>>>> On Thu, 9 Dec 2021 at 15:38, Arvid Heise <ar...@apache.org> wrote:
> >>>>>
> >>>>>> Hi all,
> >>>>>>
> >>>>>> We tried out Chesnay's proposal and went with Option 2.
> >> Unfortunately,
> >>>>> we
> >>>>>> experienced tough nuts to crack and feel like we hit a dead end:
> >>>>>> - The main pain point with the outlined Frankensteinian connector
> >> repo
> >>>>> is
> >>>>>> how to handle shared code / infra code. If we have it in some
> >> <common>
> >>>>>> branch, then we need to merge the common branch in the connector
> >> branch
> >>>>> on
> >>>>>> update. However, it's unclear to me how improvements in the common
> >>>>> branch
> >>>>>> that naturally appear while working on a specific connector go back
> >> into
> >>>>>> the common branch. You can't use a pull request from your branch or
> >> else
> >>>>>> your connector code would poison the connector-less common branch.
> >> So
> >>>>> you
> >>>>>> would probably manually copy the files over to a common branch and
> >>>>> create a
> >>>>>> PR branch for that.
> >>>>>> - A weird solution could be to have the common branch as a
> >> submodule in
> >>>>> the
> >>>>>> repo itself (if that's even possible). I'm sure that this setup
> >> would
> >>>>> blow
> >>>>>> up the minds of all newcomers.
> >>>>>> - Similarly, it's mandatory to have safeguards against code from
> >>>>> connector
> >>>>>> A poisoning connector B, common, or main. I had some similar setup
> >> in
> >>>>> the
> >>>>>> past and code from two "distinct" branch types constantly swept
> >> over.
> >>>>>> - We could also say that we simply release <common> independently
> >> and
> >>>>> just
> >>>>>> have a maven (SNAPSHOT) dependency on it. But that would create a
> >> weird
> >>>>>> flow if you need to change in common where you need to constantly
> >> switch
> >>>>>> branches back and forth.
> >>>>>> - In general, Frankensteinian's approach is very switch intensive.
> >> If
> >>>>> you
> >>>>>> maintain 3 connectors and need to fix 1 build stability each at the
> >> same
> >>>>>> time (quite common nowadays for some reason) and you have 2 review
> >>>>> rounds,
> >>>>>> you need to switch branches 9 times ignoring changes to common.
> >>>>>>
> >>>>>> Additionally, we still have the rather user/dev unfriendly main
> >> that is
> >>>>>> mostly empty. I'm also not sure we can generate an overview
> >> README.md to
> >>>>>> make it more friendly here because in theory every connector branch
> >>>>> should
> >>>>>> be based on main and we would get merge conflicts.
> >>>>>>
> >>>>>> I'd like to propose once again to go with individual repositories.
> >>>>>> - The only downside that we discussed so far is that we have more
> >>>>> initial
> >>>>>> setup to do. Since we organically grow the number of
> >>>>> connector/repositories
> >>>>>> that load is quite distributed. We can offer templates after
> >> finding a
> >>>>> good
> >>>>>> approach that can even be used by outside organizations.
> >>>>>> - Regarding secrets, I think it's actually an advantage that the
> >> Kafka
> >>>>>> connector has no access to the AWS secrets. If there are secrets to
> >> be
> >>>>>> shared across connectors, we can and should use Azure's Variable
> >> Groups
> >>>>> (I
> >>>>>> have used it in the past to share Nexus creds across repos). That
> >> would
> >>>>>> also make rotation easy.
> >>>>>> - Working on different connectors would be rather easy as all
> >> modern IDE
> >>>>>> support multiple repo setups in the same project. You still need to
> >> do
> >>>>>> multiple releases in case you update common code (either accessed
> >>>>> through
> >>>>>> Nexus or git submodule) and you want to release your connector.
> >>>>>> - There is no difference in respect to how many CI runs there in
> >> both
> >>>>>> approaches.
> >>>>>> - Individual repositories also have the advantage of allowing
> >> external
> >>>>>> incubation. Let's assume someone builds connector A and hosts it in
> >>>>> their
> >>>>>> organization (very common setup). If they want to contribute the
> >> code to
> >>>>>> Flink, we could simply transfer the repository into ASF after
> >> ensuring
> >>>>>> Flink coding standards. Then we retain git history and Github
> >> issues.
> >>>>>> Is there any point that I'm missing?
> >>>>>>
> >>>>>> On Fri, Nov 26, 2021 at 1:32 PM Chesnay Schepler <
> >> ches...@apache.org>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> For sharing workflows we should be able to use composite actions.
> >> We'd
> >>>>>>> have the main definition files in the flink-connectors repo, that
> >> we
> >>>>>>> also need to tag/release, which other branches/repos can then
> >> import.
> >>>>>>> These are also versioned, so we don't have to worry about
> >> accidentally
> >>>>>>> breaking stuff.
> >>>>>>> These could also be used to enforce certain standards / interfaces
> >>>>> such
> >>>>>>> that we can automate more things (e.g., integration into the Flink
> >>>>>>> documentation).
> >>>>>>>
> >>>>>>> It is true that Option 2) and dedicated repositories share a lot
> >> of
> >>>>>>> properties. While I did say in an offline conversation that we in
> >> that
> >>>>>>> case might just as well use separate repositories, I'm not so sure
> >>>>>>> anymore. One repo would make administration a bit easier, for
> >> example
> >>>>>>> secrets wouldn't have to be applied to each repo (we wouldn't want
> >>>>>>> certain secrets to be set up organization-wide).
> >>>>>>> I overall also like that one repo would present a single access
> >> point;
> >>>>>>> you can't "miss" a connector repo, and I would hope that having
> >> it as
> >>>>>>> one repo would nurture more collaboration between the connectors,
> >>>>> which
> >>>>>>> after all need to solve similar problems.
> >>>>>>>
> >>>>>>> It is a fair point that the branching model would be quite weird,
> >> but
> >>>>> I
> >>>>>>> think that would subside pretty quickly.
> >>>>>>>
> >>>>>>> Personally I'd go with Option 2, and if that doesn't work out we
> >> can
> >>>>>>> still split the repo later on. (Which should then be a trivial
> >> matter
> >>>>> of
> >>>>>>> copying all <connector>/* branches and renaming them).
> >>>>>>>
> >>>>>>> On 26/11/2021 12:47, Till Rohrmann wrote:
> >>>>>>>> Hi Arvid,
> >>>>>>>>
> >>>>>>>> Thanks for updating this thread with the latest findings. The
> >>>>> described
> >>>>>>>> limitations for a single connector repo sound suboptimal to me.
> >>>>>>>>
> >>>>>>>> * Option 2. sounds as if we try to simulate multi connector
> >> repos
> >>>>>> inside
> >>>>>>> of
> >>>>>>>> a single repo. I also don't know how we would share code
> >> between the
> >>>>>>>> different branches (sharing infrastructure would probably be
> >> easier
> >>>>>>>> though). This seems to have the same limitations as dedicated
> >> repos
> >>>>>> with
> >>>>>>>> the downside of having a not very intuitive branching model.
> >>>>>>>> * Isn't option 1. kind of a degenerated version of option 2.
> >> where
> >>>>> we
> >>>>>>> have
> >>>>>>>> some unrelated code from other connectors in the individual
> >>>>> connector
> >>>>>>>> branches?
> >>>>>>>> * Option 3. has the downside that someone creating a release
> >> has to
> >>>>>>> release
> >>>>>>>> all connectors. This means that she either has to sync with the
> >>>>>> different
> >>>>>>>> connector maintainers or has to be able to release all
> >> connectors on
> >>>>>> her
> >>>>>>>> own. We are already seeing in the Flink community that releases
> >>>>> require
> >>>>>>>> quite good communication/coordination between the different
> >> people
> >>>>>>> working
> >>>>>>>> on different Flink components. Given our goals to make connector
> >>>>>> releases
> >>>>>>>> easier and more frequent, I think that coupling different
> >> connector
> >>>>>>>> releases might be counter-productive.
> >>>>>>>>
> >>>>>>>> To me it sounds not very practical to mainly use a mono
> >> repository
> >>>>> w/o
> >>>>>>>> having some more advanced build infrastructure that, for
> >> example,
> >>>>>> allows
> >>>>>>> to
> >>>>>>>> have different git roots in different connector directories.
> >> Maybe
> >>>>> the
> >>>>>>> mono
> >>>>>>>> repo can be a catch all repository for connectors that want to
> >> be
> >>>>>>> released
> >>>>>>>> in lock-step (Option 3.) with all other connectors the repo
> >>>>> contains.
> >>>>>> But
> >>>>>>>> for connectors that get changed frequently, having a dedicated
> >>>>>> repository
> >>>>>>>> that allows independent releases sounds preferable to me.
> >>>>>>>>
> >>>>>>>> What utilities and infrastructure code do you intend to share?
> >> Using
> >>>>>> git
> >>>>>>>> submodules can definitely be one option to share code. However,
> >> it
> >>>>>> might
> >>>>>>>> also be ok to depend on flink-connector-common artifacts which
> >> could
> >>>>>> make
> >>>>>>>> things easier. Where I am unsure is whether git submodules can
> >> be
> >>>>> used
> >>>>>> to
> >>>>>>>> share infrastructure code (e.g. the .github/workflows) because
> >> you
> >>>>> need
> >>>>>>>> these files in the repo to trigger the CI infrastructure.
> >>>>>>>>
> >>>>>>>> Cheers,
> >>>>>>>> Till
> >>>>>>>>
> >>>>>>>> On Thu, Nov 25, 2021 at 1:59 PM Arvid Heise <ar...@apache.org>
> >>>>> wrote:
> >>>>>>>>> Hi Brian,
> >>>>>>>>>
> >>>>>>>>> Thank you for sharing. I think your approach is very valid and
> >> is
> >>>>> in
> >>>>>>> line
> >>>>>>>>> with what I had in mind.
> >>>>>>>>>
> >>>>>>>>> Basically Pravega community aligns the connector releases with
> >> the
> >>>>>>> Pravega
> >>>>>>>>>> mainline release
> >>>>>>>>>>
> >>>>>>>>> This certainly would mean that there is little value in
> >> coupling
> >>>>>>> connector
> >>>>>>>>> versions. So it's making a good case for having separate
> >> connector
> >>>>>>> repos.
> >>>>>>>>>
> >>>>>>>>>> and maintains the connector with the latest 3 Flink
> >> versions(CI
> >>>>> will
> >>>>>>>>>> publish snapshots for all these 3 branches)
> >>>>>>>>>>
> >>>>>>>>> I'd like to give connector devs a simple way to express to
> >> which
> >>>>> Flink
> >>>>>>>>> versions the current branch is compatible. From there we can
> >>>>> generate
> >>>>>>> the
> >>>>>>>>> compatibility matrix automatically and optionally also create
> >>>>>> different
> >>>>>>>>> releases per supported Flink version. Not sure if the latter is
> >>>>> indeed
> >>>>>>>>> better than having just one artifact that happens to run with
> >>>>> multiple
> >>>>>>>>> Flink versions. I guess it depends on what dependencies we are
> >>>>>>> exposing. If
> >>>>>>>>> the connector uses flink-connector-base, then we probably need
> >>>>>> separate
> >>>>>>>>> artifacts with poms anyways.
> >>>>>>>>>
> >>>>>>>>> Best,
> >>>>>>>>>
> >>>>>>>>> Arvid
> >>>>>>>>>
> >>>>>>>>> On Fri, Nov 19, 2021 at 10:55 AM Zhou, Brian <b.z...@dell.com>
> >>>>> wrote:
> >>>>>>>>>> Hi Arvid,
> >>>>>>>>>>
> >>>>>>>>>> For branching model, the Pravega Flink connector has some
> >>>>> experience
> >>>>>>> what
> >>>>>>>>>> I would like to share. Here[1][2] is the compatibility matrix
> >> and
> >>>>>> wiki
> >>>>>>>>>> explaining the branching model and releases. Basically Pravega
> >>>>>>> community
> >>>>>>>>>> aligns the connector releases with the Pravega mainline
> >> release,
> >>>>> and
> >>>>>>>>>> maintains the connector with the latest 3 Flink versions(CI
> >> will
> >>>>>>> publish
> >>>>>>>>>> snapshots for all these 3 branches).
> >>>>>>>>>> For example, recently we have 0.10.1 release[3], and in maven
> >>>>> central
> >>>>>>> we
> >>>>>>>>>> need to upload three artifacts(For Flink 1.13, 1.12, 1.11) for
> >>>>> 0.10.1
> >>>>>>>>>> version[4].
> >>>>>>>>>>
> >>>>>>>>>> There are some alternatives. Another solution that we once
> >>>>> discussed
> >>>>>>> but
> >>>>>>>>>> finally got abandoned is to have a independent version just
> >> like
> >>>>> the
> >>>>>>>>>> current CDC connector, and then give a big compatibility
> >> matrix to
> >>>>>>> users.
> >>>>>>>>>> We think it would be too confusing when the connector
> >> develops. On
> >>>>>> the
> >>>>>>>>>> contrary, we can also do the opposite way to align with Flink
> >>>>> version
> >>>>>>> and
> >>>>>>>>>> maintain several branches for different system version.
> >>>>>>>>>>
> >>>>>>>>>> I would say this is only a fairly-OK solution because it is a
> >> bit
> >>>>>>> painful
> >>>>>>>>>> for maintainers as cherry-picks are very common and releases
> >> would
> >>>>>>>>> require
> >>>>>>>>>> much work. However, if neither systems do not have a nice
> >> backward
> >>>>>>>>>> compatibility, there seems to be no comfortable solution to
> >> the
> >>>>> their
> >>>>>>>>>> connector.
> >>>>>>>>>>
> >>>>>>>>>> [1]
> >>>>> https://github.com/pravega/flink-connectors#compatibility-matrix
> >>>>>>>>>> [2]
> >>>>>>>>>>
> >>
> https://github.com/pravega/flink-connectors/wiki/Versioning-strategy-for-Flink-connector
> >>>>>>>>>> [3]
> >>>>> https://github.com/pravega/flink-connectors/releases/tag/v0.10.1
> >>>>>>>>>> [4]
> >> https://search.maven.org/search?q=pravega-connectors-flink
> >>>>>>>>>> Best Regards,
> >>>>>>>>>> Brian
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Internal Use - Confidential
> >>>>>>>>>>
> >>>>>>>>>> -----Original Message-----
> >>>>>>>>>> From: Arvid Heise <ar...@apache.org>
> >>>>>>>>>> Sent: Friday, November 19, 2021 4:12 PM
> >>>>>>>>>> To: dev
> >>>>>>>>>> Subject: Re: [DISCUSS] Creating an external connector
> >> repository
> >>>>>>>>>>
> >>>>>>>>>> [EXTERNAL EMAIL]
> >>>>>>>>>>
> >>>>>>>>>> Hi everyone,
> >>>>>>>>>>
> >>>>>>>>>> we are currently in the process of setting up the
> >> flink-connectors
> >>>>>> repo
> >>>>>>>>>> [1] for new connectors but we hit a wall that we currently
> >> cannot
> >>>>>> take:
> >>>>>>>>>> branching model.
> >>>>>>>>>> To reiterate the original motivation of the external connector
> >>>>> repo:
> >>>>>> We
> >>>>>>>>>> want to decouple the release cycle of a connector with Flink.
> >>>>>> However,
> >>>>>>> if
> >>>>>>>>>> we want to support semantic versioning in the connectors with
> >> the
> >>>>>>> ability
> >>>>>>>>>> to introduce breaking changes through major version bumps and
> >>>>> support
> >>>>>>>>>> bugfixes on old versions, then we need release branches
> >> similar to
> >>>>>> how
> >>>>>>>>>> Flink core operates.
> >>>>>>>>>> Consider two connectors, let's call them kafka and hbase. We
> >> have
> >>>>>> kafka
> >>>>>>>>> in
> >>>>>>>>>> version 1.0.X, 1.1.Y (small improvement), 2.0.Z (config
> >> option)
> >>>>>> change
> >>>>>>>>> and
> >>>>>>>>>> hbase only on 1.0.A.
> >>>>>>>>>>
> >>>>>>>>>> Now our current assumption was that we can work with a
> >> mono-repo
> >>>>>> under
> >>>>>>>>> ASF
> >>>>>>>>>> (flink-connectors). Then, for release-branches, we found 3
> >>>>> options:
> >>>>>>>>>> 1. We would need to create some ugly mess with the cross
> >> product
> >>>>> of
> >>>>>>>>>> connector and version: so you have kafka-release-1.0,
> >>>>>>> kafka-release-1.1,
> >>>>>>>>>> kafka-release-2.0, hbase-release-1.0. The main issue is not
> >> the
> >>>>>> amount
> >>>>>>> of
> >>>>>>>>>> branches (that's something that git can handle) but there the
> >>>>> state
> >>>>>> of
> >>>>>>>>>> kafka is undefined in hbase-release-1.0. That's a call for
> >>>>> desaster
> >>>>>> and
> >>>>>>>>>> makes releasing connectors very cumbersome (CI would only
> >> execute
> >>>>> and
> >>>>>>>>>> publish hbase SNAPSHOTS on hbase-release-1.0).
> >>>>>>>>>> 2. We could avoid the undefined state by having an empty
> >> master
> >>>>> and
> >>>>>>> each
> >>>>>>>>>> release branch really only holds the code of the connector.
> >> But
> >>>>>> that's
> >>>>>>>>> also
> >>>>>>>>>> not great: any user that looks at the repo and sees no
> >> connector
> >>>>>> would
> >>>>>>>>>> assume that it's dead.
> >>>>>>>>>> 3. We could have synced releases similar to the CDC connectors
> >>>>> [2].
> >>>>>>> That
> >>>>>>>>>> means that if any connector introduces a breaking change, all
> >>>>>>> connectors
> >>>>>>>>>> get a new major. I find that quite confusing to a user if
> >> hbase
> >>>>> gets
> >>>>>> a
> >>>>>>>>> new
> >>>>>>>>>> release without any change because kafka introduced a breaking
> >>>>>> change.
> >>>>>>>>>> To fully decouple release cycles and CI of connectors, we
> >> could
> >>>>> add
> >>>>>>>>>> individual repositories under ASF (flink-connector-kafka,
> >>>>>>>>>> flink-connector-hbase). Then we can apply the same branching
> >>>>> model as
> >>>>>>>>>> before. I quickly checked if there are precedences in the
> >> apache
> >>>>>>>>> community
> >>>>>>>>>> for that approach and just by scanning alphabetically I found
> >>>>> cordova
> >>>>>>>>> with
> >>>>>>>>>> 70 and couchdb with 77 apache repos respectively. So it
> >> certainly
> >>>>>> seems
> >>>>>>>>>> like other projects approached our problem in that way and the
> >>>>> apache
> >>>>>>>>>> organization is okay with that. I currently expect max 20
> >>>>> additional
> >>>>>>>>> repos
> >>>>>>>>>> for connectors and in the future 10 max each for formats and
> >>>>>>> filesystems
> >>>>>>>>> if
> >>>>>>>>>> we would also move them out at some point in time. So we
> >> would be
> >>>>> at
> >>>>>> a
> >>>>>>>>>> total of 50 repos.
> >>>>>>>>>>
> >>>>>>>>>> Note for all options, we need to provide a compability matrix
> >>>>> that we
> >>>>>>> aim
> >>>>>>>>>> to autogenerate.
> >>>>>>>>>>
> >>>>>>>>>> Now for the potential downsides that we internally discussed:
> >>>>>>>>>> - How can we ensure common infra structure code, utilties, and
> >>>>>> quality?
> >>>>>>>>>> I propose to add a flink-connector-common that contains all
> >> these
> >>>>>>> things
> >>>>>>>>>> and is added as a git submodule/subtree to the repos.
> >>>>>>>>>> - Do we implicitly discourage connector developers to maintain
> >>>>> more
> >>>>>>> than
> >>>>>>>>>> one connector with a fragmented code base?
> >>>>>>>>>> That is certainly a risk. However, I currently also see few
> >> devs
> >>>>>>> working
> >>>>>>>>>> on more than one connector. However, it may actually help
> >> keeping
> >>>>> the
> >>>>>>>>> devs
> >>>>>>>>>> that maintain a specific connector on the hook. We could use
> >>>>> github
> >>>>>>>>> issues
> >>>>>>>>>> to track bugs and feature requests and a dev can focus his
> >> limited
> >>>>>> time
> >>>>>>>>> on
> >>>>>>>>>> getting that one connector right.
> >>>>>>>>>>
> >>>>>>>>>> So WDYT? Compared to some intermediate suggestions with split
> >>>>> repos,
> >>>>>>> the
> >>>>>>>>>> big difference is that everything remains under Apache
> >> umbrella
> >>>>> and
> >>>>>> the
> >>>>>>>>>> Flink community.
> >>>>>>>>>>
> >>>>>>>>>> [1]
> >>>>>>>>>>
> >>
> https://urldefense.com/v3/__https://github.com/apache/flink-connectors__;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-XjpYgXzxxweh4$
> >>>>>>>>>> [github[.]com] [2]
> >>>>>>>>>>
> >>
> https://urldefense.com/v3/__https://github.com/ververica/flink-cdc-connectors/__;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-XjpYgXzgoPGA8$
> >>>>>>>>>> [github[.]com]
> >>>>>>>>>>
> >>>>>>>>>> On Fri, Nov 12, 2021 at 3:39 PM Arvid Heise <ar...@apache.org
> >>>>>> wrote:
> >>>>>>>>>>> Hi everyone,
> >>>>>>>>>>>
> >>>>>>>>>>> I created the flink-connectors repo [1] to advance the
> >> topic. We
> >>>>>> would
> >>>>>>>>>>> create a proof-of-concept in the next few weeks as a special
> >>>>> branch
> >>>>>>>>>>> that I'd then use for discussions. If the community agrees
> >> with
> >>>>> the
> >>>>>>>>>>> approach, that special branch will become the master. If
> >> not, we
> >>>>> can
> >>>>>>>>>>> reiterate over it or create competing POCs.
> >>>>>>>>>>>
> >>>>>>>>>>> If someone wants to try things out in parallel, just make
> >> sure
> >>>>> that
> >>>>>>>>>>> you are not accidentally pushing POCs to the master.
> >>>>>>>>>>>
> >>>>>>>>>>> As a reminder: We will not move out any current connector
> >> from
> >>>>> Flink
> >>>>>>>>>>> at this point in time, so everything in Flink will remain as
> >> is
> >>>>> and
> >>>>>> be
> >>>>>>>>>>> maintained there.
> >>>>>>>>>>>
> >>>>>>>>>>> Best,
> >>>>>>>>>>>
> >>>>>>>>>>> Arvid
> >>>>>>>>>>>
> >>>>>>>>>>> [1]
> >>>>>>>>>>>
> >> https://urldefense.com/v3/__https://github.com/apache/flink-connectors
> >> __;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-XjpYgXzxxweh4
> >>>>>>>>>>> $ [github[.]com]
> >>>>>>>>>>>
> >>>>>>>>>>> On Fri, Oct 29, 2021 at 6:57 PM Till Rohrmann <
> >>>>> trohrm...@apache.org
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Hi everyone,
> >>>>>>>>>>>>
> >>>>>>>>>>>>   From the discussion, it seems to me that we have different
> >>>>>> opinions
> >>>>>>>>>>>> whether to have an ASF umbrella repository or to host them
> >>>>> outside
> >>>>>> of
> >>>>>>>>>>>> the ASF. It also seems that this is not really the problem
> >> to
> >>>>>> solve.
> >>>>>>>>>>>> Since there are many good arguments for either approach, we
> >>>>> could
> >>>>>>>>>>>> simply start with an ASF umbrella repository and see how
> >> people
> >>>>>> adopt
> >>>>>>>>>>>> it. If the individual connectors cannot move fast enough or
> >> if
> >>>>>> people
> >>>>>>>>>>>> prefer to not buy into the more heavy-weight ASF processes,
> >> then
> >>>>>> they
> >>>>>>>>>>>> can host the code also somewhere else. We simply need to
> >> make
> >>>>> sure
> >>>>>>>>>>>> that these connectors are discoverable (e.g. via
> >>>>> flink-packages).
> >>>>>>>>>>>> The more important problem seems to be to provide common
> >> tooling
> >>>>>>>>>>>> (testing, infrastructure, documentation) that can easily be
> >>>>> reused.
> >>>>>>>>>>>> Similarly, it has become clear that the Flink community
> >> needs to
> >>>>>>>>>>>> improve on providing stable APIs. I think it is not
> >> realistic to
> >>>>>>>>>>>> first complete these tasks before starting to move
> >> connectors to
> >>>>>>>>>>>> dedicated repositories. As Stephan said, creating a
> >> connector
> >>>>>>>>>>>> repository will force us to pay more attention to API
> >> stability
> >>>>> and
> >>>>>>>>>>>> also to think about which testing tools are required.
> >> Hence, I
> >>>>>>>>>>>> believe that starting to add connectors to a different
> >>>>> repository
> >>>>>>>>>>>> than apache/flink will help improve our connector tooling
> >>>>>> (declaring
> >>>>>>>>>>>> testing classes as public, creating a common test utility
> >> repo,
> >>>>>>>>>>>> creating a repo
> >>>>>>>>>>>> template) and vice versa. Hence, I like Arvid's proposed
> >>>>> process as
> >>>>>>>>>>>> it will start kicking things off w/o letting this effort
> >> fizzle
> >>>>>> out.
> >>>>>>>>>>>> Cheers,
> >>>>>>>>>>>> Till
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Thu, Oct 28, 2021 at 11:44 AM Stephan Ewen <
> >> se...@apache.org
> >>>>>>>>> wrote:
> >>>>>>>>>>>>> Thank you all, for the nice discussion!
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>   From my point of view, I very much like the idea of
> >> putting
> >>>>>>>>>>>>> connectors
> >>>>>>>>>>>> in a
> >>>>>>>>>>>>> separate repository. But I would argue it should be part of
> >>>>> Apache
> >>>>>>>>>>>> Flink,
> >>>>>>>>>>>>> similar to flink-statefun, flink-ml, etc.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I share many of the reasons for that:
> >>>>>>>>>>>>>     - As argued many times, reduces complexity of the Flink
> >>>>> repo,
> >>>>>>>>>>>> increases
> >>>>>>>>>>>>> response times of CI, etc.
> >>>>>>>>>>>>>     - Much lower barrier of contribution, because an
> >> unstable
> >>>>>>>>>>>>> connector
> >>>>>>>>>>>> would
> >>>>>>>>>>>>> not de-stabilize the whole build. Of course, we would need
> >> to
> >>>>> make
> >>>>>>>>>>>>> sure
> >>>>>>>>>>>> we
> >>>>>>>>>>>>> set this up the right way, with connectors having
> >> individual CI
> >>>>>>>>>>>>> runs,
> >>>>>>>>>>>> build
> >>>>>>>>>>>>> status, etc. But it certainly seems possible.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I would argue some points a bit different than some cases
> >> made
> >>>>>>>>> before:
> >>>>>>>>>>>>> (a) I believe the separation would increase connector
> >>>>> stability.
> >>>>>>>>>>>> Because it
> >>>>>>>>>>>>> really forces us to work with the connectors against the
> >> APIs
> >>>>> like
> >>>>>>>>>>>>> any external developer. A mono repo is somehow the wrong
> >> thing
> >>>>> if
> >>>>>>>>>>>>> you in practice want to actually guarantee stable internal
> >>>>> APIs at
> >>>>>>>>>> some layer.
> >>>>>>>>>>>>> Because the mono repo makes it easy to just change
> >> something on
> >>>>>>>>>>>>> both
> >>>>>>>>>>>> sides
> >>>>>>>>>>>>> of the API (provider and consumer) seamlessly.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Major refactorings in Flink need to keep all connector API
> >>>>>>>>>>>>> contracts intact, or we need to have a new version of the
> >>>>>> connector
> >>>>>>>>>> API.
> >>>>>>>>>>>>> (b) We may even be able to go towards more lightweight and
> >>>>>>>>>>>>> automated releases over time, even if we stay in Apache
> >> Flink
> >>>>> with
> >>>>>>>>>> that repo.
> >>>>>>>>>>>>> This isn't yet fully aligned with the Apache release
> >> policies,
> >>>>>> yet,
> >>>>>>>>>>>>> but there are board discussions about whether there can be
> >>>>>>>>>>>>> bot-triggered releases (by dependabot) and how that could
> >> fit
> >>>>> into
> >>>>>>>>>> the Apache process.
> >>>>>>>>>>>>> This doesn't seem to be quite there just yet, but seeing
> >> that
> >>>>>> those
> >>>>>>>>>>>> start
> >>>>>>>>>>>>> is a good sign, and there is a good chance we can do some
> >>>>> things
> >>>>>>>>>> there.
> >>>>>>>>>>>>> I am not sure whether we should let bots trigger releases,
> >>>>> because
> >>>>>>>>>>>>> a
> >>>>>>>>>>>> final
> >>>>>>>>>>>>> human look at things isn't a bad thing, especially given
> >> the
> >>>>>>>>>>>>> popularity
> >>>>>>>>>>>> of
> >>>>>>>>>>>>> software supply chain attacks recently.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I do share Chesnay's concerns about complexity in tooling,
> >>>>> though.
> >>>>>>>>>>>>> Both release tooling and test tooling. They are not
> >>>>> incompatible
> >>>>>>>>>>>>> with that approach, but they are a task we need to tackle
> >>>>> during
> >>>>>>>>>>>>> this change which will add additional work.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Tue, Oct 26, 2021 at 10:31 AM Arvid Heise <
> >> ar...@apache.org
> >>>>>>>>>> wrote:
> >>>>>>>>>>>>>> Hi folks,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I think some questions came up and I'd like to address the
> >>>>>>>>>>>>>> question of
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>> timing.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Could you clarify what release cadence you're thinking of?
> >>>>>>>>>>>>>> There's
> >>>>>>>>>>>> quite
> >>>>>>>>>>>>>>> a big range that fits "more frequent than Flink"
> >> (per-commit,
> >>>>>>>>>>>>>>> daily, weekly, bi-weekly, monthly, even bi-monthly).
> >>>>>>>>>>>>>> The short answer is: as often as needed:
> >>>>>>>>>>>>>> - If there is a CVE in a dependency and we need to bump
> >> it -
> >>>>>>>>>>>>>> release immediately.
> >>>>>>>>>>>>>> - If there is a new feature merged, release soonish. We
> >> may
> >>>>>>>>>>>>>> collect a
> >>>>>>>>>>>> few
> >>>>>>>>>>>>>> successive features before a release.
> >>>>>>>>>>>>>> - If there is a bugfix, release immediately or soonish
> >>>>> depending
> >>>>>>>>>>>>>> on
> >>>>>>>>>>>> the
> >>>>>>>>>>>>>> severity and if there are workarounds available.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> We should not limit ourselves; the whole idea of
> >> independent
> >>>>>>>>>>>>>> releases
> >>>>>>>>>>>> is
> >>>>>>>>>>>>>> exactly that you release as needed. There is no release
> >>>>> planning
> >>>>>>>>>>>>>> or anything needed, you just go with a release as if it
> >> was an
> >>>>>>>>>>>>>> external artifact.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> (1) is the connector API already stable?
> >>>>>>>>>>>>>>>   From another discussion thread [1], connector API is far
> >>>>> from
> >>>>>>>>>>>> stable.
> >>>>>>>>>>>>>>> Currently, it's hard to build connectors against multiple
> >>>>> Flink
> >>>>>>>>>>>>> versions.
> >>>>>>>>>>>>>>> There are breaking API changes both in 1.12 -> 1.13 and
> >> 1.13
> >>>>> ->
> >>>>>>>>>>>>>>> 1.14
> >>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>    maybe also in the future versions,  because Table
> >> related
> >>>>> APIs
> >>>>>>>>>>>>>>> are
> >>>>>>>>>>>>> still
> >>>>>>>>>>>>>>> @PublicEvolving and new Sink API is still @Experimental.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>> The question is: what is stable in an evolving system? We
> >>>>>>>>>>>>>> recently discovered that the old SourceFunction needed to
> >> be
> >>>>>>>>>>>>>> refined such that cancellation works correctly [1]. So
> >> that
> >>>>>>>>>>>>>> interface is in Flink since
> >>>>>>>>>>>> 7
> >>>>>>>>>>>>>> years, heavily used also outside, and we still had to
> >> change
> >>>>> the
> >>>>>>>>>>>> contract
> >>>>>>>>>>>>>> in a way that I'd expect any implementer to recheck their
> >>>>>>>>>>>> implementation.
> >>>>>>>>>>>>>> It might not be necessary to change anything and you can
> >>>>> probably
> >>>>>>>>>>>> change
> >>>>>>>>>>>>>> the the code for all Flink versions but still, the
> >> interface
> >>>>> was
> >>>>>>>>>>>>>> not
> >>>>>>>>>>>>> stable
> >>>>>>>>>>>>>> in the closest sense.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> If we focus just on API changes on the unified interfaces,
> >>>>> then
> >>>>>>>>>>>>>> we
> >>>>>>>>>>>> expect
> >>>>>>>>>>>>>> one more change to Sink API to support compaction. For
> >> Table
> >>>>> API,
> >>>>>>>>>>>> there
> >>>>>>>>>>>>>> will most likely also be some changes in 1.15. So we could
> >>>>> wait
> >>>>>>>>>>>>>> for
> >>>>>>>>>>>> 1.15.
> >>>>>>>>>>>>>> But I'm questioning if that's really necessary because we
> >> will
> >>>>>>>>>>>>>> add
> >>>>>>>>>>>> more
> >>>>>>>>>>>>>> functionality beyond 1.15 without breaking API. For
> >> example,
> >>>>> we
> >>>>>>>>>>>>>> may
> >>>>>>>>>>>> add
> >>>>>>>>>>>>>> more unified connector metrics. If you want to use it in
> >> your
> >>>>>>>>>>>> connector,
> >>>>>>>>>>>>>> you have to support multiple Flink versions anyhow. So
> >> rather
> >>>>>>>>>>>>>> then
> >>>>>>>>>>>>> focusing
> >>>>>>>>>>>>>> the discussion on "when is stuff stable", I'd rather
> >> focus on
> >>>>>>>>>>>>>> "how
> >>>>>>>>>>>> can we
> >>>>>>>>>>>>>> support building connectors against multiple Flink
> >> versions"
> >>>>> and
> >>>>>>>>>>>>>> make
> >>>>>>>>>>>> it
> >>>>>>>>>>>>> as
> >>>>>>>>>>>>>> painless as possible.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Chesnay pointed out to use different branches for
> >> different
> >>>>> Flink
> >>>>>>>>>>>>> versions
> >>>>>>>>>>>>>> which sounds like a good suggestion. With a mono-repo, we
> >>>>> can't
> >>>>>>>>>>>>>> use branches differently anyways (there is no way to have
> >>>>> release
> >>>>>>>>>>>>>> branches
> >>>>>>>>>>>>> per
> >>>>>>>>>>>>>> connector without chaos). In these branches, we could
> >> provide
> >>>>>>>>>>>>>> shims to simulate future features in older Flink versions
> >> such
> >>>>>>>>>>>>>> that code-wise,
> >>>>>>>>>>>> the
> >>>>>>>>>>>>>> source code of a specific connector may not diverge
> >> (much).
> >>>>> For
> >>>>>>>>>>>> example,
> >>>>>>>>>>>>> to
> >>>>>>>>>>>>>> register unified connector metrics, we could simulate the
> >>>>> current
> >>>>>>>>>>>>> approach
> >>>>>>>>>>>>>> also in some utility package of the mono-repo.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I see the stable core Flink API as a prerequisite for
> >>>>> modularity.
> >>>>>>>>>>>>>> And
> >>>>>>>>>>>>>>> for connectors it is not just the source and sink API
> >> (source
> >>>>>>>>>>>>>>> being stable as of 1.14), but everything that is
> >> required to
> >>>>>>>>>>>>>>> build and maintain a connector downstream, such as the
> >> test
> >>>>>>>>>>>>>>> utilities and infrastructure.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>> That is a very fair point. I'm actually surprised to see
> >> that
> >>>>>>>>>>>>>> MiniClusterWithClientResource is not public. I see it
> >> being
> >>>>> used
> >>>>>>>>>>>>>> in
> >>>>>>>>>>>> all
> >>>>>>>>>>>>>> connectors, especially outside of Flink. I fear that as
> >> long
> >>>>> as
> >>>>>>>>>>>>>> we do
> >>>>>>>>>>>> not
> >>>>>>>>>>>>>> have connectors outside, we will not properly annotate and
> >>>>>>>>>>>>>> maintain
> >>>>>>>>>>>> these
> >>>>>>>>>>>>>> utilties in a classic hen-and-egg-problem. I will outline
> >> an
> >>>>> idea
> >>>>>>>>>>>>>> at
> >>>>>>>>>>>> the
> >>>>>>>>>>>>>> end.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> the connectors need to be adopted and require at least
> >> one
> >>>>>>>>>>>>>>> release
> >>>>>>>>>>>> per
> >>>>>>>>>>>>>>> Flink minor release.
> >>>>>>>>>>>>>>> However, this will make the releases of connectors
> >> slower,
> >>>>> e.g.
> >>>>>>>>>>>>> maintain
> >>>>>>>>>>>>>>> features for multiple branches and release multiple
> >> branches.
> >>>>>>>>>>>>>>> I think the main purpose of having an external connector
> >>>>>>>>>>>>>>> repository
> >>>>>>>>>>>> is
> >>>>>>>>>>>>> in
> >>>>>>>>>>>>>>> order to have "faster releases of connectors"?
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Imagine a project with a complex set of dependencies.
> >> Let's
> >>>>> say
> >>>>>>>>>>>> Flink
> >>>>>>>>>>>>>>> version A plus Flink reliant dependencies released by
> >> other
> >>>>>>>>>>>>>>> projects (Flink-external connectors, Beam, Iceberg, Hudi,
> >>>>> ..).
> >>>>>>>>>>>>>>> We don't want
> >>>>>>>>>>>> a
> >>>>>>>>>>>>>>> situation where we bump the core Flink version to B and
> >>>>> things
> >>>>>>>>>>>>>>> fall apart (interface changes, utilities that were
> >> useful but
> >>>>>>>>>>>>>>> not public, transitive dependencies etc.).
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Yes, that's why I wanted to automate the processes more
> >> which
> >>>>> is
> >>>>>>>>>>>>>> not
> >>>>>>>>>>>> that
> >>>>>>>>>>>>>> easy under ASF. Maybe we automate the source provision
> >> across
> >>>>>>>>>>>> supported
> >>>>>>>>>>>>>> versions and have 1 vote thread for all versions of a
> >>>>> connector?
> >>>>>>>>>>>>>>   From the perspective of CDC connector maintainers, the
> >>>>> biggest
> >>>>>>>>>>>> advantage
> >>>>>>>>>>>>> of
> >>>>>>>>>>>>>>> maintaining it outside of the Flink project is that:
> >>>>>>>>>>>>>>> 1) we can have a more flexible and faster release cycle
> >>>>>>>>>>>>>>> 2) we can be more liberal with committership for
> >> connector
> >>>>>>>>>>>> maintainers
> >>>>>>>>>>>>>>> which can also attract more committers to help the
> >> release.
> >>>>>>>>>>>>>>> Personally, I think maintaining one connector repository
> >>>>> under
> >>>>>>>>>>>>>>> the
> >>>>>>>>>>>> ASF
> >>>>>>>>>>>>>> may
> >>>>>>>>>>>>>>> not have the above benefits.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Yes, I also feel that ASF is too restrictive for our
> >> needs.
> >>>>> But
> >>>>>>>>>>>>>> it
> >>>>>>>>>>>> feels
> >>>>>>>>>>>>>> like there are too many that see it differently and I
> >> think we
> >>>>>>>>>>>>>> need
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> (2) Flink testability without connectors.
> >>>>>>>>>>>>>>> This is a very good question. How can we guarantee the
> >> new
> >>>>>>>>>>>>>>> Source
> >>>>>>>>>>>> and
> >>>>>>>>>>>>>> Sink
> >>>>>>>>>>>>>>> API are stable with only test implementation?
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>> We can't and shouldn't. Since the connector repo is
> >> managed by
> >>>>>>>>>>>>>> Flink,
> >>>>>>>>>>>> a
> >>>>>>>>>>>>>> Flink release manager needs to check if the Flink
> >> connectors
> >>>>> are
> >>>>>>>>>>>> actually
> >>>>>>>>>>>>>> working prior to creating an RC. That's similar to how
> >>>>>>>>>>>>>> flink-shaded
> >>>>>>>>>>>> and
> >>>>>>>>>>>>>> flink core are related.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> So here is one idea that I had to get things rolling. We
> >> are
> >>>>>>>>>>>>>> going to address the external repo iteratively without
> >>>>>>>>>>>>>> compromising what we
> >>>>>>>>>>>>> already
> >>>>>>>>>>>>>> have:
> >>>>>>>>>>>>>> 1.Phase, add new contributions to external repo. We use
> >> that
> >>>>> time
> >>>>>>>>>>>>>> to
> >>>>>>>>>>>>> setup
> >>>>>>>>>>>>>> infra accordingly and optimize release processes. We will
> >>>>>>>>>>>>>> identify
> >>>>>>>>>>>> test
> >>>>>>>>>>>>>> utilities that are not yet public/stable and fix that.
> >>>>>>>>>>>>>> 2.Phase, add ports to the new unified interfaces of
> >> existing
> >>>>>>>>>>>> connectors.
> >>>>>>>>>>>>>> That requires a previous Flink release to make utilities
> >>>>> stable.
> >>>>>>>>>>>>>> Keep
> >>>>>>>>>>>> old
> >>>>>>>>>>>>>> interfaces in flink-core.
> >>>>>>>>>>>>>> 3.Phase, remove old interfaces in flink-core of some
> >>>>> connectors
> >>>>>>>>>>>>>> (tbd
> >>>>>>>>>>>> at a
> >>>>>>>>>>>>>> later point).
> >>>>>>>>>>>>>> 4.Phase, optionally move all remaining connectors (tbd at
> >> a
> >>>>> later
> >>>>>>>>>>>> point).
> >>>>>>>>>>>>>> I'd envision having ~3 months between the starting the
> >>>>> different
> >>>>>>>>>>>> phases.
> >>>>>>>>>>>>>> WDYT?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> [1]
> >>>>>>>>>>>>>>
> >>>>>> https://urldefense.com/v3/__https://issues.apache.org/jira/browse
> >>>>> /FLINK-23527__;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgd
> >>>>>>>>>>>>>> ke_-XjpYgX2sIvAP4$ [issues[.]apache[.]org]
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Thu, Oct 21, 2021 at 7:12 AM Kyle Bendickson <
> >>>>> k...@tabular.io
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>> Hi all,
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> My name is Kyle and I’m an open source developer
> >> primarily
> >>>>>>>>>>>>>>> focused
> >>>>>>>>>>>> on
> >>>>>>>>>>>>>>> Apache Iceberg.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I’m happy to help clarify or elaborate on any aspect of
> >> our
> >>>>>>>>>>>> experience
> >>>>>>>>>>>>>>> working on a relatively decoupled connector that is
> >>>>> downstream
> >>>>>>>>>>>>>>> and
> >>>>>>>>>>>>> pretty
> >>>>>>>>>>>>>>> popular.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I’d also love to be able to contribute or assist in any
> >> way I
> >>>>>>>>> can.
> >>>>>>>>>>>>>>> I don’t mean to thread jack, but are there any meetings
> >> or
> >>>>>>>>>>>>>>> community
> >>>>>>>>>>>>> sync
> >>>>>>>>>>>>>>> ups, specifically around the connector APIs, that I might
> >>>>> join
> >>>>>>>>>>>>>>> / be
> >>>>>>>>>>>>>> invited
> >>>>>>>>>>>>>>> to?
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I did want to add that even though I’ve experienced some
> >> of
> >>>>> the
> >>>>>>>>>>>>>>> pain
> >>>>>>>>>>>>>> points
> >>>>>>>>>>>>>>> of integrating with an evolving system / API (catalog
> >> support
> >>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>> generally
> >>>>>>>>>>>>>>> speaking pretty new everywhere really in this space), I
> >> also
> >>>>>>>>>>>>>>> agree personally that you shouldn’t slow down development
> >>>>>>>>>>>>>>> velocity too
> >>>>>>>>>>>> much
> >>>>>>>>>>>>> for
> >>>>>>>>>>>>>>> the sake of external connector. Getting to a performant
> >> and
> >>>>>>>>>>>>>>> stable
> >>>>>>>>>>>>> place
> >>>>>>>>>>>>>>> should be the primary goal, and slowing that down to
> >> support
> >>>>>>>>>>>> stragglers
> >>>>>>>>>>>>>>> will (in my personal opinion) always be a losing game.
> >> Some
> >>>>>>>>>>>>>>> folks
> >>>>>>>>>>>> will
> >>>>>>>>>>>>>>> simply stay behind on versions regardless until they
> >> have to
> >>>>>>>>>>>> upgrade.
> >>>>>>>>>>>>>>> I am working on ensuring that the Iceberg community stays
> >>>>>>>>>>>>>>> within 1-2 versions of Flink, so that we can help provide
> >>>>> more
> >>>>>>>>>>>>>>> feedback or
> >>>>>>>>>>>>>> contribute
> >>>>>>>>>>>>>>> things that might make our ability to support multiple
> >> Flink
> >>>>>>>>>>>> runtimes /
> >>>>>>>>>>>>>>> versions with one project / codebase and minimal to no
> >>>>>>>>>>>>>>> reflection
> >>>>>>>>>>>> (our
> >>>>>>>>>>>>>>> desired goal).
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> If there’s anything I can do or any way I can be of
> >>>>> assistance,
> >>>>>>>>>>>> please
> >>>>>>>>>>>>>>> don’t hesitate to reach out. Or find me on ASF slack 😀
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I greatly appreciate your general concern for the needs
> >> of
> >>>>>>>>>>>> downstream
> >>>>>>>>>>>>>>> connector integrators!
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Cheers
> >>>>>>>>>>>>>>> Kyle Bendickson (GitHub: kbendick) Open Source Developer
> >> kyle
> >>>>>>>>>>>>>>> [at] tabular [dot] io
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Wed, Oct 20, 2021 at 11:35 AM Thomas Weise <
> >>>>> t...@apache.org>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I see the stable core Flink API as a prerequisite for
> >>>>>>>>>> modularity.
> >>>>>>>>>>>> And
> >>>>>>>>>>>>>>>> for connectors it is not just the source and sink API
> >>>>> (source
> >>>>>>>>>>>> being
> >>>>>>>>>>>>>>>> stable as of 1.14), but everything that is required to
> >> build
> >>>>>>>>>>>>>>>> and maintain a connector downstream, such as the test
> >>>>>>>>>>>>>>>> utilities and infrastructure.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Without the stable surface of core Flink, changes will
> >> leak
> >>>>>>>>>>>>>>>> into downstream dependencies and force lock step
> >> updates.
> >>>>>>>>>>>>>>>> Refactoring across N repos is more painful than a single
> >>>>>>>>>>>>>>>> repo. Those with experience developing downstream of
> >> Flink
> >>>>>>>>>>>>>>>> will know the pain, and
> >>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>> isn't limited to connectors. I don't remember a Flink
> >> "minor
> >>>>>>>>>>>> version"
> >>>>>>>>>>>>>>>> update that was just a dependency version change and
> >> did not
> >>>>>>>>>>>>>>>> force other downstream changes.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Imagine a project with a complex set of dependencies.
> >> Let's
> >>>>>>>>>>>>>>>> say
> >>>>>>>>>>>> Flink
> >>>>>>>>>>>>>>>> version A plus Flink reliant dependencies released by
> >> other
> >>>>>>>>>>>> projects
> >>>>>>>>>>>>>>>> (Flink-external connectors, Beam, Iceberg, Hudi, ..). We
> >>>>>>>>>>>>>>>> don't
> >>>>>>>>>>>> want a
> >>>>>>>>>>>>>>>> situation where we bump the core Flink version to B and
> >>>>>>>>>>>>>>>> things
> >>>>>>>>>>>> fall
> >>>>>>>>>>>>>>>> apart (interface changes, utilities that were useful
> >> but not
> >>>>>>>>>>>> public,
> >>>>>>>>>>>>>>>> transitive dependencies etc.).
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> The discussion here also highlights the benefits of
> >> keeping
> >>>>>>>>>>>> certain
> >>>>>>>>>>>>>>>> connectors outside Flink. Whether that is due to
> >> difference
> >>>>>>>>>>>>>>>> in developer community, maturity of the connectors,
> >> their
> >>>>>>>>>>>>>>>> specialized/limited usage etc. I would like to see that
> >> as a
> >>>>>>>>>>>>>>>> sign
> >>>>>>>>>>>> of
> >>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>> growing ecosystem and most of the ideas that Arvid has
> >> put
> >>>>>>>>>>>>>>>> forward would benefit further growth of the connector
> >>>>>>>>> ecosystem.
> >>>>>>>>>>>>>>>> As for keeping connectors within Apache Flink: I prefer
> >> that
> >>>>>>>>>>>>>>>> as
> >>>>>>>>>>>> the
> >>>>>>>>>>>>>>>> path forward for "essential" connectors like FileSource,
> >>>>>>>>>>>> KafkaSource,
> >>>>>>>>>>>>>>>> ... And we can still achieve a more flexible and faster
> >>>>>>>>>>>>>>>> release
> >>>>>>>>>>>>> cycle.
> >>>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>>> Thomas
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Wed, Oct 20, 2021 at 3:32 AM Jark Wu <
> >> imj...@gmail.com>
> >>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>> Hi Konstantin,
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> the connectors need to be adopted and require at least
> >>>>>>>>>>>>>>>>>> one
> >>>>>>>>>>>>> release
> >>>>>>>>>>>>>>> per
> >>>>>>>>>>>>>>>>> Flink minor release.
> >>>>>>>>>>>>>>>>> However, this will make the releases of connectors
> >> slower,
> >>>>>>>>>> e.g.
> >>>>>>>>>>>>>>> maintain
> >>>>>>>>>>>>>>>>> features for multiple branches and release multiple
> >>>>>>>>> branches.
> >>>>>>>>>>>>>>>>> I think the main purpose of having an external
> >> connector
> >>>>>>>>>>>> repository
> >>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>> order to have "faster releases of connectors"?
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>   From the perspective of CDC connector maintainers, the
> >>>>>>>>>>>>>>>>> biggest
> >>>>>>>>>>>>>>> advantage
> >>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>> maintaining it outside of the Flink project is that:
> >>>>>>>>>>>>>>>>> 1) we can have a more flexible and faster release cycle
> >>>>>>>>>>>>>>>>> 2) we can be more liberal with committership for
> >> connector
> >>>>>>>>>>>>>> maintainers
> >>>>>>>>>>>>>>>>> which can also attract more committers to help the
> >> release.
> >>>>>>>>>>>>>>>>> Personally, I think maintaining one connector
> >> repository
> >>>>>>>>>>>>>>>>> under
> >>>>>>>>>>>> the
> >>>>>>>>>>>>>> ASF
> >>>>>>>>>>>>>>>> may
> >>>>>>>>>>>>>>>>> not have the above benefits.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>> Jark
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Wed, 20 Oct 2021 at 15:14, Konstantin Knauf <
> >>>>>>>>>>>> kna...@apache.org>
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>> Hi everyone,
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> regarding the stability of the APIs. I think everyone
> >>>>>>>>>>>>>>>>>> agrees
> >>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>> connector APIs which are stable across minor versions
> >>>>>>>>>>>>> (1.13->1.14)
> >>>>>>>>>>>>>>> are
> >>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>> mid-term goal. But:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> a) These APIs are still quite young, and we shouldn't
> >>>>>>>>>>>>>>>>>> make
> >>>>>>>>>>>> them
> >>>>>>>>>>>>>>> @Public
> >>>>>>>>>>>>>>>>>> prematurely either.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> b) Isn't this *mostly* orthogonal to where the
> >> connector
> >>>>>>>>>>>>>>>>>> code
> >>>>>>>>>>>>>> lives?
> >>>>>>>>>>>>>>>> Yes,
> >>>>>>>>>>>>>>>>>> as long as there are breaking changes, the connectors
> >>>>>>>>>>>>>>>>>> need to
> >>>>>>>>>>>> be
> >>>>>>>>>>>>>>>> adopted
> >>>>>>>>>>>>>>>>>> and require at least one release per Flink minor
> >> release.
> >>>>>>>>>>>>>>>>>> Documentation-wise this can be addressed via a
> >>>>>>>>>>>>>>>>>> compatibility
> >>>>>>>>>>>>> matrix
> >>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>> each connector as Arvid suggested. IMO we shouldn't
> >> block
> >>>>>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>> effort
> >>>>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>>>> the stability of the APIs.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Konstantin
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On Wed, Oct 20, 2021 at 8:56 AM Jark Wu
> >>>>>>>>>>>>>>>>>> <imj...@gmail.com>
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> I think Thomas raised very good questions and would
> >> like
> >>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>> know
> >>>>>>>>>>>>>>> your
> >>>>>>>>>>>>>>>>>>> opinions if we want to move connectors out of flink
> >> in
> >>>>>>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>> version.
> >>>>>>>>>>>>>>>>>>> (1) is the connector API already stable?
> >>>>>>>>>>>>>>>>>>>> Separate releases would only make sense if the core
> >>>>>>>>>>>>>>>>>>>> Flink
> >>>>>>>>>>>>>> surface
> >>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>> fairly stable though. As evident from Iceberg (and
> >>>>>>>>>>>>>>>>>>>> also
> >>>>>>>>>>>> Beam),
> >>>>>>>>>>>>>>>> that's
> >>>>>>>>>>>>>>>>>>>> not the case currently. We should probably focus on
> >>>>>>>>>>>> addressing
> >>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>> stability first, before splitting code. A success
> >>>>>>>>>>>>>>>>>>>> criteria
> >>>>>>>>>>>>> could
> >>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>> that we are able to build Iceberg and Beam against
> >>>>>>>>>>>>>>>>>>>> multiple
> >>>>>>>>>>>>>> Flink
> >>>>>>>>>>>>>>>>>>>> versions w/o the need to change code. The goal would
> >>>>>>>>>>>>>>>>>>>> be
> >>>>>>>>>>>> that
> >>>>>>>>>>>>> no
> >>>>>>>>>>>>>>>>>>>> connector breaks when we make changes to Flink core.
> >>>>>>>>>>>>>>>>>>>> Until
> >>>>>>>>>>>>>> that's
> >>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>> case, code separation creates a setup where 1+1 or
> >> N+1
> >>>>>>>>>>>>>>> repositories
> >>>>>>>>>>>>>>>>>>>> need to move lock step.
> >>>>>>>>>>>>>>>>>>>   From another discussion thread [1], connector API
> >> is far
> >>>>>>>>>>>>>>>>>>> from
> >>>>>>>>>>>>>>> stable.
> >>>>>>>>>>>>>>>>>>> Currently, it's hard to build connectors against
> >>>>>>>>>>>>>>>>>>> multiple
> >>>>>>>>>>>> Flink
> >>>>>>>>>>>>>>>> versions.
> >>>>>>>>>>>>>>>>>>> There are breaking API changes both in 1.12 -> 1.13
> >> and
> >>>>>>>>>>>>>>>>>>> 1.13
> >>>>>>>>>>>> ->
> >>>>>>>>>>>>>> 1.14
> >>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>    maybe also in the future versions,  because Table
> >>>>>>>>>>>>>>>>>>> related
> >>>>>>>>>>>> APIs
> >>>>>>>>>>>>>> are
> >>>>>>>>>>>>>>>> still
> >>>>>>>>>>>>>>>>>>> @PublicEvolving and new Sink API is still
> >> @Experimental.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> (2) Flink testability without connectors.
> >>>>>>>>>>>>>>>>>>>> Flink w/o Kafka connector (and few others) isn't
> >>>>>>>>>>>>>>>>>>>> viable. Testability of Flink was already brought up,
> >>>>>>>>>>>>>>>>>>>> can we
> >>>>>>>>>>>>>> really
> >>>>>>>>>>>>>>>>>>>> certify a Flink core release without Kafka
> >> connector?
> >>>>>>>>>>>>>>>>>>>> Maybe
> >>>>>>>>>>>>>> those
> >>>>>>>>>>>>>>>>>>>> connectors that are used in Flink e2e tests to
> >>>>>>>>>>>>>>>>>>>> validate
> >>>>>>>>>>>>>>>> functionality
> >>>>>>>>>>>>>>>>>>>> of core Flink should not be broken out?
> >>>>>>>>>>>>>>>>>>> This is a very good question. How can we guarantee
> >> the
> >>>>>>>>>>>>>>>>>>> new
> >>>>>>>>>>>>> Source
> >>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>> Sink
> >>>>>>>>>>>>>>>>>>> API are stable with only test implementation?
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>>> Jark
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> On Tue, 19 Oct 2021 at 23:56, Chesnay Schepler <
> >>>>>>>>>>>>>> ches...@apache.org>
> >>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Could you clarify what release cadence you're
> >> thinking
> >>>>>>>>>> of?
> >>>>>>>>>>>>>> There's
> >>>>>>>>>>>>>>>> quite
> >>>>>>>>>>>>>>>>>>>> a big range that fits "more frequent than Flink"
> >>>>>>>>>>>> (per-commit,
> >>>>>>>>>>>>>>> daily,
> >>>>>>>>>>>>>>>>>>>> weekly, bi-weekly, monthly, even bi-monthly).
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> On 19/10/2021 14:15, Martijn Visser wrote:
> >>>>>>>>>>>>>>>>>>>>> Hi all,
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> I think it would be a huge benefit if we can
> >> achieve
> >>>>>>>>>>>>>>>>>>>>> more
> >>>>>>>>>>>>>>> frequent
> >>>>>>>>>>>>>>>>>>>> releases
> >>>>>>>>>>>>>>>>>>>>> of connectors, which are not bound to the release
> >>>>>>>>>>>>>>>>>>>>> cycle
> >>>>>>>>>>>> of
> >>>>>>>>>>>>>> Flink
> >>>>>>>>>>>>>>>>>>> itself.
> >>>>>>>>>>>>>>>>>>>> I
> >>>>>>>>>>>>>>>>>>>>> agree that in order to get there, we need to have
> >>>>>>>>>>>>>>>>>>>>> stable
> >>>>>>>>>>>>>>>> interfaces
> >>>>>>>>>>>>>>>>>>> which
> >>>>>>>>>>>>>>>>>>>>> are trustworthy and reliable, so they can be safely
> >>>>>>>>>>>>>>>>>>>>> used
> >>>>>>>>>>>> by
> >>>>>>>>>>>>>>> those
> >>>>>>>>>>>>>>>>>>>>> connectors. I do think that work still needs to be
> >>>>>>>>>>>>>>>>>>>>> done
> >>>>>>>>>>>> on
> >>>>>>>>>>>>>> those
> >>>>>>>>>>>>>>>>>>>>> interfaces, but I am confident that we can get
> >> there
> >>>>>>>>>>>> from a
> >>>>>>>>>>>>>>> Flink
> >>>>>>>>>>>>>>>>>>>>> perspective.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> I am worried that we would not be able to achieve
> >>>>>>>>>>>>>>>>>>>>> those
> >>>>>>>>>>>>>> frequent
> >>>>>>>>>>>>>>>>>>> releases
> >>>>>>>>>>>>>>>>>>>>> of connectors if we are putting these connectors
> >>>>>>>>>>>>>>>>>>>>> under
> >>>>>>>>>>>> the
> >>>>>>>>>>>>>>> Apache
> >>>>>>>>>>>>>>>>>>>> umbrella,
> >>>>>>>>>>>>>>>>>>>>> because that means that for each connector release
> >>>>>>>>>>>>>>>>>>>>> we
> >>>>>>>>>>>> have
> >>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>> follow
> >>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>> Apache release creation process. This requires a
> >> lot
> >>>>>>>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>> manual
> >>>>>>>>>>>>>>>> steps
> >>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>> prohibits automation and I think it would be hard
> >> to
> >>>>>>>>>>>> scale
> >>>>>>>>>>>>> out
> >>>>>>>>>>>>>>>>>>> frequent
> >>>>>>>>>>>>>>>>>>>>> releases of connectors. I'm curious how others
> >> think
> >>>>>>>>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>> challenge
> >>>>>>>>>>>>>>>>>>> could
> >>>>>>>>>>>>>>>>>>>>> be solved.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Best regards,
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Martijn
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> On Mon, 18 Oct 2021 at 22:22, Thomas Weise <
> >>>>>>>>>>>> t...@apache.org>
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>> Thanks for initiating this discussion.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> There are definitely a few things that are not
> >>>>>>>>>>>>>>>>>>>>>> optimal
> >>>>>>>>>>>> with
> >>>>>>>>>>>>>> our
> >>>>>>>>>>>>>>>>>>>>>> current management of connectors. I would not
> >>>>>>>>>>>> necessarily
> >>>>>>>>>>>>>>>>>>> characterize
> >>>>>>>>>>>>>>>>>>>>>> it as a "mess" though. As the points raised so far
> >>>>>>>>>>>> show, it
> >>>>>>>>>>>>>>> isn't
> >>>>>>>>>>>>>>>>>>> easy
> >>>>>>>>>>>>>>>>>>>>>> to find a solution that balances competing
> >>>>>>>>>>>>>>>>>>>>>> requirements
> >>>>>>>>>>>> and
> >>>>>>>>>>>>>>>> leads to
> >>>>>>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>>>>>> net improvement.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> It would be great if we can find a setup that
> >>>>>>>>>>>>>>>>>>>>>> allows for
> >>>>>>>>>>>>>>>> connectors
> >>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>> be released independently of core Flink and that
> >>>>>>>>>>>>>>>>>>>>>> each
> >>>>>>>>>>>>>> connector
> >>>>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>>>> released separately. Flink already has separate
> >>>>>>>>>>>>>>>>>>>>> releases (flink-shaded), so that by itself isn't a
> >>>>>>>>>> new thing.
> >>>>>>>>>>>>>>>> Per-connector
> >>>>>>>>>>>>>>>>>>>>>> releases would need to allow for more frequent
> >>>>>>>>>>>>>>>>>>>>>> releases
> >>>>>>>>>>>>>>> (without
> >>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>> baggage that a full Flink release comes with).
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Separate releases would only make sense if the core
> >>>>>>>>>>>> Flink
> >>>>>>>>>>>>>>>> surface is
> >>>>>>>>>>>>>>>>>>>>>> fairly stable though. As evident from Iceberg (and
> >>>>>>>>>>>>>>>>>>>>>> also
> >>>>>>>>>>>>>> Beam),
> >>>>>>>>>>>>>>>> that's
> >>>>>>>>>>>>>>>>>>>>>> not the case currently. We should probably focus
> >> on
> >>>>>>>>>>>>>> addressing
> >>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>> stability first, before splitting code. A success
> >>>>>>>>>>>> criteria
> >>>>>>>>>>>>>>> could
> >>>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>>>> that we are able to build Iceberg and Beam against
> >>>>>>>>>>>> multiple
> >>>>>>>>>>>>>>> Flink
> >>>>>>>>>>>>>>>>>>>>>> versions w/o the need to change code. The goal
> >>>>>>>>>>>>>>>>>>>>>> would be
> >>>>>>>>>>>>> that
> >>>>>>>>>>>>>> no
> >>>>>>>>>>>>>>>>>>>>>> connector breaks when we make changes to Flink
> >> core.
> >>>>>>>>>>>> Until
> >>>>>>>>>>>>>>>> that's the
> >>>>>>>>>>>>>>>>>>>>>> case, code separation creates a setup where 1+1 or
> >>>>>>>>>>>>>>>>>>>>>> N+1
> >>>>>>>>>>>>>>>> repositories
> >>>>>>>>>>>>>>>>>>>>>> need to move lock step.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Regarding some connectors being more important for
> >>>>>>>>>>>>>>>>>>>>>> Flink
> >>>>>>>>>>>>> than
> >>>>>>>>>>>>>>>> others:
> >>>>>>>>>>>>>>>>>>>>> That's a fact. Flink w/o Kafka connector (and few
> >>>>>>>>>>>> others)
> >>>>>>>>>>>>>> isn't
> >>>>>>>>>>>>>>>>>>>>>> viable. Testability of Flink was already brought
> >>>>>>>>>>>>>>>>>>>>>> up,
> >>>>>>>>>>>> can we
> >>>>>>>>>>>>>>>> really
> >>>>>>>>>>>>>>>>>>>>>> certify a Flink core release without Kafka
> >>>>>>>>> connector?
> >>>>>>>>>>>> Maybe
> >>>>>>>>>>>>>>> those
> >>>>>>>>>>>>>>>>>>>>>> connectors that are used in Flink e2e tests to
> >>>>>>>>>>>>>>>>>>>>>> validate
> >>>>>>>>>>>>>>>> functionality
> >>>>>>>>>>>>>>>>>>>>>> of core Flink should not be broken out?
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Finally, I think that the connectors that move
> >> into
> >>>>>>>>>>>>> separate
> >>>>>>>>>>>>>>>> repos
> >>>>>>>>>>>>>>>>>>>>>> should remain part of the Apache Flink project.
> >>>>>>>>>>>>>>>>>>>>>> Larger
> >>>>>>>>>>>>>>>> organizations
> >>>>>>>>>>>>>>>>>>>>>> tend to approve the use of and contribution to
> >> open
> >>>>>>>>>>>> source
> >>>>>>>>>>>>> at
> >>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>> project level. Sometimes it is everything ASF.
> >> More
> >>>>>>>>>>>> often
> >>>>>>>>>>>>> it
> >>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>>>> "Apache Foo". It would be fatal to end up with a
> >>>>>>>>>>>> patchwork
> >>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>> projects
> >>>>>>>>>>>>>>>>>>>>>> with potentially different licenses and governance
> >>>>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>> arrive
> >>>>>>>>>>>>>>> at a
> >>>>>>>>>>>>>>>>>>>>>> working Flink setup. This may mean we prioritize
> >>>>>>>>>>>> usability
> >>>>>>>>>>>>>> over
> >>>>>>>>>>>>>>>>>>>>>> developer convenience, if that's in the best
> >>>>>>>>>>>>>>>>>>>>>> interest of
> >>>>>>>>>>>>>> Flink
> >>>>>>>>>>>>>>>> as a
> >>>>>>>>>>>>>>>>>>>>>> whole.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>>>>>>>>> Thomas
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> On Mon, Oct 18, 2021 at 6:59 AM Chesnay Schepler <
> >>>>>>>>>>>>>>>> ches...@apache.org
> >>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>> Generally, the issues are reproducibility and
> >>>>>>>>>> control.
> >>>>>>>>>>>>>>>>>>>>>>> Stuffs completely broken on the Flink side for a
> >>>>>>>>>> week?
> >>>>>>>>>>>>> Well
> >>>>>>>>>>>>>>>> then so
> >>>>>>>>>>>>>>>>>>> are
> >>>>>>>>>>>>>>>>>>>>>>> the connector repos.
> >>>>>>>>>>>>>>>>>>>>>>> (As-is) You can't go back to a previous version
> >> of
> >>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>> snapshot.
> >>>>>>>>>>>>>>>>>>> Which
> >>>>>>>>>>>>>>>>>>>>>>> also means that checking out older commits can be
> >>>>>>>>>>>>>> problematic
> >>>>>>>>>>>>>>>>>>> because
> >>>>>>>>>>>>>>>>>>>>>>> you'd still work against the latest snapshots,
> >> and
> >>>>>>>>>>>>>>>>>>>>>>> they
> >>>>>>>>>>>>> not
> >>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>>>>> compatible with each other.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> On 18/10/2021 15:22, Arvid Heise wrote:
> >>>>>>>>>>>>>>>>>>>>>>>> I was actually betting on snapshots versions.
> >>>>>>>>>>>>>>>>>>>>>>>> What are
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>> limits?
> >>>>>>>>>>>>>>>>>>>>>>>> Obviously, we can only do a release of a 1.15
> >>>>>>>>>>>> connector
> >>>>>>>>>>>>>> after
> >>>>>>>>>>>>>>>> 1.15
> >>>>>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>>>>>> release.
> >>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Konstantin Knauf
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >> https://urldefense.com/v3/__https://twitter.com/snntrable
> >> __;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-
> >>>>>>>>>>>>>>>>>> XjpYgX5MUy9M4$ [twitter[.]com]
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >> https://urldefense.com/v3/__https://github.com/knaufk__;!
> >> !LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-XjpY
> >>>>>>>>>>>>>>>>>> gXyX8u50S$ [github[.]com]
> >>>>>>>>>>>>>>>>>>
> >>>>>>>
>
>

Re: [DISCUSS] Creating an external connector repository

Reply via email to