Re: [DISCUSS] Creating an external connector repository

Chesnay Schepler Fri, 26 Nov 2021 04:32:32 -0800

For sharing workflows we should be able to use composite actions. We'dhave the main definition files in the flink-connectors repo, that wealso need to tag/release, which other branches/repos can then import.These are also versioned, so we don't have to worry about accidentallybreaking stuff.These could also be used to enforce certain standards / interfaces suchthat we can automate more things (e.g., integration into the Flinkdocumentation).

It is true that Option 2) and dedicated repositories share a lot ofproperties. While I did say in an offline conversation that we in thatcase might just as well use separate repositories, I'm not so sureanymore. One repo would make administration a bit easier, for examplesecrets wouldn't have to be applied to each repo (we wouldn't wantcertain secrets to be set up organization-wide).I overall also like that one repo would present a single access point;you can't "miss" a connector repo, and I would hope that having it asone repo would nurture more collaboration between the connectors, whichafter all need to solve similar problems.

It is a fair point that the branching model would be quite weird, but Ithink that would subside pretty quickly.

Personally I'd go with Option 2, and if that doesn't work out we canstill split the repo later on. (Which should then be a trivial matter ofcopying all <connector>/* branches and renaming them).


On 26/11/2021 12:47, Till Rohrmann wrote:

Hi Arvid,

Thanks for updating this thread with the latest findings. The described
limitations for a single connector repo sound suboptimal to me.

* Option 2. sounds as if we try to simulate multi connector repos inside of
a single repo. I also don't know how we would share code between the
different branches (sharing infrastructure would probably be easier
though). This seems to have the same limitations as dedicated repos with
the downside of having a not very intuitive branching model.
* Isn't option 1. kind of a degenerated version of option 2. where we have
some unrelated code from other connectors in the individual connector
branches?
* Option 3. has the downside that someone creating a release has to release
all connectors. This means that she either has to sync with the different
connector maintainers or has to be able to release all connectors on her
own. We are already seeing in the Flink community that releases require
quite good communication/coordination between the different people working
on different Flink components. Given our goals to make connector releases
easier and more frequent, I think that coupling different connector
releases might be counter-productive.

To me it sounds not very practical to mainly use a mono repository w/o
having some more advanced build infrastructure that, for example, allows to
have different git roots in different connector directories. Maybe the mono
repo can be a catch all repository for connectors that want to be released
in lock-step (Option 3.) with all other connectors the repo contains. But
for connectors that get changed frequently, having a dedicated repository
that allows independent releases sounds preferable to me.

What utilities and infrastructure code do you intend to share? Using git
submodules can definitely be one option to share code. However, it might
also be ok to depend on flink-connector-common artifacts which could make
things easier. Where I am unsure is whether git submodules can be used to
share infrastructure code (e.g. the .github/workflows) because you need
these files in the repo to trigger the CI infrastructure.

Cheers,
Till

On Thu, Nov 25, 2021 at 1:59 PM Arvid Heise <[email protected]> wrote:

Hi Brian,

Thank you for sharing. I think your approach is very valid and is in line
with what I had in mind.

Basically Pravega community aligns the connector releases with the Pravega

mainline release

This certainly would mean that there is little value in coupling connector
versions. So it's making a good case for having separate connector repos.

and maintains the connector with the latest 3 Flink versions(CI will
publish snapshots for all these 3 branches)

I'd like to give connector devs a simple way to express to which Flink
versions the current branch is compatible. From there we can generate the
compatibility matrix automatically and optionally also create different
releases per supported Flink version. Not sure if the latter is indeed
better than having just one artifact that happens to run with multiple
Flink versions. I guess it depends on what dependencies we are exposing. If
the connector uses flink-connector-base, then we probably need separate
artifacts with poms anyways.

Best,

Arvid

On Fri, Nov 19, 2021 at 10:55 AM Zhou, Brian <[email protected]> wrote:

Hi Arvid,

For branching model, the Pravega Flink connector has some experience what
I would like to share. Here[1][2] is the compatibility matrix and wiki
explaining the branching model and releases. Basically Pravega community
aligns the connector releases with the Pravega mainline release, and
maintains the connector with the latest 3 Flink versions(CI will publish
snapshots for all these 3 branches).
For example, recently we have 0.10.1 release[3], and in maven central we
need to upload three artifacts(For Flink 1.13, 1.12, 1.11) for 0.10.1
version[4].

There are some alternatives. Another solution that we once discussed but
finally got abandoned is to have a independent version just like the
current CDC connector, and then give a big compatibility matrix to users.
We think it would be too confusing when the connector develops. On the
contrary, we can also do the opposite way to align with Flink version and
maintain several branches for different system version.

I would say this is only a fairly-OK solution because it is a bit painful
for maintainers as cherry-picks are very common and releases would

require

much work. However, if neither systems do not have a nice backward
compatibility, there seems to be no comfortable solution to the their
connector.

[1] https://github.com/pravega/flink-connectors#compatibility-matrix
[2]

https://github.com/pravega/flink-connectors/wiki/Versioning-strategy-for-Flink-connector

[3] https://github.com/pravega/flink-connectors/releases/tag/v0.10.1
[4] https://search.maven.org/search?q=pravega-connectors-flink

Best Regards,
Brian


Internal Use - Confidential

-----Original Message-----
From: Arvid Heise <[email protected]>
Sent: Friday, November 19, 2021 4:12 PM
To: dev
Subject: Re: [DISCUSS] Creating an external connector repository


[EXTERNAL EMAIL]

Hi everyone,

we are currently in the process of setting up the flink-connectors repo
[1] for new connectors but we hit a wall that we currently cannot take:
branching model.
To reiterate the original motivation of the external connector repo: We
want to decouple the release cycle of a connector with Flink. However, if
we want to support semantic versioning in the connectors with the ability
to introduce breaking changes through major version bumps and support
bugfixes on old versions, then we need release branches similar to how
Flink core operates.
Consider two connectors, let's call them kafka and hbase. We have kafka

in

version 1.0.X, 1.1.Y (small improvement), 2.0.Z (config option) change

and

hbase only on 1.0.A.

Now our current assumption was that we can work with a mono-repo under

ASF

(flink-connectors). Then, for release-branches, we found 3 options:
1. We would need to create some ugly mess with the cross product of
connector and version: so you have kafka-release-1.0, kafka-release-1.1,
kafka-release-2.0, hbase-release-1.0. The main issue is not the amount of
branches (that's something that git can handle) but there the state of
kafka is undefined in hbase-release-1.0. That's a call for desaster and
makes releasing connectors very cumbersome (CI would only execute and
publish hbase SNAPSHOTS on hbase-release-1.0).
2. We could avoid the undefined state by having an empty master and each
release branch really only holds the code of the connector. But that's

also

not great: any user that looks at the repo and sees no connector would
assume that it's dead.
3. We could have synced releases similar to the CDC connectors [2]. That
means that if any connector introduces a breaking change, all connectors
get a new major. I find that quite confusing to a user if hbase gets a

new

release without any change because kafka introduced a breaking change.

To fully decouple release cycles and CI of connectors, we could add
individual repositories under ASF (flink-connector-kafka,
flink-connector-hbase). Then we can apply the same branching model as
before. I quickly checked if there are precedences in the apache

community

for that approach and just by scanning alphabetically I found cordova

with

70 and couchdb with 77 apache repos respectively. So it certainly seems
like other projects approached our problem in that way and the apache
organization is okay with that. I currently expect max 20 additional

repos

for connectors and in the future 10 max each for formats and filesystems

if

we would also move them out at some point in time. So we would be at a
total of 50 repos.

Note for all options, we need to provide a compability matrix that we aim
to autogenerate.

Now for the potential downsides that we internally discussed:
- How can we ensure common infra structure code, utilties, and quality?
I propose to add a flink-connector-common that contains all these things
and is added as a git submodule/subtree to the repos.
- Do we implicitly discourage connector developers to maintain more than
one connector with a fragmented code base?
That is certainly a risk. However, I currently also see few devs working
on more than one connector. However, it may actually help keeping the

devs

that maintain a specific connector on the hook. We could use github

issues

to track bugs and feature requests and a dev can focus his limited time

on

getting that one connector right.

So WDYT? Compared to some intermediate suggestions with split repos, the
big difference is that everything remains under Apache umbrella and the
Flink community.

[1]

https://urldefense.com/v3/__https://github.com/apache/flink-connectors__;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-XjpYgXzxxweh4$

[github[.]com] [2]

https://urldefense.com/v3/__https://github.com/ververica/flink-cdc-connectors/__;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-XjpYgXzgoPGA8$

[github[.]com]

On Fri, Nov 12, 2021 at 3:39 PM Arvid Heise <[email protected]> wrote:

Hi everyone,

I created the flink-connectors repo [1] to advance the topic. We would
create a proof-of-concept in the next few weeks as a special branch
that I'd then use for discussions. If the community agrees with the
approach, that special branch will become the master. If not, we can
reiterate over it or create competing POCs.

If someone wants to try things out in parallel, just make sure that
you are not accidentally pushing POCs to the master.

As a reminder: We will not move out any current connector from Flink
at this point in time, so everything in Flink will remain as is and be
maintained there.

Best,

Arvid

[1]
https://urldefense.com/v3/__https://github.com/apache/flink-connectors
__;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-XjpYgXzxxweh4
$ [github[.]com]

On Fri, Oct 29, 2021 at 6:57 PM Till Rohrmann <[email protected]>
wrote:

Hi everyone,

 From the discussion, it seems to me that we have different opinions
whether to have an ASF umbrella repository or to host them outside of
the ASF. It also seems that this is not really the problem to solve.
Since there are many good arguments for either approach, we could
simply start with an ASF umbrella repository and see how people adopt
it. If the individual connectors cannot move fast enough or if people
prefer to not buy into the more heavy-weight ASF processes, then they
can host the code also somewhere else. We simply need to make sure
that these connectors are discoverable (e.g. via flink-packages).

The more important problem seems to be to provide common tooling
(testing, infrastructure, documentation) that can easily be reused.
Similarly, it has become clear that the Flink community needs to
improve on providing stable APIs. I think it is not realistic to
first complete these tasks before starting to move connectors to
dedicated repositories. As Stephan said, creating a connector
repository will force us to pay more attention to API stability and
also to think about which testing tools are required. Hence, I
believe that starting to add connectors to a different repository
than apache/flink will help improve our connector tooling (declaring
testing classes as public, creating a common test utility repo,
creating a repo
template) and vice versa. Hence, I like Arvid's proposed process as
it will start kicking things off w/o letting this effort fizzle out.

Cheers,
Till

On Thu, Oct 28, 2021 at 11:44 AM Stephan Ewen <[email protected]>

wrote:

Thank you all, for the nice discussion!

 From my point of view, I very much like the idea of putting
connectors

in a

separate repository. But I would argue it should be part of Apache

Flink,

similar to flink-statefun, flink-ml, etc.

I share many of the reasons for that:
   - As argued many times, reduces complexity of the Flink repo,

increases

response times of CI, etc.
   - Much lower barrier of contribution, because an unstable
connector

would

not de-stabilize the whole build. Of course, we would need to make
sure

we

set this up the right way, with connectors having individual CI
runs,

build

status, etc. But it certainly seems possible.


I would argue some points a bit different than some cases made

before:

(a) I believe the separation would increase connector stability.

Because it

really forces us to work with the connectors against the APIs like
any external developer. A mono repo is somehow the wrong thing if
you in practice want to actually guarantee stable internal APIs at

some layer.

Because the mono repo makes it easy to just change something on
both

sides

of the API (provider and consumer) seamlessly.

Major refactorings in Flink need to keep all connector API
contracts intact, or we need to have a new version of the connector

API.

(b) We may even be able to go towards more lightweight and
automated releases over time, even if we stay in Apache Flink with

that repo.

This isn't yet fully aligned with the Apache release policies, yet,
but there are board discussions about whether there can be
bot-triggered releases (by dependabot) and how that could fit into

the Apache process.

This doesn't seem to be quite there just yet, but seeing that those

start

is a good sign, and there is a good chance we can do some things

there.

I am not sure whether we should let bots trigger releases, because
a

final

human look at things isn't a bad thing, especially given the
popularity

of

software supply chain attacks recently.


I do share Chesnay's concerns about complexity in tooling, though.
Both release tooling and test tooling. They are not incompatible
with that approach, but they are a task we need to tackle during
this change which will add additional work.



On Tue, Oct 26, 2021 at 10:31 AM Arvid Heise <[email protected]>

wrote:

Hi folks,

I think some questions came up and I'd like to address the
question of

the

timing.

Could you clarify what release cadence you're thinking of?
There's

quite

a big range that fits "more frequent than Flink" (per-commit,
daily, weekly, bi-weekly, monthly, even bi-monthly).

The short answer is: as often as needed:
- If there is a CVE in a dependency and we need to bump it -
release immediately.
- If there is a new feature merged, release soonish. We may
collect a

few

successive features before a release.
- If there is a bugfix, release immediately or soonish depending
on

the

severity and if there are workarounds available.

We should not limit ourselves; the whole idea of independent
releases

is

exactly that you release as needed. There is no release planning
or anything needed, you just go with a release as if it was an
external artifact.

(1) is the connector API already stable?

 From another discussion thread [1], connector API is far from

stable.

Currently, it's hard to build connectors against multiple Flink

versions.

There are breaking API changes both in 1.12 -> 1.13 and 1.13 ->
1.14

and

  maybe also in the future versions,  because Table related APIs
are

still

@PublicEvolving and new Sink API is still @Experimental.

The question is: what is stable in an evolving system? We
recently discovered that the old SourceFunction needed to be
refined such that cancellation works correctly [1]. So that
interface is in Flink since

years, heavily used also outside, and we still had to change the

contract

in a way that I'd expect any implementer to recheck their

implementation.

It might not be necessary to change anything and you can probably

change

the the code for all Flink versions but still, the interface was
not

stable

in the closest sense.

If we focus just on API changes on the unified interfaces, then
we

expect

one more change to Sink API to support compaction. For Table API,

there

will most likely also be some changes in 1.15. So we could wait
for

1.15.

But I'm questioning if that's really necessary because we will
add

more

functionality beyond 1.15 without breaking API. For example, we
may

add

more unified connector metrics. If you want to use it in your

connector,

you have to support multiple Flink versions anyhow. So rather
then

focusing

the discussion on "when is stuff stable", I'd rather focus on
"how

can we

support building connectors against multiple Flink versions" and
make

it

as

painless as possible.

Chesnay pointed out to use different branches for different Flink

versions

which sounds like a good suggestion. With a mono-repo, we can't
use branches differently anyways (there is no way to have release
branches

per

connector without chaos). In these branches, we could provide
shims to simulate future features in older Flink versions such
that code-wise,

the

source code of a specific connector may not diverge (much). For

example,

to

register unified connector metrics, we could simulate the current

approach

also in some utility package of the mono-repo.

I see the stable core Flink API as a prerequisite for modularity.
And

for connectors it is not just the source and sink API (source
being stable as of 1.14), but everything that is required to
build and maintain a connector downstream, such as the test
utilities and infrastructure.

That is a very fair point. I'm actually surprised to see that
MiniClusterWithClientResource is not public. I see it being used
in

all

connectors, especially outside of Flink. I fear that as long as
we do

not

have connectors outside, we will not properly annotate and
maintain

these

utilties in a classic hen-and-egg-problem. I will outline an idea
at

the

end.

the connectors need to be adopted and require at least one
release

per

Flink minor release.
However, this will make the releases of connectors slower, e.g.

maintain

features for multiple branches and release multiple branches.
I think the main purpose of having an external connector
repository

is

in

order to have "faster releases of connectors"?

Imagine a project with a complex set of dependencies. Let's say

Flink

version A plus Flink reliant dependencies released by other
projects (Flink-external connectors, Beam, Iceberg, Hudi, ..).
We don't want

situation where we bump the core Flink version to B and things
fall apart (interface changes, utilities that were useful but
not public, transitive dependencies etc.).

Yes, that's why I wanted to automate the processes more which is
not

that

easy under ASF. Maybe we automate the source provision across

supported

versions and have 1 vote thread for all versions of a connector?

 From the perspective of CDC connector maintainers, the biggest

advantage

of

maintaining it outside of the Flink project is that:
1) we can have a more flexible and faster release cycle
2) we can be more liberal with committership for connector

maintainers

which can also attract more committers to help the release.

Personally, I think maintaining one connector repository under
the

ASF

may

not have the above benefits.

Yes, I also feel that ASF is too restrictive for our needs. But
it

feels

like there are too many that see it differently and I think we
need

(2) Flink testability without connectors.

This is a very good question. How can we guarantee the new
Source

and

Sink

API are stable with only test implementation?

We can't and shouldn't. Since the connector repo is managed by
Flink,

Flink release manager needs to check if the Flink connectors are

actually

working prior to creating an RC. That's similar to how
flink-shaded

and

flink core are related.


So here is one idea that I had to get things rolling. We are
going to address the external repo iteratively without
compromising what we

already

have:
1.Phase, add new contributions to external repo. We use that time
to

setup

infra accordingly and optimize release processes. We will
identify

test

utilities that are not yet public/stable and fix that.
2.Phase, add ports to the new unified interfaces of existing

connectors.

That requires a previous Flink release to make utilities stable.
Keep

old

interfaces in flink-core.
3.Phase, remove old interfaces in flink-core of some connectors
(tbd

at a

later point).
4.Phase, optionally move all remaining connectors (tbd at a later

point).

I'd envision having ~3 months between the starting the different

phases.

WDYT?


[1]
https://urldefense.com/v3/__https://issues.apache.org/jira/browse
/FLINK-23527__;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgd
ke_-XjpYgX2sIvAP4$ [issues[.]apache[.]org]

On Thu, Oct 21, 2021 at 7:12 AM Kyle Bendickson <[email protected]>

wrote:

Hi all,

My name is Kyle and I’m an open source developer primarily
focused

on

Apache Iceberg.

I’m happy to help clarify or elaborate on any aspect of our

experience

working on a relatively decoupled connector that is downstream
and

pretty

popular.

I’d also love to be able to contribute or assist in any way I

can.

I don’t mean to thread jack, but are there any meetings or
community

sync

ups, specifically around the connector APIs, that I might join
/ be

invited

to?

I did want to add that even though I’ve experienced some of the
pain

points

of integrating with an evolving system / API (catalog support
is

generally

speaking pretty new everywhere really in this space), I also
agree personally that you shouldn’t slow down development
velocity too

much

for

the sake of external connector. Getting to a performant and
stable

place

should be the primary goal, and slowing that down to support

stragglers

will (in my personal opinion) always be a losing game. Some
folks

will

simply stay behind on versions regardless until they have to

upgrade.

I am working on ensuring that the Iceberg community stays
within 1-2 versions of Flink, so that we can help provide more
feedback or

contribute

things that might make our ability to support multiple Flink

runtimes /

versions with one project / codebase and minimal to no
reflection

(our

desired goal).

If there’s anything I can do or any way I can be of assistance,

please

don’t hesitate to reach out. Or find me on ASF slack 😀

I greatly appreciate your general concern for the needs of

downstream

connector integrators!

Cheers
Kyle Bendickson (GitHub: kbendick) Open Source Developer kyle
[at] tabular [dot] io

On Wed, Oct 20, 2021 at 11:35 AM Thomas Weise <[email protected]>

wrote:

Hi,

I see the stable core Flink API as a prerequisite for

modularity.

And

for connectors it is not just the source and sink API (source

being

stable as of 1.14), but everything that is required to build
and maintain a connector downstream, such as the test
utilities and infrastructure.

Without the stable surface of core Flink, changes will leak
into downstream dependencies and force lock step updates.
Refactoring across N repos is more painful than a single
repo. Those with experience developing downstream of Flink
will know the pain, and

that

isn't limited to connectors. I don't remember a Flink "minor

version"

update that was just a dependency version change and did not
force other downstream changes.

Imagine a project with a complex set of dependencies. Let's
say

Flink

version A plus Flink reliant dependencies released by other

projects

(Flink-external connectors, Beam, Iceberg, Hudi, ..). We
don't

want a

situation where we bump the core Flink version to B and
things

fall

apart (interface changes, utilities that were useful but not

public,

transitive dependencies etc.).

The discussion here also highlights the benefits of keeping

certain

connectors outside Flink. Whether that is due to difference
in developer community, maturity of the connectors, their
specialized/limited usage etc. I would like to see that as a
sign

of

growing ecosystem and most of the ideas that Arvid has put
forward would benefit further growth of the connector

ecosystem.

As for keeping connectors within Apache Flink: I prefer that
as

the

path forward for "essential" connectors like FileSource,

KafkaSource,

... And we can still achieve a more flexible and faster
release

cycle.

Thanks,
Thomas





On Wed, Oct 20, 2021 at 3:32 AM Jark Wu <[email protected]>

wrote:

Hi Konstantin,

the connectors need to be adopted and require at least
one

release

per

Flink minor release.
However, this will make the releases of connectors slower,

e.g.

maintain

features for multiple branches and release multiple

branches.

I think the main purpose of having an external connector

repository

is

in

order to have "faster releases of connectors"?


 From the perspective of CDC connector maintainers, the
biggest

advantage

of

maintaining it outside of the Flink project is that:
1) we can have a more flexible and faster release cycle
2) we can be more liberal with committership for connector

maintainers

which can also attract more committers to help the release.

Personally, I think maintaining one connector repository
under

the

ASF

may

not have the above benefits.

Best,
Jark

On Wed, 20 Oct 2021 at 15:14, Konstantin Knauf <

[email protected]>

wrote:

Hi everyone,

regarding the stability of the APIs. I think everyone
agrees

that

connector APIs which are stable across minor versions

(1.13->1.14)

are

the

mid-term goal. But:

a) These APIs are still quite young, and we shouldn't
make

them

@Public

prematurely either.

b) Isn't this *mostly* orthogonal to where the connector
code

lives?

Yes,

as long as there are breaking changes, the connectors
need to

be

adopted

and require at least one release per Flink minor release.
Documentation-wise this can be addressed via a
compatibility

matrix

for

each connector as Arvid suggested. IMO we shouldn't block
this

effort

on

the stability of the APIs.

Cheers,

Konstantin



On Wed, Oct 20, 2021 at 8:56 AM Jark Wu
<[email protected]>

wrote:

Hi,

I think Thomas raised very good questions and would like
to

know

your

opinions if we want to move connectors out of flink in
this

version.

(1) is the connector API already stable?

Separate releases would only make sense if the core
Flink

surface

is

fairly stable though. As evident from Iceberg (and
also

Beam),

that's

not the case currently. We should probably focus on

addressing

the

stability first, before splitting code. A success
criteria

could

be

that we are able to build Iceberg and Beam against
multiple

Flink

versions w/o the need to change code. The goal would
be

that

no

connector breaks when we make changes to Flink core.
Until

that's

the

case, code separation creates a setup where 1+1 or N+1

repositories

need to move lock step.

 From another discussion thread [1], connector API is far
from

stable.

Currently, it's hard to build connectors against
multiple

Flink

versions.

There are breaking API changes both in 1.12 -> 1.13 and
1.13

->

1.14

and

  maybe also in the future versions,  because Table
related

APIs

are

still

@PublicEvolving and new Sink API is still @Experimental.


(2) Flink testability without connectors.

Flink w/o Kafka connector (and few others) isn't
viable. Testability of Flink was already brought up,
can we

really

certify a Flink core release without Kafka connector?
Maybe

those

connectors that are used in Flink e2e tests to
validate

functionality

of core Flink should not be broken out?

This is a very good question. How can we guarantee the
new

Source

and

Sink

API are stable with only test implementation?


Best,
Jark





On Tue, 19 Oct 2021 at 23:56, Chesnay Schepler <

[email protected]>

wrote:

Could you clarify what release cadence you're thinking

of?

There's

quite

a big range that fits "more frequent than Flink"

(per-commit,

daily,

weekly, bi-weekly, monthly, even bi-monthly).

On 19/10/2021 14:15, Martijn Visser wrote:

Hi all,

I think it would be a huge benefit if we can achieve
more

frequent

releases

of connectors, which are not bound to the release
cycle

of

Flink

itself.

agree that in order to get there, we need to have
stable

interfaces

which

are trustworthy and reliable, so they can be safely
used

by

those

connectors. I do think that work still needs to be
done

on

those

interfaces, but I am confident that we can get there

from a

Flink

perspective.

I am worried that we would not be able to achieve
those

frequent

releases

of connectors if we are putting these connectors
under

the

Apache

umbrella,

because that means that for each connector release
we

have

to

follow

the

Apache release creation process. This requires a lot
of

manual

steps

and

prohibits automation and I think it would be hard to

scale

out

frequent

releases of connectors. I'm curious how others think
this

challenge

could

be solved.

Best regards,

Martijn

On Mon, 18 Oct 2021 at 22:22, Thomas Weise <

[email protected]>

wrote:

Thanks for initiating this discussion.

There are definitely a few things that are not
optimal

with

our

current management of connectors. I would not

necessarily

characterize

it as a "mess" though. As the points raised so far

show, it

isn't

easy

to find a solution that balances competing
requirements

and

leads to

net improvement.

It would be great if we can find a setup that
allows for

connectors

to

be released independently of core Flink and that
each

connector

can

be

released separately. Flink already has separate
releases (flink-shaded), so that by itself isn't a

new thing.

Per-connector

releases would need to allow for more frequent
releases

(without

the

baggage that a full Flink release comes with).

Separate releases would only make sense if the core

Flink

surface is

fairly stable though. As evident from Iceberg (and
also

Beam),

that's

not the case currently. We should probably focus on

addressing

the

stability first, before splitting code. A success

criteria

could

be

that we are able to build Iceberg and Beam against

multiple

Flink

versions w/o the need to change code. The goal
would be

that

no

connector breaks when we make changes to Flink core.

Until

that's the

case, code separation creates a setup where 1+1 or
N+1

repositories

need to move lock step.

Regarding some connectors being more important for
Flink

than

others:

That's a fact. Flink w/o Kafka connector (and few

others)

isn't

viable. Testability of Flink was already brought
up,

can we

really

certify a Flink core release without Kafka

connector?

Maybe

those

connectors that are used in Flink e2e tests to
validate

functionality

of core Flink should not be broken out?

Finally, I think that the connectors that move into

separate

repos

should remain part of the Apache Flink project.
Larger

organizations

tend to approve the use of and contribution to open

source

at

the

project level. Sometimes it is everything ASF. More

often

it

is

"Apache Foo". It would be fatal to end up with a

patchwork

of

projects

with potentially different licenses and governance
to

arrive

at a

working Flink setup. This may mean we prioritize

usability

over

developer convenience, if that's in the best
interest of

Flink

as a

whole.

Thanks,
Thomas



On Mon, Oct 18, 2021 at 6:59 AM Chesnay Schepler <

[email protected]

wrote:

Generally, the issues are reproducibility and

control.

Stuffs completely broken on the Flink side for a

week?

Well

then so

are

the connector repos.
(As-is) You can't go back to a previous version of
the

snapshot.

Which

also means that checking out older commits can be

problematic

because

you'd still work against the latest snapshots, and
they

not

be

compatible with each other.


On 18/10/2021 15:22, Arvid Heise wrote:

I was actually betting on snapshots versions.
What are

the

limits?

Obviously, we can only do a release of a 1.15

connector

after

1.15

is

release.


--

Konstantin Knauf

https://urldefense.com/v3/__https://twitter.com/snntrable
__;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-
XjpYgX5MUy9M4$ [twitter[.]com]

https://urldefense.com/v3/__https://github.com/knaufk__;!
!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-XjpY
gXyX8u50S$ [github[.]com]

Re: [DISCUSS] Creating an external connector repository

Reply via email to