For sharing workflows we should be able to use composite actions. We'd have the main definition files in the flink-connectors repo, that we also need to tag/release, which other branches/repos can then import. These are also versioned, so we don't have to worry about accidentally breaking stuff. These could also be used to enforce certain standards / interfaces such that we can automate more things (e.g., integration into the Flink documentation).

It is true that Option 2) and dedicated repositories share a lot of properties. While I did say in an offline conversation that we in that case might just as well use separate repositories, I'm not so sure anymore. One repo would make administration a bit easier, for example secrets wouldn't have to be applied to each repo (we wouldn't want certain secrets to be set up organization-wide). I overall also like that one repo would present a single access point; you can't "miss" a connector repo, and I would hope that having it as one repo would nurture more collaboration between the connectors, which after all need to solve similar problems.

It is a fair point that the branching model would be quite weird, but I think that would subside pretty quickly.

Personally I'd go with Option 2, and if that doesn't work out we can still split the repo later on. (Which should then be a trivial matter of copying all <connector>/* branches and renaming them).

On 26/11/2021 12:47, Till Rohrmann wrote:
Hi Arvid,

Thanks for updating this thread with the latest findings. The described
limitations for a single connector repo sound suboptimal to me.

* Option 2. sounds as if we try to simulate multi connector repos inside of
a single repo. I also don't know how we would share code between the
different branches (sharing infrastructure would probably be easier
though). This seems to have the same limitations as dedicated repos with
the downside of having a not very intuitive branching model.
* Isn't option 1. kind of a degenerated version of option 2. where we have
some unrelated code from other connectors in the individual connector
branches?
* Option 3. has the downside that someone creating a release has to release
all connectors. This means that she either has to sync with the different
connector maintainers or has to be able to release all connectors on her
own. We are already seeing in the Flink community that releases require
quite good communication/coordination between the different people working
on different Flink components. Given our goals to make connector releases
easier and more frequent, I think that coupling different connector
releases might be counter-productive.

To me it sounds not very practical to mainly use a mono repository w/o
having some more advanced build infrastructure that, for example, allows to
have different git roots in different connector directories. Maybe the mono
repo can be a catch all repository for connectors that want to be released
in lock-step (Option 3.) with all other connectors the repo contains. But
for connectors that get changed frequently, having a dedicated repository
that allows independent releases sounds preferable to me.

What utilities and infrastructure code do you intend to share? Using git
submodules can definitely be one option to share code. However, it might
also be ok to depend on flink-connector-common artifacts which could make
things easier. Where I am unsure is whether git submodules can be used to
share infrastructure code (e.g. the .github/workflows) because you need
these files in the repo to trigger the CI infrastructure.

Cheers,
Till

On Thu, Nov 25, 2021 at 1:59 PM Arvid Heise <ar...@apache.org> wrote:

Hi Brian,

Thank you for sharing. I think your approach is very valid and is in line
with what I had in mind.

Basically Pravega community aligns the connector releases with the Pravega
mainline release

This certainly would mean that there is little value in coupling connector
versions. So it's making a good case for having separate connector repos.


and maintains the connector with the latest 3 Flink versions(CI will
publish snapshots for all these 3 branches)

I'd like to give connector devs a simple way to express to which Flink
versions the current branch is compatible. From there we can generate the
compatibility matrix automatically and optionally also create different
releases per supported Flink version. Not sure if the latter is indeed
better than having just one artifact that happens to run with multiple
Flink versions. I guess it depends on what dependencies we are exposing. If
the connector uses flink-connector-base, then we probably need separate
artifacts with poms anyways.

Best,

Arvid

On Fri, Nov 19, 2021 at 10:55 AM Zhou, Brian <b.z...@dell.com> wrote:

Hi Arvid,

For branching model, the Pravega Flink connector has some experience what
I would like to share. Here[1][2] is the compatibility matrix and wiki
explaining the branching model and releases. Basically Pravega community
aligns the connector releases with the Pravega mainline release, and
maintains the connector with the latest 3 Flink versions(CI will publish
snapshots for all these 3 branches).
For example, recently we have 0.10.1 release[3], and in maven central we
need to upload three artifacts(For Flink 1.13, 1.12, 1.11) for 0.10.1
version[4].

There are some alternatives. Another solution that we once discussed but
finally got abandoned is to have a independent version just like the
current CDC connector, and then give a big compatibility matrix to users.
We think it would be too confusing when the connector develops. On the
contrary, we can also do the opposite way to align with Flink version and
maintain several branches for different system version.

I would say this is only a fairly-OK solution because it is a bit painful
for maintainers as cherry-picks are very common and releases would
require
much work. However, if neither systems do not have a nice backward
compatibility, there seems to be no comfortable solution to the their
connector.

[1] https://github.com/pravega/flink-connectors#compatibility-matrix
[2]

https://github.com/pravega/flink-connectors/wiki/Versioning-strategy-for-Flink-connector
[3] https://github.com/pravega/flink-connectors/releases/tag/v0.10.1
[4] https://search.maven.org/search?q=pravega-connectors-flink

Best Regards,
Brian


Internal Use - Confidential

-----Original Message-----
From: Arvid Heise <ar...@apache.org>
Sent: Friday, November 19, 2021 4:12 PM
To: dev
Subject: Re: [DISCUSS] Creating an external connector repository


[EXTERNAL EMAIL]

Hi everyone,

we are currently in the process of setting up the flink-connectors repo
[1] for new connectors but we hit a wall that we currently cannot take:
branching model.
To reiterate the original motivation of the external connector repo: We
want to decouple the release cycle of a connector with Flink. However, if
we want to support semantic versioning in the connectors with the ability
to introduce breaking changes through major version bumps and support
bugfixes on old versions, then we need release branches similar to how
Flink core operates.
Consider two connectors, let's call them kafka and hbase. We have kafka
in
version 1.0.X, 1.1.Y (small improvement), 2.0.Z (config option) change
and
hbase only on 1.0.A.

Now our current assumption was that we can work with a mono-repo under
ASF
(flink-connectors). Then, for release-branches, we found 3 options:
1. We would need to create some ugly mess with the cross product of
connector and version: so you have kafka-release-1.0, kafka-release-1.1,
kafka-release-2.0, hbase-release-1.0. The main issue is not the amount of
branches (that's something that git can handle) but there the state of
kafka is undefined in hbase-release-1.0. That's a call for desaster and
makes releasing connectors very cumbersome (CI would only execute and
publish hbase SNAPSHOTS on hbase-release-1.0).
2. We could avoid the undefined state by having an empty master and each
release branch really only holds the code of the connector. But that's
also
not great: any user that looks at the repo and sees no connector would
assume that it's dead.
3. We could have synced releases similar to the CDC connectors [2]. That
means that if any connector introduces a breaking change, all connectors
get a new major. I find that quite confusing to a user if hbase gets a
new
release without any change because kafka introduced a breaking change.

To fully decouple release cycles and CI of connectors, we could add
individual repositories under ASF (flink-connector-kafka,
flink-connector-hbase). Then we can apply the same branching model as
before. I quickly checked if there are precedences in the apache
community
for that approach and just by scanning alphabetically I found cordova
with
70 and couchdb with 77 apache repos respectively. So it certainly seems
like other projects approached our problem in that way and the apache
organization is okay with that. I currently expect max 20 additional
repos
for connectors and in the future 10 max each for formats and filesystems
if
we would also move them out at some point in time. So we would be at a
total of 50 repos.

Note for all options, we need to provide a compability matrix that we aim
to autogenerate.

Now for the potential downsides that we internally discussed:
- How can we ensure common infra structure code, utilties, and quality?
I propose to add a flink-connector-common that contains all these things
and is added as a git submodule/subtree to the repos.
- Do we implicitly discourage connector developers to maintain more than
one connector with a fragmented code base?
That is certainly a risk. However, I currently also see few devs working
on more than one connector. However, it may actually help keeping the
devs
that maintain a specific connector on the hook. We could use github
issues
to track bugs and feature requests and a dev can focus his limited time
on
getting that one connector right.

So WDYT? Compared to some intermediate suggestions with split repos, the
big difference is that everything remains under Apache umbrella and the
Flink community.

[1]

https://urldefense.com/v3/__https://github.com/apache/flink-connectors__;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-XjpYgXzxxweh4$
[github[.]com] [2]

https://urldefense.com/v3/__https://github.com/ververica/flink-cdc-connectors/__;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-XjpYgXzgoPGA8$
[github[.]com]

On Fri, Nov 12, 2021 at 3:39 PM Arvid Heise <ar...@apache.org> wrote:

Hi everyone,

I created the flink-connectors repo [1] to advance the topic. We would
create a proof-of-concept in the next few weeks as a special branch
that I'd then use for discussions. If the community agrees with the
approach, that special branch will become the master. If not, we can
reiterate over it or create competing POCs.

If someone wants to try things out in parallel, just make sure that
you are not accidentally pushing POCs to the master.

As a reminder: We will not move out any current connector from Flink
at this point in time, so everything in Flink will remain as is and be
maintained there.

Best,

Arvid

[1]
https://urldefense.com/v3/__https://github.com/apache/flink-connectors
__;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-XjpYgXzxxweh4
$ [github[.]com]

On Fri, Oct 29, 2021 at 6:57 PM Till Rohrmann <trohrm...@apache.org>
wrote:

Hi everyone,

 From the discussion, it seems to me that we have different opinions
whether to have an ASF umbrella repository or to host them outside of
the ASF. It also seems that this is not really the problem to solve.
Since there are many good arguments for either approach, we could
simply start with an ASF umbrella repository and see how people adopt
it. If the individual connectors cannot move fast enough or if people
prefer to not buy into the more heavy-weight ASF processes, then they
can host the code also somewhere else. We simply need to make sure
that these connectors are discoverable (e.g. via flink-packages).

The more important problem seems to be to provide common tooling
(testing, infrastructure, documentation) that can easily be reused.
Similarly, it has become clear that the Flink community needs to
improve on providing stable APIs. I think it is not realistic to
first complete these tasks before starting to move connectors to
dedicated repositories. As Stephan said, creating a connector
repository will force us to pay more attention to API stability and
also to think about which testing tools are required. Hence, I
believe that starting to add connectors to a different repository
than apache/flink will help improve our connector tooling (declaring
testing classes as public, creating a common test utility repo,
creating a repo
template) and vice versa. Hence, I like Arvid's proposed process as
it will start kicking things off w/o letting this effort fizzle out.

Cheers,
Till

On Thu, Oct 28, 2021 at 11:44 AM Stephan Ewen <se...@apache.org>
wrote:
Thank you all, for the nice discussion!

 From my point of view, I very much like the idea of putting
connectors
in a
separate repository. But I would argue it should be part of Apache
Flink,
similar to flink-statefun, flink-ml, etc.

I share many of the reasons for that:
   - As argued many times, reduces complexity of the Flink repo,
increases
response times of CI, etc.
   - Much lower barrier of contribution, because an unstable
connector
would
not de-stabilize the whole build. Of course, we would need to make
sure
we
set this up the right way, with connectors having individual CI
runs,
build
status, etc. But it certainly seems possible.


I would argue some points a bit different than some cases made
before:
(a) I believe the separation would increase connector stability.
Because it
really forces us to work with the connectors against the APIs like
any external developer. A mono repo is somehow the wrong thing if
you in practice want to actually guarantee stable internal APIs at
some layer.
Because the mono repo makes it easy to just change something on
both
sides
of the API (provider and consumer) seamlessly.

Major refactorings in Flink need to keep all connector API
contracts intact, or we need to have a new version of the connector
API.
(b) We may even be able to go towards more lightweight and
automated releases over time, even if we stay in Apache Flink with
that repo.
This isn't yet fully aligned with the Apache release policies, yet,
but there are board discussions about whether there can be
bot-triggered releases (by dependabot) and how that could fit into
the Apache process.
This doesn't seem to be quite there just yet, but seeing that those
start
is a good sign, and there is a good chance we can do some things
there.
I am not sure whether we should let bots trigger releases, because
a
final
human look at things isn't a bad thing, especially given the
popularity
of
software supply chain attacks recently.


I do share Chesnay's concerns about complexity in tooling, though.
Both release tooling and test tooling. They are not incompatible
with that approach, but they are a task we need to tackle during
this change which will add additional work.



On Tue, Oct 26, 2021 at 10:31 AM Arvid Heise <ar...@apache.org>
wrote:
Hi folks,

I think some questions came up and I'd like to address the
question of
the
timing.

Could you clarify what release cadence you're thinking of?
There's
quite
a big range that fits "more frequent than Flink" (per-commit,
daily, weekly, bi-weekly, monthly, even bi-monthly).
The short answer is: as often as needed:
- If there is a CVE in a dependency and we need to bump it -
release immediately.
- If there is a new feature merged, release soonish. We may
collect a
few
successive features before a release.
- If there is a bugfix, release immediately or soonish depending
on
the
severity and if there are workarounds available.

We should not limit ourselves; the whole idea of independent
releases
is
exactly that you release as needed. There is no release planning
or anything needed, you just go with a release as if it was an
external artifact.

(1) is the connector API already stable?
 From another discussion thread [1], connector API is far from
stable.
Currently, it's hard to build connectors against multiple Flink
versions.
There are breaking API changes both in 1.12 -> 1.13 and 1.13 ->
1.14
and
  maybe also in the future versions,  because Table related APIs
are
still
@PublicEvolving and new Sink API is still @Experimental.

The question is: what is stable in an evolving system? We
recently discovered that the old SourceFunction needed to be
refined such that cancellation works correctly [1]. So that
interface is in Flink since
7
years, heavily used also outside, and we still had to change the
contract
in a way that I'd expect any implementer to recheck their
implementation.
It might not be necessary to change anything and you can probably
change
the the code for all Flink versions but still, the interface was
not
stable
in the closest sense.

If we focus just on API changes on the unified interfaces, then
we
expect
one more change to Sink API to support compaction. For Table API,
there
will most likely also be some changes in 1.15. So we could wait
for
1.15.
But I'm questioning if that's really necessary because we will
add
more
functionality beyond 1.15 without breaking API. For example, we
may
add
more unified connector metrics. If you want to use it in your
connector,
you have to support multiple Flink versions anyhow. So rather
then
focusing
the discussion on "when is stuff stable", I'd rather focus on
"how
can we
support building connectors against multiple Flink versions" and
make
it
as
painless as possible.

Chesnay pointed out to use different branches for different Flink
versions
which sounds like a good suggestion. With a mono-repo, we can't
use branches differently anyways (there is no way to have release
branches
per
connector without chaos). In these branches, we could provide
shims to simulate future features in older Flink versions such
that code-wise,
the
source code of a specific connector may not diverge (much). For
example,
to
register unified connector metrics, we could simulate the current
approach
also in some utility package of the mono-repo.

I see the stable core Flink API as a prerequisite for modularity.
And
for connectors it is not just the source and sink API (source
being stable as of 1.14), but everything that is required to
build and maintain a connector downstream, such as the test
utilities and infrastructure.

That is a very fair point. I'm actually surprised to see that
MiniClusterWithClientResource is not public. I see it being used
in
all
connectors, especially outside of Flink. I fear that as long as
we do
not
have connectors outside, we will not properly annotate and
maintain
these
utilties in a classic hen-and-egg-problem. I will outline an idea
at
the
end.

the connectors need to be adopted and require at least one
release
per
Flink minor release.
However, this will make the releases of connectors slower, e.g.
maintain
features for multiple branches and release multiple branches.
I think the main purpose of having an external connector
repository
is
in
order to have "faster releases of connectors"?

Imagine a project with a complex set of dependencies. Let's say
Flink
version A plus Flink reliant dependencies released by other
projects (Flink-external connectors, Beam, Iceberg, Hudi, ..).
We don't want
a
situation where we bump the core Flink version to B and things
fall apart (interface changes, utilities that were useful but
not public, transitive dependencies etc.).

Yes, that's why I wanted to automate the processes more which is
not
that
easy under ASF. Maybe we automate the source provision across
supported
versions and have 1 vote thread for all versions of a connector?

 From the perspective of CDC connector maintainers, the biggest
advantage
of
maintaining it outside of the Flink project is that:
1) we can have a more flexible and faster release cycle
2) we can be more liberal with committership for connector
maintainers
which can also attract more committers to help the release.

Personally, I think maintaining one connector repository under
the
ASF
may
not have the above benefits.

Yes, I also feel that ASF is too restrictive for our needs. But
it
feels
like there are too many that see it differently and I think we
need

(2) Flink testability without connectors.
This is a very good question. How can we guarantee the new
Source
and
Sink
API are stable with only test implementation?

We can't and shouldn't. Since the connector repo is managed by
Flink,
a
Flink release manager needs to check if the Flink connectors are
actually
working prior to creating an RC. That's similar to how
flink-shaded
and
flink core are related.


So here is one idea that I had to get things rolling. We are
going to address the external repo iteratively without
compromising what we
already
have:
1.Phase, add new contributions to external repo. We use that time
to
setup
infra accordingly and optimize release processes. We will
identify
test
utilities that are not yet public/stable and fix that.
2.Phase, add ports to the new unified interfaces of existing
connectors.
That requires a previous Flink release to make utilities stable.
Keep
old
interfaces in flink-core.
3.Phase, remove old interfaces in flink-core of some connectors
(tbd
at a
later point).
4.Phase, optionally move all remaining connectors (tbd at a later
point).
I'd envision having ~3 months between the starting the different
phases.
WDYT?


[1]
https://urldefense.com/v3/__https://issues.apache.org/jira/browse
/FLINK-23527__;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgd
ke_-XjpYgX2sIvAP4$ [issues[.]apache[.]org]

On Thu, Oct 21, 2021 at 7:12 AM Kyle Bendickson <k...@tabular.io>
wrote:
Hi all,

My name is Kyle and I’m an open source developer primarily
focused
on
Apache Iceberg.

I’m happy to help clarify or elaborate on any aspect of our
experience
working on a relatively decoupled connector that is downstream
and
pretty
popular.

I’d also love to be able to contribute or assist in any way I
can.
I don’t mean to thread jack, but are there any meetings or
community
sync
ups, specifically around the connector APIs, that I might join
/ be
invited
to?

I did want to add that even though I’ve experienced some of the
pain
points
of integrating with an evolving system / API (catalog support
is
generally
speaking pretty new everywhere really in this space), I also
agree personally that you shouldn’t slow down development
velocity too
much
for
the sake of external connector. Getting to a performant and
stable
place
should be the primary goal, and slowing that down to support
stragglers
will (in my personal opinion) always be a losing game. Some
folks
will
simply stay behind on versions regardless until they have to
upgrade.
I am working on ensuring that the Iceberg community stays
within 1-2 versions of Flink, so that we can help provide more
feedback or
contribute
things that might make our ability to support multiple Flink
runtimes /
versions with one project / codebase and minimal to no
reflection
(our
desired goal).

If there’s anything I can do or any way I can be of assistance,
please
don’t hesitate to reach out. Or find me on ASF slack 😀

I greatly appreciate your general concern for the needs of
downstream
connector integrators!

Cheers
Kyle Bendickson (GitHub: kbendick) Open Source Developer kyle
[at] tabular [dot] io

On Wed, Oct 20, 2021 at 11:35 AM Thomas Weise <t...@apache.org>
wrote:
Hi,

I see the stable core Flink API as a prerequisite for
modularity.
And
for connectors it is not just the source and sink API (source
being
stable as of 1.14), but everything that is required to build
and maintain a connector downstream, such as the test
utilities and infrastructure.

Without the stable surface of core Flink, changes will leak
into downstream dependencies and force lock step updates.
Refactoring across N repos is more painful than a single
repo. Those with experience developing downstream of Flink
will know the pain, and
that
isn't limited to connectors. I don't remember a Flink "minor
version"
update that was just a dependency version change and did not
force other downstream changes.

Imagine a project with a complex set of dependencies. Let's
say
Flink
version A plus Flink reliant dependencies released by other
projects
(Flink-external connectors, Beam, Iceberg, Hudi, ..). We
don't
want a
situation where we bump the core Flink version to B and
things
fall
apart (interface changes, utilities that were useful but not
public,
transitive dependencies etc.).

The discussion here also highlights the benefits of keeping
certain
connectors outside Flink. Whether that is due to difference
in developer community, maturity of the connectors, their
specialized/limited usage etc. I would like to see that as a
sign
of
a
growing ecosystem and most of the ideas that Arvid has put
forward would benefit further growth of the connector
ecosystem.
As for keeping connectors within Apache Flink: I prefer that
as
the
path forward for "essential" connectors like FileSource,
KafkaSource,
... And we can still achieve a more flexible and faster
release
cycle.
Thanks,
Thomas





On Wed, Oct 20, 2021 at 3:32 AM Jark Wu <imj...@gmail.com>
wrote:
Hi Konstantin,

the connectors need to be adopted and require at least
one
release
per
Flink minor release.
However, this will make the releases of connectors slower,
e.g.
maintain
features for multiple branches and release multiple
branches.
I think the main purpose of having an external connector
repository
is
in
order to have "faster releases of connectors"?


 From the perspective of CDC connector maintainers, the
biggest
advantage
of
maintaining it outside of the Flink project is that:
1) we can have a more flexible and faster release cycle
2) we can be more liberal with committership for connector
maintainers
which can also attract more committers to help the release.

Personally, I think maintaining one connector repository
under
the
ASF
may
not have the above benefits.

Best,
Jark

On Wed, 20 Oct 2021 at 15:14, Konstantin Knauf <
kna...@apache.org>
wrote:
Hi everyone,

regarding the stability of the APIs. I think everyone
agrees
that
connector APIs which are stable across minor versions
(1.13->1.14)
are
the
mid-term goal. But:

a) These APIs are still quite young, and we shouldn't
make
them
@Public
prematurely either.

b) Isn't this *mostly* orthogonal to where the connector
code
lives?
Yes,
as long as there are breaking changes, the connectors
need to
be
adopted
and require at least one release per Flink minor release.
Documentation-wise this can be addressed via a
compatibility
matrix
for
each connector as Arvid suggested. IMO we shouldn't block
this
effort
on
the stability of the APIs.

Cheers,

Konstantin



On Wed, Oct 20, 2021 at 8:56 AM Jark Wu
<imj...@gmail.com>
wrote:
Hi,

I think Thomas raised very good questions and would like
to
know
your
opinions if we want to move connectors out of flink in
this
version.
(1) is the connector API already stable?
Separate releases would only make sense if the core
Flink
surface
is
fairly stable though. As evident from Iceberg (and
also
Beam),
that's
not the case currently. We should probably focus on
addressing
the
stability first, before splitting code. A success
criteria
could
be
that we are able to build Iceberg and Beam against
multiple
Flink
versions w/o the need to change code. The goal would
be
that
no
connector breaks when we make changes to Flink core.
Until
that's
the
case, code separation creates a setup where 1+1 or N+1
repositories
need to move lock step.
 From another discussion thread [1], connector API is far
from
stable.
Currently, it's hard to build connectors against
multiple
Flink
versions.
There are breaking API changes both in 1.12 -> 1.13 and
1.13
->
1.14
and
  maybe also in the future versions,  because Table
related
APIs
are
still
@PublicEvolving and new Sink API is still @Experimental.


(2) Flink testability without connectors.
Flink w/o Kafka connector (and few others) isn't
viable. Testability of Flink was already brought up,
can we
really
certify a Flink core release without Kafka connector?
Maybe
those
connectors that are used in Flink e2e tests to
validate
functionality
of core Flink should not be broken out?
This is a very good question. How can we guarantee the
new
Source
and
Sink
API are stable with only test implementation?


Best,
Jark





On Tue, 19 Oct 2021 at 23:56, Chesnay Schepler <
ches...@apache.org>
wrote:

Could you clarify what release cadence you're thinking
of?
There's
quite
a big range that fits "more frequent than Flink"
(per-commit,
daily,
weekly, bi-weekly, monthly, even bi-monthly).

On 19/10/2021 14:15, Martijn Visser wrote:
Hi all,

I think it would be a huge benefit if we can achieve
more
frequent
releases
of connectors, which are not bound to the release
cycle
of
Flink
itself.
I
agree that in order to get there, we need to have
stable
interfaces
which
are trustworthy and reliable, so they can be safely
used
by
those
connectors. I do think that work still needs to be
done
on
those
interfaces, but I am confident that we can get there
from a
Flink
perspective.

I am worried that we would not be able to achieve
those
frequent
releases
of connectors if we are putting these connectors
under
the
Apache
umbrella,
because that means that for each connector release
we
have
to
follow
the
Apache release creation process. This requires a lot
of
manual
steps
and
prohibits automation and I think it would be hard to
scale
out
frequent
releases of connectors. I'm curious how others think
this
challenge
could
be solved.

Best regards,

Martijn

On Mon, 18 Oct 2021 at 22:22, Thomas Weise <
t...@apache.org>
wrote:
Thanks for initiating this discussion.

There are definitely a few things that are not
optimal
with
our
current management of connectors. I would not
necessarily
characterize
it as a "mess" though. As the points raised so far
show, it
isn't
easy
to find a solution that balances competing
requirements
and
leads to
a
net improvement.

It would be great if we can find a setup that
allows for
connectors
to
be released independently of core Flink and that
each
connector
can
be
released separately. Flink already has separate
releases (flink-shaded), so that by itself isn't a
new thing.
Per-connector
releases would need to allow for more frequent
releases
(without
the
baggage that a full Flink release comes with).

Separate releases would only make sense if the core
Flink
surface is
fairly stable though. As evident from Iceberg (and
also
Beam),
that's
not the case currently. We should probably focus on
addressing
the
stability first, before splitting code. A success
criteria
could
be
that we are able to build Iceberg and Beam against
multiple
Flink
versions w/o the need to change code. The goal
would be
that
no
connector breaks when we make changes to Flink core.
Until
that's the
case, code separation creates a setup where 1+1 or
N+1
repositories
need to move lock step.

Regarding some connectors being more important for
Flink
than
others:
That's a fact. Flink w/o Kafka connector (and few
others)
isn't
viable. Testability of Flink was already brought
up,
can we
really
certify a Flink core release without Kafka
connector?
Maybe
those
connectors that are used in Flink e2e tests to
validate
functionality
of core Flink should not be broken out?

Finally, I think that the connectors that move into
separate
repos
should remain part of the Apache Flink project.
Larger
organizations
tend to approve the use of and contribution to open
source
at
the
project level. Sometimes it is everything ASF. More
often
it
is
"Apache Foo". It would be fatal to end up with a
patchwork
of
projects
with potentially different licenses and governance
to
arrive
at a
working Flink setup. This may mean we prioritize
usability
over
developer convenience, if that's in the best
interest of
Flink
as a
whole.

Thanks,
Thomas



On Mon, Oct 18, 2021 at 6:59 AM Chesnay Schepler <
ches...@apache.org
wrote:
Generally, the issues are reproducibility and
control.
Stuffs completely broken on the Flink side for a
week?
Well
then so
are
the connector repos.
(As-is) You can't go back to a previous version of
the
snapshot.
Which
also means that checking out older commits can be
problematic
because
you'd still work against the latest snapshots, and
they
not
be
compatible with each other.


On 18/10/2021 15:22, Arvid Heise wrote:
I was actually betting on snapshots versions.
What are
the
limits?
Obviously, we can only do a release of a 1.15
connector
after
1.15
is
release.


--

Konstantin Knauf

https://urldefense.com/v3/__https://twitter.com/snntrable
__;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-
XjpYgX5MUy9M4$ [twitter[.]com]

https://urldefense.com/v3/__https://github.com/knaufk__;!
!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-XjpY
gXyX8u50S$ [github[.]com]


Reply via email to