Re: [DISCUSS] Creating an external connector repository

2021-10-20 Thread Thomas Weise
Hi,

I see the stable core Flink API as a prerequisite for modularity. And
for connectors it is not just the source and sink API (source being
stable as of 1.14), but everything that is required to build and
maintain a connector downstream, such as the test utilities and
infrastructure.

Without the stable surface of core Flink, changes will leak into
downstream dependencies and force lock step updates. Refactoring
across N repos is more painful than a single repo. Those with
experience developing downstream of Flink will know the pain, and that
isn't limited to connectors. I don't remember a Flink "minor version"
update that was just a dependency version change and did not force
other downstream changes.

Imagine a project with a complex set of dependencies. Let's say Flink
version A plus Flink reliant dependencies released by other projects
(Flink-external connectors, Beam, Iceberg, Hudi, ..). We don't want a
situation where we bump the core Flink version to B and things fall
apart (interface changes, utilities that were useful but not public,
transitive dependencies etc.).

The discussion here also highlights the benefits of keeping certain
connectors outside Flink. Whether that is due to difference in
developer community, maturity of the connectors, their
specialized/limited usage etc. I would like to see that as a sign of a
growing ecosystem and most of the ideas that Arvid has put forward
would benefit further growth of the connector ecosystem.

As for keeping connectors within Apache Flink: I prefer that as the
path forward for "essential" connectors like FileSource, KafkaSource,
... And we can still achieve a more flexible and faster release cycle.

Thanks,
Thomas





On Wed, Oct 20, 2021 at 3:32 AM Jark Wu  wrote:
>
> Hi Konstantin,
>
> > the connectors need to be adopted and require at least one release per
> Flink minor release.
> However, this will make the releases of connectors slower, e.g. maintain
> features for multiple branches and release multiple branches.
> I think the main purpose of having an external connector repository is in
> order to have "faster releases of connectors"?
>
>
> From the perspective of CDC connector maintainers, the biggest advantage of
> maintaining it outside of the Flink project is that:
> 1) we can have a more flexible and faster release cycle
> 2) we can be more liberal with committership for connector maintainers
> which can also attract more committers to help the release.
>
> Personally, I think maintaining one connector repository under the ASF may
> not have the above benefits.
>
> Best,
> Jark
>
> On Wed, 20 Oct 2021 at 15:14, Konstantin Knauf  wrote:
>
> > Hi everyone,
> >
> > regarding the stability of the APIs. I think everyone agrees that
> > connector APIs which are stable across minor versions (1.13->1.14) are the
> > mid-term goal. But:
> >
> > a) These APIs are still quite young, and we shouldn't make them @Public
> > prematurely either.
> >
> > b) Isn't this *mostly* orthogonal to where the connector code lives? Yes,
> > as long as there are breaking changes, the connectors need to be adopted
> > and require at least one release per Flink minor release.
> > Documentation-wise this can be addressed via a compatibility matrix for
> > each connector as Arvid suggested. IMO we shouldn't block this effort on
> > the stability of the APIs.
> >
> > Cheers,
> >
> > Konstantin
> >
> >
> >
> > On Wed, Oct 20, 2021 at 8:56 AM Jark Wu  wrote:
> >
> >> Hi,
> >>
> >> I think Thomas raised very good questions and would like to know your
> >> opinions if we want to move connectors out of flink in this version.
> >>
> >> (1) is the connector API already stable?
> >> > Separate releases would only make sense if the core Flink surface is
> >> > fairly stable though. As evident from Iceberg (and also Beam), that's
> >> > not the case currently. We should probably focus on addressing the
> >> > stability first, before splitting code. A success criteria could be
> >> > that we are able to build Iceberg and Beam against multiple Flink
> >> > versions w/o the need to change code. The goal would be that no
> >> > connector breaks when we make changes to Flink core. Until that's the
> >> > case, code separation creates a setup where 1+1 or N+1 repositories
> >> > need to move lock step.
> >>
> >> From another discussion thread [1], connector API is far from stable.
> >> Currently, it's hard to build connectors against multiple Flink versions.
> >> There are breaking API changes both in 1.12 -> 1.13 and 1.13 -> 1.14 and
> >>  maybe also in the future versions,  because Table related APIs are still
> >> @PublicEvolving and new Sink API is still @Experimental.
> >>
> >>
> >> (2) Flink testability without connectors.
> >> > Flink w/o Kafka connector (and few others) isn't
> >> > viable. Testability of Flink was already brought up, can we really
> >> > certify a Flink core release without Kafka connector? Maybe those
> >> > connectors that are used in Flink e2e tests to 

Re: [DISCUSS] Creating an external connector repository

2021-10-20 Thread Jark Wu
Hi Konstantin,

> the connectors need to be adopted and require at least one release per
Flink minor release.
However, this will make the releases of connectors slower, e.g. maintain
features for multiple branches and release multiple branches.
I think the main purpose of having an external connector repository is in
order to have "faster releases of connectors"?


>From the perspective of CDC connector maintainers, the biggest advantage of
maintaining it outside of the Flink project is that:
1) we can have a more flexible and faster release cycle
2) we can be more liberal with committership for connector maintainers
which can also attract more committers to help the release.

Personally, I think maintaining one connector repository under the ASF may
not have the above benefits.

Best,
Jark

On Wed, 20 Oct 2021 at 15:14, Konstantin Knauf  wrote:

> Hi everyone,
>
> regarding the stability of the APIs. I think everyone agrees that
> connector APIs which are stable across minor versions (1.13->1.14) are the
> mid-term goal. But:
>
> a) These APIs are still quite young, and we shouldn't make them @Public
> prematurely either.
>
> b) Isn't this *mostly* orthogonal to where the connector code lives? Yes,
> as long as there are breaking changes, the connectors need to be adopted
> and require at least one release per Flink minor release.
> Documentation-wise this can be addressed via a compatibility matrix for
> each connector as Arvid suggested. IMO we shouldn't block this effort on
> the stability of the APIs.
>
> Cheers,
>
> Konstantin
>
>
>
> On Wed, Oct 20, 2021 at 8:56 AM Jark Wu  wrote:
>
>> Hi,
>>
>> I think Thomas raised very good questions and would like to know your
>> opinions if we want to move connectors out of flink in this version.
>>
>> (1) is the connector API already stable?
>> > Separate releases would only make sense if the core Flink surface is
>> > fairly stable though. As evident from Iceberg (and also Beam), that's
>> > not the case currently. We should probably focus on addressing the
>> > stability first, before splitting code. A success criteria could be
>> > that we are able to build Iceberg and Beam against multiple Flink
>> > versions w/o the need to change code. The goal would be that no
>> > connector breaks when we make changes to Flink core. Until that's the
>> > case, code separation creates a setup where 1+1 or N+1 repositories
>> > need to move lock step.
>>
>> From another discussion thread [1], connector API is far from stable.
>> Currently, it's hard to build connectors against multiple Flink versions.
>> There are breaking API changes both in 1.12 -> 1.13 and 1.13 -> 1.14 and
>>  maybe also in the future versions,  because Table related APIs are still
>> @PublicEvolving and new Sink API is still @Experimental.
>>
>>
>> (2) Flink testability without connectors.
>> > Flink w/o Kafka connector (and few others) isn't
>> > viable. Testability of Flink was already brought up, can we really
>> > certify a Flink core release without Kafka connector? Maybe those
>> > connectors that are used in Flink e2e tests to validate functionality
>> > of core Flink should not be broken out?
>>
>> This is a very good question. How can we guarantee the new Source and Sink
>> API are stable with only test implementation?
>>
>>
>> Best,
>> Jark
>>
>>
>>
>>
>>
>> On Tue, 19 Oct 2021 at 23:56, Chesnay Schepler 
>> wrote:
>>
>> > Could you clarify what release cadence you're thinking of? There's quite
>> > a big range that fits "more frequent than Flink" (per-commit, daily,
>> > weekly, bi-weekly, monthly, even bi-monthly).
>> >
>> > On 19/10/2021 14:15, Martijn Visser wrote:
>> > > Hi all,
>> > >
>> > > I think it would be a huge benefit if we can achieve more frequent
>> > releases
>> > > of connectors, which are not bound to the release cycle of Flink
>> itself.
>> > I
>> > > agree that in order to get there, we need to have stable interfaces
>> which
>> > > are trustworthy and reliable, so they can be safely used by those
>> > > connectors. I do think that work still needs to be done on those
>> > > interfaces, but I am confident that we can get there from a Flink
>> > > perspective.
>> > >
>> > > I am worried that we would not be able to achieve those frequent
>> releases
>> > > of connectors if we are putting these connectors under the Apache
>> > umbrella,
>> > > because that means that for each connector release we have to follow
>> the
>> > > Apache release creation process. This requires a lot of manual steps
>> and
>> > > prohibits automation and I think it would be hard to scale out
>> frequent
>> > > releases of connectors. I'm curious how others think this challenge
>> could
>> > > be solved.
>> > >
>> > > Best regards,
>> > >
>> > > Martijn
>> > >
>> > > On Mon, 18 Oct 2021 at 22:22, Thomas Weise  wrote:
>> > >
>> > >> Thanks for initiating this discussion.
>> > >>
>> > >> There are definitely a few things that are not optimal with our
>> > >> current management of connectors. I 

Re: [DISCUSS] Creating an external connector repository

2021-10-20 Thread Konstantin Knauf
Hi everyone,

regarding the stability of the APIs. I think everyone agrees that
connector APIs which are stable across minor versions (1.13->1.14) are the
mid-term goal. But:

a) These APIs are still quite young, and we shouldn't make them @Public
prematurely either.

b) Isn't this *mostly* orthogonal to where the connector code lives? Yes,
as long as there are breaking changes, the connectors need to be adopted
and require at least one release per Flink minor release.
Documentation-wise this can be addressed via a compatibility matrix for
each connector as Arvid suggested. IMO we shouldn't block this effort on
the stability of the APIs.

Cheers,

Konstantin



On Wed, Oct 20, 2021 at 8:56 AM Jark Wu  wrote:

> Hi,
>
> I think Thomas raised very good questions and would like to know your
> opinions if we want to move connectors out of flink in this version.
>
> (1) is the connector API already stable?
> > Separate releases would only make sense if the core Flink surface is
> > fairly stable though. As evident from Iceberg (and also Beam), that's
> > not the case currently. We should probably focus on addressing the
> > stability first, before splitting code. A success criteria could be
> > that we are able to build Iceberg and Beam against multiple Flink
> > versions w/o the need to change code. The goal would be that no
> > connector breaks when we make changes to Flink core. Until that's the
> > case, code separation creates a setup where 1+1 or N+1 repositories
> > need to move lock step.
>
> From another discussion thread [1], connector API is far from stable.
> Currently, it's hard to build connectors against multiple Flink versions.
> There are breaking API changes both in 1.12 -> 1.13 and 1.13 -> 1.14 and
>  maybe also in the future versions,  because Table related APIs are still
> @PublicEvolving and new Sink API is still @Experimental.
>
>
> (2) Flink testability without connectors.
> > Flink w/o Kafka connector (and few others) isn't
> > viable. Testability of Flink was already brought up, can we really
> > certify a Flink core release without Kafka connector? Maybe those
> > connectors that are used in Flink e2e tests to validate functionality
> > of core Flink should not be broken out?
>
> This is a very good question. How can we guarantee the new Source and Sink
> API are stable with only test implementation?
>
>
> Best,
> Jark
>
>
>
>
>
> On Tue, 19 Oct 2021 at 23:56, Chesnay Schepler  wrote:
>
> > Could you clarify what release cadence you're thinking of? There's quite
> > a big range that fits "more frequent than Flink" (per-commit, daily,
> > weekly, bi-weekly, monthly, even bi-monthly).
> >
> > On 19/10/2021 14:15, Martijn Visser wrote:
> > > Hi all,
> > >
> > > I think it would be a huge benefit if we can achieve more frequent
> > releases
> > > of connectors, which are not bound to the release cycle of Flink
> itself.
> > I
> > > agree that in order to get there, we need to have stable interfaces
> which
> > > are trustworthy and reliable, so they can be safely used by those
> > > connectors. I do think that work still needs to be done on those
> > > interfaces, but I am confident that we can get there from a Flink
> > > perspective.
> > >
> > > I am worried that we would not be able to achieve those frequent
> releases
> > > of connectors if we are putting these connectors under the Apache
> > umbrella,
> > > because that means that for each connector release we have to follow
> the
> > > Apache release creation process. This requires a lot of manual steps
> and
> > > prohibits automation and I think it would be hard to scale out frequent
> > > releases of connectors. I'm curious how others think this challenge
> could
> > > be solved.
> > >
> > > Best regards,
> > >
> > > Martijn
> > >
> > > On Mon, 18 Oct 2021 at 22:22, Thomas Weise  wrote:
> > >
> > >> Thanks for initiating this discussion.
> > >>
> > >> There are definitely a few things that are not optimal with our
> > >> current management of connectors. I would not necessarily characterize
> > >> it as a "mess" though. As the points raised so far show, it isn't easy
> > >> to find a solution that balances competing requirements and leads to a
> > >> net improvement.
> > >>
> > >> It would be great if we can find a setup that allows for connectors to
> > >> be released independently of core Flink and that each connector can be
> > >> released separately. Flink already has separate releases
> > >> (flink-shaded), so that by itself isn't a new thing. Per-connector
> > >> releases would need to allow for more frequent releases (without the
> > >> baggage that a full Flink release comes with).
> > >>
> > >> Separate releases would only make sense if the core Flink surface is
> > >> fairly stable though. As evident from Iceberg (and also Beam), that's
> > >> not the case currently. We should probably focus on addressing the
> > >> stability first, before splitting code. A success criteria could be
> > >> that we are able to build 

Re: [DISCUSS] Creating an external connector repository

2021-10-20 Thread Jark Wu
Hi,

I think Thomas raised very good questions and would like to know your
opinions if we want to move connectors out of flink in this version.

(1) is the connector API already stable?
> Separate releases would only make sense if the core Flink surface is
> fairly stable though. As evident from Iceberg (and also Beam), that's
> not the case currently. We should probably focus on addressing the
> stability first, before splitting code. A success criteria could be
> that we are able to build Iceberg and Beam against multiple Flink
> versions w/o the need to change code. The goal would be that no
> connector breaks when we make changes to Flink core. Until that's the
> case, code separation creates a setup where 1+1 or N+1 repositories
> need to move lock step.

>From another discussion thread [1], connector API is far from stable.
Currently, it's hard to build connectors against multiple Flink versions.
There are breaking API changes both in 1.12 -> 1.13 and 1.13 -> 1.14 and
 maybe also in the future versions,  because Table related APIs are still
@PublicEvolving and new Sink API is still @Experimental.


(2) Flink testability without connectors.
> Flink w/o Kafka connector (and few others) isn't
> viable. Testability of Flink was already brought up, can we really
> certify a Flink core release without Kafka connector? Maybe those
> connectors that are used in Flink e2e tests to validate functionality
> of core Flink should not be broken out?

This is a very good question. How can we guarantee the new Source and Sink
API are stable with only test implementation?


Best,
Jark





On Tue, 19 Oct 2021 at 23:56, Chesnay Schepler  wrote:

> Could you clarify what release cadence you're thinking of? There's quite
> a big range that fits "more frequent than Flink" (per-commit, daily,
> weekly, bi-weekly, monthly, even bi-monthly).
>
> On 19/10/2021 14:15, Martijn Visser wrote:
> > Hi all,
> >
> > I think it would be a huge benefit if we can achieve more frequent
> releases
> > of connectors, which are not bound to the release cycle of Flink itself.
> I
> > agree that in order to get there, we need to have stable interfaces which
> > are trustworthy and reliable, so they can be safely used by those
> > connectors. I do think that work still needs to be done on those
> > interfaces, but I am confident that we can get there from a Flink
> > perspective.
> >
> > I am worried that we would not be able to achieve those frequent releases
> > of connectors if we are putting these connectors under the Apache
> umbrella,
> > because that means that for each connector release we have to follow the
> > Apache release creation process. This requires a lot of manual steps and
> > prohibits automation and I think it would be hard to scale out frequent
> > releases of connectors. I'm curious how others think this challenge could
> > be solved.
> >
> > Best regards,
> >
> > Martijn
> >
> > On Mon, 18 Oct 2021 at 22:22, Thomas Weise  wrote:
> >
> >> Thanks for initiating this discussion.
> >>
> >> There are definitely a few things that are not optimal with our
> >> current management of connectors. I would not necessarily characterize
> >> it as a "mess" though. As the points raised so far show, it isn't easy
> >> to find a solution that balances competing requirements and leads to a
> >> net improvement.
> >>
> >> It would be great if we can find a setup that allows for connectors to
> >> be released independently of core Flink and that each connector can be
> >> released separately. Flink already has separate releases
> >> (flink-shaded), so that by itself isn't a new thing. Per-connector
> >> releases would need to allow for more frequent releases (without the
> >> baggage that a full Flink release comes with).
> >>
> >> Separate releases would only make sense if the core Flink surface is
> >> fairly stable though. As evident from Iceberg (and also Beam), that's
> >> not the case currently. We should probably focus on addressing the
> >> stability first, before splitting code. A success criteria could be
> >> that we are able to build Iceberg and Beam against multiple Flink
> >> versions w/o the need to change code. The goal would be that no
> >> connector breaks when we make changes to Flink core. Until that's the
> >> case, code separation creates a setup where 1+1 or N+1 repositories
> >> need to move lock step.
> >>
> >> Regarding some connectors being more important for Flink than others:
> >> That's a fact. Flink w/o Kafka connector (and few others) isn't
> >> viable. Testability of Flink was already brought up, can we really
> >> certify a Flink core release without Kafka connector? Maybe those
> >> connectors that are used in Flink e2e tests to validate functionality
> >> of core Flink should not be broken out?
> >>
> >> Finally, I think that the connectors that move into separate repos
> >> should remain part of the Apache Flink project. Larger organizations
> >> tend to approve the use of and contribution to open 

Re: [DISCUSS] Creating an external connector repository

2021-10-19 Thread Chesnay Schepler
Could you clarify what release cadence you're thinking of? There's quite 
a big range that fits "more frequent than Flink" (per-commit, daily, 
weekly, bi-weekly, monthly, even bi-monthly).


On 19/10/2021 14:15, Martijn Visser wrote:

Hi all,

I think it would be a huge benefit if we can achieve more frequent releases
of connectors, which are not bound to the release cycle of Flink itself. I
agree that in order to get there, we need to have stable interfaces which
are trustworthy and reliable, so they can be safely used by those
connectors. I do think that work still needs to be done on those
interfaces, but I am confident that we can get there from a Flink
perspective.

I am worried that we would not be able to achieve those frequent releases
of connectors if we are putting these connectors under the Apache umbrella,
because that means that for each connector release we have to follow the
Apache release creation process. This requires a lot of manual steps and
prohibits automation and I think it would be hard to scale out frequent
releases of connectors. I'm curious how others think this challenge could
be solved.

Best regards,

Martijn

On Mon, 18 Oct 2021 at 22:22, Thomas Weise  wrote:


Thanks for initiating this discussion.

There are definitely a few things that are not optimal with our
current management of connectors. I would not necessarily characterize
it as a "mess" though. As the points raised so far show, it isn't easy
to find a solution that balances competing requirements and leads to a
net improvement.

It would be great if we can find a setup that allows for connectors to
be released independently of core Flink and that each connector can be
released separately. Flink already has separate releases
(flink-shaded), so that by itself isn't a new thing. Per-connector
releases would need to allow for more frequent releases (without the
baggage that a full Flink release comes with).

Separate releases would only make sense if the core Flink surface is
fairly stable though. As evident from Iceberg (and also Beam), that's
not the case currently. We should probably focus on addressing the
stability first, before splitting code. A success criteria could be
that we are able to build Iceberg and Beam against multiple Flink
versions w/o the need to change code. The goal would be that no
connector breaks when we make changes to Flink core. Until that's the
case, code separation creates a setup where 1+1 or N+1 repositories
need to move lock step.

Regarding some connectors being more important for Flink than others:
That's a fact. Flink w/o Kafka connector (and few others) isn't
viable. Testability of Flink was already brought up, can we really
certify a Flink core release without Kafka connector? Maybe those
connectors that are used in Flink e2e tests to validate functionality
of core Flink should not be broken out?

Finally, I think that the connectors that move into separate repos
should remain part of the Apache Flink project. Larger organizations
tend to approve the use of and contribution to open source at the
project level. Sometimes it is everything ASF. More often it is
"Apache Foo". It would be fatal to end up with a patchwork of projects
with potentially different licenses and governance to arrive at a
working Flink setup. This may mean we prioritize usability over
developer convenience, if that's in the best interest of Flink as a
whole.

Thanks,
Thomas



On Mon, Oct 18, 2021 at 6:59 AM Chesnay Schepler 
wrote:

Generally, the issues are reproducibility and control.

Stuffs completely broken on the Flink side for a week? Well then so are
the connector repos.
(As-is) You can't go back to a previous version of the snapshot. Which
also means that checking out older commits can be problematic because
you'd still work against the latest snapshots, and they not be
compatible with each other.


On 18/10/2021 15:22, Arvid Heise wrote:

I was actually betting on snapshots versions. What are the limits?
Obviously, we can only do a release of a 1.15 connector after 1.15 is
release.






Re: [DISCUSS] Creating an external connector repository

2021-10-19 Thread Chesnay Schepler
TBH I think you're overestimating how much work it is to create a 
non-Flink release. Having done most of the flink-shaded releases, I 
really don't see an issue of even doing weekly releases with that process.


We can not reduce the number of votes AFAIK; the ASF seems very clear on 
that matter to me: 
https://www.apache.org/foundation/voting.html#ReleaseVotes

However, the vote duration is up to us.

Additionally, we only /need /to vote on the /source/. This means we 
don't need to create the maven artifacts for each RC, but can do that at 
the very end.


On 19/10/2021 14:21, Arvid Heise wrote:
Okay I think it is clear that the majority would like to keep 
connectors under the Apache Flink umbrella. That means we will not be 
able to have per-connector repositories and project management, 
automatic dependency bumping with Dependabot, or semi-automatic releases.


So then I'm assuming the directory structure that @Chesnay Schepler 
 proposed would be the most beneficial:

- A root project with some convenience setup.
- Unrelated subprojects with individual versioning and releases.
- Branches for minor Flink releases. That is needed anyhow to use new 
features independent of API stability.
- Each connector maintains its own documentation that is accessible 
through the main documentation.


Any thoughts on alternatives? Do you see risks?

@Stephan Ewen  mentioned offline that we 
could adjust the bylaws for the connectors such that we need fewer 
PMCs to approve a release. Would it be enough to have one PMC vote per 
connector release? Do you know of other ways to tweak the release 
process to have fewer manual work?


On Mon, Oct 18, 2021 at 10:22 PM Thomas Weise  wrote:

Thanks for initiating this discussion.

There are definitely a few things that are not optimal with our
current management of connectors. I would not necessarily characterize
it as a "mess" though. As the points raised so far show, it isn't easy
to find a solution that balances competing requirements and leads to a
net improvement.

It would be great if we can find a setup that allows for connectors to
be released independently of core Flink and that each connector can be
released separately. Flink already has separate releases
(flink-shaded), so that by itself isn't a new thing. Per-connector
releases would need to allow for more frequent releases (without the
baggage that a full Flink release comes with).

Separate releases would only make sense if the core Flink surface is
fairly stable though. As evident from Iceberg (and also Beam), that's
not the case currently. We should probably focus on addressing the
stability first, before splitting code. A success criteria could be
that we are able to build Iceberg and Beam against multiple Flink
versions w/o the need to change code. The goal would be that no
connector breaks when we make changes to Flink core. Until that's the
case, code separation creates a setup where 1+1 or N+1 repositories
need to move lock step.

Regarding some connectors being more important for Flink than others:
That's a fact. Flink w/o Kafka connector (and few others) isn't
viable. Testability of Flink was already brought up, can we really
certify a Flink core release without Kafka connector? Maybe those
connectors that are used in Flink e2e tests to validate functionality
of core Flink should not be broken out?

Finally, I think that the connectors that move into separate repos
should remain part of the Apache Flink project. Larger organizations
tend to approve the use of and contribution to open source at the
project level. Sometimes it is everything ASF. More often it is
"Apache Foo". It would be fatal to end up with a patchwork of projects
with potentially different licenses and governance to arrive at a
working Flink setup. This may mean we prioritize usability over
developer convenience, if that's in the best interest of Flink as a
whole.

Thanks,
Thomas



On Mon, Oct 18, 2021 at 6:59 AM Chesnay Schepler
 wrote:
>
> Generally, the issues are reproducibility and control.
>
> Stuffs completely broken on the Flink side for a week? Well then
so are
> the connector repos.
> (As-is) You can't go back to a previous version of the snapshot.
Which
> also means that checking out older commits can be problematic
because
> you'd still work against the latest snapshots, and they not be
> compatible with each other.
>
>
> On 18/10/2021 15:22, Arvid Heise wrote:
> > I was actually betting on snapshots versions. What are the limits?
> > Obviously, we can only do a release of a 1.15 connector after
1.15 is
> > release.
>
>



Re: [DISCUSS] Creating an external connector repository

2021-10-19 Thread Dawid Wysakowicz
Hey all,

I don't have much to add to the general discussion. Just a single
comment on:

that we could adjust the bylaws for the connectors such that we need
fewer PMCs to approve a release. Would it be enough to have one PMC
vote per connector release?

I think it's not an option. This particular rule is one of few rules
from the bylaws that actually originates from ASF rather than was
established within the Flink community. I believe we do need 3 PMC votes
for any formal ASF releases [1].

Votes on whether a package is ready to release use majority
approval-- i.e. at least three PMC members must vote affirmatively
for release, and there must be more positive than negative votes.
Releases may not be vetoed*.*Generally the community will cancel the
release vote if anyone identifies serious problems, but in most
cases the ultimate decision lies with the individual serving as
release manager. The specifics of the process may vary from project
to project,*but the 'minimum quorum of three +1 votes' rule is
universal.*

Best,

Dawid

https://www.apache.org/foundation/voting.html#ReleaseVotes

On 19/10/2021 14:21, Arvid Heise wrote:
> Okay I think it is clear that the majority would like to keep connectors
> under the Apache Flink umbrella. That means we will not be able to have
> per-connector repositories and project management, automatic dependency
> bumping with Dependabot, or semi-automatic releases.
>
> So then I'm assuming the directory structure that @Chesnay Schepler
>  proposed would be the most beneficial:
> - A root project with some convenience setup.
> - Unrelated subprojects with individual versioning and releases.
> - Branches for minor Flink releases. That is needed anyhow to use new
> features independent of API stability.
> - Each connector maintains its own documentation that is accessible through
> the main documentation.
>
> Any thoughts on alternatives? Do you see risks?
>
> @Stephan Ewen  mentioned offline that we could adjust the
> bylaws for the connectors such that we need fewer PMCs to approve a
> release. Would it be enough to have one PMC vote per connector release? Do
> you know of other ways to tweak the release process to have fewer manual
> work?
>
> On Mon, Oct 18, 2021 at 10:22 PM Thomas Weise  wrote:
>
>> Thanks for initiating this discussion.
>>
>> There are definitely a few things that are not optimal with our
>> current management of connectors. I would not necessarily characterize
>> it as a "mess" though. As the points raised so far show, it isn't easy
>> to find a solution that balances competing requirements and leads to a
>> net improvement.
>>
>> It would be great if we can find a setup that allows for connectors to
>> be released independently of core Flink and that each connector can be
>> released separately. Flink already has separate releases
>> (flink-shaded), so that by itself isn't a new thing. Per-connector
>> releases would need to allow for more frequent releases (without the
>> baggage that a full Flink release comes with).
>>
>> Separate releases would only make sense if the core Flink surface is
>> fairly stable though. As evident from Iceberg (and also Beam), that's
>> not the case currently. We should probably focus on addressing the
>> stability first, before splitting code. A success criteria could be
>> that we are able to build Iceberg and Beam against multiple Flink
>> versions w/o the need to change code. The goal would be that no
>> connector breaks when we make changes to Flink core. Until that's the
>> case, code separation creates a setup where 1+1 or N+1 repositories
>> need to move lock step.
>>
>> Regarding some connectors being more important for Flink than others:
>> That's a fact. Flink w/o Kafka connector (and few others) isn't
>> viable. Testability of Flink was already brought up, can we really
>> certify a Flink core release without Kafka connector? Maybe those
>> connectors that are used in Flink e2e tests to validate functionality
>> of core Flink should not be broken out?
>>
>> Finally, I think that the connectors that move into separate repos
>> should remain part of the Apache Flink project. Larger organizations
>> tend to approve the use of and contribution to open source at the
>> project level. Sometimes it is everything ASF. More often it is
>> "Apache Foo". It would be fatal to end up with a patchwork of projects
>> with potentially different licenses and governance to arrive at a
>> working Flink setup. This may mean we prioritize usability over
>> developer convenience, if that's in the best interest of Flink as a
>> whole.
>>
>> Thanks,
>> Thomas
>>
>>
>>
>> On Mon, Oct 18, 2021 at 6:59 AM Chesnay Schepler 
>> wrote:
>>> Generally, the issues are reproducibility and control.
>>>
>>> Stuffs completely broken on the Flink side for a week? Well then so are
>>> the connector repos.
>>> (As-is) You can't go back to a previous version of the snapshot. Which
>>> also means that 

Re: [DISCUSS] Creating an external connector repository

2021-10-19 Thread Konstantin Knauf
Thank you, Arvid & team, for working on this.

I would also favor one connector repository under the ASF. This will
already force us to provide better tools and more stable APIs, which
connectors developed outside of Apache Flink will benefit from, too.

Besides simplifying the formal release process for connectors, I believe,
we can also be more liberal with Committership for connector maintainers.

I expect that this setup can scale better than the current one, but it
doesn't scale super well either. In addition, there is still the ASF
barrier to contributions/releases. So, we might have more connectors in
this repository than we have in Apache Flink right now, but not all
connectors will end up in this repository. For those "external" connectors,
we should still aim to improve visibility, documentation and tooling.

It feels like such a hybrid approach might be the only option given
competing requirements.

Thanks,

Konstnatin

On Mon, Oct 18, 2021 at 10:22 PM Thomas Weise  wrote:

> Thanks for initiating this discussion.
>
> There are definitely a few things that are not optimal with our
> current management of connectors. I would not necessarily characterize
> it as a "mess" though. As the points raised so far show, it isn't easy
> to find a solution that balances competing requirements and leads to a
> net improvement.
>
> It would be great if we can find a setup that allows for connectors to
> be released independently of core Flink and that each connector can be
> released separately. Flink already has separate releases
> (flink-shaded), so that by itself isn't a new thing. Per-connector
> releases would need to allow for more frequent releases (without the
> baggage that a full Flink release comes with).
>
> Separate releases would only make sense if the core Flink surface is
> fairly stable though. As evident from Iceberg (and also Beam), that's
> not the case currently. We should probably focus on addressing the
> stability first, before splitting code. A success criteria could be
> that we are able to build Iceberg and Beam against multiple Flink
> versions w/o the need to change code. The goal would be that no
> connector breaks when we make changes to Flink core. Until that's the
> case, code separation creates a setup where 1+1 or N+1 repositories
> need to move lock step.
>
> Regarding some connectors being more important for Flink than others:
> That's a fact. Flink w/o Kafka connector (and few others) isn't
> viable. Testability of Flink was already brought up, can we really
> certify a Flink core release without Kafka connector? Maybe those
> connectors that are used in Flink e2e tests to validate functionality
> of core Flink should not be broken out?
>
> Finally, I think that the connectors that move into separate repos
> should remain part of the Apache Flink project. Larger organizations
> tend to approve the use of and contribution to open source at the
> project level. Sometimes it is everything ASF. More often it is
> "Apache Foo". It would be fatal to end up with a patchwork of projects
> with potentially different licenses and governance to arrive at a
> working Flink setup. This may mean we prioritize usability over
> developer convenience, if that's in the best interest of Flink as a
> whole.
>
> Thanks,
> Thomas
>
>
>
> On Mon, Oct 18, 2021 at 6:59 AM Chesnay Schepler 
> wrote:
> >
> > Generally, the issues are reproducibility and control.
> >
> > Stuffs completely broken on the Flink side for a week? Well then so are
> > the connector repos.
> > (As-is) You can't go back to a previous version of the snapshot. Which
> > also means that checking out older commits can be problematic because
> > you'd still work against the latest snapshots, and they not be
> > compatible with each other.
> >
> >
> > On 18/10/2021 15:22, Arvid Heise wrote:
> > > I was actually betting on snapshots versions. What are the limits?
> > > Obviously, we can only do a release of a 1.15 connector after 1.15 is
> > > release.
> >
> >
>


-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk


Re: [DISCUSS] Creating an external connector repository

2021-10-19 Thread Arvid Heise
Okay I think it is clear that the majority would like to keep connectors
under the Apache Flink umbrella. That means we will not be able to have
per-connector repositories and project management, automatic dependency
bumping with Dependabot, or semi-automatic releases.

So then I'm assuming the directory structure that @Chesnay Schepler
 proposed would be the most beneficial:
- A root project with some convenience setup.
- Unrelated subprojects with individual versioning and releases.
- Branches for minor Flink releases. That is needed anyhow to use new
features independent of API stability.
- Each connector maintains its own documentation that is accessible through
the main documentation.

Any thoughts on alternatives? Do you see risks?

@Stephan Ewen  mentioned offline that we could adjust the
bylaws for the connectors such that we need fewer PMCs to approve a
release. Would it be enough to have one PMC vote per connector release? Do
you know of other ways to tweak the release process to have fewer manual
work?

On Mon, Oct 18, 2021 at 10:22 PM Thomas Weise  wrote:

> Thanks for initiating this discussion.
>
> There are definitely a few things that are not optimal with our
> current management of connectors. I would not necessarily characterize
> it as a "mess" though. As the points raised so far show, it isn't easy
> to find a solution that balances competing requirements and leads to a
> net improvement.
>
> It would be great if we can find a setup that allows for connectors to
> be released independently of core Flink and that each connector can be
> released separately. Flink already has separate releases
> (flink-shaded), so that by itself isn't a new thing. Per-connector
> releases would need to allow for more frequent releases (without the
> baggage that a full Flink release comes with).
>
> Separate releases would only make sense if the core Flink surface is
> fairly stable though. As evident from Iceberg (and also Beam), that's
> not the case currently. We should probably focus on addressing the
> stability first, before splitting code. A success criteria could be
> that we are able to build Iceberg and Beam against multiple Flink
> versions w/o the need to change code. The goal would be that no
> connector breaks when we make changes to Flink core. Until that's the
> case, code separation creates a setup where 1+1 or N+1 repositories
> need to move lock step.
>
> Regarding some connectors being more important for Flink than others:
> That's a fact. Flink w/o Kafka connector (and few others) isn't
> viable. Testability of Flink was already brought up, can we really
> certify a Flink core release without Kafka connector? Maybe those
> connectors that are used in Flink e2e tests to validate functionality
> of core Flink should not be broken out?
>
> Finally, I think that the connectors that move into separate repos
> should remain part of the Apache Flink project. Larger organizations
> tend to approve the use of and contribution to open source at the
> project level. Sometimes it is everything ASF. More often it is
> "Apache Foo". It would be fatal to end up with a patchwork of projects
> with potentially different licenses and governance to arrive at a
> working Flink setup. This may mean we prioritize usability over
> developer convenience, if that's in the best interest of Flink as a
> whole.
>
> Thanks,
> Thomas
>
>
>
> On Mon, Oct 18, 2021 at 6:59 AM Chesnay Schepler 
> wrote:
> >
> > Generally, the issues are reproducibility and control.
> >
> > Stuffs completely broken on the Flink side for a week? Well then so are
> > the connector repos.
> > (As-is) You can't go back to a previous version of the snapshot. Which
> > also means that checking out older commits can be problematic because
> > you'd still work against the latest snapshots, and they not be
> > compatible with each other.
> >
> >
> > On 18/10/2021 15:22, Arvid Heise wrote:
> > > I was actually betting on snapshots versions. What are the limits?
> > > Obviously, we can only do a release of a 1.15 connector after 1.15 is
> > > release.
> >
> >
>


Re: [DISCUSS] Creating an external connector repository

2021-10-19 Thread Martijn Visser
Hi all,

I think it would be a huge benefit if we can achieve more frequent releases
of connectors, which are not bound to the release cycle of Flink itself. I
agree that in order to get there, we need to have stable interfaces which
are trustworthy and reliable, so they can be safely used by those
connectors. I do think that work still needs to be done on those
interfaces, but I am confident that we can get there from a Flink
perspective.

I am worried that we would not be able to achieve those frequent releases
of connectors if we are putting these connectors under the Apache umbrella,
because that means that for each connector release we have to follow the
Apache release creation process. This requires a lot of manual steps and
prohibits automation and I think it would be hard to scale out frequent
releases of connectors. I'm curious how others think this challenge could
be solved.

Best regards,

Martijn

On Mon, 18 Oct 2021 at 22:22, Thomas Weise  wrote:

> Thanks for initiating this discussion.
>
> There are definitely a few things that are not optimal with our
> current management of connectors. I would not necessarily characterize
> it as a "mess" though. As the points raised so far show, it isn't easy
> to find a solution that balances competing requirements and leads to a
> net improvement.
>
> It would be great if we can find a setup that allows for connectors to
> be released independently of core Flink and that each connector can be
> released separately. Flink already has separate releases
> (flink-shaded), so that by itself isn't a new thing. Per-connector
> releases would need to allow for more frequent releases (without the
> baggage that a full Flink release comes with).
>
> Separate releases would only make sense if the core Flink surface is
> fairly stable though. As evident from Iceberg (and also Beam), that's
> not the case currently. We should probably focus on addressing the
> stability first, before splitting code. A success criteria could be
> that we are able to build Iceberg and Beam against multiple Flink
> versions w/o the need to change code. The goal would be that no
> connector breaks when we make changes to Flink core. Until that's the
> case, code separation creates a setup where 1+1 or N+1 repositories
> need to move lock step.
>
> Regarding some connectors being more important for Flink than others:
> That's a fact. Flink w/o Kafka connector (and few others) isn't
> viable. Testability of Flink was already brought up, can we really
> certify a Flink core release without Kafka connector? Maybe those
> connectors that are used in Flink e2e tests to validate functionality
> of core Flink should not be broken out?
>
> Finally, I think that the connectors that move into separate repos
> should remain part of the Apache Flink project. Larger organizations
> tend to approve the use of and contribution to open source at the
> project level. Sometimes it is everything ASF. More often it is
> "Apache Foo". It would be fatal to end up with a patchwork of projects
> with potentially different licenses and governance to arrive at a
> working Flink setup. This may mean we prioritize usability over
> developer convenience, if that's in the best interest of Flink as a
> whole.
>
> Thanks,
> Thomas
>
>
>
> On Mon, Oct 18, 2021 at 6:59 AM Chesnay Schepler 
> wrote:
> >
> > Generally, the issues are reproducibility and control.
> >
> > Stuffs completely broken on the Flink side for a week? Well then so are
> > the connector repos.
> > (As-is) You can't go back to a previous version of the snapshot. Which
> > also means that checking out older commits can be problematic because
> > you'd still work against the latest snapshots, and they not be
> > compatible with each other.
> >
> >
> > On 18/10/2021 15:22, Arvid Heise wrote:
> > > I was actually betting on snapshots versions. What are the limits?
> > > Obviously, we can only do a release of a 1.15 connector after 1.15 is
> > > release.
> >
> >
>


Re: [DISCUSS] Creating an external connector repository

2021-10-18 Thread Thomas Weise
Thanks for initiating this discussion.

There are definitely a few things that are not optimal with our
current management of connectors. I would not necessarily characterize
it as a "mess" though. As the points raised so far show, it isn't easy
to find a solution that balances competing requirements and leads to a
net improvement.

It would be great if we can find a setup that allows for connectors to
be released independently of core Flink and that each connector can be
released separately. Flink already has separate releases
(flink-shaded), so that by itself isn't a new thing. Per-connector
releases would need to allow for more frequent releases (without the
baggage that a full Flink release comes with).

Separate releases would only make sense if the core Flink surface is
fairly stable though. As evident from Iceberg (and also Beam), that's
not the case currently. We should probably focus on addressing the
stability first, before splitting code. A success criteria could be
that we are able to build Iceberg and Beam against multiple Flink
versions w/o the need to change code. The goal would be that no
connector breaks when we make changes to Flink core. Until that's the
case, code separation creates a setup where 1+1 or N+1 repositories
need to move lock step.

Regarding some connectors being more important for Flink than others:
That's a fact. Flink w/o Kafka connector (and few others) isn't
viable. Testability of Flink was already brought up, can we really
certify a Flink core release without Kafka connector? Maybe those
connectors that are used in Flink e2e tests to validate functionality
of core Flink should not be broken out?

Finally, I think that the connectors that move into separate repos
should remain part of the Apache Flink project. Larger organizations
tend to approve the use of and contribution to open source at the
project level. Sometimes it is everything ASF. More often it is
"Apache Foo". It would be fatal to end up with a patchwork of projects
with potentially different licenses and governance to arrive at a
working Flink setup. This may mean we prioritize usability over
developer convenience, if that's in the best interest of Flink as a
whole.

Thanks,
Thomas



On Mon, Oct 18, 2021 at 6:59 AM Chesnay Schepler  wrote:
>
> Generally, the issues are reproducibility and control.
>
> Stuffs completely broken on the Flink side for a week? Well then so are
> the connector repos.
> (As-is) You can't go back to a previous version of the snapshot. Which
> also means that checking out older commits can be problematic because
> you'd still work against the latest snapshots, and they not be
> compatible with each other.
>
>
> On 18/10/2021 15:22, Arvid Heise wrote:
> > I was actually betting on snapshots versions. What are the limits?
> > Obviously, we can only do a release of a 1.15 connector after 1.15 is
> > release.
>
>


Re: [DISCUSS] Creating an external connector repository

2021-10-18 Thread Chesnay Schepler

Generally, the issues are reproducibility and control.

Stuffs completely broken on the Flink side for a week? Well then so are 
the connector repos.
(As-is) You can't go back to a previous version of the snapshot. Which 
also means that checking out older commits can be problematic because 
you'd still work against the latest snapshots, and they not be 
compatible with each other.



On 18/10/2021 15:22, Arvid Heise wrote:

I was actually betting on snapshots versions. What are the limits?
Obviously, we can only do a release of a 1.15 connector after 1.15 is
release.





Re: [DISCUSS] Creating an external connector repository

2021-10-18 Thread Chesnay Schepler

I think you're misinterpreting my comment.

Independent from the repo split we should only keep the connectors in 
the Flink project that we actively maintain.

The rest we might as well just drop.
If some external people are interested in maintaining these connectors 
then there's nothing stopping them from doing so.


For example, I don't think our Cassandra connector is neither in good 
shape nor appears to be a big priority.
I would not mind us dropping it (== or moving it into some external 
repo, to me that's all the same).

Kafka would be a different story.

On 18/10/2021 15:22, Arvid Heise wrote:

I would like to avoid treating some connectors different from other
connectors by design. In reality, we can assume that some connectors will
receive more love than others. However, if we already treat some connectors
"better" than others we may run in a vicious cycle where the "bad" ones
never improve.
Nevertheless, I'd also be fine to just start with some of them and move
others later.





Re: [DISCUSS] Creating an external connector repository

2021-10-18 Thread Arvid Heise
Hi folks,

thanks for joining the discussion. I'd like to give some ideas on how
certain concerns are going to be addressed:

Ingo:
> In general I think breaking up the big repo would be a good move with many
> benefits (which you have outlined already). One concern would be how to
> proceed with our docs / examples if we were to really separate out all
> connectors.
>

I don't see any issue at all with both options. You'd just have to update
the dependency to the connector for blog posts and starter examples.
Each connector page should provide specific examples themselves.
Note that I would keep File Source/Sink in the main repo as they don't add
dependencies on their own. Formats and Filesystem may be externalized at a
much later point after we gained more knowledge on how to build an real
ecosystem with connectors.


> 1. More real-life examples would essentially now depend on external
> projects. Particularly if hosted outside the ASF, this would feel somewhat
> odd. Or to put it differently, if flink-connector-foo is not part of Flink
> itself, should the Flink Docs use it for any examples?
>
Why not? We also have blog posts that use external dependencies.

2. Generation of documentation (config options) wouldn't be possible unless
> the docs depend on these external projects, which would create weird
> version dependency cycles (Flink 1.X's docs depend on flink-connector-foo
> 1.X which depends on Flink 1.X).
>
Config options that are connector specific should only appear on the
connector pages. So we need to incorporate the config option generation in
the connector template.


> 3. Documentation would inevitably be much less consistent when split across
> many repositories.
>
Fair point. If we use the same template as Flink Web UI for connectors, we
could embed subpages directly in the main documentation. If we allow that
for all connectors, it would be actually less fragmented as now where some
connectors are only described in Bahir or on external pages.


> As for your approaches, how would (A) allow hosting personal / company
> projects if only Flink committers can write to it?
>
That's entirely independent. In both options and even now, there are
several connectors living on other pages. They are currently only findable
through a search engine and we should fix that anyhow. See [1] for an
example on how Kafka connect is doing it.

> Connectors may receive some sort of quality seal
>
> This sounds like a lot of work and process, and could easily become a
> source of frustration.
>
Yes this is definitively some effort but strictly less than maintaining the
connector in the community as it's an irregular review.


Chesnay:
> What I'm concerned about, and which we never really covered in past
> discussions about split repositories, are
> a) ways to share infrastructure (e.g., CI/release utilities/codestyle)
>
I'd provide a common Github connector template where everything is in. That
means of course making things public.

> b) testing
>
See below

> c) documentation integration
>
See Ingo's response.

>
> Particularly for b) we still lack any real public utilities.
> Even fundamental things such as the MiniClusterResource are not
> annotated in any way.
> I would argue that we need to sort this out before a split can happen.
> We've seen with the flink-benchmarks repo and recent discussions how
> easily things can break.
>
Yes, I agree but given that we already have connectors outside of the main
repo, the situation can only improve. By moving the connectors out, we are
actually forced to provide a level ground for everyone and thus really
enabling the community to contribute connectors.
We also plan to finish the connector testing framework in 1.15.

Related to that, there is the question on how Flink is then supposed to
> ensure that things don't break. My impression is that we heavily rely on
> the connector tests to that end at the moment.
> Similarly, what connector (version) would be used for examples (like the
> WordCount which reads from Kafka) or (e2e) tests that want to read
> something other than a file? You end up with this circular dependency
> which are always troublesome.
>
I agree that we must avoid any kind of circular dependencies. There are a
couple of options that we probably are going to mix:
* Move connector specific e2e tests into connector repo
* Have nightly builds on connector repo and collect results in some
overview.
* React on failures, especially if several connectors fail at once.
* Have an e2e repo/module in Flink that has cross-connector tests etc.

As for for the repo structure, I would think that a single one could
> work quite well (because having 10+ connector repositories is just a
> mess), but currently I wouldn't set it up as a single project.
> I would rather have something like N + 1 projects (one for each
> connectors + a shared testing project) which are released individually
> as required, without any snapshot dependencies in-between.
> Then 1 branch for 

Re: [DISCUSS] Creating an external connector repository

2021-10-18 Thread Leonard Xu
Hi, all

I understand very well that the maintainers of the community want to move the 
connector to an external system. Indeed, the development and maintenance of the 
connector requires a lot of energy, and these do not involve the Flink core 
framework, which can reduce the maintenance pressure on the community side.

I only have one concern. Once we migrate these connectors to external projects, 
how can we ensure them with high quality? All the built-in connectors of Flink 
are developed or reviewed by the committers. The reported connector bugs from 
JIRA and mailing lists will be quick fixed currently, how does the Flink 
community ensure the development rhythm of the connector after the move? In 
other words, are these connectors still first-class citizens of the Flink 
community? If it is how we guarantee.

Recently, I have maintained a series of cdc connectors in the Flink CDC project 
[1]. My feeling is that it is not easy to develop and maintain connectors. 
Contributors to the Flink CDC project have done some approaches in this way, 
such as building connector integration tests [2], document management [3]. 
Personally, I don’t have a strong tendency to move the built-in connectors out 
or keep them. If the final decision of this thread discussion  turns out to 
move out, I’m happy to share our experience and provide help in the new 
connector project. .

Best,
Leonard
[1]https://github.com/ververica/flink-cdc-connectors
[2]https://github.com/ververica/flink-cdc-connectors/runs/3902664601
[3]https://ververica.github.io/flink-cdc-connectors/master/

> 在 2021年10月18日,19:00,David Morávek  写道:
> 
> We are mostly talking about the freedom this would bring to the connector 
> authors, but we still don't have answers for the important topics:
> 
> - How exactly are we going to maintain the high quality standard of the 
> connectors?
> - How would the connector release cycle to look like? Is this going to affect 
> the Flink release cycle?
> - How would the documentation process / generation look like?
> - Not all of the connectors rely solely on the Stable APIs. Moving them 
> outside of the Flink code-base will make any refactoring on the Flink side 
> significantly more complex as potentially needs to be reflected into all 
> connectors. There are some possible solutions, such as Gradle's included 
> builds, but we're far away from that. How are we planning to address this?
> - How would we develop connectors against unreleased Flink version? Java 
> snapshots have many limits when used for the cross-repository development.
> - With appropriate tooling, this whole thing is achievable even with the 
> single repository that we already have. It just matter of having a more 
> fine-grained build / release process. Have you tried to research this option?
> 
> I'd personally strongly suggest against moving the connectors out of the ASF 
> umbrella. The ASF brings legal guarantees, hard gained trust of the users and 
> high quality standards to the table. I still fail to see any good reason for 
> giving this up. Also this decision would be hard to reverse, because it would 
> most likely require a new donation to the ASF (would this require a consent 
> from all contributors as there is no clear ownership?).
> 
> Best,
> D.
> 
> 
> On Mon, Oct 18, 2021 at 12:12 PM Qingsheng Ren  > wrote:
> Thanks for driving this discussion Arvid! I think this will be one giant leap 
> for Flink community. Externalizing connectors would give connector developers 
> more freedom in developing, releasing and maintaining, which can attract more 
> developers for contributing their connectors and expand the Flink ecosystems.
> 
> Considering the position for hosting connectors, I prefer to use an 
> individual organization outside Apache umbrella. If we keep all connectors 
> under Apache, I think there’s not quite difference comparing keeping them in 
> the Flink main repo. Connector developers still require permissions from 
> Flink committers to contribute, and release process should follow Apache 
> rules, which are against our initial motivations of externalizing connectors.
> 
> Using an individual Github organization will maximum the freedom provided to 
> developers. An ideal structure in my mind would be like 
> "github.com/flink-connectors/flink-connector-xxx 
> ". The new 
> established flink-extended org might be another choice, but considering the 
> amount of connectors, I prefer to use an individual org for connectors to 
> avoid flushing other repos under flink-extended.
> 
> In the meantime, we need to provide a well-established standard / guideline 
> for contributing connectors, including CI, testing, docs (maybe we can’t 
> provide resources for running them, but we should give enough guide on how to 
> setup one) to keep the high quality of connectors. I’m happy to help building 
> these fundamental bricks. Also since 

Re: [DISCUSS] Creating an external connector repository

2021-10-18 Thread David Morávek
We are mostly talking about the freedom this would bring to the connector
authors, but we still don't have answers for the important topics:

- How exactly are we going to maintain the high quality standard of the
connectors?
- How would the connector release cycle to look like? Is this going to
affect the Flink release cycle?
- How would the documentation process / generation look like?
- Not all of the connectors rely solely on the Stable APIs. Moving them
outside of the Flink code-base will make any refactoring on the Flink side
significantly more complex as potentially needs to be reflected into all
connectors. There are some possible solutions, such as Gradle's included
builds, but we're far away from that. How are we planning to address this?
- How would we develop connectors against unreleased Flink version? Java
snapshots have many limits when used for the cross-repository development.
- With appropriate tooling, this whole thing is achievable even with the
single repository that we already have. It just matter of having a more
fine-grained build / release process. Have you tried to research this
option?

I'd personally strongly suggest against moving the connectors out of the
ASF umbrella. The ASF brings legal guarantees, hard gained trust of the
users and high quality standards to the table. I still fail to see any good
reason for giving this up. Also this decision would be hard to reverse,
because it would most likely require a new donation to the ASF (would this
require a consent from all contributors as there is no clear ownership?).

Best,
D.


On Mon, Oct 18, 2021 at 12:12 PM Qingsheng Ren  wrote:

> Thanks for driving this discussion Arvid! I think this will be one giant
> leap for Flink community. Externalizing connectors would give connector
> developers more freedom in developing, releasing and maintaining, which can
> attract more developers for contributing their connectors and expand the
> Flink ecosystems.
>
> Considering the position for hosting connectors, I prefer to use an
> individual organization outside Apache umbrella. If we keep all connectors
> under Apache, I think there’s not quite difference comparing keeping them
> in the Flink main repo. Connector developers still require permissions from
> Flink committers to contribute, and release process should follow Apache
> rules, which are against our initial motivations of externalizing
> connectors.
>
> Using an individual Github organization will maximum the freedom provided
> to developers. An ideal structure in my mind would be like "
> github.com/flink-connectors/flink-connector-xxx". The new established
> flink-extended org might be another choice, but considering the amount of
> connectors, I prefer to use an individual org for connectors to avoid
> flushing other repos under flink-extended.
>
> In the meantime, we need to provide a well-established standard /
> guideline for contributing connectors, including CI, testing, docs (maybe
> we can’t provide resources for running them, but we should give enough
> guide on how to setup one) to keep the high quality of connectors. I’m
> happy to help building these fundamental bricks. Also since Kafka connector
> is widely used among Flink users, we can make Kafka connector a “model” of
> how to build and contribute a well-qualified connector into Flink
> ecosystem, and we can still use this trusted one for Flink E2E tests.
>
> Again I believe this will definitely boost the expansion of Flink
> ecosystem. Very excited to see the progress!
>
> Best,
>
> Qingsheng Ren
> On Oct 15, 2021, 8:47 PM +0800, Arvid Heise , wrote:
> > Dear community,
> > Today I would like to kickstart a series of discussions around creating
> an external connector repository. The main idea is to decouple the release
> cycle of Flink with the release cycles of the connectors. This is a common
> approach in other big data analytics projects and seems to scale better
> than the current approach. In particular, it will yield the following
> changes.
> >  • Faster releases of connectors: New features can be added more
> quickly, bugs can be fixed immediately, and we can have faster security
> patches in case of direct or indirect (through dependencies) security
> flaws. • New features can be added to old Flink versions: If the connector
> API didn’t change, the same connector jar may be used with different Flink
> versions. Thus, new features can also immediately be used with older Flink
> versions. A compatibility matrix on each connector page will help users to
> find suitable connector versions for their Flink versions. • More activity
> and contributions around connectors: If we ease the contribution and
> development process around connectors, we will see faster development and
> also more connectors. Since that heavily depends on the chosen approach
> discussed below, more details will be shown there. • An overhaul of the
> connector page: In the future, all known connectors will be shown on the
> 

Re: [DISCUSS] Creating an external connector repository

2021-10-18 Thread Qingsheng Ren
Thanks for driving this discussion Arvid! I think this will be one giant leap 
for Flink community. Externalizing connectors would give connector developers 
more freedom in developing, releasing and maintaining, which can attract more 
developers for contributing their connectors and expand the Flink ecosystems.

Considering the position for hosting connectors, I prefer to use an individual 
organization outside Apache umbrella. If we keep all connectors under Apache, I 
think there’s not quite difference comparing keeping them in the Flink main 
repo. Connector developers still require permissions from Flink committers to 
contribute, and release process should follow Apache rules, which are against 
our initial motivations of externalizing connectors.

Using an individual Github organization will maximum the freedom provided to 
developers. An ideal structure in my mind would be like 
"github.com/flink-connectors/flink-connector-xxx". The new established 
flink-extended org might be another choice, but considering the amount of 
connectors, I prefer to use an individual org for connectors to avoid flushing 
other repos under flink-extended.

In the meantime, we need to provide a well-established standard / guideline for 
contributing connectors, including CI, testing, docs (maybe we can’t provide 
resources for running them, but we should give enough guide on how to setup 
one) to keep the high quality of connectors. I’m happy to help building these 
fundamental bricks. Also since Kafka connector is widely used among Flink 
users, we can make Kafka connector a “model” of how to build and contribute a 
well-qualified connector into Flink ecosystem, and we can still use this 
trusted one for Flink E2E tests.

Again I believe this will definitely boost the expansion of Flink ecosystem. 
Very excited to see the progress!

Best,

Qingsheng Ren
On Oct 15, 2021, 8:47 PM +0800, Arvid Heise , wrote:
> Dear community,
> Today I would like to kickstart a series of discussions around creating an 
> external connector repository. The main idea is to decouple the release cycle 
> of Flink with the release cycles of the connectors. This is a common approach 
> in other big data analytics projects and seems to scale better than the 
> current approach. In particular, it will yield the following changes.
>  • Faster releases of connectors: New features can be added more quickly, 
> bugs can be fixed immediately, and we can have faster security patches in 
> case of direct or indirect (through dependencies) security flaws. • New 
> features can be added to old Flink versions: If the connector API didn’t 
> change, the same connector jar may be used with different Flink versions. 
> Thus, new features can also immediately be used with older Flink versions. A 
> compatibility matrix on each connector page will help users to find suitable 
> connector versions for their Flink versions. • More activity and 
> contributions around connectors: If we ease the contribution and development 
> process around connectors, we will see faster development and also more 
> connectors. Since that heavily depends on the chosen approach discussed 
> below, more details will be shown there. • An overhaul of the connector page: 
> In the future, all known connectors will be shown on the same page in a 
> similar layout independent of where they reside. They could be hosted on 
> external project pages (e.g., Iceberg and Hudi), on some company page, or may 
> stay within the main Flink reposi    tory. Connectors may receive some sort 
> of quality seal such that users can quickly access the production-readiness 
> and we could also add which community/company promises which kind of support. 
> • If we take out (some) connectors out of Flink, Flink CI will be faster and 
> Flink devs will experience less build stabilities (which mostly come from 
> connectors). That would also speed up Flink development.
> Now I’d first like to collect your viewpoints on the ideal state. Let’s first 
> recap which approaches, we currently have:
>  • We have half of the connectors in the main Flink repository. Relatively 
> few of them have received updates in the past couple of months. • Another 
> large chunk of connectors are in Apache Bahir. It recently has seen the first 
> release in 3 years. • There are a few other (Apache) projects that maintain a 
> Flink connector, such as Apache Iceberg, Apache Hudi, and Pravega. • A few 
> connectors are listed on company-related repositories, such as Apache Pulsar 
> on StreamNative and CDC connectors on Ververica.
> My personal observation is that having a repository per connector seems to 
> increase the activity on a connector as it’s easier to maintain. For example, 
> in Apache Bahir all connectors are built against the same Flink version, 
> which may not be desirable when certain APIs change; for example, 
> SinkFunction will be eventually deprecated and removed but new Sink interface 
> may gain more 

Re: [DISCUSS] Creating an external connector repository

2021-10-15 Thread Chesnay Schepler
My opinion of splitting the Flink repositories hasn't changed; I'm still 
in favor of it.


While it would technically be possible to release individual connectors 
even if they are part of the Flink repo,
it is quite a hassle to do so and error prone due to the current branch 
structure.

A split would also force us to watch out much more for API stability.

I'm gonna assume that we will move out all connectors:

What I'm concerned about, and which we never really covered in past 
discussions about split repositories, are

a) ways to share infrastructure (e.g., CI/release utilities/codestyle)
b) testing
c) documentation integration

Particularly for b) we still lack any real public utilities.
Even fundamental things such as the MiniClusterResource are not 
annotated in any way.

I would argue that we need to sort this out before a split can happen.
We've seen with the flink-benchmarks repo and recent discussions how 
easily things can break.


Related to that, there is the question on how Flink is then supposed to 
ensure that things don't break. My impression is that we heavily rely on 
the connector tests to that end at the moment.
Similarly, what connector (version) would be used for examples (like the 
WordCount which reads from Kafka) or (e2e) tests that want to read 
something other than a file? You end up with this circular dependency 
which are always troublesome.


As for for the repo structure, I would think that a single one could 
work quite well (because having 10+ connector repositories is just a 
mess), but currently I wouldn't set it up as a single project.
I would rather have something like N + 1 projects (one for each 
connectors + a shared testing project) which are released individually 
as required, without any snapshot dependencies in-between.
Then 1 branch for each major Flink version (again, no snapshot 
dependencies). Individual connectors can be released at any time against 
any of the latest bugfix releases, which due to lack of binaries (and 
python releases) would be a breeze.


I don't like the idea of moving existing connectors out of the Apache 
organization. At the very least, not all of them. While some are 
certainly ill-maintained (e.g., Cassandra) where it would be neat if 
external projects could maintain them, others (like Kafka) are not and 
quite fundamental to actually using Flink.


On 15/10/2021 14:47, Arvid Heise wrote:

Dear community,

Today I would like to kickstart a series of discussions around creating an
external connector repository. The main idea is to decouple the release
cycle of Flink with the release cycles of the connectors. This is a common
approach in other big data analytics projects and seems to scale better
than the current approach. In particular, it will yield the following
changes.


-

Faster releases of connectors: New features can be added more quickly,
bugs can be fixed immediately, and we can have faster security patches in
case of direct or indirect (through dependencies) security flaws.
-

New features can be added to old Flink versions: If the connector API
didn’t change, the same connector jar may be used with different Flink
versions. Thus, new features can also immediately be used with older Flink
versions. A compatibility matrix on each connector page will help users to
find suitable connector versions for their Flink versions.
-

More activity and contributions around connectors: If we ease the
contribution and development process around connectors, we will see faster
development and also more connectors. Since that heavily depends on the
chosen approach discussed below, more details will be shown there.
-

An overhaul of the connector page: In the future, all known connectors
will be shown on the same page in a similar layout independent of where
they reside. They could be hosted on external project pages (e.g., Iceberg
and Hudi), on some company page, or may stay within the main Flink reposi
tory. Connectors may receive some sort of quality seal such that users
can quickly access the production-readiness and we could also add which
community/company promises which kind of support.
-

If we take out (some) connectors out of Flink, Flink CI will be faster
and Flink devs will experience less build stabilities (which mostly come
from connectors). That would also speed up Flink development.


Now I’d first like to collect your viewpoints on the ideal state. Let’s
first recap which approaches, we currently have:


-

We have half of the connectors in the main Flink repository. Relatively
few of them have received updates in the past couple of months.
-

Another large chunk of connectors are in Apache Bahir. It recently has
seen the first release in 3 years.
-

There are a few other (Apache) projects that maintain a Flink connector,
such as Apache Iceberg, Apache Hudi, and Pravega.
-

 

Re: [DISCUSS] Creating an external connector repository

2021-10-15 Thread Ingo Bürk
Hi Arvid,

In general I think breaking up the big repo would be a good move with many
benefits (which you have outlined already). One concern would be how to
proceed with our docs / examples if we were to really separate out all
connectors.

1. More real-life examples would essentially now depend on external
projects. Particularly if hosted outside the ASF, this would feel somewhat
odd. Or to put it differently, if flink-connector-foo is not part of Flink
itself, should the Flink Docs use it for any examples?
2. Generation of documentation (config options) wouldn't be possible unless
the docs depend on these external projects, which would create weird
version dependency cycles (Flink 1.X's docs depend on flink-connector-foo
1.X which depends on Flink 1.X).
3. Documentation would inevitably be much less consistent when split across
many repositories.

As for your approaches, how would (A) allow hosting personal / company
projects if only Flink committers can write to it?

> Connectors may receive some sort of quality seal

This sounds like a lot of work and process, and could easily become a
source of frustration.


Best
Ingo

On Fri, Oct 15, 2021 at 2:47 PM Arvid Heise  wrote:

> Dear community,
>
> Today I would like to kickstart a series of discussions around creating an
> external connector repository. The main idea is to decouple the release
> cycle of Flink with the release cycles of the connectors. This is a common
> approach in other big data analytics projects and seems to scale better
> than the current approach. In particular, it will yield the following
> changes.
>
>
>-
>
>Faster releases of connectors: New features can be added more quickly,
>bugs can be fixed immediately, and we can have faster security patches in
>case of direct or indirect (through dependencies) security flaws.
>-
>
>New features can be added to old Flink versions: If the connector API
>didn’t change, the same connector jar may be used with different Flink
>versions. Thus, new features can also immediately be used with older Flink
>versions. A compatibility matrix on each connector page will help users to
>find suitable connector versions for their Flink versions.
>-
>
>More activity and contributions around connectors: If we ease the
>contribution and development process around connectors, we will see faster
>development and also more connectors. Since that heavily depends on the
>chosen approach discussed below, more details will be shown there.
>-
>
>An overhaul of the connector page: In the future, all known connectors
>will be shown on the same page in a similar layout independent of where
>they reside. They could be hosted on external project pages (e.g., Iceberg
>and Hudi), on some company page, or may stay within the main Flink reposi
>tory. Connectors may receive some sort of quality seal such that users
>can quickly access the production-readiness and we could also add which
>community/company promises which kind of support.
>-
>
>If we take out (some) connectors out of Flink, Flink CI will be faster
>and Flink devs will experience less build stabilities (which mostly come
>from connectors). That would also speed up Flink development.
>
>
> Now I’d first like to collect your viewpoints on the ideal state. Let’s
> first recap which approaches, we currently have:
>
>
>-
>
>We have half of the connectors in the main Flink repository.
>Relatively few of them have received updates in the past couple of months.
>-
>
>Another large chunk of connectors are in Apache Bahir. It recently has
>seen the first release in 3 years.
>-
>
>There are a few other (Apache) projects that maintain a Flink
>connector, such as Apache Iceberg, Apache Hudi, and Pravega.
>-
>
>A few connectors are listed on company-related repositories, such as
>Apache Pulsar on StreamNative and CDC connectors on Ververica.
>
>
> My personal observation is that having a repository per connector seems to
> increase the activity on a connector as it’s easier to maintain. For
> example, in Apache Bahir all connectors are built against the same Flink
> version, which may not be desirable when certain APIs change; for example,
> SinkFunction will be eventually deprecated and removed but new Sink
> interface may gain more features.
>
> Now, I'd like to outline different approaches. All approaches will allow
> you to host your connector on any kind of personal, project, or company
> repository. We still want to provide a default place where users can
> contribute their connectors and hopefully grow a community around it. The
> approaches are:
>
>
>1.
>
>Create a mono-repo under the Apache umbrella where all connectors will
>reside, for example, github.com/apache/flink-connectors. That
>repository needs to follow its rules: No GitHub issues, no Dependabot or
>similar tools, and a strict 

[DISCUSS] Creating an external connector repository

2021-10-15 Thread Arvid Heise
Dear community,

Today I would like to kickstart a series of discussions around creating an
external connector repository. The main idea is to decouple the release
cycle of Flink with the release cycles of the connectors. This is a common
approach in other big data analytics projects and seems to scale better
than the current approach. In particular, it will yield the following
changes.


   -

   Faster releases of connectors: New features can be added more quickly,
   bugs can be fixed immediately, and we can have faster security patches in
   case of direct or indirect (through dependencies) security flaws.
   -

   New features can be added to old Flink versions: If the connector API
   didn’t change, the same connector jar may be used with different Flink
   versions. Thus, new features can also immediately be used with older Flink
   versions. A compatibility matrix on each connector page will help users to
   find suitable connector versions for their Flink versions.
   -

   More activity and contributions around connectors: If we ease the
   contribution and development process around connectors, we will see faster
   development and also more connectors. Since that heavily depends on the
   chosen approach discussed below, more details will be shown there.
   -

   An overhaul of the connector page: In the future, all known connectors
   will be shown on the same page in a similar layout independent of where
   they reside. They could be hosted on external project pages (e.g., Iceberg
   and Hudi), on some company page, or may stay within the main Flink reposi
   tory. Connectors may receive some sort of quality seal such that users
   can quickly access the production-readiness and we could also add which
   community/company promises which kind of support.
   -

   If we take out (some) connectors out of Flink, Flink CI will be faster
   and Flink devs will experience less build stabilities (which mostly come
   from connectors). That would also speed up Flink development.


Now I’d first like to collect your viewpoints on the ideal state. Let’s
first recap which approaches, we currently have:


   -

   We have half of the connectors in the main Flink repository. Relatively
   few of them have received updates in the past couple of months.
   -

   Another large chunk of connectors are in Apache Bahir. It recently has
   seen the first release in 3 years.
   -

   There are a few other (Apache) projects that maintain a Flink connector,
   such as Apache Iceberg, Apache Hudi, and Pravega.
   -

   A few connectors are listed on company-related repositories, such as
   Apache Pulsar on StreamNative and CDC connectors on Ververica.


My personal observation is that having a repository per connector seems to
increase the activity on a connector as it’s easier to maintain. For
example, in Apache Bahir all connectors are built against the same Flink
version, which may not be desirable when certain APIs change; for example,
SinkFunction will be eventually deprecated and removed but new Sink
interface may gain more features.

Now, I'd like to outline different approaches. All approaches will allow
you to host your connector on any kind of personal, project, or company
repository. We still want to provide a default place where users can
contribute their connectors and hopefully grow a community around it. The
approaches are:


   1.

   Create a mono-repo under the Apache umbrella where all connectors will
   reside, for example, github.com/apache/flink-connectors. That repository
   needs to follow its rules: No GitHub issues, no Dependabot or similar
   tools, and a strict manual release process. It would be under the Flink
   community, such that Flink committers can write to that repository but
   no-one else.
   2.

   Create a GitHub organization with small repositories, for example
   github.com/flink-connectors. Since it’s not under the Apache umbrella,
   we are free to use whatever process we deem best (up to a future
   discussion). Each repository can have a shared list of maintainers +
   connector specific committers. We can provide more automation. We may even
   allow different licenses to incorporate things like a connector to Oracle
   that cannot be released under ASL.
   3.

   ??? <- please provide your additional approaches


In both cases, we will provide opinionated module/repository templates
based on a connector testing framework and guidelines. Depending on the
approach, we may need to enforce certain things.

I’d like to first focus on what the community would ideally seek and
minimize the discussions around legal issues, which we would discuss later.
For now, I’d also like to postpone the discussion if we move all or only a
subset of connectors from Flink to the new default place as it seems to be
orthogonal to the fundamental discussion.

PS: If the external repository for connectors is successful, I’d also like
to move out other things like formats, filesystems, and