Re: [DISCUSS] Creating an external connector repository

2022-01-13 Thread Martijn Visser
gt; >> but
>> >>>>> I
>> >>>>>>> think that would subside pretty quickly.
>> >>>>>>>
>> >>>>>>> Personally I'd go with Option 2, and if that doesn't work out we
>> >> can
>> >>>>>>> still split the repo later on. (Which should then be a trivial
>> >> matter
>> >>>>> of
>> >>>>>>> copying all /* branches and renaming them).
>> >>>>>>>
>> >>>>>>> On 26/11/2021 12:47, Till Rohrmann wrote:
>> >>>>>>>> Hi Arvid,
>> >>>>>>>>
>> >>>>>>>> Thanks for updating this thread with the latest findings. The
>> >>>>> described
>> >>>>>>>> limitations for a single connector repo sound suboptimal to me.
>> >>>>>>>>
>> >>>>>>>> * Option 2. sounds as if we try to simulate multi connector
>> >> repos
>> >>>>>> inside
>> >>>>>>> of
>> >>>>>>>> a single repo. I also don't know how we would share code
>> >> between the
>> >>>>>>>> different branches (sharing infrastructure would probably be
>> >> easier
>> >>>>>>>> though). This seems to have the same limitations as dedicated
>> >> repos
>> >>>>>> with
>> >>>>>>>> the downside of having a not very intuitive branching model.
>> >>>>>>>> * Isn't option 1. kind of a degenerated version of option 2.
>> >> where
>> >>>>> we
>> >>>>>>> have
>> >>>>>>>> some unrelated code from other connectors in the individual
>> >>>>> connector
>> >>>>>>>> branches?
>> >>>>>>>> * Option 3. has the downside that someone creating a release
>> >> has to
>> >>>>>>> release
>> >>>>>>>> all connectors. This means that she either has to sync with the
>> >>>>>> different
>> >>>>>>>> connector maintainers or has to be able to release all
>> >> connectors on
>> >>>>>> her
>> >>>>>>>> own. We are already seeing in the Flink community that releases
>> >>>>> require
>> >>>>>>>> quite good communication/coordination between the different
>> >> people
>> >>>>>>> working
>> >>>>>>>> on different Flink components. Given our goals to make connector
>> >>>>>> releases
>> >>>>>>>> easier and more frequent, I think that coupling different
>> >> connector
>> >>>>>>>> releases might be counter-productive.
>> >>>>>>>>
>> >>>>>>>> To me it sounds not very practical to mainly use a mono
>> >> repository
>> >>>>> w/o
>> >>>>>>>> having some more advanced build infrastructure that, for
>> >> example,
>> >>>>>> allows
>> >>>>>>> to
>> >>>>>>>> have different git roots in different connector directories.
>> >> Maybe
>> >>>>> the
>> >>>>>>> mono
>> >>>>>>>> repo can be a catch all repository for connectors that want to
>> >> be
>> >>>>>>> released
>> >>>>>>>> in lock-step (Option 3.) with all other connectors the repo
>> >>>>> contains.
>> >>>>>> But
>> >>>>>>>> for connectors that get changed frequently, having a dedicated
>> >>>>>> repository
>> >>>>>>>> that allows independent releases sounds preferable to me.
>> >>>>>>>>
>> >>>>>>>> What utilities and infrastructure code do you intend to share?
>> >> Using
>> >>>>>> git
>> >>>>>>>> submodules can definitely be one option to share code. However,
>> >> it
>> >>>>>> might
>> >>>>>>>> also be ok to depend on 

Re: [DISCUSS] Creating an external connector repository

2022-01-11 Thread Martijn Visser
 (sharing infrastructure would probably be
> >> easier
> >>>>>>>> though). This seems to have the same limitations as dedicated
> >> repos
> >>>>>> with
> >>>>>>>> the downside of having a not very intuitive branching model.
> >>>>>>>> * Isn't option 1. kind of a degenerated version of option 2.
> >> where
> >>>>> we
> >>>>>>> have
> >>>>>>>> some unrelated code from other connectors in the individual
> >>>>> connector
> >>>>>>>> branches?
> >>>>>>>> * Option 3. has the downside that someone creating a release
> >> has to
> >>>>>>> release
> >>>>>>>> all connectors. This means that she either has to sync with the
> >>>>>> different
> >>>>>>>> connector maintainers or has to be able to release all
> >> connectors on
> >>>>>> her
> >>>>>>>> own. We are already seeing in the Flink community that releases
> >>>>> require
> >>>>>>>> quite good communication/coordination between the different
> >> people
> >>>>>>> working
> >>>>>>>> on different Flink components. Given our goals to make connector
> >>>>>> releases
> >>>>>>>> easier and more frequent, I think that coupling different
> >> connector
> >>>>>>>> releases might be counter-productive.
> >>>>>>>>
> >>>>>>>> To me it sounds not very practical to mainly use a mono
> >> repository
> >>>>> w/o
> >>>>>>>> having some more advanced build infrastructure that, for
> >> example,
> >>>>>> allows
> >>>>>>> to
> >>>>>>>> have different git roots in different connector directories.
> >> Maybe
> >>>>> the
> >>>>>>> mono
> >>>>>>>> repo can be a catch all repository for connectors that want to
> >> be
> >>>>>>> released
> >>>>>>>> in lock-step (Option 3.) with all other connectors the repo
> >>>>> contains.
> >>>>>> But
> >>>>>>>> for connectors that get changed frequently, having a dedicated
> >>>>>> repository
> >>>>>>>> that allows independent releases sounds preferable to me.
> >>>>>>>>
> >>>>>>>> What utilities and infrastructure code do you intend to share?
> >> Using
> >>>>>> git
> >>>>>>>> submodules can definitely be one option to share code. However,
> >> it
> >>>>>> might
> >>>>>>>> also be ok to depend on flink-connector-common artifacts which
> >> could
> >>>>>> make
> >>>>>>>> things easier. Where I am unsure is whether git submodules can
> >> be
> >>>>> used
> >>>>>> to
> >>>>>>>> share infrastructure code (e.g. the .github/workflows) because
> >> you
> >>>>> need
> >>>>>>>> these files in the repo to trigger the CI infrastructure.
> >>>>>>>>
> >>>>>>>> Cheers,
> >>>>>>>> Till
> >>>>>>>>
> >>>>>>>> On Thu, Nov 25, 2021 at 1:59 PM Arvid Heise 
> >>>>> wrote:
> >>>>>>>>> Hi Brian,
> >>>>>>>>>
> >>>>>>>>> Thank you for sharing. I think your approach is very valid and
> >> is
> >>>>> in
> >>>>>>> line
> >>>>>>>>> with what I had in mind.
> >>>>>>>>>
> >>>>>>>>> Basically Pravega community aligns the connector releases with
> >> the
> >>>>>>> Pravega
> >>>>>>>>>> mainline release
> >>>>>>>>>>
> >>>>>>>>> This certainly would mean that there is little value in
> >> coupling
> >>>>>>> connector
> >>>>>>>>> ve

Re: [DISCUSS] Creating an external connector repository

2022-01-10 Thread Chesnay Schepler
s. Given our goals to make connector

releases

easier and more frequent, I think that coupling different

connector

releases might be counter-productive.

To me it sounds not very practical to mainly use a mono

repository

w/o

having some more advanced build infrastructure that, for

example,

allows

to

have different git roots in different connector directories.

Maybe

the

mono

repo can be a catch all repository for connectors that want to

be

released

in lock-step (Option 3.) with all other connectors the repo

contains.

But

for connectors that get changed frequently, having a dedicated

repository

that allows independent releases sounds preferable to me.

What utilities and infrastructure code do you intend to share?

Using

git

submodules can definitely be one option to share code. However,

it

might

also be ok to depend on flink-connector-common artifacts which

could

make

things easier. Where I am unsure is whether git submodules can

be

used

to

share infrastructure code (e.g. the .github/workflows) because

you

need

these files in the repo to trigger the CI infrastructure.

Cheers,
Till

On Thu, Nov 25, 2021 at 1:59 PM Arvid Heise 

wrote:

Hi Brian,

Thank you for sharing. I think your approach is very valid and

is

in

line

with what I had in mind.

Basically Pravega community aligns the connector releases with

the

Pravega

mainline release


This certainly would mean that there is little value in

coupling

connector

versions. So it's making a good case for having separate

connector

repos.



and maintains the connector with the latest 3 Flink

versions(CI

will

publish snapshots for all these 3 branches)


I'd like to give connector devs a simple way to express to

which

Flink

versions the current branch is compatible. From there we can

generate

the

compatibility matrix automatically and optionally also create

different

releases per supported Flink version. Not sure if the latter is

indeed

better than having just one artifact that happens to run with

multiple

Flink versions. I guess it depends on what dependencies we are

exposing. If

the connector uses flink-connector-base, then we probably need

separate

artifacts with poms anyways.

Best,

Arvid

On Fri, Nov 19, 2021 at 10:55 AM Zhou, Brian 

wrote:

Hi Arvid,

For branching model, the Pravega Flink connector has some

experience

what

I would like to share. Here[1][2] is the compatibility matrix

and

wiki

explaining the branching model and releases. Basically Pravega

community

aligns the connector releases with the Pravega mainline

release,

and

maintains the connector with the latest 3 Flink versions(CI

will

publish

snapshots for all these 3 branches).
For example, recently we have 0.10.1 release[3], and in maven

central

we

need to upload three artifacts(For Flink 1.13, 1.12, 1.11) for

0.10.1

version[4].

There are some alternatives. Another solution that we once

discussed

but

finally got abandoned is to have a independent version just

like

the

current CDC connector, and then give a big compatibility

matrix to

users.

We think it would be too confusing when the connector

develops. On

the

contrary, we can also do the opposite way to align with Flink

version

and

maintain several branches for different system version.

I would say this is only a fairly-OK solution because it is a

bit

painful

for maintainers as cherry-picks are very common and releases

would

require

much work. However, if neither systems do not have a nice

backward

compatibility, there seems to be no comfortable solution to

the

their

connector.

[1]

https://github.com/pravega/flink-connectors#compatibility-matrix

[2]


https://github.com/pravega/flink-connectors/wiki/Versioning-strategy-for-Flink-connector

[3]

https://github.com/pravega/flink-connectors/releases/tag/v0.10.1

[4]

https://search.maven.org/search?q=pravega-connectors-flink

Best Regards,
Brian


Internal Use - Confidential

-Original Message-
From: Arvid Heise 
Sent: Friday, November 19, 2021 4:12 PM
To: dev
Subject: Re: [DISCUSS] Creating an external connector

repository


[EXTERNAL EMAIL]

Hi everyone,

we are currently in the process of setting up the

flink-connectors

repo

[1] for new connectors but we hit a wall that we currently

cannot

take:

branching model.
To reiterate the original motivation of the external connector

repo:

We

want to decouple the release cycle of a connector with Flink.

However,

if

we want to support semantic versioning in the connectors with

the

ability

to introduce breaking changes through major version bumps and

support

bugfixes on old versions, then we need release branches

similar to

how

Flink core operates.
Consider two connectors, let's call them kafka and hbase. We

have

kafka

in

version 1.0.X, 1.1.Y (small improvement), 2.0.Z (config

option)

change

and

hbase only on 1.0.A.

Now our current assumption was that we can work with a

mono-repo

und

Re: [DISCUSS] Creating an external connector repository

2022-01-05 Thread Martijn Visser
 > >> > > > for connectors that get changed frequently, having a dedicated
> > >> > repository
> > >> > > > that allows independent releases sounds preferable to me.
> > >> > > >
> > >> > > > What utilities and infrastructure code do you intend to share?
> Using
> > >> > git
> > >> > > > submodules can definitely be one option to share code. However,
> it
> > >> > might
> > >> > > > also be ok to depend on flink-connector-common artifacts which
> could
> > >> > make
> > >> > > > things easier. Where I am unsure is whether git submodules can
> be
> > >> used
> > >> > to
> > >> > > > share infrastructure code (e.g. the .github/workflows) because
> you
> > >> need
> > >> > > > these files in the repo to trigger the CI infrastructure.
> > >> > > >
> > >> > > > Cheers,
> > >> > > > Till
> > >> > > >
> > >> > > > On Thu, Nov 25, 2021 at 1:59 PM Arvid Heise 
> > >> wrote:
> > >> > > >
> > >> > > >> Hi Brian,
> > >> > > >>
> > >> > > >> Thank you for sharing. I think your approach is very valid and
> is
> > >> in
> > >> > > line
> > >> > > >> with what I had in mind.
> > >> > > >>
> > >> > > >> Basically Pravega community aligns the connector releases with
> the
> > >> > > Pravega
> > >> > > >>> mainline release
> > >> > > >>>
> > >> > > >> This certainly would mean that there is little value in
> coupling
> > >> > > connector
> > >> > > >> versions. So it's making a good case for having separate
> connector
> > >> > > repos.
> > >> > > >>
> > >> > > >>
> > >> > > >>> and maintains the connector with the latest 3 Flink
> versions(CI
> > >> will
> > >> > > >>> publish snapshots for all these 3 branches)
> > >> > > >>>
> > >> > > >> I'd like to give connector devs a simple way to express to
> which
> > >> Flink
> > >> > > >> versions the current branch is compatible. From there we can
> > >> generate
> > >> > > the
> > >> > > >> compatibility matrix automatically and optionally also create
> > >> > different
> > >> > > >> releases per supported Flink version. Not sure if the latter is
> > >> indeed
> > >> > > >> better than having just one artifact that happens to run with
> > >> multiple
> > >> > > >> Flink versions. I guess it depends on what dependencies we are
> > >> > > exposing. If
> > >> > > >> the connector uses flink-connector-base, then we probably need
> > >> > separate
> > >> > > >> artifacts with poms anyways.
> > >> > > >>
> > >> > > >> Best,
> > >> > > >>
> > >> > > >> Arvid
> > >> > > >>
> > >> > > >> On Fri, Nov 19, 2021 at 10:55 AM Zhou, Brian 
> > >> wrote:
> > >> > > >>
> > >> > > >>> Hi Arvid,
> > >> > > >>>
> > >> > > >>> For branching model, the Pravega Flink connector has some
> > >> experience
> > >> > > what
> > >> > > >>> I would like to share. Here[1][2] is the compatibility matrix
> and
> > >> > wiki
> > >> > > >>> explaining the branching model and releases. Basically Pravega
> > >> > > community
> > >> > > >>> aligns the connector releases with the Pravega mainline
> release,
> > >> and
> > >> > > >>> maintains the connector with the latest 3 Flink versions(CI
> will
> > >> > > publish
> > >> > > >>> snapshots for all these 3 branches).
> > >> > > >>> For example, recently we have 0.10.1 release[3], and in maven
> > >> central
> > &

Re: [DISCUSS] Creating an external connector repository

2021-12-09 Thread Thomas Weise
 that coupling different connector
> >> > > > releases might be counter-productive.
> >> > > >
> >> > > > To me it sounds not very practical to mainly use a mono repository
> >> w/o
> >> > > > having some more advanced build infrastructure that, for example,
> >> > allows
> >> > > to
> >> > > > have different git roots in different connector directories. Maybe
> >> the
> >> > > mono
> >> > > > repo can be a catch all repository for connectors that want to be
> >> > > released
> >> > > > in lock-step (Option 3.) with all other connectors the repo
> >> contains.
> >> > But
> >> > > > for connectors that get changed frequently, having a dedicated
> >> > repository
> >> > > > that allows independent releases sounds preferable to me.
> >> > > >
> >> > > > What utilities and infrastructure code do you intend to share? Using
> >> > git
> >> > > > submodules can definitely be one option to share code. However, it
> >> > might
> >> > > > also be ok to depend on flink-connector-common artifacts which could
> >> > make
> >> > > > things easier. Where I am unsure is whether git submodules can be
> >> used
> >> > to
> >> > > > share infrastructure code (e.g. the .github/workflows) because you
> >> need
> >> > > > these files in the repo to trigger the CI infrastructure.
> >> > > >
> >> > > > Cheers,
> >> > > > Till
> >> > > >
> >> > > > On Thu, Nov 25, 2021 at 1:59 PM Arvid Heise 
> >> wrote:
> >> > > >
> >> > > >> Hi Brian,
> >> > > >>
> >> > > >> Thank you for sharing. I think your approach is very valid and is
> >> in
> >> > > line
> >> > > >> with what I had in mind.
> >> > > >>
> >> > > >> Basically Pravega community aligns the connector releases with the
> >> > > Pravega
> >> > > >>> mainline release
> >> > > >>>
> >> > > >> This certainly would mean that there is little value in coupling
> >> > > connector
> >> > > >> versions. So it's making a good case for having separate connector
> >> > > repos.
> >> > > >>
> >> > > >>
> >> > > >>> and maintains the connector with the latest 3 Flink versions(CI
> >> will
> >> > > >>> publish snapshots for all these 3 branches)
> >> > > >>>
> >> > > >> I'd like to give connector devs a simple way to express to which
> >> Flink
> >> > > >> versions the current branch is compatible. From there we can
> >> generate
> >> > > the
> >> > > >> compatibility matrix automatically and optionally also create
> >> > different
> >> > > >> releases per supported Flink version. Not sure if the latter is
> >> indeed
> >> > > >> better than having just one artifact that happens to run with
> >> multiple
> >> > > >> Flink versions. I guess it depends on what dependencies we are
> >> > > exposing. If
> >> > > >> the connector uses flink-connector-base, then we probably need
> >> > separate
> >> > > >> artifacts with poms anyways.
> >> > > >>
> >> > > >> Best,
> >> > > >>
> >> > > >> Arvid
> >> > > >>
> >> > > >> On Fri, Nov 19, 2021 at 10:55 AM Zhou, Brian 
> >> wrote:
> >> > > >>
> >> > > >>> Hi Arvid,
> >> > > >>>
> >> > > >>> For branching model, the Pravega Flink connector has some
> >> experience
> >> > > what
> >> > > >>> I would like to share. Here[1][2] is the compatibility matrix and
> >> > wiki
> >> > > >>> explaining the branching model and releases. Basically Pravega
> >> > > community
> >> > > >>> aligns the connector releases with the Pravega mainline release,
> >> and
> >> > > >>> maintains the connector

Re: [DISCUSS] Creating an external connector repository

2021-12-09 Thread Till Rohrmann
tely be one option to share code. However, it
>> > might
>> > > > also be ok to depend on flink-connector-common artifacts which could
>> > make
>> > > > things easier. Where I am unsure is whether git submodules can be
>> used
>> > to
>> > > > share infrastructure code (e.g. the .github/workflows) because you
>> need
>> > > > these files in the repo to trigger the CI infrastructure.
>> > > >
>> > > > Cheers,
>> > > > Till
>> > > >
>> > > > On Thu, Nov 25, 2021 at 1:59 PM Arvid Heise 
>> wrote:
>> > > >
>> > > >> Hi Brian,
>> > > >>
>> > > >> Thank you for sharing. I think your approach is very valid and is
>> in
>> > > line
>> > > >> with what I had in mind.
>> > > >>
>> > > >> Basically Pravega community aligns the connector releases with the
>> > > Pravega
>> > > >>> mainline release
>> > > >>>
>> > > >> This certainly would mean that there is little value in coupling
>> > > connector
>> > > >> versions. So it's making a good case for having separate connector
>> > > repos.
>> > > >>
>> > > >>
>> > > >>> and maintains the connector with the latest 3 Flink versions(CI
>> will
>> > > >>> publish snapshots for all these 3 branches)
>> > > >>>
>> > > >> I'd like to give connector devs a simple way to express to which
>> Flink
>> > > >> versions the current branch is compatible. From there we can
>> generate
>> > > the
>> > > >> compatibility matrix automatically and optionally also create
>> > different
>> > > >> releases per supported Flink version. Not sure if the latter is
>> indeed
>> > > >> better than having just one artifact that happens to run with
>> multiple
>> > > >> Flink versions. I guess it depends on what dependencies we are
>> > > exposing. If
>> > > >> the connector uses flink-connector-base, then we probably need
>> > separate
>> > > >> artifacts with poms anyways.
>> > > >>
>> > > >> Best,
>> > > >>
>> > > >> Arvid
>> > > >>
>> > > >> On Fri, Nov 19, 2021 at 10:55 AM Zhou, Brian 
>> wrote:
>> > > >>
>> > > >>> Hi Arvid,
>> > > >>>
>> > > >>> For branching model, the Pravega Flink connector has some
>> experience
>> > > what
>> > > >>> I would like to share. Here[1][2] is the compatibility matrix and
>> > wiki
>> > > >>> explaining the branching model and releases. Basically Pravega
>> > > community
>> > > >>> aligns the connector releases with the Pravega mainline release,
>> and
>> > > >>> maintains the connector with the latest 3 Flink versions(CI will
>> > > publish
>> > > >>> snapshots for all these 3 branches).
>> > > >>> For example, recently we have 0.10.1 release[3], and in maven
>> central
>> > > we
>> > > >>> need to upload three artifacts(For Flink 1.13, 1.12, 1.11) for
>> 0.10.1
>> > > >>> version[4].
>> > > >>>
>> > > >>> There are some alternatives. Another solution that we once
>> discussed
>> > > but
>> > > >>> finally got abandoned is to have a independent version just like
>> the
>> > > >>> current CDC connector, and then give a big compatibility matrix to
>> > > users.
>> > > >>> We think it would be too confusing when the connector develops. On
>> > the
>> > > >>> contrary, we can also do the opposite way to align with Flink
>> version
>> > > and
>> > > >>> maintain several branches for different system version.
>> > > >>>
>> > > >>> I would say this is only a fairly-OK solution because it is a bit
>> > > painful
>> > > >>> for maintainers as cherry-picks are very common and releases would
>> > > >> require
>> > > >>> much work. However, if neither systems do not have a nice backwa

Re: [DISCUSS] Creating an external connector repository

2021-12-09 Thread Till Rohrmann
releases with the
> > > Pravega
> > > >>> mainline release
> > > >>>
> > > >> This certainly would mean that there is little value in coupling
> > > connector
> > > >> versions. So it's making a good case for having separate connector
> > > repos.
> > > >>
> > > >>
> > > >>> and maintains the connector with the latest 3 Flink versions(CI
> will
> > > >>> publish snapshots for all these 3 branches)
> > > >>>
> > > >> I'd like to give connector devs a simple way to express to which
> Flink
> > > >> versions the current branch is compatible. From there we can
> generate
> > > the
> > > >> compatibility matrix automatically and optionally also create
> > different
> > > >> releases per supported Flink version. Not sure if the latter is
> indeed
> > > >> better than having just one artifact that happens to run with
> multiple
> > > >> Flink versions. I guess it depends on what dependencies we are
> > > exposing. If
> > > >> the connector uses flink-connector-base, then we probably need
> > separate
> > > >> artifacts with poms anyways.
> > > >>
> > > >> Best,
> > > >>
> > > >> Arvid
> > > >>
> > > >> On Fri, Nov 19, 2021 at 10:55 AM Zhou, Brian 
> wrote:
> > > >>
> > > >>> Hi Arvid,
> > > >>>
> > > >>> For branching model, the Pravega Flink connector has some
> experience
> > > what
> > > >>> I would like to share. Here[1][2] is the compatibility matrix and
> > wiki
> > > >>> explaining the branching model and releases. Basically Pravega
> > > community
> > > >>> aligns the connector releases with the Pravega mainline release,
> and
> > > >>> maintains the connector with the latest 3 Flink versions(CI will
> > > publish
> > > >>> snapshots for all these 3 branches).
> > > >>> For example, recently we have 0.10.1 release[3], and in maven
> central
> > > we
> > > >>> need to upload three artifacts(For Flink 1.13, 1.12, 1.11) for
> 0.10.1
> > > >>> version[4].
> > > >>>
> > > >>> There are some alternatives. Another solution that we once
> discussed
> > > but
> > > >>> finally got abandoned is to have a independent version just like
> the
> > > >>> current CDC connector, and then give a big compatibility matrix to
> > > users.
> > > >>> We think it would be too confusing when the connector develops. On
> > the
> > > >>> contrary, we can also do the opposite way to align with Flink
> version
> > > and
> > > >>> maintain several branches for different system version.
> > > >>>
> > > >>> I would say this is only a fairly-OK solution because it is a bit
> > > painful
> > > >>> for maintainers as cherry-picks are very common and releases would
> > > >> require
> > > >>> much work. However, if neither systems do not have a nice backward
> > > >>> compatibility, there seems to be no comfortable solution to the
> their
> > > >>> connector.
> > > >>>
> > > >>> [1]
> https://github.com/pravega/flink-connectors#compatibility-matrix
> > > >>> [2]
> > > >>>
> > > >>
> > >
> >
> https://github.com/pravega/flink-connectors/wiki/Versioning-strategy-for-Flink-connector
> > > >>> [3]
> https://github.com/pravega/flink-connectors/releases/tag/v0.10.1
> > > >>> [4] https://search.maven.org/search?q=pravega-connectors-flink
> > > >>>
> > > >>> Best Regards,
> > > >>> Brian
> > > >>>
> > > >>>
> > > >>> Internal Use - Confidential
> > > >>>
> > > >>> -Original Message-
> > > >>> From: Arvid Heise 
> > > >>> Sent: Friday, November 19, 2021 4:12 PM
> > > >>> To: dev
> > > >>> Subject: Re: [DISCUSS] Creating an external connector repository
> > > >>>
> > > >>>
> > > >>> [EXTERNAL EMAIL]
> > > >>>
> > > >>&g

Re: [DISCUSS] Creating an external connector repository

2021-12-09 Thread Martijn Visser
t; >> Flink versions. I guess it depends on what dependencies we are
> > exposing. If
> > >> the connector uses flink-connector-base, then we probably need
> separate
> > >> artifacts with poms anyways.
> > >>
> > >> Best,
> > >>
> > >> Arvid
> > >>
> > >> On Fri, Nov 19, 2021 at 10:55 AM Zhou, Brian  wrote:
> > >>
> > >>> Hi Arvid,
> > >>>
> > >>> For branching model, the Pravega Flink connector has some experience
> > what
> > >>> I would like to share. Here[1][2] is the compatibility matrix and
> wiki
> > >>> explaining the branching model and releases. Basically Pravega
> > community
> > >>> aligns the connector releases with the Pravega mainline release, and
> > >>> maintains the connector with the latest 3 Flink versions(CI will
> > publish
> > >>> snapshots for all these 3 branches).
> > >>> For example, recently we have 0.10.1 release[3], and in maven central
> > we
> > >>> need to upload three artifacts(For Flink 1.13, 1.12, 1.11) for 0.10.1
> > >>> version[4].
> > >>>
> > >>> There are some alternatives. Another solution that we once discussed
> > but
> > >>> finally got abandoned is to have a independent version just like the
> > >>> current CDC connector, and then give a big compatibility matrix to
> > users.
> > >>> We think it would be too confusing when the connector develops. On
> the
> > >>> contrary, we can also do the opposite way to align with Flink version
> > and
> > >>> maintain several branches for different system version.
> > >>>
> > >>> I would say this is only a fairly-OK solution because it is a bit
> > painful
> > >>> for maintainers as cherry-picks are very common and releases would
> > >> require
> > >>> much work. However, if neither systems do not have a nice backward
> > >>> compatibility, there seems to be no comfortable solution to the their
> > >>> connector.
> > >>>
> > >>> [1] https://github.com/pravega/flink-connectors#compatibility-matrix
> > >>> [2]
> > >>>
> > >>
> >
> https://github.com/pravega/flink-connectors/wiki/Versioning-strategy-for-Flink-connector
> > >>> [3] https://github.com/pravega/flink-connectors/releases/tag/v0.10.1
> > >>> [4] https://search.maven.org/search?q=pravega-connectors-flink
> > >>>
> > >>> Best Regards,
> > >>> Brian
> > >>>
> > >>>
> > >>> Internal Use - Confidential
> > >>>
> > >>> -Original Message-
> > >>> From: Arvid Heise 
> > >>> Sent: Friday, November 19, 2021 4:12 PM
> > >>> To: dev
> > >>> Subject: Re: [DISCUSS] Creating an external connector repository
> > >>>
> > >>>
> > >>> [EXTERNAL EMAIL]
> > >>>
> > >>> Hi everyone,
> > >>>
> > >>> we are currently in the process of setting up the flink-connectors
> repo
> > >>> [1] for new connectors but we hit a wall that we currently cannot
> take:
> > >>> branching model.
> > >>> To reiterate the original motivation of the external connector repo:
> We
> > >>> want to decouple the release cycle of a connector with Flink.
> However,
> > if
> > >>> we want to support semantic versioning in the connectors with the
> > ability
> > >>> to introduce breaking changes through major version bumps and support
> > >>> bugfixes on old versions, then we need release branches similar to
> how
> > >>> Flink core operates.
> > >>> Consider two connectors, let's call them kafka and hbase. We have
> kafka
> > >> in
> > >>> version 1.0.X, 1.1.Y (small improvement), 2.0.Z (config option)
> change
> > >> and
> > >>> hbase only on 1.0.A.
> > >>>
> > >>> Now our current assumption was that we can work with a mono-repo
> under
> > >> ASF
> > >>> (flink-connectors). Then, for release-branches, we found 3 options:
> > >>> 1. We would need to create some ugly mess with the cross product of
> > >>> connector and version: so you have kafka-release-1.0,
> > kafka-release-1.1

Re: [DISCUSS] Creating an external connector repository

2021-12-09 Thread Arvid Heise
t; >>>
> >>> There are some alternatives. Another solution that we once discussed
> but
> >>> finally got abandoned is to have a independent version just like the
> >>> current CDC connector, and then give a big compatibility matrix to
> users.
> >>> We think it would be too confusing when the connector develops. On the
> >>> contrary, we can also do the opposite way to align with Flink version
> and
> >>> maintain several branches for different system version.
> >>>
> >>> I would say this is only a fairly-OK solution because it is a bit
> painful
> >>> for maintainers as cherry-picks are very common and releases would
> >> require
> >>> much work. However, if neither systems do not have a nice backward
> >>> compatibility, there seems to be no comfortable solution to the their
> >>> connector.
> >>>
> >>> [1] https://github.com/pravega/flink-connectors#compatibility-matrix
> >>> [2]
> >>>
> >>
> https://github.com/pravega/flink-connectors/wiki/Versioning-strategy-for-Flink-connector
> >>> [3] https://github.com/pravega/flink-connectors/releases/tag/v0.10.1
> >>> [4] https://search.maven.org/search?q=pravega-connectors-flink
> >>>
> >>> Best Regards,
> >>> Brian
> >>>
> >>>
> >>> Internal Use - Confidential
> >>>
> >>> -Original Message-
> >>> From: Arvid Heise 
> >>> Sent: Friday, November 19, 2021 4:12 PM
> >>> To: dev
> >>> Subject: Re: [DISCUSS] Creating an external connector repository
> >>>
> >>>
> >>> [EXTERNAL EMAIL]
> >>>
> >>> Hi everyone,
> >>>
> >>> we are currently in the process of setting up the flink-connectors repo
> >>> [1] for new connectors but we hit a wall that we currently cannot take:
> >>> branching model.
> >>> To reiterate the original motivation of the external connector repo: We
> >>> want to decouple the release cycle of a connector with Flink. However,
> if
> >>> we want to support semantic versioning in the connectors with the
> ability
> >>> to introduce breaking changes through major version bumps and support
> >>> bugfixes on old versions, then we need release branches similar to how
> >>> Flink core operates.
> >>> Consider two connectors, let's call them kafka and hbase. We have kafka
> >> in
> >>> version 1.0.X, 1.1.Y (small improvement), 2.0.Z (config option) change
> >> and
> >>> hbase only on 1.0.A.
> >>>
> >>> Now our current assumption was that we can work with a mono-repo under
> >> ASF
> >>> (flink-connectors). Then, for release-branches, we found 3 options:
> >>> 1. We would need to create some ugly mess with the cross product of
> >>> connector and version: so you have kafka-release-1.0,
> kafka-release-1.1,
> >>> kafka-release-2.0, hbase-release-1.0. The main issue is not the amount
> of
> >>> branches (that's something that git can handle) but there the state of
> >>> kafka is undefined in hbase-release-1.0. That's a call for desaster and
> >>> makes releasing connectors very cumbersome (CI would only execute and
> >>> publish hbase SNAPSHOTS on hbase-release-1.0).
> >>> 2. We could avoid the undefined state by having an empty master and
> each
> >>> release branch really only holds the code of the connector. But that's
> >> also
> >>> not great: any user that looks at the repo and sees no connector would
> >>> assume that it's dead.
> >>> 3. We could have synced releases similar to the CDC connectors [2].
> That
> >>> means that if any connector introduces a breaking change, all
> connectors
> >>> get a new major. I find that quite confusing to a user if hbase gets a
> >> new
> >>> release without any change because kafka introduced a breaking change.
> >>>
> >>> To fully decouple release cycles and CI of connectors, we could add
> >>> individual repositories under ASF (flink-connector-kafka,
> >>> flink-connector-hbase). Then we can apply the same branching model as
> >>> before. I quickly checked if there are precedences in the apache
> >> community
> >>> for that approach and just by scanning alphabetically I found cordova
> >> with
> >&g

Re: [DISCUSS] Creating an external connector repository

2021-11-26 Thread Chesnay Schepler
 connector releases with the Pravega mainline release, and
maintains the connector with the latest 3 Flink versions(CI will publish
snapshots for all these 3 branches).
For example, recently we have 0.10.1 release[3], and in maven central we
need to upload three artifacts(For Flink 1.13, 1.12, 1.11) for 0.10.1
version[4].

There are some alternatives. Another solution that we once discussed but
finally got abandoned is to have a independent version just like the
current CDC connector, and then give a big compatibility matrix to users.
We think it would be too confusing when the connector develops. On the
contrary, we can also do the opposite way to align with Flink version and
maintain several branches for different system version.

I would say this is only a fairly-OK solution because it is a bit painful
for maintainers as cherry-picks are very common and releases would

require

much work. However, if neither systems do not have a nice backward
compatibility, there seems to be no comfortable solution to the their
connector.

[1] https://github.com/pravega/flink-connectors#compatibility-matrix
[2]


https://github.com/pravega/flink-connectors/wiki/Versioning-strategy-for-Flink-connector

[3] https://github.com/pravega/flink-connectors/releases/tag/v0.10.1
[4] https://search.maven.org/search?q=pravega-connectors-flink

Best Regards,
Brian


Internal Use - Confidential

-Original Message-
From: Arvid Heise 
Sent: Friday, November 19, 2021 4:12 PM
To: dev
Subject: Re: [DISCUSS] Creating an external connector repository


[EXTERNAL EMAIL]

Hi everyone,

we are currently in the process of setting up the flink-connectors repo
[1] for new connectors but we hit a wall that we currently cannot take:
branching model.
To reiterate the original motivation of the external connector repo: We
want to decouple the release cycle of a connector with Flink. However, if
we want to support semantic versioning in the connectors with the ability
to introduce breaking changes through major version bumps and support
bugfixes on old versions, then we need release branches similar to how
Flink core operates.
Consider two connectors, let's call them kafka and hbase. We have kafka

in

version 1.0.X, 1.1.Y (small improvement), 2.0.Z (config option) change

and

hbase only on 1.0.A.

Now our current assumption was that we can work with a mono-repo under

ASF

(flink-connectors). Then, for release-branches, we found 3 options:
1. We would need to create some ugly mess with the cross product of
connector and version: so you have kafka-release-1.0, kafka-release-1.1,
kafka-release-2.0, hbase-release-1.0. The main issue is not the amount of
branches (that's something that git can handle) but there the state of
kafka is undefined in hbase-release-1.0. That's a call for desaster and
makes releasing connectors very cumbersome (CI would only execute and
publish hbase SNAPSHOTS on hbase-release-1.0).
2. We could avoid the undefined state by having an empty master and each
release branch really only holds the code of the connector. But that's

also

not great: any user that looks at the repo and sees no connector would
assume that it's dead.
3. We could have synced releases similar to the CDC connectors [2]. That
means that if any connector introduces a breaking change, all connectors
get a new major. I find that quite confusing to a user if hbase gets a

new

release without any change because kafka introduced a breaking change.

To fully decouple release cycles and CI of connectors, we could add
individual repositories under ASF (flink-connector-kafka,
flink-connector-hbase). Then we can apply the same branching model as
before. I quickly checked if there are precedences in the apache

community

for that approach and just by scanning alphabetically I found cordova

with

70 and couchdb with 77 apache repos respectively. So it certainly seems
like other projects approached our problem in that way and the apache
organization is okay with that. I currently expect max 20 additional

repos

for connectors and in the future 10 max each for formats and filesystems

if

we would also move them out at some point in time. So we would be at a
total of 50 repos.

Note for all options, we need to provide a compability matrix that we aim
to autogenerate.

Now for the potential downsides that we internally discussed:
- How can we ensure common infra structure code, utilties, and quality?
I propose to add a flink-connector-common that contains all these things
and is added as a git submodule/subtree to the repos.
- Do we implicitly discourage connector developers to maintain more than
one connector with a fragmented code base?
That is certainly a risk. However, I currently also see few devs working
on more than one connector. However, it may actually help keeping the

devs

that maintain a specific connector on the hook. We could use github

issues

to track bugs and feature requests and a dev can focus his limited time

on

getting that one connector 

Re: [DISCUSS] Creating an external connector repository

2021-11-26 Thread Till Rohrmann
om/pravega/flink-connectors#compatibility-matrix
> > [2]
> >
> https://github.com/pravega/flink-connectors/wiki/Versioning-strategy-for-Flink-connector
> > [3] https://github.com/pravega/flink-connectors/releases/tag/v0.10.1
> > [4] https://search.maven.org/search?q=pravega-connectors-flink
> >
> > Best Regards,
> > Brian
> >
> >
> > Internal Use - Confidential
> >
> > -Original Message-
> > From: Arvid Heise 
> > Sent: Friday, November 19, 2021 4:12 PM
> > To: dev
> > Subject: Re: [DISCUSS] Creating an external connector repository
> >
> >
> > [EXTERNAL EMAIL]
> >
> > Hi everyone,
> >
> > we are currently in the process of setting up the flink-connectors repo
> > [1] for new connectors but we hit a wall that we currently cannot take:
> > branching model.
> > To reiterate the original motivation of the external connector repo: We
> > want to decouple the release cycle of a connector with Flink. However, if
> > we want to support semantic versioning in the connectors with the ability
> > to introduce breaking changes through major version bumps and support
> > bugfixes on old versions, then we need release branches similar to how
> > Flink core operates.
> > Consider two connectors, let's call them kafka and hbase. We have kafka
> in
> > version 1.0.X, 1.1.Y (small improvement), 2.0.Z (config option) change
> and
> > hbase only on 1.0.A.
> >
> > Now our current assumption was that we can work with a mono-repo under
> ASF
> > (flink-connectors). Then, for release-branches, we found 3 options:
> > 1. We would need to create some ugly mess with the cross product of
> > connector and version: so you have kafka-release-1.0, kafka-release-1.1,
> > kafka-release-2.0, hbase-release-1.0. The main issue is not the amount of
> > branches (that's something that git can handle) but there the state of
> > kafka is undefined in hbase-release-1.0. That's a call for desaster and
> > makes releasing connectors very cumbersome (CI would only execute and
> > publish hbase SNAPSHOTS on hbase-release-1.0).
> > 2. We could avoid the undefined state by having an empty master and each
> > release branch really only holds the code of the connector. But that's
> also
> > not great: any user that looks at the repo and sees no connector would
> > assume that it's dead.
> > 3. We could have synced releases similar to the CDC connectors [2]. That
> > means that if any connector introduces a breaking change, all connectors
> > get a new major. I find that quite confusing to a user if hbase gets a
> new
> > release without any change because kafka introduced a breaking change.
> >
> > To fully decouple release cycles and CI of connectors, we could add
> > individual repositories under ASF (flink-connector-kafka,
> > flink-connector-hbase). Then we can apply the same branching model as
> > before. I quickly checked if there are precedences in the apache
> community
> > for that approach and just by scanning alphabetically I found cordova
> with
> > 70 and couchdb with 77 apache repos respectively. So it certainly seems
> > like other projects approached our problem in that way and the apache
> > organization is okay with that. I currently expect max 20 additional
> repos
> > for connectors and in the future 10 max each for formats and filesystems
> if
> > we would also move them out at some point in time. So we would be at a
> > total of 50 repos.
> >
> > Note for all options, we need to provide a compability matrix that we aim
> > to autogenerate.
> >
> > Now for the potential downsides that we internally discussed:
> > - How can we ensure common infra structure code, utilties, and quality?
> > I propose to add a flink-connector-common that contains all these things
> > and is added as a git submodule/subtree to the repos.
> > - Do we implicitly discourage connector developers to maintain more than
> > one connector with a fragmented code base?
> > That is certainly a risk. However, I currently also see few devs working
> > on more than one connector. However, it may actually help keeping the
> devs
> > that maintain a specific connector on the hook. We could use github
> issues
> > to track bugs and feature requests and a dev can focus his limited time
> on
> > getting that one connector right.
> >
> > So WDYT? Compared to some intermediate suggestions with split repos, the
> > big difference is that everything remains under Apache umbrella and the
> > Flink community.
> >
> > [1]
> &g

Re: [DISCUSS] Creating an external connector repository

2021-11-25 Thread Arvid Heise
Hi Brian,

Thank you for sharing. I think your approach is very valid and is in line
with what I had in mind.

Basically Pravega community aligns the connector releases with the Pravega
> mainline release
>
This certainly would mean that there is little value in coupling connector
versions. So it's making a good case for having separate connector repos.


> and maintains the connector with the latest 3 Flink versions(CI will
> publish snapshots for all these 3 branches)
>
I'd like to give connector devs a simple way to express to which Flink
versions the current branch is compatible. From there we can generate the
compatibility matrix automatically and optionally also create different
releases per supported Flink version. Not sure if the latter is indeed
better than having just one artifact that happens to run with multiple
Flink versions. I guess it depends on what dependencies we are exposing. If
the connector uses flink-connector-base, then we probably need separate
artifacts with poms anyways.

Best,

Arvid

On Fri, Nov 19, 2021 at 10:55 AM Zhou, Brian  wrote:

> Hi Arvid,
>
> For branching model, the Pravega Flink connector has some experience what
> I would like to share. Here[1][2] is the compatibility matrix and wiki
> explaining the branching model and releases. Basically Pravega community
> aligns the connector releases with the Pravega mainline release, and
> maintains the connector with the latest 3 Flink versions(CI will publish
> snapshots for all these 3 branches).
> For example, recently we have 0.10.1 release[3], and in maven central we
> need to upload three artifacts(For Flink 1.13, 1.12, 1.11) for 0.10.1
> version[4].
>
> There are some alternatives. Another solution that we once discussed but
> finally got abandoned is to have a independent version just like the
> current CDC connector, and then give a big compatibility matrix to users.
> We think it would be too confusing when the connector develops. On the
> contrary, we can also do the opposite way to align with Flink version and
> maintain several branches for different system version.
>
> I would say this is only a fairly-OK solution because it is a bit painful
> for maintainers as cherry-picks are very common and releases would require
> much work. However, if neither systems do not have a nice backward
> compatibility, there seems to be no comfortable solution to the their
> connector.
>
> [1] https://github.com/pravega/flink-connectors#compatibility-matrix
> [2]
> https://github.com/pravega/flink-connectors/wiki/Versioning-strategy-for-Flink-connector
> [3] https://github.com/pravega/flink-connectors/releases/tag/v0.10.1
> [4] https://search.maven.org/search?q=pravega-connectors-flink
>
> Best Regards,
> Brian
>
>
> Internal Use - Confidential
>
> -Original Message-
> From: Arvid Heise 
> Sent: Friday, November 19, 2021 4:12 PM
> To: dev
> Subject: Re: [DISCUSS] Creating an external connector repository
>
>
> [EXTERNAL EMAIL]
>
> Hi everyone,
>
> we are currently in the process of setting up the flink-connectors repo
> [1] for new connectors but we hit a wall that we currently cannot take:
> branching model.
> To reiterate the original motivation of the external connector repo: We
> want to decouple the release cycle of a connector with Flink. However, if
> we want to support semantic versioning in the connectors with the ability
> to introduce breaking changes through major version bumps and support
> bugfixes on old versions, then we need release branches similar to how
> Flink core operates.
> Consider two connectors, let's call them kafka and hbase. We have kafka in
> version 1.0.X, 1.1.Y (small improvement), 2.0.Z (config option) change and
> hbase only on 1.0.A.
>
> Now our current assumption was that we can work with a mono-repo under ASF
> (flink-connectors). Then, for release-branches, we found 3 options:
> 1. We would need to create some ugly mess with the cross product of
> connector and version: so you have kafka-release-1.0, kafka-release-1.1,
> kafka-release-2.0, hbase-release-1.0. The main issue is not the amount of
> branches (that's something that git can handle) but there the state of
> kafka is undefined in hbase-release-1.0. That's a call for desaster and
> makes releasing connectors very cumbersome (CI would only execute and
> publish hbase SNAPSHOTS on hbase-release-1.0).
> 2. We could avoid the undefined state by having an empty master and each
> release branch really only holds the code of the connector. But that's also
> not great: any user that looks at the repo and sees no connector would
> assume that it's dead.
> 3. We could have synced releases similar to the CDC connectors [2]. That
> means that if any connector introduces a breaking change, all co

RE: [DISCUSS] Creating an external connector repository

2021-11-19 Thread Zhou, Brian
Hi Arvid,

For branching model, the Pravega Flink connector has some experience what I 
would like to share. Here[1][2] is the compatibility matrix and wiki explaining 
the branching model and releases. Basically Pravega community aligns the 
connector releases with the Pravega mainline release, and maintains the 
connector with the latest 3 Flink versions(CI will publish snapshots for all 
these 3 branches).
For example, recently we have 0.10.1 release[3], and in maven central we need 
to upload three artifacts(For Flink 1.13, 1.12, 1.11) for 0.10.1 version[4].

There are some alternatives. Another solution that we once discussed but 
finally got abandoned is to have a independent version just like the current 
CDC connector, and then give a big compatibility matrix to users. We think it 
would be too confusing when the connector develops. On the contrary, we can 
also do the opposite way to align with Flink version and maintain several 
branches for different system version.

I would say this is only a fairly-OK solution because it is a bit painful for 
maintainers as cherry-picks are very common and releases would require much 
work. However, if neither systems do not have a nice backward compatibility, 
there seems to be no comfortable solution to the their connector.

[1] https://github.com/pravega/flink-connectors#compatibility-matrix
[2] 
https://github.com/pravega/flink-connectors/wiki/Versioning-strategy-for-Flink-connector
[3] https://github.com/pravega/flink-connectors/releases/tag/v0.10.1
[4] https://search.maven.org/search?q=pravega-connectors-flink

Best Regards,
Brian


Internal Use - Confidential

-Original Message-
From: Arvid Heise  
Sent: Friday, November 19, 2021 4:12 PM
To: dev
Subject: Re: [DISCUSS] Creating an external connector repository


[EXTERNAL EMAIL] 

Hi everyone,

we are currently in the process of setting up the flink-connectors repo [1] for 
new connectors but we hit a wall that we currently cannot take:
branching model.
To reiterate the original motivation of the external connector repo: We want to 
decouple the release cycle of a connector with Flink. However, if we want to 
support semantic versioning in the connectors with the ability to introduce 
breaking changes through major version bumps and support bugfixes on old 
versions, then we need release branches similar to how Flink core operates.
Consider two connectors, let's call them kafka and hbase. We have kafka in 
version 1.0.X, 1.1.Y (small improvement), 2.0.Z (config option) change and 
hbase only on 1.0.A.

Now our current assumption was that we can work with a mono-repo under ASF 
(flink-connectors). Then, for release-branches, we found 3 options:
1. We would need to create some ugly mess with the cross product of connector 
and version: so you have kafka-release-1.0, kafka-release-1.1, 
kafka-release-2.0, hbase-release-1.0. The main issue is not the amount of 
branches (that's something that git can handle) but there the state of kafka is 
undefined in hbase-release-1.0. That's a call for desaster and makes releasing 
connectors very cumbersome (CI would only execute and publish hbase SNAPSHOTS 
on hbase-release-1.0).
2. We could avoid the undefined state by having an empty master and each 
release branch really only holds the code of the connector. But that's also not 
great: any user that looks at the repo and sees no connector would assume that 
it's dead.
3. We could have synced releases similar to the CDC connectors [2]. That means 
that if any connector introduces a breaking change, all connectors get a new 
major. I find that quite confusing to a user if hbase gets a new release 
without any change because kafka introduced a breaking change.

To fully decouple release cycles and CI of connectors, we could add individual 
repositories under ASF (flink-connector-kafka, flink-connector-hbase). Then we 
can apply the same branching model as before. I quickly checked if there are 
precedences in the apache community for that approach and just by scanning 
alphabetically I found cordova with
70 and couchdb with 77 apache repos respectively. So it certainly seems like 
other projects approached our problem in that way and the apache organization 
is okay with that. I currently expect max 20 additional repos for connectors 
and in the future 10 max each for formats and filesystems if we would also move 
them out at some point in time. So we would be at a total of 50 repos.

Note for all options, we need to provide a compability matrix that we aim to 
autogenerate.

Now for the potential downsides that we internally discussed:
- How can we ensure common infra structure code, utilties, and quality?
I propose to add a flink-connector-common that contains all these things and is 
added as a git submodule/subtree to the repos.
- Do we implicitly discourage connector developers to maintain more than one 
connector with a fragmented code base?
That is certainly a risk. However, I currently also see few

Re: [DISCUSS] Creating an external connector repository

2021-11-19 Thread Arvid Heise
Hi everyone,

we are currently in the process of setting up the flink-connectors repo [1]
for new connectors but we hit a wall that we currently cannot take:
branching model.
To reiterate the original motivation of the external connector repo: We
want to decouple the release cycle of a connector with Flink. However, if
we want to support semantic versioning in the connectors with the ability
to introduce breaking changes through major version bumps and support
bugfixes on old versions, then we need release branches similar to how
Flink core operates.
Consider two connectors, let's call them kafka and hbase. We have kafka in
version 1.0.X, 1.1.Y (small improvement), 2.0.Z (config option) change and
hbase only on 1.0.A.

Now our current assumption was that we can work with a mono-repo under ASF
(flink-connectors). Then, for release-branches, we found 3 options:
1. We would need to create some ugly mess with the cross product of
connector and version: so you have kafka-release-1.0, kafka-release-1.1,
kafka-release-2.0, hbase-release-1.0. The main issue is not the amount of
branches (that's something that git can handle) but there the state of
kafka is undefined in hbase-release-1.0. That's a call for desaster and
makes releasing connectors very cumbersome (CI would only execute and
publish hbase SNAPSHOTS on hbase-release-1.0).
2. We could avoid the undefined state by having an empty master and each
release branch really only holds the code of the connector. But that's also
not great: any user that looks at the repo and sees no connector would
assume that it's dead.
3. We could have synced releases similar to the CDC connectors [2]. That
means that if any connector introduces a breaking change, all connectors
get a new major. I find that quite confusing to a user if hbase gets a new
release without any change because kafka introduced a breaking change.

To fully decouple release cycles and CI of connectors, we could add
individual repositories under ASF (flink-connector-kafka,
flink-connector-hbase). Then we can apply the same branching model as
before. I quickly checked if there are precedences in the apache community
for that approach and just by scanning alphabetically I found cordova with
70 and couchdb with 77 apache repos respectively. So it certainly seems
like other projects approached our problem in that way and the apache
organization is okay with that. I currently expect max 20 additional repos
for connectors and in the future 10 max each for formats and filesystems if
we would also move them out at some point in time. So we would be at a
total of 50 repos.

Note for all options, we need to provide a compability matrix that we aim
to autogenerate.

Now for the potential downsides that we internally discussed:
- How can we ensure common infra structure code, utilties, and quality?
I propose to add a flink-connector-common that contains all these things
and is added as a git submodule/subtree to the repos.
- Do we implicitly discourage connector developers to maintain more than
one connector with a fragmented code base?
That is certainly a risk. However, I currently also see few devs working on
more than one connector. However, it may actually help keeping the devs
that maintain a specific connector on the hook. We could use github issues
to track bugs and feature requests and a dev can focus his limited time on
getting that one connector right.

So WDYT? Compared to some intermediate suggestions with split repos, the
big difference is that everything remains under Apache umbrella and the
Flink community.

[1] https://github.com/apache/flink-connectors
[2] https://github.com/ververica/flink-cdc-connectors/

On Fri, Nov 12, 2021 at 3:39 PM Arvid Heise  wrote:

> Hi everyone,
>
> I created the flink-connectors repo [1] to advance the topic. We would
> create a proof-of-concept in the next few weeks as a special branch that
> I'd then use for discussions. If the community agrees with the approach,
> that special branch will become the master. If not, we can reiterate over
> it or create competing POCs.
>
> If someone wants to try things out in parallel, just make sure that you
> are not accidentally pushing POCs to the master.
>
> As a reminder: We will not move out any current connector from Flink at
> this point in time, so everything in Flink will remain as is and be
> maintained there.
>
> Best,
>
> Arvid
>
> [1] https://github.com/apache/flink-connectors
>
> On Fri, Oct 29, 2021 at 6:57 PM Till Rohrmann 
> wrote:
>
>> Hi everyone,
>>
>> From the discussion, it seems to me that we have different opinions
>> whether to have an ASF umbrella repository or to host them outside of the
>> ASF. It also seems that this is not really the problem to solve. Since
>> there are many good arguments for either approach, we could simply start
>> with an ASF umbrella repository and see how people adopt it. If the
>> individual connectors cannot move fast enough or if people prefer to not
>> buy into the more 

Re: [DISCUSS] Creating an external connector repository

2021-11-12 Thread Arvid Heise
Hi everyone,

I created the flink-connectors repo [1] to advance the topic. We would
create a proof-of-concept in the next few weeks as a special branch that
I'd then use for discussions. If the community agrees with the approach,
that special branch will become the master. If not, we can reiterate over
it or create competing POCs.

If someone wants to try things out in parallel, just make sure that you are
not accidentally pushing POCs to the master.

As a reminder: We will not move out any current connector from Flink at
this point in time, so everything in Flink will remain as is and be
maintained there.

Best,

Arvid

[1] https://github.com/apache/flink-connectors

On Fri, Oct 29, 2021 at 6:57 PM Till Rohrmann  wrote:

> Hi everyone,
>
> From the discussion, it seems to me that we have different opinions
> whether to have an ASF umbrella repository or to host them outside of the
> ASF. It also seems that this is not really the problem to solve. Since
> there are many good arguments for either approach, we could simply start
> with an ASF umbrella repository and see how people adopt it. If the
> individual connectors cannot move fast enough or if people prefer to not
> buy into the more heavy-weight ASF processes, then they can host the code
> also somewhere else. We simply need to make sure that these connectors are
> discoverable (e.g. via flink-packages).
>
> The more important problem seems to be to provide common tooling (testing,
> infrastructure, documentation) that can easily be reused. Similarly, it has
> become clear that the Flink community needs to improve on providing stable
> APIs. I think it is not realistic to first complete these tasks before
> starting to move connectors to dedicated repositories. As Stephan said,
> creating a connector repository will force us to pay more attention to API
> stability and also to think about which testing tools are required. Hence,
> I believe that starting to add connectors to a different repository than
> apache/flink will help improve our connector tooling (declaring testing
> classes as public, creating a common test utility repo, creating a repo
> template) and vice versa. Hence, I like Arvid's proposed process as it will
> start kicking things off w/o letting this effort fizzle out.
>
> Cheers,
> Till
>
> On Thu, Oct 28, 2021 at 11:44 AM Stephan Ewen  wrote:
>
> > Thank you all, for the nice discussion!
> >
> > From my point of view, I very much like the idea of putting connectors
> in a
> > separate repository. But I would argue it should be part of Apache Flink,
> > similar to flink-statefun, flink-ml, etc.
> >
> > I share many of the reasons for that:
> >   - As argued many times, reduces complexity of the Flink repo, increases
> > response times of CI, etc.
> >   - Much lower barrier of contribution, because an unstable connector
> would
> > not de-stabilize the whole build. Of course, we would need to make sure
> we
> > set this up the right way, with connectors having individual CI runs,
> build
> > status, etc. But it certainly seems possible.
> >
> >
> > I would argue some points a bit different than some cases made before:
> >
> > (a) I believe the separation would increase connector stability. Because
> it
> > really forces us to work with the connectors against the APIs like any
> > external developer. A mono repo is somehow the wrong thing if you in
> > practice want to actually guarantee stable internal APIs at some layer.
> > Because the mono repo makes it easy to just change something on both
> sides
> > of the API (provider and consumer) seamlessly.
> >
> > Major refactorings in Flink need to keep all connector API contracts
> > intact, or we need to have a new version of the connector API.
> >
> > (b) We may even be able to go towards more lightweight and automated
> > releases over time, even if we stay in Apache Flink with that repo.
> > This isn't yet fully aligned with the Apache release policies, yet, but
> > there are board discussions about whether there can be bot-triggered
> > releases (by dependabot) and how that could fit into the Apache process.
> >
> > This doesn't seem to be quite there just yet, but seeing that those start
> > is a good sign, and there is a good chance we can do some things there.
> > I am not sure whether we should let bots trigger releases, because a
> final
> > human look at things isn't a bad thing, especially given the popularity
> of
> > software supply chain attacks recently.
> >
> >
> > I do share Chesnay's concerns about complexity in tooling, though. Both
> > release tooling and test tooling. They are not incompatible with that
> > approach, but they are a task we need to tackle during this change which
> > will add additional work.
> >
> >
> >
> > On Tue, Oct 26, 2021 at 10:31 AM Arvid Heise  wrote:
> >
> > > Hi folks,
> > >
> > > I think some questions came up and I'd like to address the question of
> > the
> > > timing.
> > >
> > > Could you clarify what release cadence you're 

Re: [DISCUSS] Creating an external connector repository

2021-10-29 Thread Till Rohrmann
Hi everyone,

>From the discussion, it seems to me that we have different opinions
whether to have an ASF umbrella repository or to host them outside of the
ASF. It also seems that this is not really the problem to solve. Since
there are many good arguments for either approach, we could simply start
with an ASF umbrella repository and see how people adopt it. If the
individual connectors cannot move fast enough or if people prefer to not
buy into the more heavy-weight ASF processes, then they can host the code
also somewhere else. We simply need to make sure that these connectors are
discoverable (e.g. via flink-packages).

The more important problem seems to be to provide common tooling (testing,
infrastructure, documentation) that can easily be reused. Similarly, it has
become clear that the Flink community needs to improve on providing stable
APIs. I think it is not realistic to first complete these tasks before
starting to move connectors to dedicated repositories. As Stephan said,
creating a connector repository will force us to pay more attention to API
stability and also to think about which testing tools are required. Hence,
I believe that starting to add connectors to a different repository than
apache/flink will help improve our connector tooling (declaring testing
classes as public, creating a common test utility repo, creating a repo
template) and vice versa. Hence, I like Arvid's proposed process as it will
start kicking things off w/o letting this effort fizzle out.

Cheers,
Till

On Thu, Oct 28, 2021 at 11:44 AM Stephan Ewen  wrote:

> Thank you all, for the nice discussion!
>
> From my point of view, I very much like the idea of putting connectors in a
> separate repository. But I would argue it should be part of Apache Flink,
> similar to flink-statefun, flink-ml, etc.
>
> I share many of the reasons for that:
>   - As argued many times, reduces complexity of the Flink repo, increases
> response times of CI, etc.
>   - Much lower barrier of contribution, because an unstable connector would
> not de-stabilize the whole build. Of course, we would need to make sure we
> set this up the right way, with connectors having individual CI runs, build
> status, etc. But it certainly seems possible.
>
>
> I would argue some points a bit different than some cases made before:
>
> (a) I believe the separation would increase connector stability. Because it
> really forces us to work with the connectors against the APIs like any
> external developer. A mono repo is somehow the wrong thing if you in
> practice want to actually guarantee stable internal APIs at some layer.
> Because the mono repo makes it easy to just change something on both sides
> of the API (provider and consumer) seamlessly.
>
> Major refactorings in Flink need to keep all connector API contracts
> intact, or we need to have a new version of the connector API.
>
> (b) We may even be able to go towards more lightweight and automated
> releases over time, even if we stay in Apache Flink with that repo.
> This isn't yet fully aligned with the Apache release policies, yet, but
> there are board discussions about whether there can be bot-triggered
> releases (by dependabot) and how that could fit into the Apache process.
>
> This doesn't seem to be quite there just yet, but seeing that those start
> is a good sign, and there is a good chance we can do some things there.
> I am not sure whether we should let bots trigger releases, because a final
> human look at things isn't a bad thing, especially given the popularity of
> software supply chain attacks recently.
>
>
> I do share Chesnay's concerns about complexity in tooling, though. Both
> release tooling and test tooling. They are not incompatible with that
> approach, but they are a task we need to tackle during this change which
> will add additional work.
>
>
>
> On Tue, Oct 26, 2021 at 10:31 AM Arvid Heise  wrote:
>
> > Hi folks,
> >
> > I think some questions came up and I'd like to address the question of
> the
> > timing.
> >
> > Could you clarify what release cadence you're thinking of? There's quite
> > > a big range that fits "more frequent than Flink" (per-commit, daily,
> > > weekly, bi-weekly, monthly, even bi-monthly).
> >
> > The short answer is: as often as needed:
> > - If there is a CVE in a dependency and we need to bump it - release
> > immediately.
> > - If there is a new feature merged, release soonish. We may collect a few
> > successive features before a release.
> > - If there is a bugfix, release immediately or soonish depending on the
> > severity and if there are workarounds available.
> >
> > We should not limit ourselves; the whole idea of independent releases is
> > exactly that you release as needed. There is no release planning or
> > anything needed, you just go with a release as if it was an external
> > artifact.
> >
> > (1) is the connector API already stable?
> > > From another discussion thread [1], connector API is far from stable.
> > > 

Re: [DISCUSS] Creating an external connector repository

2021-10-28 Thread Stephan Ewen
Thank you all, for the nice discussion!

>From my point of view, I very much like the idea of putting connectors in a
separate repository. But I would argue it should be part of Apache Flink,
similar to flink-statefun, flink-ml, etc.

I share many of the reasons for that:
  - As argued many times, reduces complexity of the Flink repo, increases
response times of CI, etc.
  - Much lower barrier of contribution, because an unstable connector would
not de-stabilize the whole build. Of course, we would need to make sure we
set this up the right way, with connectors having individual CI runs, build
status, etc. But it certainly seems possible.


I would argue some points a bit different than some cases made before:

(a) I believe the separation would increase connector stability. Because it
really forces us to work with the connectors against the APIs like any
external developer. A mono repo is somehow the wrong thing if you in
practice want to actually guarantee stable internal APIs at some layer.
Because the mono repo makes it easy to just change something on both sides
of the API (provider and consumer) seamlessly.

Major refactorings in Flink need to keep all connector API contracts
intact, or we need to have a new version of the connector API.

(b) We may even be able to go towards more lightweight and automated
releases over time, even if we stay in Apache Flink with that repo.
This isn't yet fully aligned with the Apache release policies, yet, but
there are board discussions about whether there can be bot-triggered
releases (by dependabot) and how that could fit into the Apache process.

This doesn't seem to be quite there just yet, but seeing that those start
is a good sign, and there is a good chance we can do some things there.
I am not sure whether we should let bots trigger releases, because a final
human look at things isn't a bad thing, especially given the popularity of
software supply chain attacks recently.


I do share Chesnay's concerns about complexity in tooling, though. Both
release tooling and test tooling. They are not incompatible with that
approach, but they are a task we need to tackle during this change which
will add additional work.



On Tue, Oct 26, 2021 at 10:31 AM Arvid Heise  wrote:

> Hi folks,
>
> I think some questions came up and I'd like to address the question of the
> timing.
>
> Could you clarify what release cadence you're thinking of? There's quite
> > a big range that fits "more frequent than Flink" (per-commit, daily,
> > weekly, bi-weekly, monthly, even bi-monthly).
>
> The short answer is: as often as needed:
> - If there is a CVE in a dependency and we need to bump it - release
> immediately.
> - If there is a new feature merged, release soonish. We may collect a few
> successive features before a release.
> - If there is a bugfix, release immediately or soonish depending on the
> severity and if there are workarounds available.
>
> We should not limit ourselves; the whole idea of independent releases is
> exactly that you release as needed. There is no release planning or
> anything needed, you just go with a release as if it was an external
> artifact.
>
> (1) is the connector API already stable?
> > From another discussion thread [1], connector API is far from stable.
> > Currently, it's hard to build connectors against multiple Flink versions.
> > There are breaking API changes both in 1.12 -> 1.13 and 1.13 -> 1.14 and
> >  maybe also in the future versions,  because Table related APIs are still
> > @PublicEvolving and new Sink API is still @Experimental.
> >
>
> The question is: what is stable in an evolving system? We recently
> discovered that the old SourceFunction needed to be refined such that
> cancellation works correctly [1]. So that interface is in Flink since 7
> years, heavily used also outside, and we still had to change the contract
> in a way that I'd expect any implementer to recheck their implementation.
> It might not be necessary to change anything and you can probably change
> the the code for all Flink versions but still, the interface was not stable
> in the closest sense.
>
> If we focus just on API changes on the unified interfaces, then we expect
> one more change to Sink API to support compaction. For Table API, there
> will most likely also be some changes in 1.15. So we could wait for 1.15.
>
> But I'm questioning if that's really necessary because we will add more
> functionality beyond 1.15 without breaking API. For example, we may add
> more unified connector metrics. If you want to use it in your connector,
> you have to support multiple Flink versions anyhow. So rather then focusing
> the discussion on "when is stuff stable", I'd rather focus on "how can we
> support building connectors against multiple Flink versions" and make it as
> painless as possible.
>
> Chesnay pointed out to use different branches for different Flink versions
> which sounds like a good suggestion. With a mono-repo, we can't use
> branches 

Re: [DISCUSS] Creating an external connector repository

2021-10-26 Thread Arvid Heise
Hi folks,

I think some questions came up and I'd like to address the question of the
timing.

Could you clarify what release cadence you're thinking of? There's quite
> a big range that fits "more frequent than Flink" (per-commit, daily,
> weekly, bi-weekly, monthly, even bi-monthly).

The short answer is: as often as needed:
- If there is a CVE in a dependency and we need to bump it - release
immediately.
- If there is a new feature merged, release soonish. We may collect a few
successive features before a release.
- If there is a bugfix, release immediately or soonish depending on the
severity and if there are workarounds available.

We should not limit ourselves; the whole idea of independent releases is
exactly that you release as needed. There is no release planning or
anything needed, you just go with a release as if it was an external
artifact.

(1) is the connector API already stable?
> From another discussion thread [1], connector API is far from stable.
> Currently, it's hard to build connectors against multiple Flink versions.
> There are breaking API changes both in 1.12 -> 1.13 and 1.13 -> 1.14 and
>  maybe also in the future versions,  because Table related APIs are still
> @PublicEvolving and new Sink API is still @Experimental.
>

The question is: what is stable in an evolving system? We recently
discovered that the old SourceFunction needed to be refined such that
cancellation works correctly [1]. So that interface is in Flink since 7
years, heavily used also outside, and we still had to change the contract
in a way that I'd expect any implementer to recheck their implementation.
It might not be necessary to change anything and you can probably change
the the code for all Flink versions but still, the interface was not stable
in the closest sense.

If we focus just on API changes on the unified interfaces, then we expect
one more change to Sink API to support compaction. For Table API, there
will most likely also be some changes in 1.15. So we could wait for 1.15.

But I'm questioning if that's really necessary because we will add more
functionality beyond 1.15 without breaking API. For example, we may add
more unified connector metrics. If you want to use it in your connector,
you have to support multiple Flink versions anyhow. So rather then focusing
the discussion on "when is stuff stable", I'd rather focus on "how can we
support building connectors against multiple Flink versions" and make it as
painless as possible.

Chesnay pointed out to use different branches for different Flink versions
which sounds like a good suggestion. With a mono-repo, we can't use
branches differently anyways (there is no way to have release branches per
connector without chaos). In these branches, we could provide shims to
simulate future features in older Flink versions such that code-wise, the
source code of a specific connector may not diverge (much). For example, to
register unified connector metrics, we could simulate the current approach
also in some utility package of the mono-repo.

I see the stable core Flink API as a prerequisite for modularity. And
> for connectors it is not just the source and sink API (source being
> stable as of 1.14), but everything that is required to build and
> maintain a connector downstream, such as the test utilities and
> infrastructure.
>

That is a very fair point. I'm actually surprised to see that
MiniClusterWithClientResource is not public. I see it being used in all
connectors, especially outside of Flink. I fear that as long as we do not
have connectors outside, we will not properly annotate and maintain these
utilties in a classic hen-and-egg-problem. I will outline an idea at the
end.

> the connectors need to be adopted and require at least one release per
> Flink minor release.
> However, this will make the releases of connectors slower, e.g. maintain
> features for multiple branches and release multiple branches.
> I think the main purpose of having an external connector repository is in
> order to have "faster releases of connectors"?
>

> Imagine a project with a complex set of dependencies. Let's say Flink
> version A plus Flink reliant dependencies released by other projects
> (Flink-external connectors, Beam, Iceberg, Hudi, ..). We don't want a
> situation where we bump the core Flink version to B and things fall
> apart (interface changes, utilities that were useful but not public,
> transitive dependencies etc.).
>

Yes, that's why I wanted to automate the processes more which is not that
easy under ASF. Maybe we automate the source provision across supported
versions and have 1 vote thread for all versions of a connector?

>From the perspective of CDC connector maintainers, the biggest advantage of
> maintaining it outside of the Flink project is that:
> 1) we can have a more flexible and faster release cycle
> 2) we can be more liberal with committership for connector maintainers
> which can also attract more committers to help the release.
>
> 

Re: [DISCUSS] Creating an external connector repository

2021-10-20 Thread Kyle Bendickson
Hi all,

My name is Kyle and I’m an open source developer primarily focused on
Apache Iceberg.

I’m happy to help clarify or elaborate on any aspect of our experience
working on a relatively decoupled connector that is downstream and pretty
popular.

I’d also love to be able to contribute or assist in any way I can.

I don’t mean to thread jack, but are there any meetings or community sync
ups, specifically around the connector APIs, that I might join / be invited
to?

I did want to add that even though I’ve experienced some of the pain points
of integrating with an evolving system / API (catalog support is generally
speaking pretty new everywhere really in this space), I also agree
personally that you shouldn’t slow down development velocity too much for
the sake of external connector. Getting to a performant and stable place
should be the primary goal, and slowing that down to support stragglers
will (in my personal opinion) always be a losing game. Some folks will
simply stay behind on versions regardless until they have to upgrade.

I am working on ensuring that the Iceberg community stays within 1-2
versions of Flink, so that we can help provide more feedback or contribute
things that might make our ability to support multiple Flink runtimes /
versions with one project / codebase and minimal to no reflection (our
desired goal).

If there’s anything I can do or any way I can be of assistance, please
don’t hesitate to reach out. Or find me on ASF slack 

I greatly appreciate your general concern for the needs of downstream
connector integrators!

Cheers
Kyle Bendickson (GitHub: kbendick)
Open Source Developer
kyle [at] tabular [dot] io

On Wed, Oct 20, 2021 at 11:35 AM Thomas Weise  wrote:

> Hi,
>
> I see the stable core Flink API as a prerequisite for modularity. And
> for connectors it is not just the source and sink API (source being
> stable as of 1.14), but everything that is required to build and
> maintain a connector downstream, such as the test utilities and
> infrastructure.
>
> Without the stable surface of core Flink, changes will leak into
> downstream dependencies and force lock step updates. Refactoring
> across N repos is more painful than a single repo. Those with
> experience developing downstream of Flink will know the pain, and that
> isn't limited to connectors. I don't remember a Flink "minor version"
> update that was just a dependency version change and did not force
> other downstream changes.
>
> Imagine a project with a complex set of dependencies. Let's say Flink
> version A plus Flink reliant dependencies released by other projects
> (Flink-external connectors, Beam, Iceberg, Hudi, ..). We don't want a
> situation where we bump the core Flink version to B and things fall
> apart (interface changes, utilities that were useful but not public,
> transitive dependencies etc.).
>
> The discussion here also highlights the benefits of keeping certain
> connectors outside Flink. Whether that is due to difference in
> developer community, maturity of the connectors, their
> specialized/limited usage etc. I would like to see that as a sign of a
> growing ecosystem and most of the ideas that Arvid has put forward
> would benefit further growth of the connector ecosystem.
>
> As for keeping connectors within Apache Flink: I prefer that as the
> path forward for "essential" connectors like FileSource, KafkaSource,
> ... And we can still achieve a more flexible and faster release cycle.
>
> Thanks,
> Thomas
>
>
>
>
>
> On Wed, Oct 20, 2021 at 3:32 AM Jark Wu  wrote:
> >
> > Hi Konstantin,
> >
> > > the connectors need to be adopted and require at least one release per
> > Flink minor release.
> > However, this will make the releases of connectors slower, e.g. maintain
> > features for multiple branches and release multiple branches.
> > I think the main purpose of having an external connector repository is in
> > order to have "faster releases of connectors"?
> >
> >
> > From the perspective of CDC connector maintainers, the biggest advantage
> of
> > maintaining it outside of the Flink project is that:
> > 1) we can have a more flexible and faster release cycle
> > 2) we can be more liberal with committership for connector maintainers
> > which can also attract more committers to help the release.
> >
> > Personally, I think maintaining one connector repository under the ASF
> may
> > not have the above benefits.
> >
> > Best,
> > Jark
> >
> > On Wed, 20 Oct 2021 at 15:14, Konstantin Knauf 
> wrote:
> >
> > > Hi everyone,
> > >
> > > regarding the stability of the APIs. I think everyone agrees that
> > > connector APIs which are stable across minor versions (1.13->1.14) are
> the
> > > mid-term goal. But:
> > >
> > > a) These APIs are still quite young, and we shouldn't make them @Public
> > > prematurely either.
> > >
> > > b) Isn't this *mostly* orthogonal to where the connector code lives?
> Yes,
> > > as long as there are breaking changes, the connectors need to be
> adopted
> > > 

Re: [DISCUSS] Creating an external connector repository

2021-10-20 Thread Thomas Weise
Hi,

I see the stable core Flink API as a prerequisite for modularity. And
for connectors it is not just the source and sink API (source being
stable as of 1.14), but everything that is required to build and
maintain a connector downstream, such as the test utilities and
infrastructure.

Without the stable surface of core Flink, changes will leak into
downstream dependencies and force lock step updates. Refactoring
across N repos is more painful than a single repo. Those with
experience developing downstream of Flink will know the pain, and that
isn't limited to connectors. I don't remember a Flink "minor version"
update that was just a dependency version change and did not force
other downstream changes.

Imagine a project with a complex set of dependencies. Let's say Flink
version A plus Flink reliant dependencies released by other projects
(Flink-external connectors, Beam, Iceberg, Hudi, ..). We don't want a
situation where we bump the core Flink version to B and things fall
apart (interface changes, utilities that were useful but not public,
transitive dependencies etc.).

The discussion here also highlights the benefits of keeping certain
connectors outside Flink. Whether that is due to difference in
developer community, maturity of the connectors, their
specialized/limited usage etc. I would like to see that as a sign of a
growing ecosystem and most of the ideas that Arvid has put forward
would benefit further growth of the connector ecosystem.

As for keeping connectors within Apache Flink: I prefer that as the
path forward for "essential" connectors like FileSource, KafkaSource,
... And we can still achieve a more flexible and faster release cycle.

Thanks,
Thomas





On Wed, Oct 20, 2021 at 3:32 AM Jark Wu  wrote:
>
> Hi Konstantin,
>
> > the connectors need to be adopted and require at least one release per
> Flink minor release.
> However, this will make the releases of connectors slower, e.g. maintain
> features for multiple branches and release multiple branches.
> I think the main purpose of having an external connector repository is in
> order to have "faster releases of connectors"?
>
>
> From the perspective of CDC connector maintainers, the biggest advantage of
> maintaining it outside of the Flink project is that:
> 1) we can have a more flexible and faster release cycle
> 2) we can be more liberal with committership for connector maintainers
> which can also attract more committers to help the release.
>
> Personally, I think maintaining one connector repository under the ASF may
> not have the above benefits.
>
> Best,
> Jark
>
> On Wed, 20 Oct 2021 at 15:14, Konstantin Knauf  wrote:
>
> > Hi everyone,
> >
> > regarding the stability of the APIs. I think everyone agrees that
> > connector APIs which are stable across minor versions (1.13->1.14) are the
> > mid-term goal. But:
> >
> > a) These APIs are still quite young, and we shouldn't make them @Public
> > prematurely either.
> >
> > b) Isn't this *mostly* orthogonal to where the connector code lives? Yes,
> > as long as there are breaking changes, the connectors need to be adopted
> > and require at least one release per Flink minor release.
> > Documentation-wise this can be addressed via a compatibility matrix for
> > each connector as Arvid suggested. IMO we shouldn't block this effort on
> > the stability of the APIs.
> >
> > Cheers,
> >
> > Konstantin
> >
> >
> >
> > On Wed, Oct 20, 2021 at 8:56 AM Jark Wu  wrote:
> >
> >> Hi,
> >>
> >> I think Thomas raised very good questions and would like to know your
> >> opinions if we want to move connectors out of flink in this version.
> >>
> >> (1) is the connector API already stable?
> >> > Separate releases would only make sense if the core Flink surface is
> >> > fairly stable though. As evident from Iceberg (and also Beam), that's
> >> > not the case currently. We should probably focus on addressing the
> >> > stability first, before splitting code. A success criteria could be
> >> > that we are able to build Iceberg and Beam against multiple Flink
> >> > versions w/o the need to change code. The goal would be that no
> >> > connector breaks when we make changes to Flink core. Until that's the
> >> > case, code separation creates a setup where 1+1 or N+1 repositories
> >> > need to move lock step.
> >>
> >> From another discussion thread [1], connector API is far from stable.
> >> Currently, it's hard to build connectors against multiple Flink versions.
> >> There are breaking API changes both in 1.12 -> 1.13 and 1.13 -> 1.14 and
> >>  maybe also in the future versions,  because Table related APIs are still
> >> @PublicEvolving and new Sink API is still @Experimental.
> >>
> >>
> >> (2) Flink testability without connectors.
> >> > Flink w/o Kafka connector (and few others) isn't
> >> > viable. Testability of Flink was already brought up, can we really
> >> > certify a Flink core release without Kafka connector? Maybe those
> >> > connectors that are used in Flink e2e tests to 

Re: [DISCUSS] Creating an external connector repository

2021-10-20 Thread Jark Wu
Hi Konstantin,

> the connectors need to be adopted and require at least one release per
Flink minor release.
However, this will make the releases of connectors slower, e.g. maintain
features for multiple branches and release multiple branches.
I think the main purpose of having an external connector repository is in
order to have "faster releases of connectors"?


>From the perspective of CDC connector maintainers, the biggest advantage of
maintaining it outside of the Flink project is that:
1) we can have a more flexible and faster release cycle
2) we can be more liberal with committership for connector maintainers
which can also attract more committers to help the release.

Personally, I think maintaining one connector repository under the ASF may
not have the above benefits.

Best,
Jark

On Wed, 20 Oct 2021 at 15:14, Konstantin Knauf  wrote:

> Hi everyone,
>
> regarding the stability of the APIs. I think everyone agrees that
> connector APIs which are stable across minor versions (1.13->1.14) are the
> mid-term goal. But:
>
> a) These APIs are still quite young, and we shouldn't make them @Public
> prematurely either.
>
> b) Isn't this *mostly* orthogonal to where the connector code lives? Yes,
> as long as there are breaking changes, the connectors need to be adopted
> and require at least one release per Flink minor release.
> Documentation-wise this can be addressed via a compatibility matrix for
> each connector as Arvid suggested. IMO we shouldn't block this effort on
> the stability of the APIs.
>
> Cheers,
>
> Konstantin
>
>
>
> On Wed, Oct 20, 2021 at 8:56 AM Jark Wu  wrote:
>
>> Hi,
>>
>> I think Thomas raised very good questions and would like to know your
>> opinions if we want to move connectors out of flink in this version.
>>
>> (1) is the connector API already stable?
>> > Separate releases would only make sense if the core Flink surface is
>> > fairly stable though. As evident from Iceberg (and also Beam), that's
>> > not the case currently. We should probably focus on addressing the
>> > stability first, before splitting code. A success criteria could be
>> > that we are able to build Iceberg and Beam against multiple Flink
>> > versions w/o the need to change code. The goal would be that no
>> > connector breaks when we make changes to Flink core. Until that's the
>> > case, code separation creates a setup where 1+1 or N+1 repositories
>> > need to move lock step.
>>
>> From another discussion thread [1], connector API is far from stable.
>> Currently, it's hard to build connectors against multiple Flink versions.
>> There are breaking API changes both in 1.12 -> 1.13 and 1.13 -> 1.14 and
>>  maybe also in the future versions,  because Table related APIs are still
>> @PublicEvolving and new Sink API is still @Experimental.
>>
>>
>> (2) Flink testability without connectors.
>> > Flink w/o Kafka connector (and few others) isn't
>> > viable. Testability of Flink was already brought up, can we really
>> > certify a Flink core release without Kafka connector? Maybe those
>> > connectors that are used in Flink e2e tests to validate functionality
>> > of core Flink should not be broken out?
>>
>> This is a very good question. How can we guarantee the new Source and Sink
>> API are stable with only test implementation?
>>
>>
>> Best,
>> Jark
>>
>>
>>
>>
>>
>> On Tue, 19 Oct 2021 at 23:56, Chesnay Schepler 
>> wrote:
>>
>> > Could you clarify what release cadence you're thinking of? There's quite
>> > a big range that fits "more frequent than Flink" (per-commit, daily,
>> > weekly, bi-weekly, monthly, even bi-monthly).
>> >
>> > On 19/10/2021 14:15, Martijn Visser wrote:
>> > > Hi all,
>> > >
>> > > I think it would be a huge benefit if we can achieve more frequent
>> > releases
>> > > of connectors, which are not bound to the release cycle of Flink
>> itself.
>> > I
>> > > agree that in order to get there, we need to have stable interfaces
>> which
>> > > are trustworthy and reliable, so they can be safely used by those
>> > > connectors. I do think that work still needs to be done on those
>> > > interfaces, but I am confident that we can get there from a Flink
>> > > perspective.
>> > >
>> > > I am worried that we would not be able to achieve those frequent
>> releases
>> > > of connectors if we are putting these connectors under the Apache
>> > umbrella,
>> > > because that means that for each connector release we have to follow
>> the
>> > > Apache release creation process. This requires a lot of manual steps
>> and
>> > > prohibits automation and I think it would be hard to scale out
>> frequent
>> > > releases of connectors. I'm curious how others think this challenge
>> could
>> > > be solved.
>> > >
>> > > Best regards,
>> > >
>> > > Martijn
>> > >
>> > > On Mon, 18 Oct 2021 at 22:22, Thomas Weise  wrote:
>> > >
>> > >> Thanks for initiating this discussion.
>> > >>
>> > >> There are definitely a few things that are not optimal with our
>> > >> current management of connectors. I 

Re: [DISCUSS] Creating an external connector repository

2021-10-20 Thread Konstantin Knauf
Hi everyone,

regarding the stability of the APIs. I think everyone agrees that
connector APIs which are stable across minor versions (1.13->1.14) are the
mid-term goal. But:

a) These APIs are still quite young, and we shouldn't make them @Public
prematurely either.

b) Isn't this *mostly* orthogonal to where the connector code lives? Yes,
as long as there are breaking changes, the connectors need to be adopted
and require at least one release per Flink minor release.
Documentation-wise this can be addressed via a compatibility matrix for
each connector as Arvid suggested. IMO we shouldn't block this effort on
the stability of the APIs.

Cheers,

Konstantin



On Wed, Oct 20, 2021 at 8:56 AM Jark Wu  wrote:

> Hi,
>
> I think Thomas raised very good questions and would like to know your
> opinions if we want to move connectors out of flink in this version.
>
> (1) is the connector API already stable?
> > Separate releases would only make sense if the core Flink surface is
> > fairly stable though. As evident from Iceberg (and also Beam), that's
> > not the case currently. We should probably focus on addressing the
> > stability first, before splitting code. A success criteria could be
> > that we are able to build Iceberg and Beam against multiple Flink
> > versions w/o the need to change code. The goal would be that no
> > connector breaks when we make changes to Flink core. Until that's the
> > case, code separation creates a setup where 1+1 or N+1 repositories
> > need to move lock step.
>
> From another discussion thread [1], connector API is far from stable.
> Currently, it's hard to build connectors against multiple Flink versions.
> There are breaking API changes both in 1.12 -> 1.13 and 1.13 -> 1.14 and
>  maybe also in the future versions,  because Table related APIs are still
> @PublicEvolving and new Sink API is still @Experimental.
>
>
> (2) Flink testability without connectors.
> > Flink w/o Kafka connector (and few others) isn't
> > viable. Testability of Flink was already brought up, can we really
> > certify a Flink core release without Kafka connector? Maybe those
> > connectors that are used in Flink e2e tests to validate functionality
> > of core Flink should not be broken out?
>
> This is a very good question. How can we guarantee the new Source and Sink
> API are stable with only test implementation?
>
>
> Best,
> Jark
>
>
>
>
>
> On Tue, 19 Oct 2021 at 23:56, Chesnay Schepler  wrote:
>
> > Could you clarify what release cadence you're thinking of? There's quite
> > a big range that fits "more frequent than Flink" (per-commit, daily,
> > weekly, bi-weekly, monthly, even bi-monthly).
> >
> > On 19/10/2021 14:15, Martijn Visser wrote:
> > > Hi all,
> > >
> > > I think it would be a huge benefit if we can achieve more frequent
> > releases
> > > of connectors, which are not bound to the release cycle of Flink
> itself.
> > I
> > > agree that in order to get there, we need to have stable interfaces
> which
> > > are trustworthy and reliable, so they can be safely used by those
> > > connectors. I do think that work still needs to be done on those
> > > interfaces, but I am confident that we can get there from a Flink
> > > perspective.
> > >
> > > I am worried that we would not be able to achieve those frequent
> releases
> > > of connectors if we are putting these connectors under the Apache
> > umbrella,
> > > because that means that for each connector release we have to follow
> the
> > > Apache release creation process. This requires a lot of manual steps
> and
> > > prohibits automation and I think it would be hard to scale out frequent
> > > releases of connectors. I'm curious how others think this challenge
> could
> > > be solved.
> > >
> > > Best regards,
> > >
> > > Martijn
> > >
> > > On Mon, 18 Oct 2021 at 22:22, Thomas Weise  wrote:
> > >
> > >> Thanks for initiating this discussion.
> > >>
> > >> There are definitely a few things that are not optimal with our
> > >> current management of connectors. I would not necessarily characterize
> > >> it as a "mess" though. As the points raised so far show, it isn't easy
> > >> to find a solution that balances competing requirements and leads to a
> > >> net improvement.
> > >>
> > >> It would be great if we can find a setup that allows for connectors to
> > >> be released independently of core Flink and that each connector can be
> > >> released separately. Flink already has separate releases
> > >> (flink-shaded), so that by itself isn't a new thing. Per-connector
> > >> releases would need to allow for more frequent releases (without the
> > >> baggage that a full Flink release comes with).
> > >>
> > >> Separate releases would only make sense if the core Flink surface is
> > >> fairly stable though. As evident from Iceberg (and also Beam), that's
> > >> not the case currently. We should probably focus on addressing the
> > >> stability first, before splitting code. A success criteria could be
> > >> that we are able to build 

Re: [DISCUSS] Creating an external connector repository

2021-10-20 Thread Jark Wu
Hi,

I think Thomas raised very good questions and would like to know your
opinions if we want to move connectors out of flink in this version.

(1) is the connector API already stable?
> Separate releases would only make sense if the core Flink surface is
> fairly stable though. As evident from Iceberg (and also Beam), that's
> not the case currently. We should probably focus on addressing the
> stability first, before splitting code. A success criteria could be
> that we are able to build Iceberg and Beam against multiple Flink
> versions w/o the need to change code. The goal would be that no
> connector breaks when we make changes to Flink core. Until that's the
> case, code separation creates a setup where 1+1 or N+1 repositories
> need to move lock step.

>From another discussion thread [1], connector API is far from stable.
Currently, it's hard to build connectors against multiple Flink versions.
There are breaking API changes both in 1.12 -> 1.13 and 1.13 -> 1.14 and
 maybe also in the future versions,  because Table related APIs are still
@PublicEvolving and new Sink API is still @Experimental.


(2) Flink testability without connectors.
> Flink w/o Kafka connector (and few others) isn't
> viable. Testability of Flink was already brought up, can we really
> certify a Flink core release without Kafka connector? Maybe those
> connectors that are used in Flink e2e tests to validate functionality
> of core Flink should not be broken out?

This is a very good question. How can we guarantee the new Source and Sink
API are stable with only test implementation?


Best,
Jark





On Tue, 19 Oct 2021 at 23:56, Chesnay Schepler  wrote:

> Could you clarify what release cadence you're thinking of? There's quite
> a big range that fits "more frequent than Flink" (per-commit, daily,
> weekly, bi-weekly, monthly, even bi-monthly).
>
> On 19/10/2021 14:15, Martijn Visser wrote:
> > Hi all,
> >
> > I think it would be a huge benefit if we can achieve more frequent
> releases
> > of connectors, which are not bound to the release cycle of Flink itself.
> I
> > agree that in order to get there, we need to have stable interfaces which
> > are trustworthy and reliable, so they can be safely used by those
> > connectors. I do think that work still needs to be done on those
> > interfaces, but I am confident that we can get there from a Flink
> > perspective.
> >
> > I am worried that we would not be able to achieve those frequent releases
> > of connectors if we are putting these connectors under the Apache
> umbrella,
> > because that means that for each connector release we have to follow the
> > Apache release creation process. This requires a lot of manual steps and
> > prohibits automation and I think it would be hard to scale out frequent
> > releases of connectors. I'm curious how others think this challenge could
> > be solved.
> >
> > Best regards,
> >
> > Martijn
> >
> > On Mon, 18 Oct 2021 at 22:22, Thomas Weise  wrote:
> >
> >> Thanks for initiating this discussion.
> >>
> >> There are definitely a few things that are not optimal with our
> >> current management of connectors. I would not necessarily characterize
> >> it as a "mess" though. As the points raised so far show, it isn't easy
> >> to find a solution that balances competing requirements and leads to a
> >> net improvement.
> >>
> >> It would be great if we can find a setup that allows for connectors to
> >> be released independently of core Flink and that each connector can be
> >> released separately. Flink already has separate releases
> >> (flink-shaded), so that by itself isn't a new thing. Per-connector
> >> releases would need to allow for more frequent releases (without the
> >> baggage that a full Flink release comes with).
> >>
> >> Separate releases would only make sense if the core Flink surface is
> >> fairly stable though. As evident from Iceberg (and also Beam), that's
> >> not the case currently. We should probably focus on addressing the
> >> stability first, before splitting code. A success criteria could be
> >> that we are able to build Iceberg and Beam against multiple Flink
> >> versions w/o the need to change code. The goal would be that no
> >> connector breaks when we make changes to Flink core. Until that's the
> >> case, code separation creates a setup where 1+1 or N+1 repositories
> >> need to move lock step.
> >>
> >> Regarding some connectors being more important for Flink than others:
> >> That's a fact. Flink w/o Kafka connector (and few others) isn't
> >> viable. Testability of Flink was already brought up, can we really
> >> certify a Flink core release without Kafka connector? Maybe those
> >> connectors that are used in Flink e2e tests to validate functionality
> >> of core Flink should not be broken out?
> >>
> >> Finally, I think that the connectors that move into separate repos
> >> should remain part of the Apache Flink project. Larger organizations
> >> tend to approve the use of and contribution to open 

Re: [DISCUSS] Creating an external connector repository

2021-10-19 Thread Chesnay Schepler
Could you clarify what release cadence you're thinking of? There's quite 
a big range that fits "more frequent than Flink" (per-commit, daily, 
weekly, bi-weekly, monthly, even bi-monthly).


On 19/10/2021 14:15, Martijn Visser wrote:

Hi all,

I think it would be a huge benefit if we can achieve more frequent releases
of connectors, which are not bound to the release cycle of Flink itself. I
agree that in order to get there, we need to have stable interfaces which
are trustworthy and reliable, so they can be safely used by those
connectors. I do think that work still needs to be done on those
interfaces, but I am confident that we can get there from a Flink
perspective.

I am worried that we would not be able to achieve those frequent releases
of connectors if we are putting these connectors under the Apache umbrella,
because that means that for each connector release we have to follow the
Apache release creation process. This requires a lot of manual steps and
prohibits automation and I think it would be hard to scale out frequent
releases of connectors. I'm curious how others think this challenge could
be solved.

Best regards,

Martijn

On Mon, 18 Oct 2021 at 22:22, Thomas Weise  wrote:


Thanks for initiating this discussion.

There are definitely a few things that are not optimal with our
current management of connectors. I would not necessarily characterize
it as a "mess" though. As the points raised so far show, it isn't easy
to find a solution that balances competing requirements and leads to a
net improvement.

It would be great if we can find a setup that allows for connectors to
be released independently of core Flink and that each connector can be
released separately. Flink already has separate releases
(flink-shaded), so that by itself isn't a new thing. Per-connector
releases would need to allow for more frequent releases (without the
baggage that a full Flink release comes with).

Separate releases would only make sense if the core Flink surface is
fairly stable though. As evident from Iceberg (and also Beam), that's
not the case currently. We should probably focus on addressing the
stability first, before splitting code. A success criteria could be
that we are able to build Iceberg and Beam against multiple Flink
versions w/o the need to change code. The goal would be that no
connector breaks when we make changes to Flink core. Until that's the
case, code separation creates a setup where 1+1 or N+1 repositories
need to move lock step.

Regarding some connectors being more important for Flink than others:
That's a fact. Flink w/o Kafka connector (and few others) isn't
viable. Testability of Flink was already brought up, can we really
certify a Flink core release without Kafka connector? Maybe those
connectors that are used in Flink e2e tests to validate functionality
of core Flink should not be broken out?

Finally, I think that the connectors that move into separate repos
should remain part of the Apache Flink project. Larger organizations
tend to approve the use of and contribution to open source at the
project level. Sometimes it is everything ASF. More often it is
"Apache Foo". It would be fatal to end up with a patchwork of projects
with potentially different licenses and governance to arrive at a
working Flink setup. This may mean we prioritize usability over
developer convenience, if that's in the best interest of Flink as a
whole.

Thanks,
Thomas



On Mon, Oct 18, 2021 at 6:59 AM Chesnay Schepler 
wrote:

Generally, the issues are reproducibility and control.

Stuffs completely broken on the Flink side for a week? Well then so are
the connector repos.
(As-is) You can't go back to a previous version of the snapshot. Which
also means that checking out older commits can be problematic because
you'd still work against the latest snapshots, and they not be
compatible with each other.


On 18/10/2021 15:22, Arvid Heise wrote:

I was actually betting on snapshots versions. What are the limits?
Obviously, we can only do a release of a 1.15 connector after 1.15 is
release.






Re: [DISCUSS] Creating an external connector repository

2021-10-19 Thread Chesnay Schepler
TBH I think you're overestimating how much work it is to create a 
non-Flink release. Having done most of the flink-shaded releases, I 
really don't see an issue of even doing weekly releases with that process.


We can not reduce the number of votes AFAIK; the ASF seems very clear on 
that matter to me: 
https://www.apache.org/foundation/voting.html#ReleaseVotes

However, the vote duration is up to us.

Additionally, we only /need /to vote on the /source/. This means we 
don't need to create the maven artifacts for each RC, but can do that at 
the very end.


On 19/10/2021 14:21, Arvid Heise wrote:
Okay I think it is clear that the majority would like to keep 
connectors under the Apache Flink umbrella. That means we will not be 
able to have per-connector repositories and project management, 
automatic dependency bumping with Dependabot, or semi-automatic releases.


So then I'm assuming the directory structure that @Chesnay Schepler 
 proposed would be the most beneficial:

- A root project with some convenience setup.
- Unrelated subprojects with individual versioning and releases.
- Branches for minor Flink releases. That is needed anyhow to use new 
features independent of API stability.
- Each connector maintains its own documentation that is accessible 
through the main documentation.


Any thoughts on alternatives? Do you see risks?

@Stephan Ewen  mentioned offline that we 
could adjust the bylaws for the connectors such that we need fewer 
PMCs to approve a release. Would it be enough to have one PMC vote per 
connector release? Do you know of other ways to tweak the release 
process to have fewer manual work?


On Mon, Oct 18, 2021 at 10:22 PM Thomas Weise  wrote:

Thanks for initiating this discussion.

There are definitely a few things that are not optimal with our
current management of connectors. I would not necessarily characterize
it as a "mess" though. As the points raised so far show, it isn't easy
to find a solution that balances competing requirements and leads to a
net improvement.

It would be great if we can find a setup that allows for connectors to
be released independently of core Flink and that each connector can be
released separately. Flink already has separate releases
(flink-shaded), so that by itself isn't a new thing. Per-connector
releases would need to allow for more frequent releases (without the
baggage that a full Flink release comes with).

Separate releases would only make sense if the core Flink surface is
fairly stable though. As evident from Iceberg (and also Beam), that's
not the case currently. We should probably focus on addressing the
stability first, before splitting code. A success criteria could be
that we are able to build Iceberg and Beam against multiple Flink
versions w/o the need to change code. The goal would be that no
connector breaks when we make changes to Flink core. Until that's the
case, code separation creates a setup where 1+1 or N+1 repositories
need to move lock step.

Regarding some connectors being more important for Flink than others:
That's a fact. Flink w/o Kafka connector (and few others) isn't
viable. Testability of Flink was already brought up, can we really
certify a Flink core release without Kafka connector? Maybe those
connectors that are used in Flink e2e tests to validate functionality
of core Flink should not be broken out?

Finally, I think that the connectors that move into separate repos
should remain part of the Apache Flink project. Larger organizations
tend to approve the use of and contribution to open source at the
project level. Sometimes it is everything ASF. More often it is
"Apache Foo". It would be fatal to end up with a patchwork of projects
with potentially different licenses and governance to arrive at a
working Flink setup. This may mean we prioritize usability over
developer convenience, if that's in the best interest of Flink as a
whole.

Thanks,
Thomas



On Mon, Oct 18, 2021 at 6:59 AM Chesnay Schepler
 wrote:
>
> Generally, the issues are reproducibility and control.
>
> Stuffs completely broken on the Flink side for a week? Well then
so are
> the connector repos.
> (As-is) You can't go back to a previous version of the snapshot.
Which
> also means that checking out older commits can be problematic
because
> you'd still work against the latest snapshots, and they not be
> compatible with each other.
>
>
> On 18/10/2021 15:22, Arvid Heise wrote:
> > I was actually betting on snapshots versions. What are the limits?
> > Obviously, we can only do a release of a 1.15 connector after
1.15 is
> > release.
>
>



Re: [DISCUSS] Creating an external connector repository

2021-10-19 Thread Dawid Wysakowicz
Hey all,

I don't have much to add to the general discussion. Just a single
comment on:

that we could adjust the bylaws for the connectors such that we need
fewer PMCs to approve a release. Would it be enough to have one PMC
vote per connector release?

I think it's not an option. This particular rule is one of few rules
from the bylaws that actually originates from ASF rather than was
established within the Flink community. I believe we do need 3 PMC votes
for any formal ASF releases [1].

Votes on whether a package is ready to release use majority
approval-- i.e. at least three PMC members must vote affirmatively
for release, and there must be more positive than negative votes.
Releases may not be vetoed*.*Generally the community will cancel the
release vote if anyone identifies serious problems, but in most
cases the ultimate decision lies with the individual serving as
release manager. The specifics of the process may vary from project
to project,*but the 'minimum quorum of three +1 votes' rule is
universal.*

Best,

Dawid

https://www.apache.org/foundation/voting.html#ReleaseVotes

On 19/10/2021 14:21, Arvid Heise wrote:
> Okay I think it is clear that the majority would like to keep connectors
> under the Apache Flink umbrella. That means we will not be able to have
> per-connector repositories and project management, automatic dependency
> bumping with Dependabot, or semi-automatic releases.
>
> So then I'm assuming the directory structure that @Chesnay Schepler
>  proposed would be the most beneficial:
> - A root project with some convenience setup.
> - Unrelated subprojects with individual versioning and releases.
> - Branches for minor Flink releases. That is needed anyhow to use new
> features independent of API stability.
> - Each connector maintains its own documentation that is accessible through
> the main documentation.
>
> Any thoughts on alternatives? Do you see risks?
>
> @Stephan Ewen  mentioned offline that we could adjust the
> bylaws for the connectors such that we need fewer PMCs to approve a
> release. Would it be enough to have one PMC vote per connector release? Do
> you know of other ways to tweak the release process to have fewer manual
> work?
>
> On Mon, Oct 18, 2021 at 10:22 PM Thomas Weise  wrote:
>
>> Thanks for initiating this discussion.
>>
>> There are definitely a few things that are not optimal with our
>> current management of connectors. I would not necessarily characterize
>> it as a "mess" though. As the points raised so far show, it isn't easy
>> to find a solution that balances competing requirements and leads to a
>> net improvement.
>>
>> It would be great if we can find a setup that allows for connectors to
>> be released independently of core Flink and that each connector can be
>> released separately. Flink already has separate releases
>> (flink-shaded), so that by itself isn't a new thing. Per-connector
>> releases would need to allow for more frequent releases (without the
>> baggage that a full Flink release comes with).
>>
>> Separate releases would only make sense if the core Flink surface is
>> fairly stable though. As evident from Iceberg (and also Beam), that's
>> not the case currently. We should probably focus on addressing the
>> stability first, before splitting code. A success criteria could be
>> that we are able to build Iceberg and Beam against multiple Flink
>> versions w/o the need to change code. The goal would be that no
>> connector breaks when we make changes to Flink core. Until that's the
>> case, code separation creates a setup where 1+1 or N+1 repositories
>> need to move lock step.
>>
>> Regarding some connectors being more important for Flink than others:
>> That's a fact. Flink w/o Kafka connector (and few others) isn't
>> viable. Testability of Flink was already brought up, can we really
>> certify a Flink core release without Kafka connector? Maybe those
>> connectors that are used in Flink e2e tests to validate functionality
>> of core Flink should not be broken out?
>>
>> Finally, I think that the connectors that move into separate repos
>> should remain part of the Apache Flink project. Larger organizations
>> tend to approve the use of and contribution to open source at the
>> project level. Sometimes it is everything ASF. More often it is
>> "Apache Foo". It would be fatal to end up with a patchwork of projects
>> with potentially different licenses and governance to arrive at a
>> working Flink setup. This may mean we prioritize usability over
>> developer convenience, if that's in the best interest of Flink as a
>> whole.
>>
>> Thanks,
>> Thomas
>>
>>
>>
>> On Mon, Oct 18, 2021 at 6:59 AM Chesnay Schepler 
>> wrote:
>>> Generally, the issues are reproducibility and control.
>>>
>>> Stuffs completely broken on the Flink side for a week? Well then so are
>>> the connector repos.
>>> (As-is) You can't go back to a previous version of the snapshot. Which
>>> also means that 

Re: [DISCUSS] Creating an external connector repository

2021-10-19 Thread Konstantin Knauf
Thank you, Arvid & team, for working on this.

I would also favor one connector repository under the ASF. This will
already force us to provide better tools and more stable APIs, which
connectors developed outside of Apache Flink will benefit from, too.

Besides simplifying the formal release process for connectors, I believe,
we can also be more liberal with Committership for connector maintainers.

I expect that this setup can scale better than the current one, but it
doesn't scale super well either. In addition, there is still the ASF
barrier to contributions/releases. So, we might have more connectors in
this repository than we have in Apache Flink right now, but not all
connectors will end up in this repository. For those "external" connectors,
we should still aim to improve visibility, documentation and tooling.

It feels like such a hybrid approach might be the only option given
competing requirements.

Thanks,

Konstnatin

On Mon, Oct 18, 2021 at 10:22 PM Thomas Weise  wrote:

> Thanks for initiating this discussion.
>
> There are definitely a few things that are not optimal with our
> current management of connectors. I would not necessarily characterize
> it as a "mess" though. As the points raised so far show, it isn't easy
> to find a solution that balances competing requirements and leads to a
> net improvement.
>
> It would be great if we can find a setup that allows for connectors to
> be released independently of core Flink and that each connector can be
> released separately. Flink already has separate releases
> (flink-shaded), so that by itself isn't a new thing. Per-connector
> releases would need to allow for more frequent releases (without the
> baggage that a full Flink release comes with).
>
> Separate releases would only make sense if the core Flink surface is
> fairly stable though. As evident from Iceberg (and also Beam), that's
> not the case currently. We should probably focus on addressing the
> stability first, before splitting code. A success criteria could be
> that we are able to build Iceberg and Beam against multiple Flink
> versions w/o the need to change code. The goal would be that no
> connector breaks when we make changes to Flink core. Until that's the
> case, code separation creates a setup where 1+1 or N+1 repositories
> need to move lock step.
>
> Regarding some connectors being more important for Flink than others:
> That's a fact. Flink w/o Kafka connector (and few others) isn't
> viable. Testability of Flink was already brought up, can we really
> certify a Flink core release without Kafka connector? Maybe those
> connectors that are used in Flink e2e tests to validate functionality
> of core Flink should not be broken out?
>
> Finally, I think that the connectors that move into separate repos
> should remain part of the Apache Flink project. Larger organizations
> tend to approve the use of and contribution to open source at the
> project level. Sometimes it is everything ASF. More often it is
> "Apache Foo". It would be fatal to end up with a patchwork of projects
> with potentially different licenses and governance to arrive at a
> working Flink setup. This may mean we prioritize usability over
> developer convenience, if that's in the best interest of Flink as a
> whole.
>
> Thanks,
> Thomas
>
>
>
> On Mon, Oct 18, 2021 at 6:59 AM Chesnay Schepler 
> wrote:
> >
> > Generally, the issues are reproducibility and control.
> >
> > Stuffs completely broken on the Flink side for a week? Well then so are
> > the connector repos.
> > (As-is) You can't go back to a previous version of the snapshot. Which
> > also means that checking out older commits can be problematic because
> > you'd still work against the latest snapshots, and they not be
> > compatible with each other.
> >
> >
> > On 18/10/2021 15:22, Arvid Heise wrote:
> > > I was actually betting on snapshots versions. What are the limits?
> > > Obviously, we can only do a release of a 1.15 connector after 1.15 is
> > > release.
> >
> >
>


-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk


Re: [DISCUSS] Creating an external connector repository

2021-10-19 Thread Arvid Heise
Okay I think it is clear that the majority would like to keep connectors
under the Apache Flink umbrella. That means we will not be able to have
per-connector repositories and project management, automatic dependency
bumping with Dependabot, or semi-automatic releases.

So then I'm assuming the directory structure that @Chesnay Schepler
 proposed would be the most beneficial:
- A root project with some convenience setup.
- Unrelated subprojects with individual versioning and releases.
- Branches for minor Flink releases. That is needed anyhow to use new
features independent of API stability.
- Each connector maintains its own documentation that is accessible through
the main documentation.

Any thoughts on alternatives? Do you see risks?

@Stephan Ewen  mentioned offline that we could adjust the
bylaws for the connectors such that we need fewer PMCs to approve a
release. Would it be enough to have one PMC vote per connector release? Do
you know of other ways to tweak the release process to have fewer manual
work?

On Mon, Oct 18, 2021 at 10:22 PM Thomas Weise  wrote:

> Thanks for initiating this discussion.
>
> There are definitely a few things that are not optimal with our
> current management of connectors. I would not necessarily characterize
> it as a "mess" though. As the points raised so far show, it isn't easy
> to find a solution that balances competing requirements and leads to a
> net improvement.
>
> It would be great if we can find a setup that allows for connectors to
> be released independently of core Flink and that each connector can be
> released separately. Flink already has separate releases
> (flink-shaded), so that by itself isn't a new thing. Per-connector
> releases would need to allow for more frequent releases (without the
> baggage that a full Flink release comes with).
>
> Separate releases would only make sense if the core Flink surface is
> fairly stable though. As evident from Iceberg (and also Beam), that's
> not the case currently. We should probably focus on addressing the
> stability first, before splitting code. A success criteria could be
> that we are able to build Iceberg and Beam against multiple Flink
> versions w/o the need to change code. The goal would be that no
> connector breaks when we make changes to Flink core. Until that's the
> case, code separation creates a setup where 1+1 or N+1 repositories
> need to move lock step.
>
> Regarding some connectors being more important for Flink than others:
> That's a fact. Flink w/o Kafka connector (and few others) isn't
> viable. Testability of Flink was already brought up, can we really
> certify a Flink core release without Kafka connector? Maybe those
> connectors that are used in Flink e2e tests to validate functionality
> of core Flink should not be broken out?
>
> Finally, I think that the connectors that move into separate repos
> should remain part of the Apache Flink project. Larger organizations
> tend to approve the use of and contribution to open source at the
> project level. Sometimes it is everything ASF. More often it is
> "Apache Foo". It would be fatal to end up with a patchwork of projects
> with potentially different licenses and governance to arrive at a
> working Flink setup. This may mean we prioritize usability over
> developer convenience, if that's in the best interest of Flink as a
> whole.
>
> Thanks,
> Thomas
>
>
>
> On Mon, Oct 18, 2021 at 6:59 AM Chesnay Schepler 
> wrote:
> >
> > Generally, the issues are reproducibility and control.
> >
> > Stuffs completely broken on the Flink side for a week? Well then so are
> > the connector repos.
> > (As-is) You can't go back to a previous version of the snapshot. Which
> > also means that checking out older commits can be problematic because
> > you'd still work against the latest snapshots, and they not be
> > compatible with each other.
> >
> >
> > On 18/10/2021 15:22, Arvid Heise wrote:
> > > I was actually betting on snapshots versions. What are the limits?
> > > Obviously, we can only do a release of a 1.15 connector after 1.15 is
> > > release.
> >
> >
>


Re: [DISCUSS] Creating an external connector repository

2021-10-19 Thread Martijn Visser
Hi all,

I think it would be a huge benefit if we can achieve more frequent releases
of connectors, which are not bound to the release cycle of Flink itself. I
agree that in order to get there, we need to have stable interfaces which
are trustworthy and reliable, so they can be safely used by those
connectors. I do think that work still needs to be done on those
interfaces, but I am confident that we can get there from a Flink
perspective.

I am worried that we would not be able to achieve those frequent releases
of connectors if we are putting these connectors under the Apache umbrella,
because that means that for each connector release we have to follow the
Apache release creation process. This requires a lot of manual steps and
prohibits automation and I think it would be hard to scale out frequent
releases of connectors. I'm curious how others think this challenge could
be solved.

Best regards,

Martijn

On Mon, 18 Oct 2021 at 22:22, Thomas Weise  wrote:

> Thanks for initiating this discussion.
>
> There are definitely a few things that are not optimal with our
> current management of connectors. I would not necessarily characterize
> it as a "mess" though. As the points raised so far show, it isn't easy
> to find a solution that balances competing requirements and leads to a
> net improvement.
>
> It would be great if we can find a setup that allows for connectors to
> be released independently of core Flink and that each connector can be
> released separately. Flink already has separate releases
> (flink-shaded), so that by itself isn't a new thing. Per-connector
> releases would need to allow for more frequent releases (without the
> baggage that a full Flink release comes with).
>
> Separate releases would only make sense if the core Flink surface is
> fairly stable though. As evident from Iceberg (and also Beam), that's
> not the case currently. We should probably focus on addressing the
> stability first, before splitting code. A success criteria could be
> that we are able to build Iceberg and Beam against multiple Flink
> versions w/o the need to change code. The goal would be that no
> connector breaks when we make changes to Flink core. Until that's the
> case, code separation creates a setup where 1+1 or N+1 repositories
> need to move lock step.
>
> Regarding some connectors being more important for Flink than others:
> That's a fact. Flink w/o Kafka connector (and few others) isn't
> viable. Testability of Flink was already brought up, can we really
> certify a Flink core release without Kafka connector? Maybe those
> connectors that are used in Flink e2e tests to validate functionality
> of core Flink should not be broken out?
>
> Finally, I think that the connectors that move into separate repos
> should remain part of the Apache Flink project. Larger organizations
> tend to approve the use of and contribution to open source at the
> project level. Sometimes it is everything ASF. More often it is
> "Apache Foo". It would be fatal to end up with a patchwork of projects
> with potentially different licenses and governance to arrive at a
> working Flink setup. This may mean we prioritize usability over
> developer convenience, if that's in the best interest of Flink as a
> whole.
>
> Thanks,
> Thomas
>
>
>
> On Mon, Oct 18, 2021 at 6:59 AM Chesnay Schepler 
> wrote:
> >
> > Generally, the issues are reproducibility and control.
> >
> > Stuffs completely broken on the Flink side for a week? Well then so are
> > the connector repos.
> > (As-is) You can't go back to a previous version of the snapshot. Which
> > also means that checking out older commits can be problematic because
> > you'd still work against the latest snapshots, and they not be
> > compatible with each other.
> >
> >
> > On 18/10/2021 15:22, Arvid Heise wrote:
> > > I was actually betting on snapshots versions. What are the limits?
> > > Obviously, we can only do a release of a 1.15 connector after 1.15 is
> > > release.
> >
> >
>


Re: [DISCUSS] Creating an external connector repository

2021-10-18 Thread Thomas Weise
Thanks for initiating this discussion.

There are definitely a few things that are not optimal with our
current management of connectors. I would not necessarily characterize
it as a "mess" though. As the points raised so far show, it isn't easy
to find a solution that balances competing requirements and leads to a
net improvement.

It would be great if we can find a setup that allows for connectors to
be released independently of core Flink and that each connector can be
released separately. Flink already has separate releases
(flink-shaded), so that by itself isn't a new thing. Per-connector
releases would need to allow for more frequent releases (without the
baggage that a full Flink release comes with).

Separate releases would only make sense if the core Flink surface is
fairly stable though. As evident from Iceberg (and also Beam), that's
not the case currently. We should probably focus on addressing the
stability first, before splitting code. A success criteria could be
that we are able to build Iceberg and Beam against multiple Flink
versions w/o the need to change code. The goal would be that no
connector breaks when we make changes to Flink core. Until that's the
case, code separation creates a setup where 1+1 or N+1 repositories
need to move lock step.

Regarding some connectors being more important for Flink than others:
That's a fact. Flink w/o Kafka connector (and few others) isn't
viable. Testability of Flink was already brought up, can we really
certify a Flink core release without Kafka connector? Maybe those
connectors that are used in Flink e2e tests to validate functionality
of core Flink should not be broken out?

Finally, I think that the connectors that move into separate repos
should remain part of the Apache Flink project. Larger organizations
tend to approve the use of and contribution to open source at the
project level. Sometimes it is everything ASF. More often it is
"Apache Foo". It would be fatal to end up with a patchwork of projects
with potentially different licenses and governance to arrive at a
working Flink setup. This may mean we prioritize usability over
developer convenience, if that's in the best interest of Flink as a
whole.

Thanks,
Thomas



On Mon, Oct 18, 2021 at 6:59 AM Chesnay Schepler  wrote:
>
> Generally, the issues are reproducibility and control.
>
> Stuffs completely broken on the Flink side for a week? Well then so are
> the connector repos.
> (As-is) You can't go back to a previous version of the snapshot. Which
> also means that checking out older commits can be problematic because
> you'd still work against the latest snapshots, and they not be
> compatible with each other.
>
>
> On 18/10/2021 15:22, Arvid Heise wrote:
> > I was actually betting on snapshots versions. What are the limits?
> > Obviously, we can only do a release of a 1.15 connector after 1.15 is
> > release.
>
>


Re: [DISCUSS] Creating an external connector repository

2021-10-18 Thread Chesnay Schepler

Generally, the issues are reproducibility and control.

Stuffs completely broken on the Flink side for a week? Well then so are 
the connector repos.
(As-is) You can't go back to a previous version of the snapshot. Which 
also means that checking out older commits can be problematic because 
you'd still work against the latest snapshots, and they not be 
compatible with each other.



On 18/10/2021 15:22, Arvid Heise wrote:

I was actually betting on snapshots versions. What are the limits?
Obviously, we can only do a release of a 1.15 connector after 1.15 is
release.





Re: [DISCUSS] Creating an external connector repository

2021-10-18 Thread Chesnay Schepler

I think you're misinterpreting my comment.

Independent from the repo split we should only keep the connectors in 
the Flink project that we actively maintain.

The rest we might as well just drop.
If some external people are interested in maintaining these connectors 
then there's nothing stopping them from doing so.


For example, I don't think our Cassandra connector is neither in good 
shape nor appears to be a big priority.
I would not mind us dropping it (== or moving it into some external 
repo, to me that's all the same).

Kafka would be a different story.

On 18/10/2021 15:22, Arvid Heise wrote:

I would like to avoid treating some connectors different from other
connectors by design. In reality, we can assume that some connectors will
receive more love than others. However, if we already treat some connectors
"better" than others we may run in a vicious cycle where the "bad" ones
never improve.
Nevertheless, I'd also be fine to just start with some of them and move
others later.





Re: [DISCUSS] Creating an external connector repository

2021-10-18 Thread Arvid Heise
Hi folks,

thanks for joining the discussion. I'd like to give some ideas on how
certain concerns are going to be addressed:

Ingo:
> In general I think breaking up the big repo would be a good move with many
> benefits (which you have outlined already). One concern would be how to
> proceed with our docs / examples if we were to really separate out all
> connectors.
>

I don't see any issue at all with both options. You'd just have to update
the dependency to the connector for blog posts and starter examples.
Each connector page should provide specific examples themselves.
Note that I would keep File Source/Sink in the main repo as they don't add
dependencies on their own. Formats and Filesystem may be externalized at a
much later point after we gained more knowledge on how to build an real
ecosystem with connectors.


> 1. More real-life examples would essentially now depend on external
> projects. Particularly if hosted outside the ASF, this would feel somewhat
> odd. Or to put it differently, if flink-connector-foo is not part of Flink
> itself, should the Flink Docs use it for any examples?
>
Why not? We also have blog posts that use external dependencies.

2. Generation of documentation (config options) wouldn't be possible unless
> the docs depend on these external projects, which would create weird
> version dependency cycles (Flink 1.X's docs depend on flink-connector-foo
> 1.X which depends on Flink 1.X).
>
Config options that are connector specific should only appear on the
connector pages. So we need to incorporate the config option generation in
the connector template.


> 3. Documentation would inevitably be much less consistent when split across
> many repositories.
>
Fair point. If we use the same template as Flink Web UI for connectors, we
could embed subpages directly in the main documentation. If we allow that
for all connectors, it would be actually less fragmented as now where some
connectors are only described in Bahir or on external pages.


> As for your approaches, how would (A) allow hosting personal / company
> projects if only Flink committers can write to it?
>
That's entirely independent. In both options and even now, there are
several connectors living on other pages. They are currently only findable
through a search engine and we should fix that anyhow. See [1] for an
example on how Kafka connect is doing it.

> Connectors may receive some sort of quality seal
>
> This sounds like a lot of work and process, and could easily become a
> source of frustration.
>
Yes this is definitively some effort but strictly less than maintaining the
connector in the community as it's an irregular review.


Chesnay:
> What I'm concerned about, and which we never really covered in past
> discussions about split repositories, are
> a) ways to share infrastructure (e.g., CI/release utilities/codestyle)
>
I'd provide a common Github connector template where everything is in. That
means of course making things public.

> b) testing
>
See below

> c) documentation integration
>
See Ingo's response.

>
> Particularly for b) we still lack any real public utilities.
> Even fundamental things such as the MiniClusterResource are not
> annotated in any way.
> I would argue that we need to sort this out before a split can happen.
> We've seen with the flink-benchmarks repo and recent discussions how
> easily things can break.
>
Yes, I agree but given that we already have connectors outside of the main
repo, the situation can only improve. By moving the connectors out, we are
actually forced to provide a level ground for everyone and thus really
enabling the community to contribute connectors.
We also plan to finish the connector testing framework in 1.15.

Related to that, there is the question on how Flink is then supposed to
> ensure that things don't break. My impression is that we heavily rely on
> the connector tests to that end at the moment.
> Similarly, what connector (version) would be used for examples (like the
> WordCount which reads from Kafka) or (e2e) tests that want to read
> something other than a file? You end up with this circular dependency
> which are always troublesome.
>
I agree that we must avoid any kind of circular dependencies. There are a
couple of options that we probably are going to mix:
* Move connector specific e2e tests into connector repo
* Have nightly builds on connector repo and collect results in some
overview.
* React on failures, especially if several connectors fail at once.
* Have an e2e repo/module in Flink that has cross-connector tests etc.

As for for the repo structure, I would think that a single one could
> work quite well (because having 10+ connector repositories is just a
> mess), but currently I wouldn't set it up as a single project.
> I would rather have something like N + 1 projects (one for each
> connectors + a shared testing project) which are released individually
> as required, without any snapshot dependencies in-between.
> Then 1 branch for 

Re: [DISCUSS] Creating an external connector repository

2021-10-18 Thread Leonard Xu
Hi, all

I understand very well that the maintainers of the community want to move the 
connector to an external system. Indeed, the development and maintenance of the 
connector requires a lot of energy, and these do not involve the Flink core 
framework, which can reduce the maintenance pressure on the community side.

I only have one concern. Once we migrate these connectors to external projects, 
how can we ensure them with high quality? All the built-in connectors of Flink 
are developed or reviewed by the committers. The reported connector bugs from 
JIRA and mailing lists will be quick fixed currently, how does the Flink 
community ensure the development rhythm of the connector after the move? In 
other words, are these connectors still first-class citizens of the Flink 
community? If it is how we guarantee.

Recently, I have maintained a series of cdc connectors in the Flink CDC project 
[1]. My feeling is that it is not easy to develop and maintain connectors. 
Contributors to the Flink CDC project have done some approaches in this way, 
such as building connector integration tests [2], document management [3]. 
Personally, I don’t have a strong tendency to move the built-in connectors out 
or keep them. If the final decision of this thread discussion  turns out to 
move out, I’m happy to share our experience and provide help in the new 
connector project. .

Best,
Leonard
[1]https://github.com/ververica/flink-cdc-connectors
[2]https://github.com/ververica/flink-cdc-connectors/runs/3902664601
[3]https://ververica.github.io/flink-cdc-connectors/master/

> 在 2021年10月18日,19:00,David Morávek  写道:
> 
> We are mostly talking about the freedom this would bring to the connector 
> authors, but we still don't have answers for the important topics:
> 
> - How exactly are we going to maintain the high quality standard of the 
> connectors?
> - How would the connector release cycle to look like? Is this going to affect 
> the Flink release cycle?
> - How would the documentation process / generation look like?
> - Not all of the connectors rely solely on the Stable APIs. Moving them 
> outside of the Flink code-base will make any refactoring on the Flink side 
> significantly more complex as potentially needs to be reflected into all 
> connectors. There are some possible solutions, such as Gradle's included 
> builds, but we're far away from that. How are we planning to address this?
> - How would we develop connectors against unreleased Flink version? Java 
> snapshots have many limits when used for the cross-repository development.
> - With appropriate tooling, this whole thing is achievable even with the 
> single repository that we already have. It just matter of having a more 
> fine-grained build / release process. Have you tried to research this option?
> 
> I'd personally strongly suggest against moving the connectors out of the ASF 
> umbrella. The ASF brings legal guarantees, hard gained trust of the users and 
> high quality standards to the table. I still fail to see any good reason for 
> giving this up. Also this decision would be hard to reverse, because it would 
> most likely require a new donation to the ASF (would this require a consent 
> from all contributors as there is no clear ownership?).
> 
> Best,
> D.
> 
> 
> On Mon, Oct 18, 2021 at 12:12 PM Qingsheng Ren  > wrote:
> Thanks for driving this discussion Arvid! I think this will be one giant leap 
> for Flink community. Externalizing connectors would give connector developers 
> more freedom in developing, releasing and maintaining, which can attract more 
> developers for contributing their connectors and expand the Flink ecosystems.
> 
> Considering the position for hosting connectors, I prefer to use an 
> individual organization outside Apache umbrella. If we keep all connectors 
> under Apache, I think there’s not quite difference comparing keeping them in 
> the Flink main repo. Connector developers still require permissions from 
> Flink committers to contribute, and release process should follow Apache 
> rules, which are against our initial motivations of externalizing connectors.
> 
> Using an individual Github organization will maximum the freedom provided to 
> developers. An ideal structure in my mind would be like 
> "github.com/flink-connectors/flink-connector-xxx 
> ". The new 
> established flink-extended org might be another choice, but considering the 
> amount of connectors, I prefer to use an individual org for connectors to 
> avoid flushing other repos under flink-extended.
> 
> In the meantime, we need to provide a well-established standard / guideline 
> for contributing connectors, including CI, testing, docs (maybe we can’t 
> provide resources for running them, but we should give enough guide on how to 
> setup one) to keep the high quality of connectors. I’m happy to help building 
> these fundamental bricks. Also since 

Re: [DISCUSS] Creating an external connector repository

2021-10-18 Thread David Morávek
We are mostly talking about the freedom this would bring to the connector
authors, but we still don't have answers for the important topics:

- How exactly are we going to maintain the high quality standard of the
connectors?
- How would the connector release cycle to look like? Is this going to
affect the Flink release cycle?
- How would the documentation process / generation look like?
- Not all of the connectors rely solely on the Stable APIs. Moving them
outside of the Flink code-base will make any refactoring on the Flink side
significantly more complex as potentially needs to be reflected into all
connectors. There are some possible solutions, such as Gradle's included
builds, but we're far away from that. How are we planning to address this?
- How would we develop connectors against unreleased Flink version? Java
snapshots have many limits when used for the cross-repository development.
- With appropriate tooling, this whole thing is achievable even with the
single repository that we already have. It just matter of having a more
fine-grained build / release process. Have you tried to research this
option?

I'd personally strongly suggest against moving the connectors out of the
ASF umbrella. The ASF brings legal guarantees, hard gained trust of the
users and high quality standards to the table. I still fail to see any good
reason for giving this up. Also this decision would be hard to reverse,
because it would most likely require a new donation to the ASF (would this
require a consent from all contributors as there is no clear ownership?).

Best,
D.


On Mon, Oct 18, 2021 at 12:12 PM Qingsheng Ren  wrote:

> Thanks for driving this discussion Arvid! I think this will be one giant
> leap for Flink community. Externalizing connectors would give connector
> developers more freedom in developing, releasing and maintaining, which can
> attract more developers for contributing their connectors and expand the
> Flink ecosystems.
>
> Considering the position for hosting connectors, I prefer to use an
> individual organization outside Apache umbrella. If we keep all connectors
> under Apache, I think there’s not quite difference comparing keeping them
> in the Flink main repo. Connector developers still require permissions from
> Flink committers to contribute, and release process should follow Apache
> rules, which are against our initial motivations of externalizing
> connectors.
>
> Using an individual Github organization will maximum the freedom provided
> to developers. An ideal structure in my mind would be like "
> github.com/flink-connectors/flink-connector-xxx". The new established
> flink-extended org might be another choice, but considering the amount of
> connectors, I prefer to use an individual org for connectors to avoid
> flushing other repos under flink-extended.
>
> In the meantime, we need to provide a well-established standard /
> guideline for contributing connectors, including CI, testing, docs (maybe
> we can’t provide resources for running them, but we should give enough
> guide on how to setup one) to keep the high quality of connectors. I’m
> happy to help building these fundamental bricks. Also since Kafka connector
> is widely used among Flink users, we can make Kafka connector a “model” of
> how to build and contribute a well-qualified connector into Flink
> ecosystem, and we can still use this trusted one for Flink E2E tests.
>
> Again I believe this will definitely boost the expansion of Flink
> ecosystem. Very excited to see the progress!
>
> Best,
>
> Qingsheng Ren
> On Oct 15, 2021, 8:47 PM +0800, Arvid Heise , wrote:
> > Dear community,
> > Today I would like to kickstart a series of discussions around creating
> an external connector repository. The main idea is to decouple the release
> cycle of Flink with the release cycles of the connectors. This is a common
> approach in other big data analytics projects and seems to scale better
> than the current approach. In particular, it will yield the following
> changes.
> >  • Faster releases of connectors: New features can be added more
> quickly, bugs can be fixed immediately, and we can have faster security
> patches in case of direct or indirect (through dependencies) security
> flaws. • New features can be added to old Flink versions: If the connector
> API didn’t change, the same connector jar may be used with different Flink
> versions. Thus, new features can also immediately be used with older Flink
> versions. A compatibility matrix on each connector page will help users to
> find suitable connector versions for their Flink versions. • More activity
> and contributions around connectors: If we ease the contribution and
> development process around connectors, we will see faster development and
> also more connectors. Since that heavily depends on the chosen approach
> discussed below, more details will be shown there. • An overhaul of the
> connector page: In the future, all known connectors will be shown on the
> 

Re: [DISCUSS] Creating an external connector repository

2021-10-18 Thread Qingsheng Ren
Thanks for driving this discussion Arvid! I think this will be one giant leap 
for Flink community. Externalizing connectors would give connector developers 
more freedom in developing, releasing and maintaining, which can attract more 
developers for contributing their connectors and expand the Flink ecosystems.

Considering the position for hosting connectors, I prefer to use an individual 
organization outside Apache umbrella. If we keep all connectors under Apache, I 
think there’s not quite difference comparing keeping them in the Flink main 
repo. Connector developers still require permissions from Flink committers to 
contribute, and release process should follow Apache rules, which are against 
our initial motivations of externalizing connectors.

Using an individual Github organization will maximum the freedom provided to 
developers. An ideal structure in my mind would be like 
"github.com/flink-connectors/flink-connector-xxx". The new established 
flink-extended org might be another choice, but considering the amount of 
connectors, I prefer to use an individual org for connectors to avoid flushing 
other repos under flink-extended.

In the meantime, we need to provide a well-established standard / guideline for 
contributing connectors, including CI, testing, docs (maybe we can’t provide 
resources for running them, but we should give enough guide on how to setup 
one) to keep the high quality of connectors. I’m happy to help building these 
fundamental bricks. Also since Kafka connector is widely used among Flink 
users, we can make Kafka connector a “model” of how to build and contribute a 
well-qualified connector into Flink ecosystem, and we can still use this 
trusted one for Flink E2E tests.

Again I believe this will definitely boost the expansion of Flink ecosystem. 
Very excited to see the progress!

Best,

Qingsheng Ren
On Oct 15, 2021, 8:47 PM +0800, Arvid Heise , wrote:
> Dear community,
> Today I would like to kickstart a series of discussions around creating an 
> external connector repository. The main idea is to decouple the release cycle 
> of Flink with the release cycles of the connectors. This is a common approach 
> in other big data analytics projects and seems to scale better than the 
> current approach. In particular, it will yield the following changes.
>  • Faster releases of connectors: New features can be added more quickly, 
> bugs can be fixed immediately, and we can have faster security patches in 
> case of direct or indirect (through dependencies) security flaws. • New 
> features can be added to old Flink versions: If the connector API didn’t 
> change, the same connector jar may be used with different Flink versions. 
> Thus, new features can also immediately be used with older Flink versions. A 
> compatibility matrix on each connector page will help users to find suitable 
> connector versions for their Flink versions. • More activity and 
> contributions around connectors: If we ease the contribution and development 
> process around connectors, we will see faster development and also more 
> connectors. Since that heavily depends on the chosen approach discussed 
> below, more details will be shown there. • An overhaul of the connector page: 
> In the future, all known connectors will be shown on the same page in a 
> similar layout independent of where they reside. They could be hosted on 
> external project pages (e.g., Iceberg and Hudi), on some company page, or may 
> stay within the main Flink reposi    tory. Connectors may receive some sort 
> of quality seal such that users can quickly access the production-readiness 
> and we could also add which community/company promises which kind of support. 
> • If we take out (some) connectors out of Flink, Flink CI will be faster and 
> Flink devs will experience less build stabilities (which mostly come from 
> connectors). That would also speed up Flink development.
> Now I’d first like to collect your viewpoints on the ideal state. Let’s first 
> recap which approaches, we currently have:
>  • We have half of the connectors in the main Flink repository. Relatively 
> few of them have received updates in the past couple of months. • Another 
> large chunk of connectors are in Apache Bahir. It recently has seen the first 
> release in 3 years. • There are a few other (Apache) projects that maintain a 
> Flink connector, such as Apache Iceberg, Apache Hudi, and Pravega. • A few 
> connectors are listed on company-related repositories, such as Apache Pulsar 
> on StreamNative and CDC connectors on Ververica.
> My personal observation is that having a repository per connector seems to 
> increase the activity on a connector as it’s easier to maintain. For example, 
> in Apache Bahir all connectors are built against the same Flink version, 
> which may not be desirable when certain APIs change; for example, 
> SinkFunction will be eventually deprecated and removed but new Sink interface 
> may gain more 

Re: [DISCUSS] Creating an external connector repository

2021-10-15 Thread Chesnay Schepler
My opinion of splitting the Flink repositories hasn't changed; I'm still 
in favor of it.


While it would technically be possible to release individual connectors 
even if they are part of the Flink repo,
it is quite a hassle to do so and error prone due to the current branch 
structure.

A split would also force us to watch out much more for API stability.

I'm gonna assume that we will move out all connectors:

What I'm concerned about, and which we never really covered in past 
discussions about split repositories, are

a) ways to share infrastructure (e.g., CI/release utilities/codestyle)
b) testing
c) documentation integration

Particularly for b) we still lack any real public utilities.
Even fundamental things such as the MiniClusterResource are not 
annotated in any way.

I would argue that we need to sort this out before a split can happen.
We've seen with the flink-benchmarks repo and recent discussions how 
easily things can break.


Related to that, there is the question on how Flink is then supposed to 
ensure that things don't break. My impression is that we heavily rely on 
the connector tests to that end at the moment.
Similarly, what connector (version) would be used for examples (like the 
WordCount which reads from Kafka) or (e2e) tests that want to read 
something other than a file? You end up with this circular dependency 
which are always troublesome.


As for for the repo structure, I would think that a single one could 
work quite well (because having 10+ connector repositories is just a 
mess), but currently I wouldn't set it up as a single project.
I would rather have something like N + 1 projects (one for each 
connectors + a shared testing project) which are released individually 
as required, without any snapshot dependencies in-between.
Then 1 branch for each major Flink version (again, no snapshot 
dependencies). Individual connectors can be released at any time against 
any of the latest bugfix releases, which due to lack of binaries (and 
python releases) would be a breeze.


I don't like the idea of moving existing connectors out of the Apache 
organization. At the very least, not all of them. While some are 
certainly ill-maintained (e.g., Cassandra) where it would be neat if 
external projects could maintain them, others (like Kafka) are not and 
quite fundamental to actually using Flink.


On 15/10/2021 14:47, Arvid Heise wrote:

Dear community,

Today I would like to kickstart a series of discussions around creating an
external connector repository. The main idea is to decouple the release
cycle of Flink with the release cycles of the connectors. This is a common
approach in other big data analytics projects and seems to scale better
than the current approach. In particular, it will yield the following
changes.


-

Faster releases of connectors: New features can be added more quickly,
bugs can be fixed immediately, and we can have faster security patches in
case of direct or indirect (through dependencies) security flaws.
-

New features can be added to old Flink versions: If the connector API
didn’t change, the same connector jar may be used with different Flink
versions. Thus, new features can also immediately be used with older Flink
versions. A compatibility matrix on each connector page will help users to
find suitable connector versions for their Flink versions.
-

More activity and contributions around connectors: If we ease the
contribution and development process around connectors, we will see faster
development and also more connectors. Since that heavily depends on the
chosen approach discussed below, more details will be shown there.
-

An overhaul of the connector page: In the future, all known connectors
will be shown on the same page in a similar layout independent of where
they reside. They could be hosted on external project pages (e.g., Iceberg
and Hudi), on some company page, or may stay within the main Flink reposi
tory. Connectors may receive some sort of quality seal such that users
can quickly access the production-readiness and we could also add which
community/company promises which kind of support.
-

If we take out (some) connectors out of Flink, Flink CI will be faster
and Flink devs will experience less build stabilities (which mostly come
from connectors). That would also speed up Flink development.


Now I’d first like to collect your viewpoints on the ideal state. Let’s
first recap which approaches, we currently have:


-

We have half of the connectors in the main Flink repository. Relatively
few of them have received updates in the past couple of months.
-

Another large chunk of connectors are in Apache Bahir. It recently has
seen the first release in 3 years.
-

There are a few other (Apache) projects that maintain a Flink connector,
such as Apache Iceberg, Apache Hudi, and Pravega.
-

 

Re: [DISCUSS] Creating an external connector repository

2021-10-15 Thread Ingo Bürk
Hi Arvid,

In general I think breaking up the big repo would be a good move with many
benefits (which you have outlined already). One concern would be how to
proceed with our docs / examples if we were to really separate out all
connectors.

1. More real-life examples would essentially now depend on external
projects. Particularly if hosted outside the ASF, this would feel somewhat
odd. Or to put it differently, if flink-connector-foo is not part of Flink
itself, should the Flink Docs use it for any examples?
2. Generation of documentation (config options) wouldn't be possible unless
the docs depend on these external projects, which would create weird
version dependency cycles (Flink 1.X's docs depend on flink-connector-foo
1.X which depends on Flink 1.X).
3. Documentation would inevitably be much less consistent when split across
many repositories.

As for your approaches, how would (A) allow hosting personal / company
projects if only Flink committers can write to it?

> Connectors may receive some sort of quality seal

This sounds like a lot of work and process, and could easily become a
source of frustration.


Best
Ingo

On Fri, Oct 15, 2021 at 2:47 PM Arvid Heise  wrote:

> Dear community,
>
> Today I would like to kickstart a series of discussions around creating an
> external connector repository. The main idea is to decouple the release
> cycle of Flink with the release cycles of the connectors. This is a common
> approach in other big data analytics projects and seems to scale better
> than the current approach. In particular, it will yield the following
> changes.
>
>
>-
>
>Faster releases of connectors: New features can be added more quickly,
>bugs can be fixed immediately, and we can have faster security patches in
>case of direct or indirect (through dependencies) security flaws.
>-
>
>New features can be added to old Flink versions: If the connector API
>didn’t change, the same connector jar may be used with different Flink
>versions. Thus, new features can also immediately be used with older Flink
>versions. A compatibility matrix on each connector page will help users to
>find suitable connector versions for their Flink versions.
>-
>
>More activity and contributions around connectors: If we ease the
>contribution and development process around connectors, we will see faster
>development and also more connectors. Since that heavily depends on the
>chosen approach discussed below, more details will be shown there.
>-
>
>An overhaul of the connector page: In the future, all known connectors
>will be shown on the same page in a similar layout independent of where
>they reside. They could be hosted on external project pages (e.g., Iceberg
>and Hudi), on some company page, or may stay within the main Flink reposi
>tory. Connectors may receive some sort of quality seal such that users
>can quickly access the production-readiness and we could also add which
>community/company promises which kind of support.
>-
>
>If we take out (some) connectors out of Flink, Flink CI will be faster
>and Flink devs will experience less build stabilities (which mostly come
>from connectors). That would also speed up Flink development.
>
>
> Now I’d first like to collect your viewpoints on the ideal state. Let’s
> first recap which approaches, we currently have:
>
>
>-
>
>We have half of the connectors in the main Flink repository.
>Relatively few of them have received updates in the past couple of months.
>-
>
>Another large chunk of connectors are in Apache Bahir. It recently has
>seen the first release in 3 years.
>-
>
>There are a few other (Apache) projects that maintain a Flink
>connector, such as Apache Iceberg, Apache Hudi, and Pravega.
>-
>
>A few connectors are listed on company-related repositories, such as
>Apache Pulsar on StreamNative and CDC connectors on Ververica.
>
>
> My personal observation is that having a repository per connector seems to
> increase the activity on a connector as it’s easier to maintain. For
> example, in Apache Bahir all connectors are built against the same Flink
> version, which may not be desirable when certain APIs change; for example,
> SinkFunction will be eventually deprecated and removed but new Sink
> interface may gain more features.
>
> Now, I'd like to outline different approaches. All approaches will allow
> you to host your connector on any kind of personal, project, or company
> repository. We still want to provide a default place where users can
> contribute their connectors and hopefully grow a community around it. The
> approaches are:
>
>
>1.
>
>Create a mono-repo under the Apache umbrella where all connectors will
>reside, for example, github.com/apache/flink-connectors. That
>repository needs to follow its rules: No GitHub issues, no Dependabot or
>similar tools, and a strict