Hi Arvid,

In general I think breaking up the big repo would be a good move with many
benefits (which you have outlined already). One concern would be how to
proceed with our docs / examples if we were to really separate out all
connectors.

1. More real-life examples would essentially now depend on external
projects. Particularly if hosted outside the ASF, this would feel somewhat
odd. Or to put it differently, if flink-connector-foo is not part of Flink
itself, should the Flink Docs use it for any examples?
2. Generation of documentation (config options) wouldn't be possible unless
the docs depend on these external projects, which would create weird
version dependency cycles (Flink 1.X's docs depend on flink-connector-foo
1.X which depends on Flink 1.X).
3. Documentation would inevitably be much less consistent when split across
many repositories.

As for your approaches, how would (A) allow hosting personal / company
projects if only Flink committers can write to it?

> Connectors may receive some sort of quality seal

This sounds like a lot of work and process, and could easily become a
source of frustration.


Best
Ingo

On Fri, Oct 15, 2021 at 2:47 PM Arvid Heise <ar...@apache.org> wrote:

> Dear community,
>
> Today I would like to kickstart a series of discussions around creating an
> external connector repository. The main idea is to decouple the release
> cycle of Flink with the release cycles of the connectors. This is a common
> approach in other big data analytics projects and seems to scale better
> than the current approach. In particular, it will yield the following
> changes.
>
>
>    -
>
>    Faster releases of connectors: New features can be added more quickly,
>    bugs can be fixed immediately, and we can have faster security patches in
>    case of direct or indirect (through dependencies) security flaws.
>    -
>
>    New features can be added to old Flink versions: If the connector API
>    didn’t change, the same connector jar may be used with different Flink
>    versions. Thus, new features can also immediately be used with older Flink
>    versions. A compatibility matrix on each connector page will help users to
>    find suitable connector versions for their Flink versions.
>    -
>
>    More activity and contributions around connectors: If we ease the
>    contribution and development process around connectors, we will see faster
>    development and also more connectors. Since that heavily depends on the
>    chosen approach discussed below, more details will be shown there.
>    -
>
>    An overhaul of the connector page: In the future, all known connectors
>    will be shown on the same page in a similar layout independent of where
>    they reside. They could be hosted on external project pages (e.g., Iceberg
>    and Hudi), on some company page, or may stay within the main Flink reposi
>    tory. Connectors may receive some sort of quality seal such that users
>    can quickly access the production-readiness and we could also add which
>    community/company promises which kind of support.
>    -
>
>    If we take out (some) connectors out of Flink, Flink CI will be faster
>    and Flink devs will experience less build stabilities (which mostly come
>    from connectors). That would also speed up Flink development.
>
>
> Now I’d first like to collect your viewpoints on the ideal state. Let’s
> first recap which approaches, we currently have:
>
>
>    -
>
>    We have half of the connectors in the main Flink repository.
>    Relatively few of them have received updates in the past couple of months.
>    -
>
>    Another large chunk of connectors are in Apache Bahir. It recently has
>    seen the first release in 3 years.
>    -
>
>    There are a few other (Apache) projects that maintain a Flink
>    connector, such as Apache Iceberg, Apache Hudi, and Pravega.
>    -
>
>    A few connectors are listed on company-related repositories, such as
>    Apache Pulsar on StreamNative and CDC connectors on Ververica.
>
>
> My personal observation is that having a repository per connector seems to
> increase the activity on a connector as it’s easier to maintain. For
> example, in Apache Bahir all connectors are built against the same Flink
> version, which may not be desirable when certain APIs change; for example,
> SinkFunction will be eventually deprecated and removed but new Sink
> interface may gain more features.
>
> Now, I'd like to outline different approaches. All approaches will allow
> you to host your connector on any kind of personal, project, or company
> repository. We still want to provide a default place where users can
> contribute their connectors and hopefully grow a community around it. The
> approaches are:
>
>
>    1.
>
>    Create a mono-repo under the Apache umbrella where all connectors will
>    reside, for example, github.com/apache/flink-connectors. That
>    repository needs to follow its rules: No GitHub issues, no Dependabot or
>    similar tools, and a strict manual release process. It would be under the
>    Flink community, such that Flink committers can write to that repository
>    but no-one else.
>    2.
>
>    Create a GitHub organization with small repositories, for example
>    github.com/flink-connectors. Since it’s not under the Apache umbrella,
>    we are free to use whatever process we deem best (up to a future
>    discussion). Each repository can have a shared list of maintainers +
>    connector specific committers. We can provide more automation. We may even
>    allow different licenses to incorporate things like a connector to Oracle
>    that cannot be released under ASL.
>    3.
>
>    ??? <- please provide your additional approaches
>
>
> In both cases, we will provide opinionated module/repository templates
> based on a connector testing framework and guidelines. Depending on the
> approach, we may need to enforce certain things.
>
> I’d like to first focus on what the community would ideally seek and
> minimize the discussions around legal issues, which we would discuss later.
> For now, I’d also like to postpone the discussion if we move all or only a
> subset of connectors from Flink to the new default place as it seems to be
> orthogonal to the fundamental discussion.
>
> PS: If the external repository for connectors is successful, I’d also like
> to move out other things like formats, filesystems, and metric reporters in
> the far future. So I’m actually aiming for
> github.com/(apache/)flink-packages. But again this discussion is
> orthogonal to the basic one.
>
> PPS: Depending on the chosen approach, there may be synergies with the
> recently approved flink-extended organization.
>

Reply via email to