Hi all, My name is Kyle and I’m an open source developer primarily focused on Apache Iceberg.
I’m happy to help clarify or elaborate on any aspect of our experience working on a relatively decoupled connector that is downstream and pretty popular. I’d also love to be able to contribute or assist in any way I can. I don’t mean to thread jack, but are there any meetings or community sync ups, specifically around the connector APIs, that I might join / be invited to? I did want to add that even though I’ve experienced some of the pain points of integrating with an evolving system / API (catalog support is generally speaking pretty new everywhere really in this space), I also agree personally that you shouldn’t slow down development velocity too much for the sake of external connector. Getting to a performant and stable place should be the primary goal, and slowing that down to support stragglers will (in my personal opinion) always be a losing game. Some folks will simply stay behind on versions regardless until they have to upgrade. I am working on ensuring that the Iceberg community stays within 1-2 versions of Flink, so that we can help provide more feedback or contribute things that might make our ability to support multiple Flink runtimes / versions with one project / codebase and minimal to no reflection (our desired goal). If there’s anything I can do or any way I can be of assistance, please don’t hesitate to reach out. Or find me on ASF slack 😀 I greatly appreciate your general concern for the needs of downstream connector integrators! Cheers Kyle Bendickson (GitHub: kbendick) Open Source Developer kyle [at] tabular [dot] io On Wed, Oct 20, 2021 at 11:35 AM Thomas Weise <t...@apache.org> wrote: > Hi, > > I see the stable core Flink API as a prerequisite for modularity. And > for connectors it is not just the source and sink API (source being > stable as of 1.14), but everything that is required to build and > maintain a connector downstream, such as the test utilities and > infrastructure. > > Without the stable surface of core Flink, changes will leak into > downstream dependencies and force lock step updates. Refactoring > across N repos is more painful than a single repo. Those with > experience developing downstream of Flink will know the pain, and that > isn't limited to connectors. I don't remember a Flink "minor version" > update that was just a dependency version change and did not force > other downstream changes. > > Imagine a project with a complex set of dependencies. Let's say Flink > version A plus Flink reliant dependencies released by other projects > (Flink-external connectors, Beam, Iceberg, Hudi, ..). We don't want a > situation where we bump the core Flink version to B and things fall > apart (interface changes, utilities that were useful but not public, > transitive dependencies etc.). > > The discussion here also highlights the benefits of keeping certain > connectors outside Flink. Whether that is due to difference in > developer community, maturity of the connectors, their > specialized/limited usage etc. I would like to see that as a sign of a > growing ecosystem and most of the ideas that Arvid has put forward > would benefit further growth of the connector ecosystem. > > As for keeping connectors within Apache Flink: I prefer that as the > path forward for "essential" connectors like FileSource, KafkaSource, > ... And we can still achieve a more flexible and faster release cycle. > > Thanks, > Thomas > > > > > > On Wed, Oct 20, 2021 at 3:32 AM Jark Wu <imj...@gmail.com> wrote: > > > > Hi Konstantin, > > > > > the connectors need to be adopted and require at least one release per > > Flink minor release. > > However, this will make the releases of connectors slower, e.g. maintain > > features for multiple branches and release multiple branches. > > I think the main purpose of having an external connector repository is in > > order to have "faster releases of connectors"? > > > > > > From the perspective of CDC connector maintainers, the biggest advantage > of > > maintaining it outside of the Flink project is that: > > 1) we can have a more flexible and faster release cycle > > 2) we can be more liberal with committership for connector maintainers > > which can also attract more committers to help the release. > > > > Personally, I think maintaining one connector repository under the ASF > may > > not have the above benefits. > > > > Best, > > Jark > > > > On Wed, 20 Oct 2021 at 15:14, Konstantin Knauf <kna...@apache.org> > wrote: > > > > > Hi everyone, > > > > > > regarding the stability of the APIs. I think everyone agrees that > > > connector APIs which are stable across minor versions (1.13->1.14) are > the > > > mid-term goal. But: > > > > > > a) These APIs are still quite young, and we shouldn't make them @Public > > > prematurely either. > > > > > > b) Isn't this *mostly* orthogonal to where the connector code lives? > Yes, > > > as long as there are breaking changes, the connectors need to be > adopted > > > and require at least one release per Flink minor release. > > > Documentation-wise this can be addressed via a compatibility matrix for > > > each connector as Arvid suggested. IMO we shouldn't block this effort > on > > > the stability of the APIs. > > > > > > Cheers, > > > > > > Konstantin > > > > > > > > > > > > On Wed, Oct 20, 2021 at 8:56 AM Jark Wu <imj...@gmail.com> wrote: > > > > > >> Hi, > > >> > > >> I think Thomas raised very good questions and would like to know your > > >> opinions if we want to move connectors out of flink in this version. > > >> > > >> (1) is the connector API already stable? > > >> > Separate releases would only make sense if the core Flink surface is > > >> > fairly stable though. As evident from Iceberg (and also Beam), > that's > > >> > not the case currently. We should probably focus on addressing the > > >> > stability first, before splitting code. A success criteria could be > > >> > that we are able to build Iceberg and Beam against multiple Flink > > >> > versions w/o the need to change code. The goal would be that no > > >> > connector breaks when we make changes to Flink core. Until that's > the > > >> > case, code separation creates a setup where 1+1 or N+1 repositories > > >> > need to move lock step. > > >> > > >> From another discussion thread [1], connector API is far from stable. > > >> Currently, it's hard to build connectors against multiple Flink > versions. > > >> There are breaking API changes both in 1.12 -> 1.13 and 1.13 -> 1.14 > and > > >> maybe also in the future versions, because Table related APIs are > still > > >> @PublicEvolving and new Sink API is still @Experimental. > > >> > > >> > > >> (2) Flink testability without connectors. > > >> > Flink w/o Kafka connector (and few others) isn't > > >> > viable. Testability of Flink was already brought up, can we really > > >> > certify a Flink core release without Kafka connector? Maybe those > > >> > connectors that are used in Flink e2e tests to validate > functionality > > >> > of core Flink should not be broken out? > > >> > > >> This is a very good question. How can we guarantee the new Source and > Sink > > >> API are stable with only test implementation? > > >> > > >> > > >> Best, > > >> Jark > > >> > > >> > > >> > > >> > > >> > > >> On Tue, 19 Oct 2021 at 23:56, Chesnay Schepler <ches...@apache.org> > > >> wrote: > > >> > > >> > Could you clarify what release cadence you're thinking of? There's > quite > > >> > a big range that fits "more frequent than Flink" (per-commit, daily, > > >> > weekly, bi-weekly, monthly, even bi-monthly). > > >> > > > >> > On 19/10/2021 14:15, Martijn Visser wrote: > > >> > > Hi all, > > >> > > > > >> > > I think it would be a huge benefit if we can achieve more frequent > > >> > releases > > >> > > of connectors, which are not bound to the release cycle of Flink > > >> itself. > > >> > I > > >> > > agree that in order to get there, we need to have stable > interfaces > > >> which > > >> > > are trustworthy and reliable, so they can be safely used by those > > >> > > connectors. I do think that work still needs to be done on those > > >> > > interfaces, but I am confident that we can get there from a Flink > > >> > > perspective. > > >> > > > > >> > > I am worried that we would not be able to achieve those frequent > > >> releases > > >> > > of connectors if we are putting these connectors under the Apache > > >> > umbrella, > > >> > > because that means that for each connector release we have to > follow > > >> the > > >> > > Apache release creation process. This requires a lot of manual > steps > > >> and > > >> > > prohibits automation and I think it would be hard to scale out > > >> frequent > > >> > > releases of connectors. I'm curious how others think this > challenge > > >> could > > >> > > be solved. > > >> > > > > >> > > Best regards, > > >> > > > > >> > > Martijn > > >> > > > > >> > > On Mon, 18 Oct 2021 at 22:22, Thomas Weise <t...@apache.org> > wrote: > > >> > > > > >> > >> Thanks for initiating this discussion. > > >> > >> > > >> > >> There are definitely a few things that are not optimal with our > > >> > >> current management of connectors. I would not necessarily > > >> characterize > > >> > >> it as a "mess" though. As the points raised so far show, it isn't > > >> easy > > >> > >> to find a solution that balances competing requirements and > leads to > > >> a > > >> > >> net improvement. > > >> > >> > > >> > >> It would be great if we can find a setup that allows for > connectors > > >> to > > >> > >> be released independently of core Flink and that each connector > can > > >> be > > >> > >> released separately. Flink already has separate releases > > >> > >> (flink-shaded), so that by itself isn't a new thing. > Per-connector > > >> > >> releases would need to allow for more frequent releases (without > the > > >> > >> baggage that a full Flink release comes with). > > >> > >> > > >> > >> Separate releases would only make sense if the core Flink > surface is > > >> > >> fairly stable though. As evident from Iceberg (and also Beam), > that's > > >> > >> not the case currently. We should probably focus on addressing > the > > >> > >> stability first, before splitting code. A success criteria could > be > > >> > >> that we are able to build Iceberg and Beam against multiple Flink > > >> > >> versions w/o the need to change code. The goal would be that no > > >> > >> connector breaks when we make changes to Flink core. Until > that's the > > >> > >> case, code separation creates a setup where 1+1 or N+1 > repositories > > >> > >> need to move lock step. > > >> > >> > > >> > >> Regarding some connectors being more important for Flink than > others: > > >> > >> That's a fact. Flink w/o Kafka connector (and few others) isn't > > >> > >> viable. Testability of Flink was already brought up, can we > really > > >> > >> certify a Flink core release without Kafka connector? Maybe those > > >> > >> connectors that are used in Flink e2e tests to validate > functionality > > >> > >> of core Flink should not be broken out? > > >> > >> > > >> > >> Finally, I think that the connectors that move into separate > repos > > >> > >> should remain part of the Apache Flink project. Larger > organizations > > >> > >> tend to approve the use of and contribution to open source at the > > >> > >> project level. Sometimes it is everything ASF. More often it is > > >> > >> "Apache Foo". It would be fatal to end up with a patchwork of > > >> projects > > >> > >> with potentially different licenses and governance to arrive at a > > >> > >> working Flink setup. This may mean we prioritize usability over > > >> > >> developer convenience, if that's in the best interest of Flink > as a > > >> > >> whole. > > >> > >> > > >> > >> Thanks, > > >> > >> Thomas > > >> > >> > > >> > >> > > >> > >> > > >> > >> On Mon, Oct 18, 2021 at 6:59 AM Chesnay Schepler < > ches...@apache.org > > >> > > > >> > >> wrote: > > >> > >>> Generally, the issues are reproducibility and control. > > >> > >>> > > >> > >>> Stuffs completely broken on the Flink side for a week? Well > then so > > >> are > > >> > >>> the connector repos. > > >> > >>> (As-is) You can't go back to a previous version of the snapshot. > > >> Which > > >> > >>> also means that checking out older commits can be problematic > > >> because > > >> > >>> you'd still work against the latest snapshots, and they not be > > >> > >>> compatible with each other. > > >> > >>> > > >> > >>> > > >> > >>> On 18/10/2021 15:22, Arvid Heise wrote: > > >> > >>>> I was actually betting on snapshots versions. What are the > limits? > > >> > >>>> Obviously, we can only do a release of a 1.15 connector after > 1.15 > > >> is > > >> > >>>> release. > > >> > >>> > > >> > > > >> > > > >> > > > > > > > > > -- > > > > > > Konstantin Knauf > > > > > > https://twitter.com/snntrable > > > > > > https://github.com/knaufk > > > >