+1 for repo per connector from my side also Thanks for trying out the different approaches.
Where would the common/infra pieces live? In a separate repository with its own release? Thomas On Thu, Dec 9, 2021 at 12:42 PM Till Rohrmann <trohrm...@apache.org> wrote: > > Sorry if I was a bit unclear. +1 for the single repo per connector approach. > > Cheers, > Till > > On Thu, Dec 9, 2021 at 5:41 PM Till Rohrmann <trohrm...@apache.org> wrote: > > > +1 for the single repo approach. > > > > Cheers, > > Till > > > > On Thu, Dec 9, 2021 at 3:54 PM Martijn Visser <mart...@ververica.com> > > wrote: > > > >> I also agree that it feels more natural to go with a repo for each > >> individual connector. Each repository can be made available at > >> flink-packages.org so users can find them, next to referring to them in > >> documentation. +1 from my side. > >> > >> On Thu, 9 Dec 2021 at 15:38, Arvid Heise <ar...@apache.org> wrote: > >> > >> > Hi all, > >> > > >> > We tried out Chesnay's proposal and went with Option 2. Unfortunately, > >> we > >> > experienced tough nuts to crack and feel like we hit a dead end: > >> > - The main pain point with the outlined Frankensteinian connector repo > >> is > >> > how to handle shared code / infra code. If we have it in some <common> > >> > branch, then we need to merge the common branch in the connector branch > >> on > >> > update. However, it's unclear to me how improvements in the common > >> branch > >> > that naturally appear while working on a specific connector go back into > >> > the common branch. You can't use a pull request from your branch or else > >> > your connector code would poison the connector-less common branch. So > >> you > >> > would probably manually copy the files over to a common branch and > >> create a > >> > PR branch for that. > >> > - A weird solution could be to have the common branch as a submodule in > >> the > >> > repo itself (if that's even possible). I'm sure that this setup would > >> blow > >> > up the minds of all newcomers. > >> > - Similarly, it's mandatory to have safeguards against code from > >> connector > >> > A poisoning connector B, common, or main. I had some similar setup in > >> the > >> > past and code from two "distinct" branch types constantly swept over. > >> > - We could also say that we simply release <common> independently and > >> just > >> > have a maven (SNAPSHOT) dependency on it. But that would create a weird > >> > flow if you need to change in common where you need to constantly switch > >> > branches back and forth. > >> > - In general, Frankensteinian's approach is very switch intensive. If > >> you > >> > maintain 3 connectors and need to fix 1 build stability each at the same > >> > time (quite common nowadays for some reason) and you have 2 review > >> rounds, > >> > you need to switch branches 9 times ignoring changes to common. > >> > > >> > Additionally, we still have the rather user/dev unfriendly main that is > >> > mostly empty. I'm also not sure we can generate an overview README.md to > >> > make it more friendly here because in theory every connector branch > >> should > >> > be based on main and we would get merge conflicts. > >> > > >> > I'd like to propose once again to go with individual repositories. > >> > - The only downside that we discussed so far is that we have more > >> initial > >> > setup to do. Since we organically grow the number of > >> connector/repositories > >> > that load is quite distributed. We can offer templates after finding a > >> good > >> > approach that can even be used by outside organizations. > >> > - Regarding secrets, I think it's actually an advantage that the Kafka > >> > connector has no access to the AWS secrets. If there are secrets to be > >> > shared across connectors, we can and should use Azure's Variable Groups > >> (I > >> > have used it in the past to share Nexus creds across repos). That would > >> > also make rotation easy. > >> > - Working on different connectors would be rather easy as all modern IDE > >> > support multiple repo setups in the same project. You still need to do > >> > multiple releases in case you update common code (either accessed > >> through > >> > Nexus or git submodule) and you want to release your connector. > >> > - There is no difference in respect to how many CI runs there in both > >> > approaches. > >> > - Individual repositories also have the advantage of allowing external > >> > incubation. Let's assume someone builds connector A and hosts it in > >> their > >> > organization (very common setup). If they want to contribute the code to > >> > Flink, we could simply transfer the repository into ASF after ensuring > >> > Flink coding standards. Then we retain git history and Github issues. > >> > > >> > Is there any point that I'm missing? > >> > > >> > On Fri, Nov 26, 2021 at 1:32 PM Chesnay Schepler <ches...@apache.org> > >> > wrote: > >> > > >> > > For sharing workflows we should be able to use composite actions. We'd > >> > > have the main definition files in the flink-connectors repo, that we > >> > > also need to tag/release, which other branches/repos can then import. > >> > > These are also versioned, so we don't have to worry about accidentally > >> > > breaking stuff. > >> > > These could also be used to enforce certain standards / interfaces > >> such > >> > > that we can automate more things (e.g., integration into the Flink > >> > > documentation). > >> > > > >> > > It is true that Option 2) and dedicated repositories share a lot of > >> > > properties. While I did say in an offline conversation that we in that > >> > > case might just as well use separate repositories, I'm not so sure > >> > > anymore. One repo would make administration a bit easier, for example > >> > > secrets wouldn't have to be applied to each repo (we wouldn't want > >> > > certain secrets to be set up organization-wide). > >> > > I overall also like that one repo would present a single access point; > >> > > you can't "miss" a connector repo, and I would hope that having it as > >> > > one repo would nurture more collaboration between the connectors, > >> which > >> > > after all need to solve similar problems. > >> > > > >> > > It is a fair point that the branching model would be quite weird, but > >> I > >> > > think that would subside pretty quickly. > >> > > > >> > > Personally I'd go with Option 2, and if that doesn't work out we can > >> > > still split the repo later on. (Which should then be a trivial matter > >> of > >> > > copying all <connector>/* branches and renaming them). > >> > > > >> > > On 26/11/2021 12:47, Till Rohrmann wrote: > >> > > > Hi Arvid, > >> > > > > >> > > > Thanks for updating this thread with the latest findings. The > >> described > >> > > > limitations for a single connector repo sound suboptimal to me. > >> > > > > >> > > > * Option 2. sounds as if we try to simulate multi connector repos > >> > inside > >> > > of > >> > > > a single repo. I also don't know how we would share code between the > >> > > > different branches (sharing infrastructure would probably be easier > >> > > > though). This seems to have the same limitations as dedicated repos > >> > with > >> > > > the downside of having a not very intuitive branching model. > >> > > > * Isn't option 1. kind of a degenerated version of option 2. where > >> we > >> > > have > >> > > > some unrelated code from other connectors in the individual > >> connector > >> > > > branches? > >> > > > * Option 3. has the downside that someone creating a release has to > >> > > release > >> > > > all connectors. This means that she either has to sync with the > >> > different > >> > > > connector maintainers or has to be able to release all connectors on > >> > her > >> > > > own. We are already seeing in the Flink community that releases > >> require > >> > > > quite good communication/coordination between the different people > >> > > working > >> > > > on different Flink components. Given our goals to make connector > >> > releases > >> > > > easier and more frequent, I think that coupling different connector > >> > > > releases might be counter-productive. > >> > > > > >> > > > To me it sounds not very practical to mainly use a mono repository > >> w/o > >> > > > having some more advanced build infrastructure that, for example, > >> > allows > >> > > to > >> > > > have different git roots in different connector directories. Maybe > >> the > >> > > mono > >> > > > repo can be a catch all repository for connectors that want to be > >> > > released > >> > > > in lock-step (Option 3.) with all other connectors the repo > >> contains. > >> > But > >> > > > for connectors that get changed frequently, having a dedicated > >> > repository > >> > > > that allows independent releases sounds preferable to me. > >> > > > > >> > > > What utilities and infrastructure code do you intend to share? Using > >> > git > >> > > > submodules can definitely be one option to share code. However, it > >> > might > >> > > > also be ok to depend on flink-connector-common artifacts which could > >> > make > >> > > > things easier. Where I am unsure is whether git submodules can be > >> used > >> > to > >> > > > share infrastructure code (e.g. the .github/workflows) because you > >> need > >> > > > these files in the repo to trigger the CI infrastructure. > >> > > > > >> > > > Cheers, > >> > > > Till > >> > > > > >> > > > On Thu, Nov 25, 2021 at 1:59 PM Arvid Heise <ar...@apache.org> > >> wrote: > >> > > > > >> > > >> Hi Brian, > >> > > >> > >> > > >> Thank you for sharing. I think your approach is very valid and is > >> in > >> > > line > >> > > >> with what I had in mind. > >> > > >> > >> > > >> Basically Pravega community aligns the connector releases with the > >> > > Pravega > >> > > >>> mainline release > >> > > >>> > >> > > >> This certainly would mean that there is little value in coupling > >> > > connector > >> > > >> versions. So it's making a good case for having separate connector > >> > > repos. > >> > > >> > >> > > >> > >> > > >>> and maintains the connector with the latest 3 Flink versions(CI > >> will > >> > > >>> publish snapshots for all these 3 branches) > >> > > >>> > >> > > >> I'd like to give connector devs a simple way to express to which > >> Flink > >> > > >> versions the current branch is compatible. From there we can > >> generate > >> > > the > >> > > >> compatibility matrix automatically and optionally also create > >> > different > >> > > >> releases per supported Flink version. Not sure if the latter is > >> indeed > >> > > >> better than having just one artifact that happens to run with > >> multiple > >> > > >> Flink versions. I guess it depends on what dependencies we are > >> > > exposing. If > >> > > >> the connector uses flink-connector-base, then we probably need > >> > separate > >> > > >> artifacts with poms anyways. > >> > > >> > >> > > >> Best, > >> > > >> > >> > > >> Arvid > >> > > >> > >> > > >> On Fri, Nov 19, 2021 at 10:55 AM Zhou, Brian <b.z...@dell.com> > >> wrote: > >> > > >> > >> > > >>> Hi Arvid, > >> > > >>> > >> > > >>> For branching model, the Pravega Flink connector has some > >> experience > >> > > what > >> > > >>> I would like to share. Here[1][2] is the compatibility matrix and > >> > wiki > >> > > >>> explaining the branching model and releases. Basically Pravega > >> > > community > >> > > >>> aligns the connector releases with the Pravega mainline release, > >> and > >> > > >>> maintains the connector with the latest 3 Flink versions(CI will > >> > > publish > >> > > >>> snapshots for all these 3 branches). > >> > > >>> For example, recently we have 0.10.1 release[3], and in maven > >> central > >> > > we > >> > > >>> need to upload three artifacts(For Flink 1.13, 1.12, 1.11) for > >> 0.10.1 > >> > > >>> version[4]. > >> > > >>> > >> > > >>> There are some alternatives. Another solution that we once > >> discussed > >> > > but > >> > > >>> finally got abandoned is to have a independent version just like > >> the > >> > > >>> current CDC connector, and then give a big compatibility matrix to > >> > > users. > >> > > >>> We think it would be too confusing when the connector develops. On > >> > the > >> > > >>> contrary, we can also do the opposite way to align with Flink > >> version > >> > > and > >> > > >>> maintain several branches for different system version. > >> > > >>> > >> > > >>> I would say this is only a fairly-OK solution because it is a bit > >> > > painful > >> > > >>> for maintainers as cherry-picks are very common and releases would > >> > > >> require > >> > > >>> much work. However, if neither systems do not have a nice backward > >> > > >>> compatibility, there seems to be no comfortable solution to the > >> their > >> > > >>> connector. > >> > > >>> > >> > > >>> [1] > >> https://github.com/pravega/flink-connectors#compatibility-matrix > >> > > >>> [2] > >> > > >>> > >> > > >> > >> > > > >> > > >> https://github.com/pravega/flink-connectors/wiki/Versioning-strategy-for-Flink-connector > >> > > >>> [3] > >> https://github.com/pravega/flink-connectors/releases/tag/v0.10.1 > >> > > >>> [4] https://search.maven.org/search?q=pravega-connectors-flink > >> > > >>> > >> > > >>> Best Regards, > >> > > >>> Brian > >> > > >>> > >> > > >>> > >> > > >>> Internal Use - Confidential > >> > > >>> > >> > > >>> -----Original Message----- > >> > > >>> From: Arvid Heise <ar...@apache.org> > >> > > >>> Sent: Friday, November 19, 2021 4:12 PM > >> > > >>> To: dev > >> > > >>> Subject: Re: [DISCUSS] Creating an external connector repository > >> > > >>> > >> > > >>> > >> > > >>> [EXTERNAL EMAIL] > >> > > >>> > >> > > >>> Hi everyone, > >> > > >>> > >> > > >>> we are currently in the process of setting up the flink-connectors > >> > repo > >> > > >>> [1] for new connectors but we hit a wall that we currently cannot > >> > take: > >> > > >>> branching model. > >> > > >>> To reiterate the original motivation of the external connector > >> repo: > >> > We > >> > > >>> want to decouple the release cycle of a connector with Flink. > >> > However, > >> > > if > >> > > >>> we want to support semantic versioning in the connectors with the > >> > > ability > >> > > >>> to introduce breaking changes through major version bumps and > >> support > >> > > >>> bugfixes on old versions, then we need release branches similar to > >> > how > >> > > >>> Flink core operates. > >> > > >>> Consider two connectors, let's call them kafka and hbase. We have > >> > kafka > >> > > >> in > >> > > >>> version 1.0.X, 1.1.Y (small improvement), 2.0.Z (config option) > >> > change > >> > > >> and > >> > > >>> hbase only on 1.0.A. > >> > > >>> > >> > > >>> Now our current assumption was that we can work with a mono-repo > >> > under > >> > > >> ASF > >> > > >>> (flink-connectors). Then, for release-branches, we found 3 > >> options: > >> > > >>> 1. We would need to create some ugly mess with the cross product > >> of > >> > > >>> connector and version: so you have kafka-release-1.0, > >> > > kafka-release-1.1, > >> > > >>> kafka-release-2.0, hbase-release-1.0. The main issue is not the > >> > amount > >> > > of > >> > > >>> branches (that's something that git can handle) but there the > >> state > >> > of > >> > > >>> kafka is undefined in hbase-release-1.0. That's a call for > >> desaster > >> > and > >> > > >>> makes releasing connectors very cumbersome (CI would only execute > >> and > >> > > >>> publish hbase SNAPSHOTS on hbase-release-1.0). > >> > > >>> 2. We could avoid the undefined state by having an empty master > >> and > >> > > each > >> > > >>> release branch really only holds the code of the connector. But > >> > that's > >> > > >> also > >> > > >>> not great: any user that looks at the repo and sees no connector > >> > would > >> > > >>> assume that it's dead. > >> > > >>> 3. We could have synced releases similar to the CDC connectors > >> [2]. > >> > > That > >> > > >>> means that if any connector introduces a breaking change, all > >> > > connectors > >> > > >>> get a new major. I find that quite confusing to a user if hbase > >> gets > >> > a > >> > > >> new > >> > > >>> release without any change because kafka introduced a breaking > >> > change. > >> > > >>> > >> > > >>> To fully decouple release cycles and CI of connectors, we could > >> add > >> > > >>> individual repositories under ASF (flink-connector-kafka, > >> > > >>> flink-connector-hbase). Then we can apply the same branching > >> model as > >> > > >>> before. I quickly checked if there are precedences in the apache > >> > > >> community > >> > > >>> for that approach and just by scanning alphabetically I found > >> cordova > >> > > >> with > >> > > >>> 70 and couchdb with 77 apache repos respectively. So it certainly > >> > seems > >> > > >>> like other projects approached our problem in that way and the > >> apache > >> > > >>> organization is okay with that. I currently expect max 20 > >> additional > >> > > >> repos > >> > > >>> for connectors and in the future 10 max each for formats and > >> > > filesystems > >> > > >> if > >> > > >>> we would also move them out at some point in time. So we would be > >> at > >> > a > >> > > >>> total of 50 repos. > >> > > >>> > >> > > >>> Note for all options, we need to provide a compability matrix > >> that we > >> > > aim > >> > > >>> to autogenerate. > >> > > >>> > >> > > >>> Now for the potential downsides that we internally discussed: > >> > > >>> - How can we ensure common infra structure code, utilties, and > >> > quality? > >> > > >>> I propose to add a flink-connector-common that contains all these > >> > > things > >> > > >>> and is added as a git submodule/subtree to the repos. > >> > > >>> - Do we implicitly discourage connector developers to maintain > >> more > >> > > than > >> > > >>> one connector with a fragmented code base? > >> > > >>> That is certainly a risk. However, I currently also see few devs > >> > > working > >> > > >>> on more than one connector. However, it may actually help keeping > >> the > >> > > >> devs > >> > > >>> that maintain a specific connector on the hook. We could use > >> github > >> > > >> issues > >> > > >>> to track bugs and feature requests and a dev can focus his limited > >> > time > >> > > >> on > >> > > >>> getting that one connector right. > >> > > >>> > >> > > >>> So WDYT? Compared to some intermediate suggestions with split > >> repos, > >> > > the > >> > > >>> big difference is that everything remains under Apache umbrella > >> and > >> > the > >> > > >>> Flink community. > >> > > >>> > >> > > >>> [1] > >> > > >>> > >> > > >> > >> > > > >> > > >> https://urldefense.com/v3/__https://github.com/apache/flink-connectors__;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-XjpYgXzxxweh4$ > >> > > >>> [github[.]com] [2] > >> > > >>> > >> > > >> > >> > > > >> > > >> https://urldefense.com/v3/__https://github.com/ververica/flink-cdc-connectors/__;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-XjpYgXzgoPGA8$ > >> > > >>> [github[.]com] > >> > > >>> > >> > > >>> On Fri, Nov 12, 2021 at 3:39 PM Arvid Heise <ar...@apache.org> > >> > wrote: > >> > > >>> > >> > > >>>> Hi everyone, > >> > > >>>> > >> > > >>>> I created the flink-connectors repo [1] to advance the topic. We > >> > would > >> > > >>>> create a proof-of-concept in the next few weeks as a special > >> branch > >> > > >>>> that I'd then use for discussions. If the community agrees with > >> the > >> > > >>>> approach, that special branch will become the master. If not, we > >> can > >> > > >>>> reiterate over it or create competing POCs. > >> > > >>>> > >> > > >>>> If someone wants to try things out in parallel, just make sure > >> that > >> > > >>>> you are not accidentally pushing POCs to the master. > >> > > >>>> > >> > > >>>> As a reminder: We will not move out any current connector from > >> Flink > >> > > >>>> at this point in time, so everything in Flink will remain as is > >> and > >> > be > >> > > >>>> maintained there. > >> > > >>>> > >> > > >>>> Best, > >> > > >>>> > >> > > >>>> Arvid > >> > > >>>> > >> > > >>>> [1] > >> > > >>>> > >> > > > >> https://urldefense.com/v3/__https://github.com/apache/flink-connectors > >> > > >>>> > >> > __;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-XjpYgXzxxweh4 > >> > > >>>> $ [github[.]com] > >> > > >>>> > >> > > >>>> On Fri, Oct 29, 2021 at 6:57 PM Till Rohrmann < > >> trohrm...@apache.org > >> > > > >> > > >>>> wrote: > >> > > >>>> > >> > > >>>>> Hi everyone, > >> > > >>>>> > >> > > >>>>> From the discussion, it seems to me that we have different > >> > opinions > >> > > >>>>> whether to have an ASF umbrella repository or to host them > >> outside > >> > of > >> > > >>>>> the ASF. It also seems that this is not really the problem to > >> > solve. > >> > > >>>>> Since there are many good arguments for either approach, we > >> could > >> > > >>>>> simply start with an ASF umbrella repository and see how people > >> > adopt > >> > > >>>>> it. If the individual connectors cannot move fast enough or if > >> > people > >> > > >>>>> prefer to not buy into the more heavy-weight ASF processes, then > >> > they > >> > > >>>>> can host the code also somewhere else. We simply need to make > >> sure > >> > > >>>>> that these connectors are discoverable (e.g. via > >> flink-packages). > >> > > >>>>> > >> > > >>>>> The more important problem seems to be to provide common tooling > >> > > >>>>> (testing, infrastructure, documentation) that can easily be > >> reused. > >> > > >>>>> Similarly, it has become clear that the Flink community needs to > >> > > >>>>> improve on providing stable APIs. I think it is not realistic to > >> > > >>>>> first complete these tasks before starting to move connectors to > >> > > >>>>> dedicated repositories. As Stephan said, creating a connector > >> > > >>>>> repository will force us to pay more attention to API stability > >> and > >> > > >>>>> also to think about which testing tools are required. Hence, I > >> > > >>>>> believe that starting to add connectors to a different > >> repository > >> > > >>>>> than apache/flink will help improve our connector tooling > >> > (declaring > >> > > >>>>> testing classes as public, creating a common test utility repo, > >> > > >>>>> creating a repo > >> > > >>>>> template) and vice versa. Hence, I like Arvid's proposed > >> process as > >> > > >>>>> it will start kicking things off w/o letting this effort fizzle > >> > out. > >> > > >>>>> > >> > > >>>>> Cheers, > >> > > >>>>> Till > >> > > >>>>> > >> > > >>>>> On Thu, Oct 28, 2021 at 11:44 AM Stephan Ewen <se...@apache.org > >> > > >> > > >> wrote: > >> > > >>>>>> Thank you all, for the nice discussion! > >> > > >>>>>> > >> > > >>>>>> From my point of view, I very much like the idea of putting > >> > > >>>>>> connectors > >> > > >>>>> in a > >> > > >>>>>> separate repository. But I would argue it should be part of > >> Apache > >> > > >>>>> Flink, > >> > > >>>>>> similar to flink-statefun, flink-ml, etc. > >> > > >>>>>> > >> > > >>>>>> I share many of the reasons for that: > >> > > >>>>>> - As argued many times, reduces complexity of the Flink > >> repo, > >> > > >>>>> increases > >> > > >>>>>> response times of CI, etc. > >> > > >>>>>> - Much lower barrier of contribution, because an unstable > >> > > >>>>>> connector > >> > > >>>>> would > >> > > >>>>>> not de-stabilize the whole build. Of course, we would need to > >> make > >> > > >>>>>> sure > >> > > >>>>> we > >> > > >>>>>> set this up the right way, with connectors having individual CI > >> > > >>>>>> runs, > >> > > >>>>> build > >> > > >>>>>> status, etc. But it certainly seems possible. > >> > > >>>>>> > >> > > >>>>>> > >> > > >>>>>> I would argue some points a bit different than some cases made > >> > > >> before: > >> > > >>>>>> (a) I believe the separation would increase connector > >> stability. > >> > > >>>>> Because it > >> > > >>>>>> really forces us to work with the connectors against the APIs > >> like > >> > > >>>>>> any external developer. A mono repo is somehow the wrong thing > >> if > >> > > >>>>>> you in practice want to actually guarantee stable internal > >> APIs at > >> > > >>> some layer. > >> > > >>>>>> Because the mono repo makes it easy to just change something on > >> > > >>>>>> both > >> > > >>>>> sides > >> > > >>>>>> of the API (provider and consumer) seamlessly. > >> > > >>>>>> > >> > > >>>>>> Major refactorings in Flink need to keep all connector API > >> > > >>>>>> contracts intact, or we need to have a new version of the > >> > connector > >> > > >>> API. > >> > > >>>>>> (b) We may even be able to go towards more lightweight and > >> > > >>>>>> automated releases over time, even if we stay in Apache Flink > >> with > >> > > >>> that repo. > >> > > >>>>>> This isn't yet fully aligned with the Apache release policies, > >> > yet, > >> > > >>>>>> but there are board discussions about whether there can be > >> > > >>>>>> bot-triggered releases (by dependabot) and how that could fit > >> into > >> > > >>> the Apache process. > >> > > >>>>>> This doesn't seem to be quite there just yet, but seeing that > >> > those > >> > > >>>>> start > >> > > >>>>>> is a good sign, and there is a good chance we can do some > >> things > >> > > >>> there. > >> > > >>>>>> I am not sure whether we should let bots trigger releases, > >> because > >> > > >>>>>> a > >> > > >>>>> final > >> > > >>>>>> human look at things isn't a bad thing, especially given the > >> > > >>>>>> popularity > >> > > >>>>> of > >> > > >>>>>> software supply chain attacks recently. > >> > > >>>>>> > >> > > >>>>>> > >> > > >>>>>> I do share Chesnay's concerns about complexity in tooling, > >> though. > >> > > >>>>>> Both release tooling and test tooling. They are not > >> incompatible > >> > > >>>>>> with that approach, but they are a task we need to tackle > >> during > >> > > >>>>>> this change which will add additional work. > >> > > >>>>>> > >> > > >>>>>> > >> > > >>>>>> > >> > > >>>>>> On Tue, Oct 26, 2021 at 10:31 AM Arvid Heise <ar...@apache.org > >> > > >> > > >>> wrote: > >> > > >>>>>>> Hi folks, > >> > > >>>>>>> > >> > > >>>>>>> I think some questions came up and I'd like to address the > >> > > >>>>>>> question of > >> > > >>>>>> the > >> > > >>>>>>> timing. > >> > > >>>>>>> > >> > > >>>>>>> Could you clarify what release cadence you're thinking of? > >> > > >>>>>>> There's > >> > > >>>>> quite > >> > > >>>>>>>> a big range that fits "more frequent than Flink" (per-commit, > >> > > >>>>>>>> daily, weekly, bi-weekly, monthly, even bi-monthly). > >> > > >>>>>>> The short answer is: as often as needed: > >> > > >>>>>>> - If there is a CVE in a dependency and we need to bump it - > >> > > >>>>>>> release immediately. > >> > > >>>>>>> - If there is a new feature merged, release soonish. We may > >> > > >>>>>>> collect a > >> > > >>>>> few > >> > > >>>>>>> successive features before a release. > >> > > >>>>>>> - If there is a bugfix, release immediately or soonish > >> depending > >> > > >>>>>>> on > >> > > >>>>> the > >> > > >>>>>>> severity and if there are workarounds available. > >> > > >>>>>>> > >> > > >>>>>>> We should not limit ourselves; the whole idea of independent > >> > > >>>>>>> releases > >> > > >>>>> is > >> > > >>>>>>> exactly that you release as needed. There is no release > >> planning > >> > > >>>>>>> or anything needed, you just go with a release as if it was an > >> > > >>>>>>> external artifact. > >> > > >>>>>>> > >> > > >>>>>>> (1) is the connector API already stable? > >> > > >>>>>>>> From another discussion thread [1], connector API is far > >> from > >> > > >>>>> stable. > >> > > >>>>>>>> Currently, it's hard to build connectors against multiple > >> Flink > >> > > >>>>>> versions. > >> > > >>>>>>>> There are breaking API changes both in 1.12 -> 1.13 and 1.13 > >> -> > >> > > >>>>>>>> 1.14 > >> > > >>>>>> and > >> > > >>>>>>>> maybe also in the future versions, because Table related > >> APIs > >> > > >>>>>>>> are > >> > > >>>>>> still > >> > > >>>>>>>> @PublicEvolving and new Sink API is still @Experimental. > >> > > >>>>>>>> > >> > > >>>>>>> The question is: what is stable in an evolving system? We > >> > > >>>>>>> recently discovered that the old SourceFunction needed to be > >> > > >>>>>>> refined such that cancellation works correctly [1]. So that > >> > > >>>>>>> interface is in Flink since > >> > > >>>>> 7 > >> > > >>>>>>> years, heavily used also outside, and we still had to change > >> the > >> > > >>>>> contract > >> > > >>>>>>> in a way that I'd expect any implementer to recheck their > >> > > >>>>> implementation. > >> > > >>>>>>> It might not be necessary to change anything and you can > >> probably > >> > > >>>>> change > >> > > >>>>>>> the the code for all Flink versions but still, the interface > >> was > >> > > >>>>>>> not > >> > > >>>>>> stable > >> > > >>>>>>> in the closest sense. > >> > > >>>>>>> > >> > > >>>>>>> If we focus just on API changes on the unified interfaces, > >> then > >> > > >>>>>>> we > >> > > >>>>> expect > >> > > >>>>>>> one more change to Sink API to support compaction. For Table > >> API, > >> > > >>>>> there > >> > > >>>>>>> will most likely also be some changes in 1.15. So we could > >> wait > >> > > >>>>>>> for > >> > > >>>>> 1.15. > >> > > >>>>>>> But I'm questioning if that's really necessary because we will > >> > > >>>>>>> add > >> > > >>>>> more > >> > > >>>>>>> functionality beyond 1.15 without breaking API. For example, > >> we > >> > > >>>>>>> may > >> > > >>>>> add > >> > > >>>>>>> more unified connector metrics. If you want to use it in your > >> > > >>>>> connector, > >> > > >>>>>>> you have to support multiple Flink versions anyhow. So rather > >> > > >>>>>>> then > >> > > >>>>>> focusing > >> > > >>>>>>> the discussion on "when is stuff stable", I'd rather focus on > >> > > >>>>>>> "how > >> > > >>>>> can we > >> > > >>>>>>> support building connectors against multiple Flink versions" > >> and > >> > > >>>>>>> make > >> > > >>>>> it > >> > > >>>>>> as > >> > > >>>>>>> painless as possible. > >> > > >>>>>>> > >> > > >>>>>>> Chesnay pointed out to use different branches for different > >> Flink > >> > > >>>>>> versions > >> > > >>>>>>> which sounds like a good suggestion. With a mono-repo, we > >> can't > >> > > >>>>>>> use branches differently anyways (there is no way to have > >> release > >> > > >>>>>>> branches > >> > > >>>>>> per > >> > > >>>>>>> connector without chaos). In these branches, we could provide > >> > > >>>>>>> shims to simulate future features in older Flink versions such > >> > > >>>>>>> that code-wise, > >> > > >>>>> the > >> > > >>>>>>> source code of a specific connector may not diverge (much). > >> For > >> > > >>>>> example, > >> > > >>>>>> to > >> > > >>>>>>> register unified connector metrics, we could simulate the > >> current > >> > > >>>>>> approach > >> > > >>>>>>> also in some utility package of the mono-repo. > >> > > >>>>>>> > >> > > >>>>>>> I see the stable core Flink API as a prerequisite for > >> modularity. > >> > > >>>>>>> And > >> > > >>>>>>>> for connectors it is not just the source and sink API (source > >> > > >>>>>>>> being stable as of 1.14), but everything that is required to > >> > > >>>>>>>> build and maintain a connector downstream, such as the test > >> > > >>>>>>>> utilities and infrastructure. > >> > > >>>>>>>> > >> > > >>>>>>> That is a very fair point. I'm actually surprised to see that > >> > > >>>>>>> MiniClusterWithClientResource is not public. I see it being > >> used > >> > > >>>>>>> in > >> > > >>>>> all > >> > > >>>>>>> connectors, especially outside of Flink. I fear that as long > >> as > >> > > >>>>>>> we do > >> > > >>>>> not > >> > > >>>>>>> have connectors outside, we will not properly annotate and > >> > > >>>>>>> maintain > >> > > >>>>> these > >> > > >>>>>>> utilties in a classic hen-and-egg-problem. I will outline an > >> idea > >> > > >>>>>>> at > >> > > >>>>> the > >> > > >>>>>>> end. > >> > > >>>>>>> > >> > > >>>>>>>> the connectors need to be adopted and require at least one > >> > > >>>>>>>> release > >> > > >>>>> per > >> > > >>>>>>>> Flink minor release. > >> > > >>>>>>>> However, this will make the releases of connectors slower, > >> e.g. > >> > > >>>>>> maintain > >> > > >>>>>>>> features for multiple branches and release multiple branches. > >> > > >>>>>>>> I think the main purpose of having an external connector > >> > > >>>>>>>> repository > >> > > >>>>> is > >> > > >>>>>> in > >> > > >>>>>>>> order to have "faster releases of connectors"? > >> > > >>>>>>>> > >> > > >>>>>>>> Imagine a project with a complex set of dependencies. Let's > >> say > >> > > >>>>> Flink > >> > > >>>>>>>> version A plus Flink reliant dependencies released by other > >> > > >>>>>>>> projects (Flink-external connectors, Beam, Iceberg, Hudi, > >> ..). > >> > > >>>>>>>> We don't want > >> > > >>>>> a > >> > > >>>>>>>> situation where we bump the core Flink version to B and > >> things > >> > > >>>>>>>> fall apart (interface changes, utilities that were useful but > >> > > >>>>>>>> not public, transitive dependencies etc.). > >> > > >>>>>>>> > >> > > >>>>>>> Yes, that's why I wanted to automate the processes more which > >> is > >> > > >>>>>>> not > >> > > >>>>> that > >> > > >>>>>>> easy under ASF. Maybe we automate the source provision across > >> > > >>>>> supported > >> > > >>>>>>> versions and have 1 vote thread for all versions of a > >> connector? > >> > > >>>>>>> > >> > > >>>>>>> From the perspective of CDC connector maintainers, the > >> biggest > >> > > >>>>> advantage > >> > > >>>>>> of > >> > > >>>>>>>> maintaining it outside of the Flink project is that: > >> > > >>>>>>>> 1) we can have a more flexible and faster release cycle > >> > > >>>>>>>> 2) we can be more liberal with committership for connector > >> > > >>>>> maintainers > >> > > >>>>>>>> which can also attract more committers to help the release. > >> > > >>>>>>>> > >> > > >>>>>>>> Personally, I think maintaining one connector repository > >> under > >> > > >>>>>>>> the > >> > > >>>>> ASF > >> > > >>>>>>> may > >> > > >>>>>>>> not have the above benefits. > >> > > >>>>>>>> > >> > > >>>>>>> Yes, I also feel that ASF is too restrictive for our needs. > >> But > >> > > >>>>>>> it > >> > > >>>>> feels > >> > > >>>>>>> like there are too many that see it differently and I think we > >> > > >>>>>>> need > >> > > >>>>>>> > >> > > >>>>>>> (2) Flink testability without connectors. > >> > > >>>>>>>> This is a very good question. How can we guarantee the new > >> > > >>>>>>>> Source > >> > > >>>>> and > >> > > >>>>>>> Sink > >> > > >>>>>>>> API are stable with only test implementation? > >> > > >>>>>>>> > >> > > >>>>>>> We can't and shouldn't. Since the connector repo is managed by > >> > > >>>>>>> Flink, > >> > > >>>>> a > >> > > >>>>>>> Flink release manager needs to check if the Flink connectors > >> are > >> > > >>>>> actually > >> > > >>>>>>> working prior to creating an RC. That's similar to how > >> > > >>>>>>> flink-shaded > >> > > >>>>> and > >> > > >>>>>>> flink core are related. > >> > > >>>>>>> > >> > > >>>>>>> > >> > > >>>>>>> So here is one idea that I had to get things rolling. We are > >> > > >>>>>>> going to address the external repo iteratively without > >> > > >>>>>>> compromising what we > >> > > >>>>>> already > >> > > >>>>>>> have: > >> > > >>>>>>> 1.Phase, add new contributions to external repo. We use that > >> time > >> > > >>>>>>> to > >> > > >>>>>> setup > >> > > >>>>>>> infra accordingly and optimize release processes. We will > >> > > >>>>>>> identify > >> > > >>>>> test > >> > > >>>>>>> utilities that are not yet public/stable and fix that. > >> > > >>>>>>> 2.Phase, add ports to the new unified interfaces of existing > >> > > >>>>> connectors. > >> > > >>>>>>> That requires a previous Flink release to make utilities > >> stable. > >> > > >>>>>>> Keep > >> > > >>>>> old > >> > > >>>>>>> interfaces in flink-core. > >> > > >>>>>>> 3.Phase, remove old interfaces in flink-core of some > >> connectors > >> > > >>>>>>> (tbd > >> > > >>>>> at a > >> > > >>>>>>> later point). > >> > > >>>>>>> 4.Phase, optionally move all remaining connectors (tbd at a > >> later > >> > > >>>>> point). > >> > > >>>>>>> I'd envision having ~3 months between the starting the > >> different > >> > > >>>>> phases. > >> > > >>>>>>> WDYT? > >> > > >>>>>>> > >> > > >>>>>>> > >> > > >>>>>>> [1] > >> > > >>>>>>> > >> > https://urldefense.com/v3/__https://issues.apache.org/jira/browse > >> > > >>>>>>> > >> /FLINK-23527__;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgd > >> > > >>>>>>> ke_-XjpYgX2sIvAP4$ [issues[.]apache[.]org] > >> > > >>>>>>> > >> > > >>>>>>> On Thu, Oct 21, 2021 at 7:12 AM Kyle Bendickson < > >> k...@tabular.io > >> > > > >> > > >>>>> wrote: > >> > > >>>>>>>> Hi all, > >> > > >>>>>>>> > >> > > >>>>>>>> My name is Kyle and I’m an open source developer primarily > >> > > >>>>>>>> focused > >> > > >>>>> on > >> > > >>>>>>>> Apache Iceberg. > >> > > >>>>>>>> > >> > > >>>>>>>> I’m happy to help clarify or elaborate on any aspect of our > >> > > >>>>> experience > >> > > >>>>>>>> working on a relatively decoupled connector that is > >> downstream > >> > > >>>>>>>> and > >> > > >>>>>> pretty > >> > > >>>>>>>> popular. > >> > > >>>>>>>> > >> > > >>>>>>>> I’d also love to be able to contribute or assist in any way I > >> > > >> can. > >> > > >>>>>>>> I don’t mean to thread jack, but are there any meetings or > >> > > >>>>>>>> community > >> > > >>>>>> sync > >> > > >>>>>>>> ups, specifically around the connector APIs, that I might > >> join > >> > > >>>>>>>> / be > >> > > >>>>>>> invited > >> > > >>>>>>>> to? > >> > > >>>>>>>> > >> > > >>>>>>>> I did want to add that even though I’ve experienced some of > >> the > >> > > >>>>>>>> pain > >> > > >>>>>>> points > >> > > >>>>>>>> of integrating with an evolving system / API (catalog support > >> > > >>>>>>>> is > >> > > >>>>>>> generally > >> > > >>>>>>>> speaking pretty new everywhere really in this space), I also > >> > > >>>>>>>> agree personally that you shouldn’t slow down development > >> > > >>>>>>>> velocity too > >> > > >>>>> much > >> > > >>>>>> for > >> > > >>>>>>>> the sake of external connector. Getting to a performant and > >> > > >>>>>>>> stable > >> > > >>>>>> place > >> > > >>>>>>>> should be the primary goal, and slowing that down to support > >> > > >>>>> stragglers > >> > > >>>>>>>> will (in my personal opinion) always be a losing game. Some > >> > > >>>>>>>> folks > >> > > >>>>> will > >> > > >>>>>>>> simply stay behind on versions regardless until they have to > >> > > >>>>> upgrade. > >> > > >>>>>>>> I am working on ensuring that the Iceberg community stays > >> > > >>>>>>>> within 1-2 versions of Flink, so that we can help provide > >> more > >> > > >>>>>>>> feedback or > >> > > >>>>>>> contribute > >> > > >>>>>>>> things that might make our ability to support multiple Flink > >> > > >>>>> runtimes / > >> > > >>>>>>>> versions with one project / codebase and minimal to no > >> > > >>>>>>>> reflection > >> > > >>>>> (our > >> > > >>>>>>>> desired goal). > >> > > >>>>>>>> > >> > > >>>>>>>> If there’s anything I can do or any way I can be of > >> assistance, > >> > > >>>>> please > >> > > >>>>>>>> don’t hesitate to reach out. Or find me on ASF slack 😀 > >> > > >>>>>>>> > >> > > >>>>>>>> I greatly appreciate your general concern for the needs of > >> > > >>>>> downstream > >> > > >>>>>>>> connector integrators! > >> > > >>>>>>>> > >> > > >>>>>>>> Cheers > >> > > >>>>>>>> Kyle Bendickson (GitHub: kbendick) Open Source Developer kyle > >> > > >>>>>>>> [at] tabular [dot] io > >> > > >>>>>>>> > >> > > >>>>>>>> On Wed, Oct 20, 2021 at 11:35 AM Thomas Weise < > >> t...@apache.org> > >> > > >>>>> wrote: > >> > > >>>>>>>>> Hi, > >> > > >>>>>>>>> > >> > > >>>>>>>>> I see the stable core Flink API as a prerequisite for > >> > > >>> modularity. > >> > > >>>>> And > >> > > >>>>>>>>> for connectors it is not just the source and sink API > >> (source > >> > > >>>>> being > >> > > >>>>>>>>> stable as of 1.14), but everything that is required to build > >> > > >>>>>>>>> and maintain a connector downstream, such as the test > >> > > >>>>>>>>> utilities and infrastructure. > >> > > >>>>>>>>> > >> > > >>>>>>>>> Without the stable surface of core Flink, changes will leak > >> > > >>>>>>>>> into downstream dependencies and force lock step updates. > >> > > >>>>>>>>> Refactoring across N repos is more painful than a single > >> > > >>>>>>>>> repo. Those with experience developing downstream of Flink > >> > > >>>>>>>>> will know the pain, and > >> > > >>>>>> that > >> > > >>>>>>>>> isn't limited to connectors. I don't remember a Flink "minor > >> > > >>>>> version" > >> > > >>>>>>>>> update that was just a dependency version change and did not > >> > > >>>>>>>>> force other downstream changes. > >> > > >>>>>>>>> > >> > > >>>>>>>>> Imagine a project with a complex set of dependencies. Let's > >> > > >>>>>>>>> say > >> > > >>>>> Flink > >> > > >>>>>>>>> version A plus Flink reliant dependencies released by other > >> > > >>>>> projects > >> > > >>>>>>>>> (Flink-external connectors, Beam, Iceberg, Hudi, ..). We > >> > > >>>>>>>>> don't > >> > > >>>>> want a > >> > > >>>>>>>>> situation where we bump the core Flink version to B and > >> > > >>>>>>>>> things > >> > > >>>>> fall > >> > > >>>>>>>>> apart (interface changes, utilities that were useful but not > >> > > >>>>> public, > >> > > >>>>>>>>> transitive dependencies etc.). > >> > > >>>>>>>>> > >> > > >>>>>>>>> The discussion here also highlights the benefits of keeping > >> > > >>>>> certain > >> > > >>>>>>>>> connectors outside Flink. Whether that is due to difference > >> > > >>>>>>>>> in developer community, maturity of the connectors, their > >> > > >>>>>>>>> specialized/limited usage etc. I would like to see that as a > >> > > >>>>>>>>> sign > >> > > >>>>> of > >> > > >>>>>> a > >> > > >>>>>>>>> growing ecosystem and most of the ideas that Arvid has put > >> > > >>>>>>>>> forward would benefit further growth of the connector > >> > > >> ecosystem. > >> > > >>>>>>>>> As for keeping connectors within Apache Flink: I prefer that > >> > > >>>>>>>>> as > >> > > >>>>> the > >> > > >>>>>>>>> path forward for "essential" connectors like FileSource, > >> > > >>>>> KafkaSource, > >> > > >>>>>>>>> ... And we can still achieve a more flexible and faster > >> > > >>>>>>>>> release > >> > > >>>>>> cycle. > >> > > >>>>>>>>> Thanks, > >> > > >>>>>>>>> Thomas > >> > > >>>>>>>>> > >> > > >>>>>>>>> > >> > > >>>>>>>>> > >> > > >>>>>>>>> > >> > > >>>>>>>>> > >> > > >>>>>>>>> On Wed, Oct 20, 2021 at 3:32 AM Jark Wu <imj...@gmail.com> > >> > > >>> wrote: > >> > > >>>>>>>>>> Hi Konstantin, > >> > > >>>>>>>>>> > >> > > >>>>>>>>>>> the connectors need to be adopted and require at least > >> > > >>>>>>>>>>> one > >> > > >>>>>> release > >> > > >>>>>>>> per > >> > > >>>>>>>>>> Flink minor release. > >> > > >>>>>>>>>> However, this will make the releases of connectors slower, > >> > > >>> e.g. > >> > > >>>>>>>> maintain > >> > > >>>>>>>>>> features for multiple branches and release multiple > >> > > >> branches. > >> > > >>>>>>>>>> I think the main purpose of having an external connector > >> > > >>>>> repository > >> > > >>>>>>> is > >> > > >>>>>>>> in > >> > > >>>>>>>>>> order to have "faster releases of connectors"? > >> > > >>>>>>>>>> > >> > > >>>>>>>>>> > >> > > >>>>>>>>>> From the perspective of CDC connector maintainers, the > >> > > >>>>>>>>>> biggest > >> > > >>>>>>>> advantage > >> > > >>>>>>>>> of > >> > > >>>>>>>>>> maintaining it outside of the Flink project is that: > >> > > >>>>>>>>>> 1) we can have a more flexible and faster release cycle > >> > > >>>>>>>>>> 2) we can be more liberal with committership for connector > >> > > >>>>>>> maintainers > >> > > >>>>>>>>>> which can also attract more committers to help the release. > >> > > >>>>>>>>>> > >> > > >>>>>>>>>> Personally, I think maintaining one connector repository > >> > > >>>>>>>>>> under > >> > > >>>>> the > >> > > >>>>>>> ASF > >> > > >>>>>>>>> may > >> > > >>>>>>>>>> not have the above benefits. > >> > > >>>>>>>>>> > >> > > >>>>>>>>>> Best, > >> > > >>>>>>>>>> Jark > >> > > >>>>>>>>>> > >> > > >>>>>>>>>> On Wed, 20 Oct 2021 at 15:14, Konstantin Knauf < > >> > > >>>>> kna...@apache.org> > >> > > >>>>>>>>> wrote: > >> > > >>>>>>>>>>> Hi everyone, > >> > > >>>>>>>>>>> > >> > > >>>>>>>>>>> regarding the stability of the APIs. I think everyone > >> > > >>>>>>>>>>> agrees > >> > > >>>>> that > >> > > >>>>>>>>>>> connector APIs which are stable across minor versions > >> > > >>>>>> (1.13->1.14) > >> > > >>>>>>>> are > >> > > >>>>>>>>> the > >> > > >>>>>>>>>>> mid-term goal. But: > >> > > >>>>>>>>>>> > >> > > >>>>>>>>>>> a) These APIs are still quite young, and we shouldn't > >> > > >>>>>>>>>>> make > >> > > >>>>> them > >> > > >>>>>>>> @Public > >> > > >>>>>>>>>>> prematurely either. > >> > > >>>>>>>>>>> > >> > > >>>>>>>>>>> b) Isn't this *mostly* orthogonal to where the connector > >> > > >>>>>>>>>>> code > >> > > >>>>>>> lives? > >> > > >>>>>>>>> Yes, > >> > > >>>>>>>>>>> as long as there are breaking changes, the connectors > >> > > >>>>>>>>>>> need to > >> > > >>>>> be > >> > > >>>>>>>>> adopted > >> > > >>>>>>>>>>> and require at least one release per Flink minor release. > >> > > >>>>>>>>>>> Documentation-wise this can be addressed via a > >> > > >>>>>>>>>>> compatibility > >> > > >>>>>> matrix > >> > > >>>>>>>> for > >> > > >>>>>>>>>>> each connector as Arvid suggested. IMO we shouldn't block > >> > > >>>>>>>>>>> this > >> > > >>>>>>> effort > >> > > >>>>>>>>> on > >> > > >>>>>>>>>>> the stability of the APIs. > >> > > >>>>>>>>>>> > >> > > >>>>>>>>>>> Cheers, > >> > > >>>>>>>>>>> > >> > > >>>>>>>>>>> Konstantin > >> > > >>>>>>>>>>> > >> > > >>>>>>>>>>> > >> > > >>>>>>>>>>> > >> > > >>>>>>>>>>> On Wed, Oct 20, 2021 at 8:56 AM Jark Wu > >> > > >>>>>>>>>>> <imj...@gmail.com> > >> > > >>>>>> wrote: > >> > > >>>>>>>>>>>> Hi, > >> > > >>>>>>>>>>>> > >> > > >>>>>>>>>>>> I think Thomas raised very good questions and would like > >> > > >>>>>>>>>>>> to > >> > > >>>>> know > >> > > >>>>>>>> your > >> > > >>>>>>>>>>>> opinions if we want to move connectors out of flink in > >> > > >>>>>>>>>>>> this > >> > > >>>>>>> version. > >> > > >>>>>>>>>>>> (1) is the connector API already stable? > >> > > >>>>>>>>>>>>> Separate releases would only make sense if the core > >> > > >>>>>>>>>>>>> Flink > >> > > >>>>>>> surface > >> > > >>>>>>>> is > >> > > >>>>>>>>>>>>> fairly stable though. As evident from Iceberg (and > >> > > >>>>>>>>>>>>> also > >> > > >>>>> Beam), > >> > > >>>>>>>>> that's > >> > > >>>>>>>>>>>>> not the case currently. We should probably focus on > >> > > >>>>> addressing > >> > > >>>>>>> the > >> > > >>>>>>>>>>>>> stability first, before splitting code. A success > >> > > >>>>>>>>>>>>> criteria > >> > > >>>>>> could > >> > > >>>>>>>> be > >> > > >>>>>>>>>>>>> that we are able to build Iceberg and Beam against > >> > > >>>>>>>>>>>>> multiple > >> > > >>>>>>> Flink > >> > > >>>>>>>>>>>>> versions w/o the need to change code. The goal would > >> > > >>>>>>>>>>>>> be > >> > > >>>>> that > >> > > >>>>>> no > >> > > >>>>>>>>>>>>> connector breaks when we make changes to Flink core. > >> > > >>>>>>>>>>>>> Until > >> > > >>>>>>> that's > >> > > >>>>>>>>> the > >> > > >>>>>>>>>>>>> case, code separation creates a setup where 1+1 or N+1 > >> > > >>>>>>>> repositories > >> > > >>>>>>>>>>>>> need to move lock step. > >> > > >>>>>>>>>>>> From another discussion thread [1], connector API is far > >> > > >>>>>>>>>>>> from > >> > > >>>>>>>> stable. > >> > > >>>>>>>>>>>> Currently, it's hard to build connectors against > >> > > >>>>>>>>>>>> multiple > >> > > >>>>> Flink > >> > > >>>>>>>>> versions. > >> > > >>>>>>>>>>>> There are breaking API changes both in 1.12 -> 1.13 and > >> > > >>>>>>>>>>>> 1.13 > >> > > >>>>> -> > >> > > >>>>>>> 1.14 > >> > > >>>>>>>>> and > >> > > >>>>>>>>>>>> maybe also in the future versions, because Table > >> > > >>>>>>>>>>>> related > >> > > >>>>> APIs > >> > > >>>>>>> are > >> > > >>>>>>>>> still > >> > > >>>>>>>>>>>> @PublicEvolving and new Sink API is still @Experimental. > >> > > >>>>>>>>>>>> > >> > > >>>>>>>>>>>> > >> > > >>>>>>>>>>>> (2) Flink testability without connectors. > >> > > >>>>>>>>>>>>> Flink w/o Kafka connector (and few others) isn't > >> > > >>>>>>>>>>>>> viable. Testability of Flink was already brought up, > >> > > >>>>>>>>>>>>> can we > >> > > >>>>>>> really > >> > > >>>>>>>>>>>>> certify a Flink core release without Kafka connector? > >> > > >>>>>>>>>>>>> Maybe > >> > > >>>>>>> those > >> > > >>>>>>>>>>>>> connectors that are used in Flink e2e tests to > >> > > >>>>>>>>>>>>> validate > >> > > >>>>>>>>> functionality > >> > > >>>>>>>>>>>>> of core Flink should not be broken out? > >> > > >>>>>>>>>>>> This is a very good question. How can we guarantee the > >> > > >>>>>>>>>>>> new > >> > > >>>>>> Source > >> > > >>>>>>>> and > >> > > >>>>>>>>> Sink > >> > > >>>>>>>>>>>> API are stable with only test implementation? > >> > > >>>>>>>>>>>> > >> > > >>>>>>>>>>>> > >> > > >>>>>>>>>>>> Best, > >> > > >>>>>>>>>>>> Jark > >> > > >>>>>>>>>>>> > >> > > >>>>>>>>>>>> > >> > > >>>>>>>>>>>> > >> > > >>>>>>>>>>>> > >> > > >>>>>>>>>>>> > >> > > >>>>>>>>>>>> On Tue, 19 Oct 2021 at 23:56, Chesnay Schepler < > >> > > >>>>>>> ches...@apache.org> > >> > > >>>>>>>>>>>> wrote: > >> > > >>>>>>>>>>>> > >> > > >>>>>>>>>>>>> Could you clarify what release cadence you're thinking > >> > > >>> of? > >> > > >>>>>>> There's > >> > > >>>>>>>>> quite > >> > > >>>>>>>>>>>>> a big range that fits "more frequent than Flink" > >> > > >>>>> (per-commit, > >> > > >>>>>>>> daily, > >> > > >>>>>>>>>>>>> weekly, bi-weekly, monthly, even bi-monthly). > >> > > >>>>>>>>>>>>> > >> > > >>>>>>>>>>>>> On 19/10/2021 14:15, Martijn Visser wrote: > >> > > >>>>>>>>>>>>>> Hi all, > >> > > >>>>>>>>>>>>>> > >> > > >>>>>>>>>>>>>> I think it would be a huge benefit if we can achieve > >> > > >>>>>>>>>>>>>> more > >> > > >>>>>>>> frequent > >> > > >>>>>>>>>>>>> releases > >> > > >>>>>>>>>>>>>> of connectors, which are not bound to the release > >> > > >>>>>>>>>>>>>> cycle > >> > > >>>>> of > >> > > >>>>>>> Flink > >> > > >>>>>>>>>>>> itself. > >> > > >>>>>>>>>>>>> I > >> > > >>>>>>>>>>>>>> agree that in order to get there, we need to have > >> > > >>>>>>>>>>>>>> stable > >> > > >>>>>>>>> interfaces > >> > > >>>>>>>>>>>> which > >> > > >>>>>>>>>>>>>> are trustworthy and reliable, so they can be safely > >> > > >>>>>>>>>>>>>> used > >> > > >>>>> by > >> > > >>>>>>>> those > >> > > >>>>>>>>>>>>>> connectors. I do think that work still needs to be > >> > > >>>>>>>>>>>>>> done > >> > > >>>>> on > >> > > >>>>>>> those > >> > > >>>>>>>>>>>>>> interfaces, but I am confident that we can get there > >> > > >>>>> from a > >> > > >>>>>>>> Flink > >> > > >>>>>>>>>>>>>> perspective. > >> > > >>>>>>>>>>>>>> > >> > > >>>>>>>>>>>>>> I am worried that we would not be able to achieve > >> > > >>>>>>>>>>>>>> those > >> > > >>>>>>> frequent > >> > > >>>>>>>>>>>> releases > >> > > >>>>>>>>>>>>>> of connectors if we are putting these connectors > >> > > >>>>>>>>>>>>>> under > >> > > >>>>> the > >> > > >>>>>>>> Apache > >> > > >>>>>>>>>>>>> umbrella, > >> > > >>>>>>>>>>>>>> because that means that for each connector release > >> > > >>>>>>>>>>>>>> we > >> > > >>>>> have > >> > > >>>>>> to > >> > > >>>>>>>>> follow > >> > > >>>>>>>>>>>> the > >> > > >>>>>>>>>>>>>> Apache release creation process. This requires a lot > >> > > >>>>>>>>>>>>>> of > >> > > >>>>>> manual > >> > > >>>>>>>>> steps > >> > > >>>>>>>>>>>> and > >> > > >>>>>>>>>>>>>> prohibits automation and I think it would be hard to > >> > > >>>>> scale > >> > > >>>>>> out > >> > > >>>>>>>>>>>> frequent > >> > > >>>>>>>>>>>>>> releases of connectors. I'm curious how others think > >> > > >>>>>>>>>>>>>> this > >> > > >>>>>>>>> challenge > >> > > >>>>>>>>>>>> could > >> > > >>>>>>>>>>>>>> be solved. > >> > > >>>>>>>>>>>>>> > >> > > >>>>>>>>>>>>>> Best regards, > >> > > >>>>>>>>>>>>>> > >> > > >>>>>>>>>>>>>> Martijn > >> > > >>>>>>>>>>>>>> > >> > > >>>>>>>>>>>>>> On Mon, 18 Oct 2021 at 22:22, Thomas Weise < > >> > > >>>>> t...@apache.org> > >> > > >>>>>>>>> wrote: > >> > > >>>>>>>>>>>>>>> Thanks for initiating this discussion. > >> > > >>>>>>>>>>>>>>> > >> > > >>>>>>>>>>>>>>> There are definitely a few things that are not > >> > > >>>>>>>>>>>>>>> optimal > >> > > >>>>> with > >> > > >>>>>>> our > >> > > >>>>>>>>>>>>>>> current management of connectors. I would not > >> > > >>>>> necessarily > >> > > >>>>>>>>>>>> characterize > >> > > >>>>>>>>>>>>>>> it as a "mess" though. As the points raised so far > >> > > >>>>> show, it > >> > > >>>>>>>> isn't > >> > > >>>>>>>>>>>> easy > >> > > >>>>>>>>>>>>>>> to find a solution that balances competing > >> > > >>>>>>>>>>>>>>> requirements > >> > > >>>>> and > >> > > >>>>>>>>> leads to > >> > > >>>>>>>>>>>> a > >> > > >>>>>>>>>>>>>>> net improvement. > >> > > >>>>>>>>>>>>>>> > >> > > >>>>>>>>>>>>>>> It would be great if we can find a setup that > >> > > >>>>>>>>>>>>>>> allows for > >> > > >>>>>>>>> connectors > >> > > >>>>>>>>>>>> to > >> > > >>>>>>>>>>>>>>> be released independently of core Flink and that > >> > > >>>>>>>>>>>>>>> each > >> > > >>>>>>> connector > >> > > >>>>>>>>> can > >> > > >>>>>>>>>>>> be > >> > > >>>>>>>>>>>>>>> released separately. Flink already has separate > >> > > >>>>>>>>>>>>>>> releases (flink-shaded), so that by itself isn't a > >> > > >>> new thing. > >> > > >>>>>>>>> Per-connector > >> > > >>>>>>>>>>>>>>> releases would need to allow for more frequent > >> > > >>>>>>>>>>>>>>> releases > >> > > >>>>>>>> (without > >> > > >>>>>>>>> the > >> > > >>>>>>>>>>>>>>> baggage that a full Flink release comes with). > >> > > >>>>>>>>>>>>>>> > >> > > >>>>>>>>>>>>>>> Separate releases would only make sense if the core > >> > > >>>>> Flink > >> > > >>>>>>>>> surface is > >> > > >>>>>>>>>>>>>>> fairly stable though. As evident from Iceberg (and > >> > > >>>>>>>>>>>>>>> also > >> > > >>>>>>> Beam), > >> > > >>>>>>>>> that's > >> > > >>>>>>>>>>>>>>> not the case currently. We should probably focus on > >> > > >>>>>>> addressing > >> > > >>>>>>>>> the > >> > > >>>>>>>>>>>>>>> stability first, before splitting code. A success > >> > > >>>>> criteria > >> > > >>>>>>>> could > >> > > >>>>>>>>> be > >> > > >>>>>>>>>>>>>>> that we are able to build Iceberg and Beam against > >> > > >>>>> multiple > >> > > >>>>>>>> Flink > >> > > >>>>>>>>>>>>>>> versions w/o the need to change code. The goal > >> > > >>>>>>>>>>>>>>> would be > >> > > >>>>>> that > >> > > >>>>>>> no > >> > > >>>>>>>>>>>>>>> connector breaks when we make changes to Flink core. > >> > > >>>>> Until > >> > > >>>>>>>>> that's the > >> > > >>>>>>>>>>>>>>> case, code separation creates a setup where 1+1 or > >> > > >>>>>>>>>>>>>>> N+1 > >> > > >>>>>>>>> repositories > >> > > >>>>>>>>>>>>>>> need to move lock step. > >> > > >>>>>>>>>>>>>>> > >> > > >>>>>>>>>>>>>>> Regarding some connectors being more important for > >> > > >>>>>>>>>>>>>>> Flink > >> > > >>>>>> than > >> > > >>>>>>>>> others: > >> > > >>>>>>>>>>>>>>> That's a fact. Flink w/o Kafka connector (and few > >> > > >>>>> others) > >> > > >>>>>>> isn't > >> > > >>>>>>>>>>>>>>> viable. Testability of Flink was already brought > >> > > >>>>>>>>>>>>>>> up, > >> > > >>>>> can we > >> > > >>>>>>>>> really > >> > > >>>>>>>>>>>>>>> certify a Flink core release without Kafka > >> > > >> connector? > >> > > >>>>> Maybe > >> > > >>>>>>>> those > >> > > >>>>>>>>>>>>>>> connectors that are used in Flink e2e tests to > >> > > >>>>>>>>>>>>>>> validate > >> > > >>>>>>>>> functionality > >> > > >>>>>>>>>>>>>>> of core Flink should not be broken out? > >> > > >>>>>>>>>>>>>>> > >> > > >>>>>>>>>>>>>>> Finally, I think that the connectors that move into > >> > > >>>>>> separate > >> > > >>>>>>>>> repos > >> > > >>>>>>>>>>>>>>> should remain part of the Apache Flink project. > >> > > >>>>>>>>>>>>>>> Larger > >> > > >>>>>>>>> organizations > >> > > >>>>>>>>>>>>>>> tend to approve the use of and contribution to open > >> > > >>>>> source > >> > > >>>>>> at > >> > > >>>>>>>> the > >> > > >>>>>>>>>>>>>>> project level. Sometimes it is everything ASF. More > >> > > >>>>> often > >> > > >>>>>> it > >> > > >>>>>>> is > >> > > >>>>>>>>>>>>>>> "Apache Foo". It would be fatal to end up with a > >> > > >>>>> patchwork > >> > > >>>>>> of > >> > > >>>>>>>>>>>> projects > >> > > >>>>>>>>>>>>>>> with potentially different licenses and governance > >> > > >>>>>>>>>>>>>>> to > >> > > >>>>>> arrive > >> > > >>>>>>>> at a > >> > > >>>>>>>>>>>>>>> working Flink setup. This may mean we prioritize > >> > > >>>>> usability > >> > > >>>>>>> over > >> > > >>>>>>>>>>>>>>> developer convenience, if that's in the best > >> > > >>>>>>>>>>>>>>> interest of > >> > > >>>>>>> Flink > >> > > >>>>>>>>> as a > >> > > >>>>>>>>>>>>>>> whole. > >> > > >>>>>>>>>>>>>>> > >> > > >>>>>>>>>>>>>>> Thanks, > >> > > >>>>>>>>>>>>>>> Thomas > >> > > >>>>>>>>>>>>>>> > >> > > >>>>>>>>>>>>>>> > >> > > >>>>>>>>>>>>>>> > >> > > >>>>>>>>>>>>>>> On Mon, Oct 18, 2021 at 6:59 AM Chesnay Schepler < > >> > > >>>>>>>>> ches...@apache.org > >> > > >>>>>>>>>>>>>>> wrote: > >> > > >>>>>>>>>>>>>>>> Generally, the issues are reproducibility and > >> > > >>> control. > >> > > >>>>>>>>>>>>>>>> Stuffs completely broken on the Flink side for a > >> > > >>> week? > >> > > >>>>>> Well > >> > > >>>>>>>>> then so > >> > > >>>>>>>>>>>> are > >> > > >>>>>>>>>>>>>>>> the connector repos. > >> > > >>>>>>>>>>>>>>>> (As-is) You can't go back to a previous version of > >> > > >>>>>>>>>>>>>>>> the > >> > > >>>>>>>> snapshot. > >> > > >>>>>>>>>>>> Which > >> > > >>>>>>>>>>>>>>>> also means that checking out older commits can be > >> > > >>>>>>> problematic > >> > > >>>>>>>>>>>> because > >> > > >>>>>>>>>>>>>>>> you'd still work against the latest snapshots, and > >> > > >>>>>>>>>>>>>>>> they > >> > > >>>>>> not > >> > > >>>>>>> be > >> > > >>>>>>>>>>>>>>>> compatible with each other. > >> > > >>>>>>>>>>>>>>>> > >> > > >>>>>>>>>>>>>>>> > >> > > >>>>>>>>>>>>>>>> On 18/10/2021 15:22, Arvid Heise wrote: > >> > > >>>>>>>>>>>>>>>>> I was actually betting on snapshots versions. > >> > > >>>>>>>>>>>>>>>>> What are > >> > > >>>>>> the > >> > > >>>>>>>>> limits? > >> > > >>>>>>>>>>>>>>>>> Obviously, we can only do a release of a 1.15 > >> > > >>>>> connector > >> > > >>>>>>> after > >> > > >>>>>>>>> 1.15 > >> > > >>>>>>>>>>>> is > >> > > >>>>>>>>>>>>>>>>> release. > >> > > >>>>>>>>>>>>> > >> > > >>>>>>>>>>> > >> > > >>>>>>>>>>> -- > >> > > >>>>>>>>>>> > >> > > >>>>>>>>>>> Konstantin Knauf > >> > > >>>>>>>>>>> > >> > > >>>>>>>>>>> https://urldefense.com/v3/__https://twitter.com/snntrable > >> > > >>>>>>>>>>> __;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_- > >> > > >>>>>>>>>>> XjpYgX5MUy9M4$ [twitter[.]com] > >> > > >>>>>>>>>>> > >> > > >>>>>>>>>>> https://urldefense.com/v3/__https://github.com/knaufk__;! > >> > > >>>>>>>>>>> !LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-XjpY > >> > > >>>>>>>>>>> gXyX8u50S$ [github[.]com] > >> > > >>>>>>>>>>> > >> > > > >> > > > >> > > >> > >