Sorry if I was a bit unclear. +1 for the single repo per connector approach.
Cheers, Till On Thu, Dec 9, 2021 at 5:41 PM Till Rohrmann <trohrm...@apache.org> wrote: > +1 for the single repo approach. > > Cheers, > Till > > On Thu, Dec 9, 2021 at 3:54 PM Martijn Visser <mart...@ververica.com> > wrote: > >> I also agree that it feels more natural to go with a repo for each >> individual connector. Each repository can be made available at >> flink-packages.org so users can find them, next to referring to them in >> documentation. +1 from my side. >> >> On Thu, 9 Dec 2021 at 15:38, Arvid Heise <ar...@apache.org> wrote: >> >> > Hi all, >> > >> > We tried out Chesnay's proposal and went with Option 2. Unfortunately, >> we >> > experienced tough nuts to crack and feel like we hit a dead end: >> > - The main pain point with the outlined Frankensteinian connector repo >> is >> > how to handle shared code / infra code. If we have it in some <common> >> > branch, then we need to merge the common branch in the connector branch >> on >> > update. However, it's unclear to me how improvements in the common >> branch >> > that naturally appear while working on a specific connector go back into >> > the common branch. You can't use a pull request from your branch or else >> > your connector code would poison the connector-less common branch. So >> you >> > would probably manually copy the files over to a common branch and >> create a >> > PR branch for that. >> > - A weird solution could be to have the common branch as a submodule in >> the >> > repo itself (if that's even possible). I'm sure that this setup would >> blow >> > up the minds of all newcomers. >> > - Similarly, it's mandatory to have safeguards against code from >> connector >> > A poisoning connector B, common, or main. I had some similar setup in >> the >> > past and code from two "distinct" branch types constantly swept over. >> > - We could also say that we simply release <common> independently and >> just >> > have a maven (SNAPSHOT) dependency on it. But that would create a weird >> > flow if you need to change in common where you need to constantly switch >> > branches back and forth. >> > - In general, Frankensteinian's approach is very switch intensive. If >> you >> > maintain 3 connectors and need to fix 1 build stability each at the same >> > time (quite common nowadays for some reason) and you have 2 review >> rounds, >> > you need to switch branches 9 times ignoring changes to common. >> > >> > Additionally, we still have the rather user/dev unfriendly main that is >> > mostly empty. I'm also not sure we can generate an overview README.md to >> > make it more friendly here because in theory every connector branch >> should >> > be based on main and we would get merge conflicts. >> > >> > I'd like to propose once again to go with individual repositories. >> > - The only downside that we discussed so far is that we have more >> initial >> > setup to do. Since we organically grow the number of >> connector/repositories >> > that load is quite distributed. We can offer templates after finding a >> good >> > approach that can even be used by outside organizations. >> > - Regarding secrets, I think it's actually an advantage that the Kafka >> > connector has no access to the AWS secrets. If there are secrets to be >> > shared across connectors, we can and should use Azure's Variable Groups >> (I >> > have used it in the past to share Nexus creds across repos). That would >> > also make rotation easy. >> > - Working on different connectors would be rather easy as all modern IDE >> > support multiple repo setups in the same project. You still need to do >> > multiple releases in case you update common code (either accessed >> through >> > Nexus or git submodule) and you want to release your connector. >> > - There is no difference in respect to how many CI runs there in both >> > approaches. >> > - Individual repositories also have the advantage of allowing external >> > incubation. Let's assume someone builds connector A and hosts it in >> their >> > organization (very common setup). If they want to contribute the code to >> > Flink, we could simply transfer the repository into ASF after ensuring >> > Flink coding standards. Then we retain git history and Github issues. >> > >> > Is there any point that I'm missing? >> > >> > On Fri, Nov 26, 2021 at 1:32 PM Chesnay Schepler <ches...@apache.org> >> > wrote: >> > >> > > For sharing workflows we should be able to use composite actions. We'd >> > > have the main definition files in the flink-connectors repo, that we >> > > also need to tag/release, which other branches/repos can then import. >> > > These are also versioned, so we don't have to worry about accidentally >> > > breaking stuff. >> > > These could also be used to enforce certain standards / interfaces >> such >> > > that we can automate more things (e.g., integration into the Flink >> > > documentation). >> > > >> > > It is true that Option 2) and dedicated repositories share a lot of >> > > properties. While I did say in an offline conversation that we in that >> > > case might just as well use separate repositories, I'm not so sure >> > > anymore. One repo would make administration a bit easier, for example >> > > secrets wouldn't have to be applied to each repo (we wouldn't want >> > > certain secrets to be set up organization-wide). >> > > I overall also like that one repo would present a single access point; >> > > you can't "miss" a connector repo, and I would hope that having it as >> > > one repo would nurture more collaboration between the connectors, >> which >> > > after all need to solve similar problems. >> > > >> > > It is a fair point that the branching model would be quite weird, but >> I >> > > think that would subside pretty quickly. >> > > >> > > Personally I'd go with Option 2, and if that doesn't work out we can >> > > still split the repo later on. (Which should then be a trivial matter >> of >> > > copying all <connector>/* branches and renaming them). >> > > >> > > On 26/11/2021 12:47, Till Rohrmann wrote: >> > > > Hi Arvid, >> > > > >> > > > Thanks for updating this thread with the latest findings. The >> described >> > > > limitations for a single connector repo sound suboptimal to me. >> > > > >> > > > * Option 2. sounds as if we try to simulate multi connector repos >> > inside >> > > of >> > > > a single repo. I also don't know how we would share code between the >> > > > different branches (sharing infrastructure would probably be easier >> > > > though). This seems to have the same limitations as dedicated repos >> > with >> > > > the downside of having a not very intuitive branching model. >> > > > * Isn't option 1. kind of a degenerated version of option 2. where >> we >> > > have >> > > > some unrelated code from other connectors in the individual >> connector >> > > > branches? >> > > > * Option 3. has the downside that someone creating a release has to >> > > release >> > > > all connectors. This means that she either has to sync with the >> > different >> > > > connector maintainers or has to be able to release all connectors on >> > her >> > > > own. We are already seeing in the Flink community that releases >> require >> > > > quite good communication/coordination between the different people >> > > working >> > > > on different Flink components. Given our goals to make connector >> > releases >> > > > easier and more frequent, I think that coupling different connector >> > > > releases might be counter-productive. >> > > > >> > > > To me it sounds not very practical to mainly use a mono repository >> w/o >> > > > having some more advanced build infrastructure that, for example, >> > allows >> > > to >> > > > have different git roots in different connector directories. Maybe >> the >> > > mono >> > > > repo can be a catch all repository for connectors that want to be >> > > released >> > > > in lock-step (Option 3.) with all other connectors the repo >> contains. >> > But >> > > > for connectors that get changed frequently, having a dedicated >> > repository >> > > > that allows independent releases sounds preferable to me. >> > > > >> > > > What utilities and infrastructure code do you intend to share? Using >> > git >> > > > submodules can definitely be one option to share code. However, it >> > might >> > > > also be ok to depend on flink-connector-common artifacts which could >> > make >> > > > things easier. Where I am unsure is whether git submodules can be >> used >> > to >> > > > share infrastructure code (e.g. the .github/workflows) because you >> need >> > > > these files in the repo to trigger the CI infrastructure. >> > > > >> > > > Cheers, >> > > > Till >> > > > >> > > > On Thu, Nov 25, 2021 at 1:59 PM Arvid Heise <ar...@apache.org> >> wrote: >> > > > >> > > >> Hi Brian, >> > > >> >> > > >> Thank you for sharing. I think your approach is very valid and is >> in >> > > line >> > > >> with what I had in mind. >> > > >> >> > > >> Basically Pravega community aligns the connector releases with the >> > > Pravega >> > > >>> mainline release >> > > >>> >> > > >> This certainly would mean that there is little value in coupling >> > > connector >> > > >> versions. So it's making a good case for having separate connector >> > > repos. >> > > >> >> > > >> >> > > >>> and maintains the connector with the latest 3 Flink versions(CI >> will >> > > >>> publish snapshots for all these 3 branches) >> > > >>> >> > > >> I'd like to give connector devs a simple way to express to which >> Flink >> > > >> versions the current branch is compatible. From there we can >> generate >> > > the >> > > >> compatibility matrix automatically and optionally also create >> > different >> > > >> releases per supported Flink version. Not sure if the latter is >> indeed >> > > >> better than having just one artifact that happens to run with >> multiple >> > > >> Flink versions. I guess it depends on what dependencies we are >> > > exposing. If >> > > >> the connector uses flink-connector-base, then we probably need >> > separate >> > > >> artifacts with poms anyways. >> > > >> >> > > >> Best, >> > > >> >> > > >> Arvid >> > > >> >> > > >> On Fri, Nov 19, 2021 at 10:55 AM Zhou, Brian <b.z...@dell.com> >> wrote: >> > > >> >> > > >>> Hi Arvid, >> > > >>> >> > > >>> For branching model, the Pravega Flink connector has some >> experience >> > > what >> > > >>> I would like to share. Here[1][2] is the compatibility matrix and >> > wiki >> > > >>> explaining the branching model and releases. Basically Pravega >> > > community >> > > >>> aligns the connector releases with the Pravega mainline release, >> and >> > > >>> maintains the connector with the latest 3 Flink versions(CI will >> > > publish >> > > >>> snapshots for all these 3 branches). >> > > >>> For example, recently we have 0.10.1 release[3], and in maven >> central >> > > we >> > > >>> need to upload three artifacts(For Flink 1.13, 1.12, 1.11) for >> 0.10.1 >> > > >>> version[4]. >> > > >>> >> > > >>> There are some alternatives. Another solution that we once >> discussed >> > > but >> > > >>> finally got abandoned is to have a independent version just like >> the >> > > >>> current CDC connector, and then give a big compatibility matrix to >> > > users. >> > > >>> We think it would be too confusing when the connector develops. On >> > the >> > > >>> contrary, we can also do the opposite way to align with Flink >> version >> > > and >> > > >>> maintain several branches for different system version. >> > > >>> >> > > >>> I would say this is only a fairly-OK solution because it is a bit >> > > painful >> > > >>> for maintainers as cherry-picks are very common and releases would >> > > >> require >> > > >>> much work. However, if neither systems do not have a nice backward >> > > >>> compatibility, there seems to be no comfortable solution to the >> their >> > > >>> connector. >> > > >>> >> > > >>> [1] >> https://github.com/pravega/flink-connectors#compatibility-matrix >> > > >>> [2] >> > > >>> >> > > >> >> > > >> > >> https://github.com/pravega/flink-connectors/wiki/Versioning-strategy-for-Flink-connector >> > > >>> [3] >> https://github.com/pravega/flink-connectors/releases/tag/v0.10.1 >> > > >>> [4] https://search.maven.org/search?q=pravega-connectors-flink >> > > >>> >> > > >>> Best Regards, >> > > >>> Brian >> > > >>> >> > > >>> >> > > >>> Internal Use - Confidential >> > > >>> >> > > >>> -----Original Message----- >> > > >>> From: Arvid Heise <ar...@apache.org> >> > > >>> Sent: Friday, November 19, 2021 4:12 PM >> > > >>> To: dev >> > > >>> Subject: Re: [DISCUSS] Creating an external connector repository >> > > >>> >> > > >>> >> > > >>> [EXTERNAL EMAIL] >> > > >>> >> > > >>> Hi everyone, >> > > >>> >> > > >>> we are currently in the process of setting up the flink-connectors >> > repo >> > > >>> [1] for new connectors but we hit a wall that we currently cannot >> > take: >> > > >>> branching model. >> > > >>> To reiterate the original motivation of the external connector >> repo: >> > We >> > > >>> want to decouple the release cycle of a connector with Flink. >> > However, >> > > if >> > > >>> we want to support semantic versioning in the connectors with the >> > > ability >> > > >>> to introduce breaking changes through major version bumps and >> support >> > > >>> bugfixes on old versions, then we need release branches similar to >> > how >> > > >>> Flink core operates. >> > > >>> Consider two connectors, let's call them kafka and hbase. We have >> > kafka >> > > >> in >> > > >>> version 1.0.X, 1.1.Y (small improvement), 2.0.Z (config option) >> > change >> > > >> and >> > > >>> hbase only on 1.0.A. >> > > >>> >> > > >>> Now our current assumption was that we can work with a mono-repo >> > under >> > > >> ASF >> > > >>> (flink-connectors). Then, for release-branches, we found 3 >> options: >> > > >>> 1. We would need to create some ugly mess with the cross product >> of >> > > >>> connector and version: so you have kafka-release-1.0, >> > > kafka-release-1.1, >> > > >>> kafka-release-2.0, hbase-release-1.0. The main issue is not the >> > amount >> > > of >> > > >>> branches (that's something that git can handle) but there the >> state >> > of >> > > >>> kafka is undefined in hbase-release-1.0. That's a call for >> desaster >> > and >> > > >>> makes releasing connectors very cumbersome (CI would only execute >> and >> > > >>> publish hbase SNAPSHOTS on hbase-release-1.0). >> > > >>> 2. We could avoid the undefined state by having an empty master >> and >> > > each >> > > >>> release branch really only holds the code of the connector. But >> > that's >> > > >> also >> > > >>> not great: any user that looks at the repo and sees no connector >> > would >> > > >>> assume that it's dead. >> > > >>> 3. We could have synced releases similar to the CDC connectors >> [2]. >> > > That >> > > >>> means that if any connector introduces a breaking change, all >> > > connectors >> > > >>> get a new major. I find that quite confusing to a user if hbase >> gets >> > a >> > > >> new >> > > >>> release without any change because kafka introduced a breaking >> > change. >> > > >>> >> > > >>> To fully decouple release cycles and CI of connectors, we could >> add >> > > >>> individual repositories under ASF (flink-connector-kafka, >> > > >>> flink-connector-hbase). Then we can apply the same branching >> model as >> > > >>> before. I quickly checked if there are precedences in the apache >> > > >> community >> > > >>> for that approach and just by scanning alphabetically I found >> cordova >> > > >> with >> > > >>> 70 and couchdb with 77 apache repos respectively. So it certainly >> > seems >> > > >>> like other projects approached our problem in that way and the >> apache >> > > >>> organization is okay with that. I currently expect max 20 >> additional >> > > >> repos >> > > >>> for connectors and in the future 10 max each for formats and >> > > filesystems >> > > >> if >> > > >>> we would also move them out at some point in time. So we would be >> at >> > a >> > > >>> total of 50 repos. >> > > >>> >> > > >>> Note for all options, we need to provide a compability matrix >> that we >> > > aim >> > > >>> to autogenerate. >> > > >>> >> > > >>> Now for the potential downsides that we internally discussed: >> > > >>> - How can we ensure common infra structure code, utilties, and >> > quality? >> > > >>> I propose to add a flink-connector-common that contains all these >> > > things >> > > >>> and is added as a git submodule/subtree to the repos. >> > > >>> - Do we implicitly discourage connector developers to maintain >> more >> > > than >> > > >>> one connector with a fragmented code base? >> > > >>> That is certainly a risk. However, I currently also see few devs >> > > working >> > > >>> on more than one connector. However, it may actually help keeping >> the >> > > >> devs >> > > >>> that maintain a specific connector on the hook. We could use >> github >> > > >> issues >> > > >>> to track bugs and feature requests and a dev can focus his limited >> > time >> > > >> on >> > > >>> getting that one connector right. >> > > >>> >> > > >>> So WDYT? Compared to some intermediate suggestions with split >> repos, >> > > the >> > > >>> big difference is that everything remains under Apache umbrella >> and >> > the >> > > >>> Flink community. >> > > >>> >> > > >>> [1] >> > > >>> >> > > >> >> > > >> > >> https://urldefense.com/v3/__https://github.com/apache/flink-connectors__;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-XjpYgXzxxweh4$ >> > > >>> [github[.]com] [2] >> > > >>> >> > > >> >> > > >> > >> https://urldefense.com/v3/__https://github.com/ververica/flink-cdc-connectors/__;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-XjpYgXzgoPGA8$ >> > > >>> [github[.]com] >> > > >>> >> > > >>> On Fri, Nov 12, 2021 at 3:39 PM Arvid Heise <ar...@apache.org> >> > wrote: >> > > >>> >> > > >>>> Hi everyone, >> > > >>>> >> > > >>>> I created the flink-connectors repo [1] to advance the topic. We >> > would >> > > >>>> create a proof-of-concept in the next few weeks as a special >> branch >> > > >>>> that I'd then use for discussions. If the community agrees with >> the >> > > >>>> approach, that special branch will become the master. If not, we >> can >> > > >>>> reiterate over it or create competing POCs. >> > > >>>> >> > > >>>> If someone wants to try things out in parallel, just make sure >> that >> > > >>>> you are not accidentally pushing POCs to the master. >> > > >>>> >> > > >>>> As a reminder: We will not move out any current connector from >> Flink >> > > >>>> at this point in time, so everything in Flink will remain as is >> and >> > be >> > > >>>> maintained there. >> > > >>>> >> > > >>>> Best, >> > > >>>> >> > > >>>> Arvid >> > > >>>> >> > > >>>> [1] >> > > >>>> >> > > >> https://urldefense.com/v3/__https://github.com/apache/flink-connectors >> > > >>>> >> > __;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-XjpYgXzxxweh4 >> > > >>>> $ [github[.]com] >> > > >>>> >> > > >>>> On Fri, Oct 29, 2021 at 6:57 PM Till Rohrmann < >> trohrm...@apache.org >> > > >> > > >>>> wrote: >> > > >>>> >> > > >>>>> Hi everyone, >> > > >>>>> >> > > >>>>> From the discussion, it seems to me that we have different >> > opinions >> > > >>>>> whether to have an ASF umbrella repository or to host them >> outside >> > of >> > > >>>>> the ASF. It also seems that this is not really the problem to >> > solve. >> > > >>>>> Since there are many good arguments for either approach, we >> could >> > > >>>>> simply start with an ASF umbrella repository and see how people >> > adopt >> > > >>>>> it. If the individual connectors cannot move fast enough or if >> > people >> > > >>>>> prefer to not buy into the more heavy-weight ASF processes, then >> > they >> > > >>>>> can host the code also somewhere else. We simply need to make >> sure >> > > >>>>> that these connectors are discoverable (e.g. via >> flink-packages). >> > > >>>>> >> > > >>>>> The more important problem seems to be to provide common tooling >> > > >>>>> (testing, infrastructure, documentation) that can easily be >> reused. >> > > >>>>> Similarly, it has become clear that the Flink community needs to >> > > >>>>> improve on providing stable APIs. I think it is not realistic to >> > > >>>>> first complete these tasks before starting to move connectors to >> > > >>>>> dedicated repositories. As Stephan said, creating a connector >> > > >>>>> repository will force us to pay more attention to API stability >> and >> > > >>>>> also to think about which testing tools are required. Hence, I >> > > >>>>> believe that starting to add connectors to a different >> repository >> > > >>>>> than apache/flink will help improve our connector tooling >> > (declaring >> > > >>>>> testing classes as public, creating a common test utility repo, >> > > >>>>> creating a repo >> > > >>>>> template) and vice versa. Hence, I like Arvid's proposed >> process as >> > > >>>>> it will start kicking things off w/o letting this effort fizzle >> > out. >> > > >>>>> >> > > >>>>> Cheers, >> > > >>>>> Till >> > > >>>>> >> > > >>>>> On Thu, Oct 28, 2021 at 11:44 AM Stephan Ewen <se...@apache.org >> > >> > > >> wrote: >> > > >>>>>> Thank you all, for the nice discussion! >> > > >>>>>> >> > > >>>>>> From my point of view, I very much like the idea of putting >> > > >>>>>> connectors >> > > >>>>> in a >> > > >>>>>> separate repository. But I would argue it should be part of >> Apache >> > > >>>>> Flink, >> > > >>>>>> similar to flink-statefun, flink-ml, etc. >> > > >>>>>> >> > > >>>>>> I share many of the reasons for that: >> > > >>>>>> - As argued many times, reduces complexity of the Flink >> repo, >> > > >>>>> increases >> > > >>>>>> response times of CI, etc. >> > > >>>>>> - Much lower barrier of contribution, because an unstable >> > > >>>>>> connector >> > > >>>>> would >> > > >>>>>> not de-stabilize the whole build. Of course, we would need to >> make >> > > >>>>>> sure >> > > >>>>> we >> > > >>>>>> set this up the right way, with connectors having individual CI >> > > >>>>>> runs, >> > > >>>>> build >> > > >>>>>> status, etc. But it certainly seems possible. >> > > >>>>>> >> > > >>>>>> >> > > >>>>>> I would argue some points a bit different than some cases made >> > > >> before: >> > > >>>>>> (a) I believe the separation would increase connector >> stability. >> > > >>>>> Because it >> > > >>>>>> really forces us to work with the connectors against the APIs >> like >> > > >>>>>> any external developer. A mono repo is somehow the wrong thing >> if >> > > >>>>>> you in practice want to actually guarantee stable internal >> APIs at >> > > >>> some layer. >> > > >>>>>> Because the mono repo makes it easy to just change something on >> > > >>>>>> both >> > > >>>>> sides >> > > >>>>>> of the API (provider and consumer) seamlessly. >> > > >>>>>> >> > > >>>>>> Major refactorings in Flink need to keep all connector API >> > > >>>>>> contracts intact, or we need to have a new version of the >> > connector >> > > >>> API. >> > > >>>>>> (b) We may even be able to go towards more lightweight and >> > > >>>>>> automated releases over time, even if we stay in Apache Flink >> with >> > > >>> that repo. >> > > >>>>>> This isn't yet fully aligned with the Apache release policies, >> > yet, >> > > >>>>>> but there are board discussions about whether there can be >> > > >>>>>> bot-triggered releases (by dependabot) and how that could fit >> into >> > > >>> the Apache process. >> > > >>>>>> This doesn't seem to be quite there just yet, but seeing that >> > those >> > > >>>>> start >> > > >>>>>> is a good sign, and there is a good chance we can do some >> things >> > > >>> there. >> > > >>>>>> I am not sure whether we should let bots trigger releases, >> because >> > > >>>>>> a >> > > >>>>> final >> > > >>>>>> human look at things isn't a bad thing, especially given the >> > > >>>>>> popularity >> > > >>>>> of >> > > >>>>>> software supply chain attacks recently. >> > > >>>>>> >> > > >>>>>> >> > > >>>>>> I do share Chesnay's concerns about complexity in tooling, >> though. >> > > >>>>>> Both release tooling and test tooling. They are not >> incompatible >> > > >>>>>> with that approach, but they are a task we need to tackle >> during >> > > >>>>>> this change which will add additional work. >> > > >>>>>> >> > > >>>>>> >> > > >>>>>> >> > > >>>>>> On Tue, Oct 26, 2021 at 10:31 AM Arvid Heise <ar...@apache.org >> > >> > > >>> wrote: >> > > >>>>>>> Hi folks, >> > > >>>>>>> >> > > >>>>>>> I think some questions came up and I'd like to address the >> > > >>>>>>> question of >> > > >>>>>> the >> > > >>>>>>> timing. >> > > >>>>>>> >> > > >>>>>>> Could you clarify what release cadence you're thinking of? >> > > >>>>>>> There's >> > > >>>>> quite >> > > >>>>>>>> a big range that fits "more frequent than Flink" (per-commit, >> > > >>>>>>>> daily, weekly, bi-weekly, monthly, even bi-monthly). >> > > >>>>>>> The short answer is: as often as needed: >> > > >>>>>>> - If there is a CVE in a dependency and we need to bump it - >> > > >>>>>>> release immediately. >> > > >>>>>>> - If there is a new feature merged, release soonish. We may >> > > >>>>>>> collect a >> > > >>>>> few >> > > >>>>>>> successive features before a release. >> > > >>>>>>> - If there is a bugfix, release immediately or soonish >> depending >> > > >>>>>>> on >> > > >>>>> the >> > > >>>>>>> severity and if there are workarounds available. >> > > >>>>>>> >> > > >>>>>>> We should not limit ourselves; the whole idea of independent >> > > >>>>>>> releases >> > > >>>>> is >> > > >>>>>>> exactly that you release as needed. There is no release >> planning >> > > >>>>>>> or anything needed, you just go with a release as if it was an >> > > >>>>>>> external artifact. >> > > >>>>>>> >> > > >>>>>>> (1) is the connector API already stable? >> > > >>>>>>>> From another discussion thread [1], connector API is far >> from >> > > >>>>> stable. >> > > >>>>>>>> Currently, it's hard to build connectors against multiple >> Flink >> > > >>>>>> versions. >> > > >>>>>>>> There are breaking API changes both in 1.12 -> 1.13 and 1.13 >> -> >> > > >>>>>>>> 1.14 >> > > >>>>>> and >> > > >>>>>>>> maybe also in the future versions, because Table related >> APIs >> > > >>>>>>>> are >> > > >>>>>> still >> > > >>>>>>>> @PublicEvolving and new Sink API is still @Experimental. >> > > >>>>>>>> >> > > >>>>>>> The question is: what is stable in an evolving system? We >> > > >>>>>>> recently discovered that the old SourceFunction needed to be >> > > >>>>>>> refined such that cancellation works correctly [1]. So that >> > > >>>>>>> interface is in Flink since >> > > >>>>> 7 >> > > >>>>>>> years, heavily used also outside, and we still had to change >> the >> > > >>>>> contract >> > > >>>>>>> in a way that I'd expect any implementer to recheck their >> > > >>>>> implementation. >> > > >>>>>>> It might not be necessary to change anything and you can >> probably >> > > >>>>> change >> > > >>>>>>> the the code for all Flink versions but still, the interface >> was >> > > >>>>>>> not >> > > >>>>>> stable >> > > >>>>>>> in the closest sense. >> > > >>>>>>> >> > > >>>>>>> If we focus just on API changes on the unified interfaces, >> then >> > > >>>>>>> we >> > > >>>>> expect >> > > >>>>>>> one more change to Sink API to support compaction. For Table >> API, >> > > >>>>> there >> > > >>>>>>> will most likely also be some changes in 1.15. So we could >> wait >> > > >>>>>>> for >> > > >>>>> 1.15. >> > > >>>>>>> But I'm questioning if that's really necessary because we will >> > > >>>>>>> add >> > > >>>>> more >> > > >>>>>>> functionality beyond 1.15 without breaking API. For example, >> we >> > > >>>>>>> may >> > > >>>>> add >> > > >>>>>>> more unified connector metrics. If you want to use it in your >> > > >>>>> connector, >> > > >>>>>>> you have to support multiple Flink versions anyhow. So rather >> > > >>>>>>> then >> > > >>>>>> focusing >> > > >>>>>>> the discussion on "when is stuff stable", I'd rather focus on >> > > >>>>>>> "how >> > > >>>>> can we >> > > >>>>>>> support building connectors against multiple Flink versions" >> and >> > > >>>>>>> make >> > > >>>>> it >> > > >>>>>> as >> > > >>>>>>> painless as possible. >> > > >>>>>>> >> > > >>>>>>> Chesnay pointed out to use different branches for different >> Flink >> > > >>>>>> versions >> > > >>>>>>> which sounds like a good suggestion. With a mono-repo, we >> can't >> > > >>>>>>> use branches differently anyways (there is no way to have >> release >> > > >>>>>>> branches >> > > >>>>>> per >> > > >>>>>>> connector without chaos). In these branches, we could provide >> > > >>>>>>> shims to simulate future features in older Flink versions such >> > > >>>>>>> that code-wise, >> > > >>>>> the >> > > >>>>>>> source code of a specific connector may not diverge (much). >> For >> > > >>>>> example, >> > > >>>>>> to >> > > >>>>>>> register unified connector metrics, we could simulate the >> current >> > > >>>>>> approach >> > > >>>>>>> also in some utility package of the mono-repo. >> > > >>>>>>> >> > > >>>>>>> I see the stable core Flink API as a prerequisite for >> modularity. >> > > >>>>>>> And >> > > >>>>>>>> for connectors it is not just the source and sink API (source >> > > >>>>>>>> being stable as of 1.14), but everything that is required to >> > > >>>>>>>> build and maintain a connector downstream, such as the test >> > > >>>>>>>> utilities and infrastructure. >> > > >>>>>>>> >> > > >>>>>>> That is a very fair point. I'm actually surprised to see that >> > > >>>>>>> MiniClusterWithClientResource is not public. I see it being >> used >> > > >>>>>>> in >> > > >>>>> all >> > > >>>>>>> connectors, especially outside of Flink. I fear that as long >> as >> > > >>>>>>> we do >> > > >>>>> not >> > > >>>>>>> have connectors outside, we will not properly annotate and >> > > >>>>>>> maintain >> > > >>>>> these >> > > >>>>>>> utilties in a classic hen-and-egg-problem. I will outline an >> idea >> > > >>>>>>> at >> > > >>>>> the >> > > >>>>>>> end. >> > > >>>>>>> >> > > >>>>>>>> the connectors need to be adopted and require at least one >> > > >>>>>>>> release >> > > >>>>> per >> > > >>>>>>>> Flink minor release. >> > > >>>>>>>> However, this will make the releases of connectors slower, >> e.g. >> > > >>>>>> maintain >> > > >>>>>>>> features for multiple branches and release multiple branches. >> > > >>>>>>>> I think the main purpose of having an external connector >> > > >>>>>>>> repository >> > > >>>>> is >> > > >>>>>> in >> > > >>>>>>>> order to have "faster releases of connectors"? >> > > >>>>>>>> >> > > >>>>>>>> Imagine a project with a complex set of dependencies. Let's >> say >> > > >>>>> Flink >> > > >>>>>>>> version A plus Flink reliant dependencies released by other >> > > >>>>>>>> projects (Flink-external connectors, Beam, Iceberg, Hudi, >> ..). >> > > >>>>>>>> We don't want >> > > >>>>> a >> > > >>>>>>>> situation where we bump the core Flink version to B and >> things >> > > >>>>>>>> fall apart (interface changes, utilities that were useful but >> > > >>>>>>>> not public, transitive dependencies etc.). >> > > >>>>>>>> >> > > >>>>>>> Yes, that's why I wanted to automate the processes more which >> is >> > > >>>>>>> not >> > > >>>>> that >> > > >>>>>>> easy under ASF. Maybe we automate the source provision across >> > > >>>>> supported >> > > >>>>>>> versions and have 1 vote thread for all versions of a >> connector? >> > > >>>>>>> >> > > >>>>>>> From the perspective of CDC connector maintainers, the >> biggest >> > > >>>>> advantage >> > > >>>>>> of >> > > >>>>>>>> maintaining it outside of the Flink project is that: >> > > >>>>>>>> 1) we can have a more flexible and faster release cycle >> > > >>>>>>>> 2) we can be more liberal with committership for connector >> > > >>>>> maintainers >> > > >>>>>>>> which can also attract more committers to help the release. >> > > >>>>>>>> >> > > >>>>>>>> Personally, I think maintaining one connector repository >> under >> > > >>>>>>>> the >> > > >>>>> ASF >> > > >>>>>>> may >> > > >>>>>>>> not have the above benefits. >> > > >>>>>>>> >> > > >>>>>>> Yes, I also feel that ASF is too restrictive for our needs. >> But >> > > >>>>>>> it >> > > >>>>> feels >> > > >>>>>>> like there are too many that see it differently and I think we >> > > >>>>>>> need >> > > >>>>>>> >> > > >>>>>>> (2) Flink testability without connectors. >> > > >>>>>>>> This is a very good question. How can we guarantee the new >> > > >>>>>>>> Source >> > > >>>>> and >> > > >>>>>>> Sink >> > > >>>>>>>> API are stable with only test implementation? >> > > >>>>>>>> >> > > >>>>>>> We can't and shouldn't. Since the connector repo is managed by >> > > >>>>>>> Flink, >> > > >>>>> a >> > > >>>>>>> Flink release manager needs to check if the Flink connectors >> are >> > > >>>>> actually >> > > >>>>>>> working prior to creating an RC. That's similar to how >> > > >>>>>>> flink-shaded >> > > >>>>> and >> > > >>>>>>> flink core are related. >> > > >>>>>>> >> > > >>>>>>> >> > > >>>>>>> So here is one idea that I had to get things rolling. We are >> > > >>>>>>> going to address the external repo iteratively without >> > > >>>>>>> compromising what we >> > > >>>>>> already >> > > >>>>>>> have: >> > > >>>>>>> 1.Phase, add new contributions to external repo. We use that >> time >> > > >>>>>>> to >> > > >>>>>> setup >> > > >>>>>>> infra accordingly and optimize release processes. We will >> > > >>>>>>> identify >> > > >>>>> test >> > > >>>>>>> utilities that are not yet public/stable and fix that. >> > > >>>>>>> 2.Phase, add ports to the new unified interfaces of existing >> > > >>>>> connectors. >> > > >>>>>>> That requires a previous Flink release to make utilities >> stable. >> > > >>>>>>> Keep >> > > >>>>> old >> > > >>>>>>> interfaces in flink-core. >> > > >>>>>>> 3.Phase, remove old interfaces in flink-core of some >> connectors >> > > >>>>>>> (tbd >> > > >>>>> at a >> > > >>>>>>> later point). >> > > >>>>>>> 4.Phase, optionally move all remaining connectors (tbd at a >> later >> > > >>>>> point). >> > > >>>>>>> I'd envision having ~3 months between the starting the >> different >> > > >>>>> phases. >> > > >>>>>>> WDYT? >> > > >>>>>>> >> > > >>>>>>> >> > > >>>>>>> [1] >> > > >>>>>>> >> > https://urldefense.com/v3/__https://issues.apache.org/jira/browse >> > > >>>>>>> >> /FLINK-23527__;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgd >> > > >>>>>>> ke_-XjpYgX2sIvAP4$ [issues[.]apache[.]org] >> > > >>>>>>> >> > > >>>>>>> On Thu, Oct 21, 2021 at 7:12 AM Kyle Bendickson < >> k...@tabular.io >> > > >> > > >>>>> wrote: >> > > >>>>>>>> Hi all, >> > > >>>>>>>> >> > > >>>>>>>> My name is Kyle and I’m an open source developer primarily >> > > >>>>>>>> focused >> > > >>>>> on >> > > >>>>>>>> Apache Iceberg. >> > > >>>>>>>> >> > > >>>>>>>> I’m happy to help clarify or elaborate on any aspect of our >> > > >>>>> experience >> > > >>>>>>>> working on a relatively decoupled connector that is >> downstream >> > > >>>>>>>> and >> > > >>>>>> pretty >> > > >>>>>>>> popular. >> > > >>>>>>>> >> > > >>>>>>>> I’d also love to be able to contribute or assist in any way I >> > > >> can. >> > > >>>>>>>> I don’t mean to thread jack, but are there any meetings or >> > > >>>>>>>> community >> > > >>>>>> sync >> > > >>>>>>>> ups, specifically around the connector APIs, that I might >> join >> > > >>>>>>>> / be >> > > >>>>>>> invited >> > > >>>>>>>> to? >> > > >>>>>>>> >> > > >>>>>>>> I did want to add that even though I’ve experienced some of >> the >> > > >>>>>>>> pain >> > > >>>>>>> points >> > > >>>>>>>> of integrating with an evolving system / API (catalog support >> > > >>>>>>>> is >> > > >>>>>>> generally >> > > >>>>>>>> speaking pretty new everywhere really in this space), I also >> > > >>>>>>>> agree personally that you shouldn’t slow down development >> > > >>>>>>>> velocity too >> > > >>>>> much >> > > >>>>>> for >> > > >>>>>>>> the sake of external connector. Getting to a performant and >> > > >>>>>>>> stable >> > > >>>>>> place >> > > >>>>>>>> should be the primary goal, and slowing that down to support >> > > >>>>> stragglers >> > > >>>>>>>> will (in my personal opinion) always be a losing game. Some >> > > >>>>>>>> folks >> > > >>>>> will >> > > >>>>>>>> simply stay behind on versions regardless until they have to >> > > >>>>> upgrade. >> > > >>>>>>>> I am working on ensuring that the Iceberg community stays >> > > >>>>>>>> within 1-2 versions of Flink, so that we can help provide >> more >> > > >>>>>>>> feedback or >> > > >>>>>>> contribute >> > > >>>>>>>> things that might make our ability to support multiple Flink >> > > >>>>> runtimes / >> > > >>>>>>>> versions with one project / codebase and minimal to no >> > > >>>>>>>> reflection >> > > >>>>> (our >> > > >>>>>>>> desired goal). >> > > >>>>>>>> >> > > >>>>>>>> If there’s anything I can do or any way I can be of >> assistance, >> > > >>>>> please >> > > >>>>>>>> don’t hesitate to reach out. Or find me on ASF slack 😀 >> > > >>>>>>>> >> > > >>>>>>>> I greatly appreciate your general concern for the needs of >> > > >>>>> downstream >> > > >>>>>>>> connector integrators! >> > > >>>>>>>> >> > > >>>>>>>> Cheers >> > > >>>>>>>> Kyle Bendickson (GitHub: kbendick) Open Source Developer kyle >> > > >>>>>>>> [at] tabular [dot] io >> > > >>>>>>>> >> > > >>>>>>>> On Wed, Oct 20, 2021 at 11:35 AM Thomas Weise < >> t...@apache.org> >> > > >>>>> wrote: >> > > >>>>>>>>> Hi, >> > > >>>>>>>>> >> > > >>>>>>>>> I see the stable core Flink API as a prerequisite for >> > > >>> modularity. >> > > >>>>> And >> > > >>>>>>>>> for connectors it is not just the source and sink API >> (source >> > > >>>>> being >> > > >>>>>>>>> stable as of 1.14), but everything that is required to build >> > > >>>>>>>>> and maintain a connector downstream, such as the test >> > > >>>>>>>>> utilities and infrastructure. >> > > >>>>>>>>> >> > > >>>>>>>>> Without the stable surface of core Flink, changes will leak >> > > >>>>>>>>> into downstream dependencies and force lock step updates. >> > > >>>>>>>>> Refactoring across N repos is more painful than a single >> > > >>>>>>>>> repo. Those with experience developing downstream of Flink >> > > >>>>>>>>> will know the pain, and >> > > >>>>>> that >> > > >>>>>>>>> isn't limited to connectors. I don't remember a Flink "minor >> > > >>>>> version" >> > > >>>>>>>>> update that was just a dependency version change and did not >> > > >>>>>>>>> force other downstream changes. >> > > >>>>>>>>> >> > > >>>>>>>>> Imagine a project with a complex set of dependencies. Let's >> > > >>>>>>>>> say >> > > >>>>> Flink >> > > >>>>>>>>> version A plus Flink reliant dependencies released by other >> > > >>>>> projects >> > > >>>>>>>>> (Flink-external connectors, Beam, Iceberg, Hudi, ..). We >> > > >>>>>>>>> don't >> > > >>>>> want a >> > > >>>>>>>>> situation where we bump the core Flink version to B and >> > > >>>>>>>>> things >> > > >>>>> fall >> > > >>>>>>>>> apart (interface changes, utilities that were useful but not >> > > >>>>> public, >> > > >>>>>>>>> transitive dependencies etc.). >> > > >>>>>>>>> >> > > >>>>>>>>> The discussion here also highlights the benefits of keeping >> > > >>>>> certain >> > > >>>>>>>>> connectors outside Flink. Whether that is due to difference >> > > >>>>>>>>> in developer community, maturity of the connectors, their >> > > >>>>>>>>> specialized/limited usage etc. I would like to see that as a >> > > >>>>>>>>> sign >> > > >>>>> of >> > > >>>>>> a >> > > >>>>>>>>> growing ecosystem and most of the ideas that Arvid has put >> > > >>>>>>>>> forward would benefit further growth of the connector >> > > >> ecosystem. >> > > >>>>>>>>> As for keeping connectors within Apache Flink: I prefer that >> > > >>>>>>>>> as >> > > >>>>> the >> > > >>>>>>>>> path forward for "essential" connectors like FileSource, >> > > >>>>> KafkaSource, >> > > >>>>>>>>> ... And we can still achieve a more flexible and faster >> > > >>>>>>>>> release >> > > >>>>>> cycle. >> > > >>>>>>>>> Thanks, >> > > >>>>>>>>> Thomas >> > > >>>>>>>>> >> > > >>>>>>>>> >> > > >>>>>>>>> >> > > >>>>>>>>> >> > > >>>>>>>>> >> > > >>>>>>>>> On Wed, Oct 20, 2021 at 3:32 AM Jark Wu <imj...@gmail.com> >> > > >>> wrote: >> > > >>>>>>>>>> Hi Konstantin, >> > > >>>>>>>>>> >> > > >>>>>>>>>>> the connectors need to be adopted and require at least >> > > >>>>>>>>>>> one >> > > >>>>>> release >> > > >>>>>>>> per >> > > >>>>>>>>>> Flink minor release. >> > > >>>>>>>>>> However, this will make the releases of connectors slower, >> > > >>> e.g. >> > > >>>>>>>> maintain >> > > >>>>>>>>>> features for multiple branches and release multiple >> > > >> branches. >> > > >>>>>>>>>> I think the main purpose of having an external connector >> > > >>>>> repository >> > > >>>>>>> is >> > > >>>>>>>> in >> > > >>>>>>>>>> order to have "faster releases of connectors"? >> > > >>>>>>>>>> >> > > >>>>>>>>>> >> > > >>>>>>>>>> From the perspective of CDC connector maintainers, the >> > > >>>>>>>>>> biggest >> > > >>>>>>>> advantage >> > > >>>>>>>>> of >> > > >>>>>>>>>> maintaining it outside of the Flink project is that: >> > > >>>>>>>>>> 1) we can have a more flexible and faster release cycle >> > > >>>>>>>>>> 2) we can be more liberal with committership for connector >> > > >>>>>>> maintainers >> > > >>>>>>>>>> which can also attract more committers to help the release. >> > > >>>>>>>>>> >> > > >>>>>>>>>> Personally, I think maintaining one connector repository >> > > >>>>>>>>>> under >> > > >>>>> the >> > > >>>>>>> ASF >> > > >>>>>>>>> may >> > > >>>>>>>>>> not have the above benefits. >> > > >>>>>>>>>> >> > > >>>>>>>>>> Best, >> > > >>>>>>>>>> Jark >> > > >>>>>>>>>> >> > > >>>>>>>>>> On Wed, 20 Oct 2021 at 15:14, Konstantin Knauf < >> > > >>>>> kna...@apache.org> >> > > >>>>>>>>> wrote: >> > > >>>>>>>>>>> Hi everyone, >> > > >>>>>>>>>>> >> > > >>>>>>>>>>> regarding the stability of the APIs. I think everyone >> > > >>>>>>>>>>> agrees >> > > >>>>> that >> > > >>>>>>>>>>> connector APIs which are stable across minor versions >> > > >>>>>> (1.13->1.14) >> > > >>>>>>>> are >> > > >>>>>>>>> the >> > > >>>>>>>>>>> mid-term goal. But: >> > > >>>>>>>>>>> >> > > >>>>>>>>>>> a) These APIs are still quite young, and we shouldn't >> > > >>>>>>>>>>> make >> > > >>>>> them >> > > >>>>>>>> @Public >> > > >>>>>>>>>>> prematurely either. >> > > >>>>>>>>>>> >> > > >>>>>>>>>>> b) Isn't this *mostly* orthogonal to where the connector >> > > >>>>>>>>>>> code >> > > >>>>>>> lives? >> > > >>>>>>>>> Yes, >> > > >>>>>>>>>>> as long as there are breaking changes, the connectors >> > > >>>>>>>>>>> need to >> > > >>>>> be >> > > >>>>>>>>> adopted >> > > >>>>>>>>>>> and require at least one release per Flink minor release. >> > > >>>>>>>>>>> Documentation-wise this can be addressed via a >> > > >>>>>>>>>>> compatibility >> > > >>>>>> matrix >> > > >>>>>>>> for >> > > >>>>>>>>>>> each connector as Arvid suggested. IMO we shouldn't block >> > > >>>>>>>>>>> this >> > > >>>>>>> effort >> > > >>>>>>>>> on >> > > >>>>>>>>>>> the stability of the APIs. >> > > >>>>>>>>>>> >> > > >>>>>>>>>>> Cheers, >> > > >>>>>>>>>>> >> > > >>>>>>>>>>> Konstantin >> > > >>>>>>>>>>> >> > > >>>>>>>>>>> >> > > >>>>>>>>>>> >> > > >>>>>>>>>>> On Wed, Oct 20, 2021 at 8:56 AM Jark Wu >> > > >>>>>>>>>>> <imj...@gmail.com> >> > > >>>>>> wrote: >> > > >>>>>>>>>>>> Hi, >> > > >>>>>>>>>>>> >> > > >>>>>>>>>>>> I think Thomas raised very good questions and would like >> > > >>>>>>>>>>>> to >> > > >>>>> know >> > > >>>>>>>> your >> > > >>>>>>>>>>>> opinions if we want to move connectors out of flink in >> > > >>>>>>>>>>>> this >> > > >>>>>>> version. >> > > >>>>>>>>>>>> (1) is the connector API already stable? >> > > >>>>>>>>>>>>> Separate releases would only make sense if the core >> > > >>>>>>>>>>>>> Flink >> > > >>>>>>> surface >> > > >>>>>>>> is >> > > >>>>>>>>>>>>> fairly stable though. As evident from Iceberg (and >> > > >>>>>>>>>>>>> also >> > > >>>>> Beam), >> > > >>>>>>>>> that's >> > > >>>>>>>>>>>>> not the case currently. We should probably focus on >> > > >>>>> addressing >> > > >>>>>>> the >> > > >>>>>>>>>>>>> stability first, before splitting code. A success >> > > >>>>>>>>>>>>> criteria >> > > >>>>>> could >> > > >>>>>>>> be >> > > >>>>>>>>>>>>> that we are able to build Iceberg and Beam against >> > > >>>>>>>>>>>>> multiple >> > > >>>>>>> Flink >> > > >>>>>>>>>>>>> versions w/o the need to change code. The goal would >> > > >>>>>>>>>>>>> be >> > > >>>>> that >> > > >>>>>> no >> > > >>>>>>>>>>>>> connector breaks when we make changes to Flink core. >> > > >>>>>>>>>>>>> Until >> > > >>>>>>> that's >> > > >>>>>>>>> the >> > > >>>>>>>>>>>>> case, code separation creates a setup where 1+1 or N+1 >> > > >>>>>>>> repositories >> > > >>>>>>>>>>>>> need to move lock step. >> > > >>>>>>>>>>>> From another discussion thread [1], connector API is far >> > > >>>>>>>>>>>> from >> > > >>>>>>>> stable. >> > > >>>>>>>>>>>> Currently, it's hard to build connectors against >> > > >>>>>>>>>>>> multiple >> > > >>>>> Flink >> > > >>>>>>>>> versions. >> > > >>>>>>>>>>>> There are breaking API changes both in 1.12 -> 1.13 and >> > > >>>>>>>>>>>> 1.13 >> > > >>>>> -> >> > > >>>>>>> 1.14 >> > > >>>>>>>>> and >> > > >>>>>>>>>>>> maybe also in the future versions, because Table >> > > >>>>>>>>>>>> related >> > > >>>>> APIs >> > > >>>>>>> are >> > > >>>>>>>>> still >> > > >>>>>>>>>>>> @PublicEvolving and new Sink API is still @Experimental. >> > > >>>>>>>>>>>> >> > > >>>>>>>>>>>> >> > > >>>>>>>>>>>> (2) Flink testability without connectors. >> > > >>>>>>>>>>>>> Flink w/o Kafka connector (and few others) isn't >> > > >>>>>>>>>>>>> viable. Testability of Flink was already brought up, >> > > >>>>>>>>>>>>> can we >> > > >>>>>>> really >> > > >>>>>>>>>>>>> certify a Flink core release without Kafka connector? >> > > >>>>>>>>>>>>> Maybe >> > > >>>>>>> those >> > > >>>>>>>>>>>>> connectors that are used in Flink e2e tests to >> > > >>>>>>>>>>>>> validate >> > > >>>>>>>>> functionality >> > > >>>>>>>>>>>>> of core Flink should not be broken out? >> > > >>>>>>>>>>>> This is a very good question. How can we guarantee the >> > > >>>>>>>>>>>> new >> > > >>>>>> Source >> > > >>>>>>>> and >> > > >>>>>>>>> Sink >> > > >>>>>>>>>>>> API are stable with only test implementation? >> > > >>>>>>>>>>>> >> > > >>>>>>>>>>>> >> > > >>>>>>>>>>>> Best, >> > > >>>>>>>>>>>> Jark >> > > >>>>>>>>>>>> >> > > >>>>>>>>>>>> >> > > >>>>>>>>>>>> >> > > >>>>>>>>>>>> >> > > >>>>>>>>>>>> >> > > >>>>>>>>>>>> On Tue, 19 Oct 2021 at 23:56, Chesnay Schepler < >> > > >>>>>>> ches...@apache.org> >> > > >>>>>>>>>>>> wrote: >> > > >>>>>>>>>>>> >> > > >>>>>>>>>>>>> Could you clarify what release cadence you're thinking >> > > >>> of? >> > > >>>>>>> There's >> > > >>>>>>>>> quite >> > > >>>>>>>>>>>>> a big range that fits "more frequent than Flink" >> > > >>>>> (per-commit, >> > > >>>>>>>> daily, >> > > >>>>>>>>>>>>> weekly, bi-weekly, monthly, even bi-monthly). >> > > >>>>>>>>>>>>> >> > > >>>>>>>>>>>>> On 19/10/2021 14:15, Martijn Visser wrote: >> > > >>>>>>>>>>>>>> Hi all, >> > > >>>>>>>>>>>>>> >> > > >>>>>>>>>>>>>> I think it would be a huge benefit if we can achieve >> > > >>>>>>>>>>>>>> more >> > > >>>>>>>> frequent >> > > >>>>>>>>>>>>> releases >> > > >>>>>>>>>>>>>> of connectors, which are not bound to the release >> > > >>>>>>>>>>>>>> cycle >> > > >>>>> of >> > > >>>>>>> Flink >> > > >>>>>>>>>>>> itself. >> > > >>>>>>>>>>>>> I >> > > >>>>>>>>>>>>>> agree that in order to get there, we need to have >> > > >>>>>>>>>>>>>> stable >> > > >>>>>>>>> interfaces >> > > >>>>>>>>>>>> which >> > > >>>>>>>>>>>>>> are trustworthy and reliable, so they can be safely >> > > >>>>>>>>>>>>>> used >> > > >>>>> by >> > > >>>>>>>> those >> > > >>>>>>>>>>>>>> connectors. I do think that work still needs to be >> > > >>>>>>>>>>>>>> done >> > > >>>>> on >> > > >>>>>>> those >> > > >>>>>>>>>>>>>> interfaces, but I am confident that we can get there >> > > >>>>> from a >> > > >>>>>>>> Flink >> > > >>>>>>>>>>>>>> perspective. >> > > >>>>>>>>>>>>>> >> > > >>>>>>>>>>>>>> I am worried that we would not be able to achieve >> > > >>>>>>>>>>>>>> those >> > > >>>>>>> frequent >> > > >>>>>>>>>>>> releases >> > > >>>>>>>>>>>>>> of connectors if we are putting these connectors >> > > >>>>>>>>>>>>>> under >> > > >>>>> the >> > > >>>>>>>> Apache >> > > >>>>>>>>>>>>> umbrella, >> > > >>>>>>>>>>>>>> because that means that for each connector release >> > > >>>>>>>>>>>>>> we >> > > >>>>> have >> > > >>>>>> to >> > > >>>>>>>>> follow >> > > >>>>>>>>>>>> the >> > > >>>>>>>>>>>>>> Apache release creation process. This requires a lot >> > > >>>>>>>>>>>>>> of >> > > >>>>>> manual >> > > >>>>>>>>> steps >> > > >>>>>>>>>>>> and >> > > >>>>>>>>>>>>>> prohibits automation and I think it would be hard to >> > > >>>>> scale >> > > >>>>>> out >> > > >>>>>>>>>>>> frequent >> > > >>>>>>>>>>>>>> releases of connectors. I'm curious how others think >> > > >>>>>>>>>>>>>> this >> > > >>>>>>>>> challenge >> > > >>>>>>>>>>>> could >> > > >>>>>>>>>>>>>> be solved. >> > > >>>>>>>>>>>>>> >> > > >>>>>>>>>>>>>> Best regards, >> > > >>>>>>>>>>>>>> >> > > >>>>>>>>>>>>>> Martijn >> > > >>>>>>>>>>>>>> >> > > >>>>>>>>>>>>>> On Mon, 18 Oct 2021 at 22:22, Thomas Weise < >> > > >>>>> t...@apache.org> >> > > >>>>>>>>> wrote: >> > > >>>>>>>>>>>>>>> Thanks for initiating this discussion. >> > > >>>>>>>>>>>>>>> >> > > >>>>>>>>>>>>>>> There are definitely a few things that are not >> > > >>>>>>>>>>>>>>> optimal >> > > >>>>> with >> > > >>>>>>> our >> > > >>>>>>>>>>>>>>> current management of connectors. I would not >> > > >>>>> necessarily >> > > >>>>>>>>>>>> characterize >> > > >>>>>>>>>>>>>>> it as a "mess" though. As the points raised so far >> > > >>>>> show, it >> > > >>>>>>>> isn't >> > > >>>>>>>>>>>> easy >> > > >>>>>>>>>>>>>>> to find a solution that balances competing >> > > >>>>>>>>>>>>>>> requirements >> > > >>>>> and >> > > >>>>>>>>> leads to >> > > >>>>>>>>>>>> a >> > > >>>>>>>>>>>>>>> net improvement. >> > > >>>>>>>>>>>>>>> >> > > >>>>>>>>>>>>>>> It would be great if we can find a setup that >> > > >>>>>>>>>>>>>>> allows for >> > > >>>>>>>>> connectors >> > > >>>>>>>>>>>> to >> > > >>>>>>>>>>>>>>> be released independently of core Flink and that >> > > >>>>>>>>>>>>>>> each >> > > >>>>>>> connector >> > > >>>>>>>>> can >> > > >>>>>>>>>>>> be >> > > >>>>>>>>>>>>>>> released separately. Flink already has separate >> > > >>>>>>>>>>>>>>> releases (flink-shaded), so that by itself isn't a >> > > >>> new thing. >> > > >>>>>>>>> Per-connector >> > > >>>>>>>>>>>>>>> releases would need to allow for more frequent >> > > >>>>>>>>>>>>>>> releases >> > > >>>>>>>> (without >> > > >>>>>>>>> the >> > > >>>>>>>>>>>>>>> baggage that a full Flink release comes with). >> > > >>>>>>>>>>>>>>> >> > > >>>>>>>>>>>>>>> Separate releases would only make sense if the core >> > > >>>>> Flink >> > > >>>>>>>>> surface is >> > > >>>>>>>>>>>>>>> fairly stable though. As evident from Iceberg (and >> > > >>>>>>>>>>>>>>> also >> > > >>>>>>> Beam), >> > > >>>>>>>>> that's >> > > >>>>>>>>>>>>>>> not the case currently. We should probably focus on >> > > >>>>>>> addressing >> > > >>>>>>>>> the >> > > >>>>>>>>>>>>>>> stability first, before splitting code. A success >> > > >>>>> criteria >> > > >>>>>>>> could >> > > >>>>>>>>> be >> > > >>>>>>>>>>>>>>> that we are able to build Iceberg and Beam against >> > > >>>>> multiple >> > > >>>>>>>> Flink >> > > >>>>>>>>>>>>>>> versions w/o the need to change code. The goal >> > > >>>>>>>>>>>>>>> would be >> > > >>>>>> that >> > > >>>>>>> no >> > > >>>>>>>>>>>>>>> connector breaks when we make changes to Flink core. >> > > >>>>> Until >> > > >>>>>>>>> that's the >> > > >>>>>>>>>>>>>>> case, code separation creates a setup where 1+1 or >> > > >>>>>>>>>>>>>>> N+1 >> > > >>>>>>>>> repositories >> > > >>>>>>>>>>>>>>> need to move lock step. >> > > >>>>>>>>>>>>>>> >> > > >>>>>>>>>>>>>>> Regarding some connectors being more important for >> > > >>>>>>>>>>>>>>> Flink >> > > >>>>>> than >> > > >>>>>>>>> others: >> > > >>>>>>>>>>>>>>> That's a fact. Flink w/o Kafka connector (and few >> > > >>>>> others) >> > > >>>>>>> isn't >> > > >>>>>>>>>>>>>>> viable. Testability of Flink was already brought >> > > >>>>>>>>>>>>>>> up, >> > > >>>>> can we >> > > >>>>>>>>> really >> > > >>>>>>>>>>>>>>> certify a Flink core release without Kafka >> > > >> connector? >> > > >>>>> Maybe >> > > >>>>>>>> those >> > > >>>>>>>>>>>>>>> connectors that are used in Flink e2e tests to >> > > >>>>>>>>>>>>>>> validate >> > > >>>>>>>>> functionality >> > > >>>>>>>>>>>>>>> of core Flink should not be broken out? >> > > >>>>>>>>>>>>>>> >> > > >>>>>>>>>>>>>>> Finally, I think that the connectors that move into >> > > >>>>>> separate >> > > >>>>>>>>> repos >> > > >>>>>>>>>>>>>>> should remain part of the Apache Flink project. >> > > >>>>>>>>>>>>>>> Larger >> > > >>>>>>>>> organizations >> > > >>>>>>>>>>>>>>> tend to approve the use of and contribution to open >> > > >>>>> source >> > > >>>>>> at >> > > >>>>>>>> the >> > > >>>>>>>>>>>>>>> project level. Sometimes it is everything ASF. More >> > > >>>>> often >> > > >>>>>> it >> > > >>>>>>> is >> > > >>>>>>>>>>>>>>> "Apache Foo". It would be fatal to end up with a >> > > >>>>> patchwork >> > > >>>>>> of >> > > >>>>>>>>>>>> projects >> > > >>>>>>>>>>>>>>> with potentially different licenses and governance >> > > >>>>>>>>>>>>>>> to >> > > >>>>>> arrive >> > > >>>>>>>> at a >> > > >>>>>>>>>>>>>>> working Flink setup. This may mean we prioritize >> > > >>>>> usability >> > > >>>>>>> over >> > > >>>>>>>>>>>>>>> developer convenience, if that's in the best >> > > >>>>>>>>>>>>>>> interest of >> > > >>>>>>> Flink >> > > >>>>>>>>> as a >> > > >>>>>>>>>>>>>>> whole. >> > > >>>>>>>>>>>>>>> >> > > >>>>>>>>>>>>>>> Thanks, >> > > >>>>>>>>>>>>>>> Thomas >> > > >>>>>>>>>>>>>>> >> > > >>>>>>>>>>>>>>> >> > > >>>>>>>>>>>>>>> >> > > >>>>>>>>>>>>>>> On Mon, Oct 18, 2021 at 6:59 AM Chesnay Schepler < >> > > >>>>>>>>> ches...@apache.org >> > > >>>>>>>>>>>>>>> wrote: >> > > >>>>>>>>>>>>>>>> Generally, the issues are reproducibility and >> > > >>> control. >> > > >>>>>>>>>>>>>>>> Stuffs completely broken on the Flink side for a >> > > >>> week? >> > > >>>>>> Well >> > > >>>>>>>>> then so >> > > >>>>>>>>>>>> are >> > > >>>>>>>>>>>>>>>> the connector repos. >> > > >>>>>>>>>>>>>>>> (As-is) You can't go back to a previous version of >> > > >>>>>>>>>>>>>>>> the >> > > >>>>>>>> snapshot. >> > > >>>>>>>>>>>> Which >> > > >>>>>>>>>>>>>>>> also means that checking out older commits can be >> > > >>>>>>> problematic >> > > >>>>>>>>>>>> because >> > > >>>>>>>>>>>>>>>> you'd still work against the latest snapshots, and >> > > >>>>>>>>>>>>>>>> they >> > > >>>>>> not >> > > >>>>>>> be >> > > >>>>>>>>>>>>>>>> compatible with each other. >> > > >>>>>>>>>>>>>>>> >> > > >>>>>>>>>>>>>>>> >> > > >>>>>>>>>>>>>>>> On 18/10/2021 15:22, Arvid Heise wrote: >> > > >>>>>>>>>>>>>>>>> I was actually betting on snapshots versions. >> > > >>>>>>>>>>>>>>>>> What are >> > > >>>>>> the >> > > >>>>>>>>> limits? >> > > >>>>>>>>>>>>>>>>> Obviously, we can only do a release of a 1.15 >> > > >>>>> connector >> > > >>>>>>> after >> > > >>>>>>>>> 1.15 >> > > >>>>>>>>>>>> is >> > > >>>>>>>>>>>>>>>>> release. >> > > >>>>>>>>>>>>> >> > > >>>>>>>>>>> >> > > >>>>>>>>>>> -- >> > > >>>>>>>>>>> >> > > >>>>>>>>>>> Konstantin Knauf >> > > >>>>>>>>>>> >> > > >>>>>>>>>>> https://urldefense.com/v3/__https://twitter.com/snntrable >> > > >>>>>>>>>>> __;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_- >> > > >>>>>>>>>>> XjpYgX5MUy9M4$ [twitter[.]com] >> > > >>>>>>>>>>> >> > > >>>>>>>>>>> https://urldefense.com/v3/__https://github.com/knaufk__;! >> > > >>>>>>>>>>> !LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-XjpY >> > > >>>>>>>>>>> gXyX8u50S$ [github[.]com] >> > > >>>>>>>>>>> >> > > >> > > >> > >> >