+1 for the single repo approach. Cheers, Till
On Thu, Dec 9, 2021 at 3:54 PM Martijn Visser <mart...@ververica.com> wrote: > I also agree that it feels more natural to go with a repo for each > individual connector. Each repository can be made available at > flink-packages.org so users can find them, next to referring to them in > documentation. +1 from my side. > > On Thu, 9 Dec 2021 at 15:38, Arvid Heise <ar...@apache.org> wrote: > > > Hi all, > > > > We tried out Chesnay's proposal and went with Option 2. Unfortunately, we > > experienced tough nuts to crack and feel like we hit a dead end: > > - The main pain point with the outlined Frankensteinian connector repo is > > how to handle shared code / infra code. If we have it in some <common> > > branch, then we need to merge the common branch in the connector branch > on > > update. However, it's unclear to me how improvements in the common branch > > that naturally appear while working on a specific connector go back into > > the common branch. You can't use a pull request from your branch or else > > your connector code would poison the connector-less common branch. So you > > would probably manually copy the files over to a common branch and > create a > > PR branch for that. > > - A weird solution could be to have the common branch as a submodule in > the > > repo itself (if that's even possible). I'm sure that this setup would > blow > > up the minds of all newcomers. > > - Similarly, it's mandatory to have safeguards against code from > connector > > A poisoning connector B, common, or main. I had some similar setup in the > > past and code from two "distinct" branch types constantly swept over. > > - We could also say that we simply release <common> independently and > just > > have a maven (SNAPSHOT) dependency on it. But that would create a weird > > flow if you need to change in common where you need to constantly switch > > branches back and forth. > > - In general, Frankensteinian's approach is very switch intensive. If you > > maintain 3 connectors and need to fix 1 build stability each at the same > > time (quite common nowadays for some reason) and you have 2 review > rounds, > > you need to switch branches 9 times ignoring changes to common. > > > > Additionally, we still have the rather user/dev unfriendly main that is > > mostly empty. I'm also not sure we can generate an overview README.md to > > make it more friendly here because in theory every connector branch > should > > be based on main and we would get merge conflicts. > > > > I'd like to propose once again to go with individual repositories. > > - The only downside that we discussed so far is that we have more initial > > setup to do. Since we organically grow the number of > connector/repositories > > that load is quite distributed. We can offer templates after finding a > good > > approach that can even be used by outside organizations. > > - Regarding secrets, I think it's actually an advantage that the Kafka > > connector has no access to the AWS secrets. If there are secrets to be > > shared across connectors, we can and should use Azure's Variable Groups > (I > > have used it in the past to share Nexus creds across repos). That would > > also make rotation easy. > > - Working on different connectors would be rather easy as all modern IDE > > support multiple repo setups in the same project. You still need to do > > multiple releases in case you update common code (either accessed through > > Nexus or git submodule) and you want to release your connector. > > - There is no difference in respect to how many CI runs there in both > > approaches. > > - Individual repositories also have the advantage of allowing external > > incubation. Let's assume someone builds connector A and hosts it in their > > organization (very common setup). If they want to contribute the code to > > Flink, we could simply transfer the repository into ASF after ensuring > > Flink coding standards. Then we retain git history and Github issues. > > > > Is there any point that I'm missing? > > > > On Fri, Nov 26, 2021 at 1:32 PM Chesnay Schepler <ches...@apache.org> > > wrote: > > > > > For sharing workflows we should be able to use composite actions. We'd > > > have the main definition files in the flink-connectors repo, that we > > > also need to tag/release, which other branches/repos can then import. > > > These are also versioned, so we don't have to worry about accidentally > > > breaking stuff. > > > These could also be used to enforce certain standards / interfaces such > > > that we can automate more things (e.g., integration into the Flink > > > documentation). > > > > > > It is true that Option 2) and dedicated repositories share a lot of > > > properties. While I did say in an offline conversation that we in that > > > case might just as well use separate repositories, I'm not so sure > > > anymore. One repo would make administration a bit easier, for example > > > secrets wouldn't have to be applied to each repo (we wouldn't want > > > certain secrets to be set up organization-wide). > > > I overall also like that one repo would present a single access point; > > > you can't "miss" a connector repo, and I would hope that having it as > > > one repo would nurture more collaboration between the connectors, which > > > after all need to solve similar problems. > > > > > > It is a fair point that the branching model would be quite weird, but I > > > think that would subside pretty quickly. > > > > > > Personally I'd go with Option 2, and if that doesn't work out we can > > > still split the repo later on. (Which should then be a trivial matter > of > > > copying all <connector>/* branches and renaming them). > > > > > > On 26/11/2021 12:47, Till Rohrmann wrote: > > > > Hi Arvid, > > > > > > > > Thanks for updating this thread with the latest findings. The > described > > > > limitations for a single connector repo sound suboptimal to me. > > > > > > > > * Option 2. sounds as if we try to simulate multi connector repos > > inside > > > of > > > > a single repo. I also don't know how we would share code between the > > > > different branches (sharing infrastructure would probably be easier > > > > though). This seems to have the same limitations as dedicated repos > > with > > > > the downside of having a not very intuitive branching model. > > > > * Isn't option 1. kind of a degenerated version of option 2. where we > > > have > > > > some unrelated code from other connectors in the individual connector > > > > branches? > > > > * Option 3. has the downside that someone creating a release has to > > > release > > > > all connectors. This means that she either has to sync with the > > different > > > > connector maintainers or has to be able to release all connectors on > > her > > > > own. We are already seeing in the Flink community that releases > require > > > > quite good communication/coordination between the different people > > > working > > > > on different Flink components. Given our goals to make connector > > releases > > > > easier and more frequent, I think that coupling different connector > > > > releases might be counter-productive. > > > > > > > > To me it sounds not very practical to mainly use a mono repository > w/o > > > > having some more advanced build infrastructure that, for example, > > allows > > > to > > > > have different git roots in different connector directories. Maybe > the > > > mono > > > > repo can be a catch all repository for connectors that want to be > > > released > > > > in lock-step (Option 3.) with all other connectors the repo contains. > > But > > > > for connectors that get changed frequently, having a dedicated > > repository > > > > that allows independent releases sounds preferable to me. > > > > > > > > What utilities and infrastructure code do you intend to share? Using > > git > > > > submodules can definitely be one option to share code. However, it > > might > > > > also be ok to depend on flink-connector-common artifacts which could > > make > > > > things easier. Where I am unsure is whether git submodules can be > used > > to > > > > share infrastructure code (e.g. the .github/workflows) because you > need > > > > these files in the repo to trigger the CI infrastructure. > > > > > > > > Cheers, > > > > Till > > > > > > > > On Thu, Nov 25, 2021 at 1:59 PM Arvid Heise <ar...@apache.org> > wrote: > > > > > > > >> Hi Brian, > > > >> > > > >> Thank you for sharing. I think your approach is very valid and is in > > > line > > > >> with what I had in mind. > > > >> > > > >> Basically Pravega community aligns the connector releases with the > > > Pravega > > > >>> mainline release > > > >>> > > > >> This certainly would mean that there is little value in coupling > > > connector > > > >> versions. So it's making a good case for having separate connector > > > repos. > > > >> > > > >> > > > >>> and maintains the connector with the latest 3 Flink versions(CI > will > > > >>> publish snapshots for all these 3 branches) > > > >>> > > > >> I'd like to give connector devs a simple way to express to which > Flink > > > >> versions the current branch is compatible. From there we can > generate > > > the > > > >> compatibility matrix automatically and optionally also create > > different > > > >> releases per supported Flink version. Not sure if the latter is > indeed > > > >> better than having just one artifact that happens to run with > multiple > > > >> Flink versions. I guess it depends on what dependencies we are > > > exposing. If > > > >> the connector uses flink-connector-base, then we probably need > > separate > > > >> artifacts with poms anyways. > > > >> > > > >> Best, > > > >> > > > >> Arvid > > > >> > > > >> On Fri, Nov 19, 2021 at 10:55 AM Zhou, Brian <b.z...@dell.com> > wrote: > > > >> > > > >>> Hi Arvid, > > > >>> > > > >>> For branching model, the Pravega Flink connector has some > experience > > > what > > > >>> I would like to share. Here[1][2] is the compatibility matrix and > > wiki > > > >>> explaining the branching model and releases. Basically Pravega > > > community > > > >>> aligns the connector releases with the Pravega mainline release, > and > > > >>> maintains the connector with the latest 3 Flink versions(CI will > > > publish > > > >>> snapshots for all these 3 branches). > > > >>> For example, recently we have 0.10.1 release[3], and in maven > central > > > we > > > >>> need to upload three artifacts(For Flink 1.13, 1.12, 1.11) for > 0.10.1 > > > >>> version[4]. > > > >>> > > > >>> There are some alternatives. Another solution that we once > discussed > > > but > > > >>> finally got abandoned is to have a independent version just like > the > > > >>> current CDC connector, and then give a big compatibility matrix to > > > users. > > > >>> We think it would be too confusing when the connector develops. On > > the > > > >>> contrary, we can also do the opposite way to align with Flink > version > > > and > > > >>> maintain several branches for different system version. > > > >>> > > > >>> I would say this is only a fairly-OK solution because it is a bit > > > painful > > > >>> for maintainers as cherry-picks are very common and releases would > > > >> require > > > >>> much work. However, if neither systems do not have a nice backward > > > >>> compatibility, there seems to be no comfortable solution to the > their > > > >>> connector. > > > >>> > > > >>> [1] > https://github.com/pravega/flink-connectors#compatibility-matrix > > > >>> [2] > > > >>> > > > >> > > > > > > https://github.com/pravega/flink-connectors/wiki/Versioning-strategy-for-Flink-connector > > > >>> [3] > https://github.com/pravega/flink-connectors/releases/tag/v0.10.1 > > > >>> [4] https://search.maven.org/search?q=pravega-connectors-flink > > > >>> > > > >>> Best Regards, > > > >>> Brian > > > >>> > > > >>> > > > >>> Internal Use - Confidential > > > >>> > > > >>> -----Original Message----- > > > >>> From: Arvid Heise <ar...@apache.org> > > > >>> Sent: Friday, November 19, 2021 4:12 PM > > > >>> To: dev > > > >>> Subject: Re: [DISCUSS] Creating an external connector repository > > > >>> > > > >>> > > > >>> [EXTERNAL EMAIL] > > > >>> > > > >>> Hi everyone, > > > >>> > > > >>> we are currently in the process of setting up the flink-connectors > > repo > > > >>> [1] for new connectors but we hit a wall that we currently cannot > > take: > > > >>> branching model. > > > >>> To reiterate the original motivation of the external connector > repo: > > We > > > >>> want to decouple the release cycle of a connector with Flink. > > However, > > > if > > > >>> we want to support semantic versioning in the connectors with the > > > ability > > > >>> to introduce breaking changes through major version bumps and > support > > > >>> bugfixes on old versions, then we need release branches similar to > > how > > > >>> Flink core operates. > > > >>> Consider two connectors, let's call them kafka and hbase. We have > > kafka > > > >> in > > > >>> version 1.0.X, 1.1.Y (small improvement), 2.0.Z (config option) > > change > > > >> and > > > >>> hbase only on 1.0.A. > > > >>> > > > >>> Now our current assumption was that we can work with a mono-repo > > under > > > >> ASF > > > >>> (flink-connectors). Then, for release-branches, we found 3 options: > > > >>> 1. We would need to create some ugly mess with the cross product of > > > >>> connector and version: so you have kafka-release-1.0, > > > kafka-release-1.1, > > > >>> kafka-release-2.0, hbase-release-1.0. The main issue is not the > > amount > > > of > > > >>> branches (that's something that git can handle) but there the state > > of > > > >>> kafka is undefined in hbase-release-1.0. That's a call for desaster > > and > > > >>> makes releasing connectors very cumbersome (CI would only execute > and > > > >>> publish hbase SNAPSHOTS on hbase-release-1.0). > > > >>> 2. We could avoid the undefined state by having an empty master and > > > each > > > >>> release branch really only holds the code of the connector. But > > that's > > > >> also > > > >>> not great: any user that looks at the repo and sees no connector > > would > > > >>> assume that it's dead. > > > >>> 3. We could have synced releases similar to the CDC connectors [2]. > > > That > > > >>> means that if any connector introduces a breaking change, all > > > connectors > > > >>> get a new major. I find that quite confusing to a user if hbase > gets > > a > > > >> new > > > >>> release without any change because kafka introduced a breaking > > change. > > > >>> > > > >>> To fully decouple release cycles and CI of connectors, we could add > > > >>> individual repositories under ASF (flink-connector-kafka, > > > >>> flink-connector-hbase). Then we can apply the same branching model > as > > > >>> before. I quickly checked if there are precedences in the apache > > > >> community > > > >>> for that approach and just by scanning alphabetically I found > cordova > > > >> with > > > >>> 70 and couchdb with 77 apache repos respectively. So it certainly > > seems > > > >>> like other projects approached our problem in that way and the > apache > > > >>> organization is okay with that. I currently expect max 20 > additional > > > >> repos > > > >>> for connectors and in the future 10 max each for formats and > > > filesystems > > > >> if > > > >>> we would also move them out at some point in time. So we would be > at > > a > > > >>> total of 50 repos. > > > >>> > > > >>> Note for all options, we need to provide a compability matrix that > we > > > aim > > > >>> to autogenerate. > > > >>> > > > >>> Now for the potential downsides that we internally discussed: > > > >>> - How can we ensure common infra structure code, utilties, and > > quality? > > > >>> I propose to add a flink-connector-common that contains all these > > > things > > > >>> and is added as a git submodule/subtree to the repos. > > > >>> - Do we implicitly discourage connector developers to maintain more > > > than > > > >>> one connector with a fragmented code base? > > > >>> That is certainly a risk. However, I currently also see few devs > > > working > > > >>> on more than one connector. However, it may actually help keeping > the > > > >> devs > > > >>> that maintain a specific connector on the hook. We could use github > > > >> issues > > > >>> to track bugs and feature requests and a dev can focus his limited > > time > > > >> on > > > >>> getting that one connector right. > > > >>> > > > >>> So WDYT? Compared to some intermediate suggestions with split > repos, > > > the > > > >>> big difference is that everything remains under Apache umbrella and > > the > > > >>> Flink community. > > > >>> > > > >>> [1] > > > >>> > > > >> > > > > > > https://urldefense.com/v3/__https://github.com/apache/flink-connectors__;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-XjpYgXzxxweh4$ > > > >>> [github[.]com] [2] > > > >>> > > > >> > > > > > > https://urldefense.com/v3/__https://github.com/ververica/flink-cdc-connectors/__;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-XjpYgXzgoPGA8$ > > > >>> [github[.]com] > > > >>> > > > >>> On Fri, Nov 12, 2021 at 3:39 PM Arvid Heise <ar...@apache.org> > > wrote: > > > >>> > > > >>>> Hi everyone, > > > >>>> > > > >>>> I created the flink-connectors repo [1] to advance the topic. We > > would > > > >>>> create a proof-of-concept in the next few weeks as a special > branch > > > >>>> that I'd then use for discussions. If the community agrees with > the > > > >>>> approach, that special branch will become the master. If not, we > can > > > >>>> reiterate over it or create competing POCs. > > > >>>> > > > >>>> If someone wants to try things out in parallel, just make sure > that > > > >>>> you are not accidentally pushing POCs to the master. > > > >>>> > > > >>>> As a reminder: We will not move out any current connector from > Flink > > > >>>> at this point in time, so everything in Flink will remain as is > and > > be > > > >>>> maintained there. > > > >>>> > > > >>>> Best, > > > >>>> > > > >>>> Arvid > > > >>>> > > > >>>> [1] > > > >>>> > > > https://urldefense.com/v3/__https://github.com/apache/flink-connectors > > > >>>> > > __;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-XjpYgXzxxweh4 > > > >>>> $ [github[.]com] > > > >>>> > > > >>>> On Fri, Oct 29, 2021 at 6:57 PM Till Rohrmann < > trohrm...@apache.org > > > > > > >>>> wrote: > > > >>>> > > > >>>>> Hi everyone, > > > >>>>> > > > >>>>> From the discussion, it seems to me that we have different > > opinions > > > >>>>> whether to have an ASF umbrella repository or to host them > outside > > of > > > >>>>> the ASF. It also seems that this is not really the problem to > > solve. > > > >>>>> Since there are many good arguments for either approach, we could > > > >>>>> simply start with an ASF umbrella repository and see how people > > adopt > > > >>>>> it. If the individual connectors cannot move fast enough or if > > people > > > >>>>> prefer to not buy into the more heavy-weight ASF processes, then > > they > > > >>>>> can host the code also somewhere else. We simply need to make > sure > > > >>>>> that these connectors are discoverable (e.g. via flink-packages). > > > >>>>> > > > >>>>> The more important problem seems to be to provide common tooling > > > >>>>> (testing, infrastructure, documentation) that can easily be > reused. > > > >>>>> Similarly, it has become clear that the Flink community needs to > > > >>>>> improve on providing stable APIs. I think it is not realistic to > > > >>>>> first complete these tasks before starting to move connectors to > > > >>>>> dedicated repositories. As Stephan said, creating a connector > > > >>>>> repository will force us to pay more attention to API stability > and > > > >>>>> also to think about which testing tools are required. Hence, I > > > >>>>> believe that starting to add connectors to a different repository > > > >>>>> than apache/flink will help improve our connector tooling > > (declaring > > > >>>>> testing classes as public, creating a common test utility repo, > > > >>>>> creating a repo > > > >>>>> template) and vice versa. Hence, I like Arvid's proposed process > as > > > >>>>> it will start kicking things off w/o letting this effort fizzle > > out. > > > >>>>> > > > >>>>> Cheers, > > > >>>>> Till > > > >>>>> > > > >>>>> On Thu, Oct 28, 2021 at 11:44 AM Stephan Ewen <se...@apache.org> > > > >> wrote: > > > >>>>>> Thank you all, for the nice discussion! > > > >>>>>> > > > >>>>>> From my point of view, I very much like the idea of putting > > > >>>>>> connectors > > > >>>>> in a > > > >>>>>> separate repository. But I would argue it should be part of > Apache > > > >>>>> Flink, > > > >>>>>> similar to flink-statefun, flink-ml, etc. > > > >>>>>> > > > >>>>>> I share many of the reasons for that: > > > >>>>>> - As argued many times, reduces complexity of the Flink repo, > > > >>>>> increases > > > >>>>>> response times of CI, etc. > > > >>>>>> - Much lower barrier of contribution, because an unstable > > > >>>>>> connector > > > >>>>> would > > > >>>>>> not de-stabilize the whole build. Of course, we would need to > make > > > >>>>>> sure > > > >>>>> we > > > >>>>>> set this up the right way, with connectors having individual CI > > > >>>>>> runs, > > > >>>>> build > > > >>>>>> status, etc. But it certainly seems possible. > > > >>>>>> > > > >>>>>> > > > >>>>>> I would argue some points a bit different than some cases made > > > >> before: > > > >>>>>> (a) I believe the separation would increase connector stability. > > > >>>>> Because it > > > >>>>>> really forces us to work with the connectors against the APIs > like > > > >>>>>> any external developer. A mono repo is somehow the wrong thing > if > > > >>>>>> you in practice want to actually guarantee stable internal APIs > at > > > >>> some layer. > > > >>>>>> Because the mono repo makes it easy to just change something on > > > >>>>>> both > > > >>>>> sides > > > >>>>>> of the API (provider and consumer) seamlessly. > > > >>>>>> > > > >>>>>> Major refactorings in Flink need to keep all connector API > > > >>>>>> contracts intact, or we need to have a new version of the > > connector > > > >>> API. > > > >>>>>> (b) We may even be able to go towards more lightweight and > > > >>>>>> automated releases over time, even if we stay in Apache Flink > with > > > >>> that repo. > > > >>>>>> This isn't yet fully aligned with the Apache release policies, > > yet, > > > >>>>>> but there are board discussions about whether there can be > > > >>>>>> bot-triggered releases (by dependabot) and how that could fit > into > > > >>> the Apache process. > > > >>>>>> This doesn't seem to be quite there just yet, but seeing that > > those > > > >>>>> start > > > >>>>>> is a good sign, and there is a good chance we can do some things > > > >>> there. > > > >>>>>> I am not sure whether we should let bots trigger releases, > because > > > >>>>>> a > > > >>>>> final > > > >>>>>> human look at things isn't a bad thing, especially given the > > > >>>>>> popularity > > > >>>>> of > > > >>>>>> software supply chain attacks recently. > > > >>>>>> > > > >>>>>> > > > >>>>>> I do share Chesnay's concerns about complexity in tooling, > though. > > > >>>>>> Both release tooling and test tooling. They are not incompatible > > > >>>>>> with that approach, but they are a task we need to tackle during > > > >>>>>> this change which will add additional work. > > > >>>>>> > > > >>>>>> > > > >>>>>> > > > >>>>>> On Tue, Oct 26, 2021 at 10:31 AM Arvid Heise <ar...@apache.org> > > > >>> wrote: > > > >>>>>>> Hi folks, > > > >>>>>>> > > > >>>>>>> I think some questions came up and I'd like to address the > > > >>>>>>> question of > > > >>>>>> the > > > >>>>>>> timing. > > > >>>>>>> > > > >>>>>>> Could you clarify what release cadence you're thinking of? > > > >>>>>>> There's > > > >>>>> quite > > > >>>>>>>> a big range that fits "more frequent than Flink" (per-commit, > > > >>>>>>>> daily, weekly, bi-weekly, monthly, even bi-monthly). > > > >>>>>>> The short answer is: as often as needed: > > > >>>>>>> - If there is a CVE in a dependency and we need to bump it - > > > >>>>>>> release immediately. > > > >>>>>>> - If there is a new feature merged, release soonish. We may > > > >>>>>>> collect a > > > >>>>> few > > > >>>>>>> successive features before a release. > > > >>>>>>> - If there is a bugfix, release immediately or soonish > depending > > > >>>>>>> on > > > >>>>> the > > > >>>>>>> severity and if there are workarounds available. > > > >>>>>>> > > > >>>>>>> We should not limit ourselves; the whole idea of independent > > > >>>>>>> releases > > > >>>>> is > > > >>>>>>> exactly that you release as needed. There is no release > planning > > > >>>>>>> or anything needed, you just go with a release as if it was an > > > >>>>>>> external artifact. > > > >>>>>>> > > > >>>>>>> (1) is the connector API already stable? > > > >>>>>>>> From another discussion thread [1], connector API is far from > > > >>>>> stable. > > > >>>>>>>> Currently, it's hard to build connectors against multiple > Flink > > > >>>>>> versions. > > > >>>>>>>> There are breaking API changes both in 1.12 -> 1.13 and 1.13 > -> > > > >>>>>>>> 1.14 > > > >>>>>> and > > > >>>>>>>> maybe also in the future versions, because Table related > APIs > > > >>>>>>>> are > > > >>>>>> still > > > >>>>>>>> @PublicEvolving and new Sink API is still @Experimental. > > > >>>>>>>> > > > >>>>>>> The question is: what is stable in an evolving system? We > > > >>>>>>> recently discovered that the old SourceFunction needed to be > > > >>>>>>> refined such that cancellation works correctly [1]. So that > > > >>>>>>> interface is in Flink since > > > >>>>> 7 > > > >>>>>>> years, heavily used also outside, and we still had to change > the > > > >>>>> contract > > > >>>>>>> in a way that I'd expect any implementer to recheck their > > > >>>>> implementation. > > > >>>>>>> It might not be necessary to change anything and you can > probably > > > >>>>> change > > > >>>>>>> the the code for all Flink versions but still, the interface > was > > > >>>>>>> not > > > >>>>>> stable > > > >>>>>>> in the closest sense. > > > >>>>>>> > > > >>>>>>> If we focus just on API changes on the unified interfaces, then > > > >>>>>>> we > > > >>>>> expect > > > >>>>>>> one more change to Sink API to support compaction. For Table > API, > > > >>>>> there > > > >>>>>>> will most likely also be some changes in 1.15. So we could wait > > > >>>>>>> for > > > >>>>> 1.15. > > > >>>>>>> But I'm questioning if that's really necessary because we will > > > >>>>>>> add > > > >>>>> more > > > >>>>>>> functionality beyond 1.15 without breaking API. For example, we > > > >>>>>>> may > > > >>>>> add > > > >>>>>>> more unified connector metrics. If you want to use it in your > > > >>>>> connector, > > > >>>>>>> you have to support multiple Flink versions anyhow. So rather > > > >>>>>>> then > > > >>>>>> focusing > > > >>>>>>> the discussion on "when is stuff stable", I'd rather focus on > > > >>>>>>> "how > > > >>>>> can we > > > >>>>>>> support building connectors against multiple Flink versions" > and > > > >>>>>>> make > > > >>>>> it > > > >>>>>> as > > > >>>>>>> painless as possible. > > > >>>>>>> > > > >>>>>>> Chesnay pointed out to use different branches for different > Flink > > > >>>>>> versions > > > >>>>>>> which sounds like a good suggestion. With a mono-repo, we can't > > > >>>>>>> use branches differently anyways (there is no way to have > release > > > >>>>>>> branches > > > >>>>>> per > > > >>>>>>> connector without chaos). In these branches, we could provide > > > >>>>>>> shims to simulate future features in older Flink versions such > > > >>>>>>> that code-wise, > > > >>>>> the > > > >>>>>>> source code of a specific connector may not diverge (much). For > > > >>>>> example, > > > >>>>>> to > > > >>>>>>> register unified connector metrics, we could simulate the > current > > > >>>>>> approach > > > >>>>>>> also in some utility package of the mono-repo. > > > >>>>>>> > > > >>>>>>> I see the stable core Flink API as a prerequisite for > modularity. > > > >>>>>>> And > > > >>>>>>>> for connectors it is not just the source and sink API (source > > > >>>>>>>> being stable as of 1.14), but everything that is required to > > > >>>>>>>> build and maintain a connector downstream, such as the test > > > >>>>>>>> utilities and infrastructure. > > > >>>>>>>> > > > >>>>>>> That is a very fair point. I'm actually surprised to see that > > > >>>>>>> MiniClusterWithClientResource is not public. I see it being > used > > > >>>>>>> in > > > >>>>> all > > > >>>>>>> connectors, especially outside of Flink. I fear that as long as > > > >>>>>>> we do > > > >>>>> not > > > >>>>>>> have connectors outside, we will not properly annotate and > > > >>>>>>> maintain > > > >>>>> these > > > >>>>>>> utilties in a classic hen-and-egg-problem. I will outline an > idea > > > >>>>>>> at > > > >>>>> the > > > >>>>>>> end. > > > >>>>>>> > > > >>>>>>>> the connectors need to be adopted and require at least one > > > >>>>>>>> release > > > >>>>> per > > > >>>>>>>> Flink minor release. > > > >>>>>>>> However, this will make the releases of connectors slower, > e.g. > > > >>>>>> maintain > > > >>>>>>>> features for multiple branches and release multiple branches. > > > >>>>>>>> I think the main purpose of having an external connector > > > >>>>>>>> repository > > > >>>>> is > > > >>>>>> in > > > >>>>>>>> order to have "faster releases of connectors"? > > > >>>>>>>> > > > >>>>>>>> Imagine a project with a complex set of dependencies. Let's > say > > > >>>>> Flink > > > >>>>>>>> version A plus Flink reliant dependencies released by other > > > >>>>>>>> projects (Flink-external connectors, Beam, Iceberg, Hudi, ..). > > > >>>>>>>> We don't want > > > >>>>> a > > > >>>>>>>> situation where we bump the core Flink version to B and things > > > >>>>>>>> fall apart (interface changes, utilities that were useful but > > > >>>>>>>> not public, transitive dependencies etc.). > > > >>>>>>>> > > > >>>>>>> Yes, that's why I wanted to automate the processes more which > is > > > >>>>>>> not > > > >>>>> that > > > >>>>>>> easy under ASF. Maybe we automate the source provision across > > > >>>>> supported > > > >>>>>>> versions and have 1 vote thread for all versions of a > connector? > > > >>>>>>> > > > >>>>>>> From the perspective of CDC connector maintainers, the biggest > > > >>>>> advantage > > > >>>>>> of > > > >>>>>>>> maintaining it outside of the Flink project is that: > > > >>>>>>>> 1) we can have a more flexible and faster release cycle > > > >>>>>>>> 2) we can be more liberal with committership for connector > > > >>>>> maintainers > > > >>>>>>>> which can also attract more committers to help the release. > > > >>>>>>>> > > > >>>>>>>> Personally, I think maintaining one connector repository under > > > >>>>>>>> the > > > >>>>> ASF > > > >>>>>>> may > > > >>>>>>>> not have the above benefits. > > > >>>>>>>> > > > >>>>>>> Yes, I also feel that ASF is too restrictive for our needs. But > > > >>>>>>> it > > > >>>>> feels > > > >>>>>>> like there are too many that see it differently and I think we > > > >>>>>>> need > > > >>>>>>> > > > >>>>>>> (2) Flink testability without connectors. > > > >>>>>>>> This is a very good question. How can we guarantee the new > > > >>>>>>>> Source > > > >>>>> and > > > >>>>>>> Sink > > > >>>>>>>> API are stable with only test implementation? > > > >>>>>>>> > > > >>>>>>> We can't and shouldn't. Since the connector repo is managed by > > > >>>>>>> Flink, > > > >>>>> a > > > >>>>>>> Flink release manager needs to check if the Flink connectors > are > > > >>>>> actually > > > >>>>>>> working prior to creating an RC. That's similar to how > > > >>>>>>> flink-shaded > > > >>>>> and > > > >>>>>>> flink core are related. > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> So here is one idea that I had to get things rolling. We are > > > >>>>>>> going to address the external repo iteratively without > > > >>>>>>> compromising what we > > > >>>>>> already > > > >>>>>>> have: > > > >>>>>>> 1.Phase, add new contributions to external repo. We use that > time > > > >>>>>>> to > > > >>>>>> setup > > > >>>>>>> infra accordingly and optimize release processes. We will > > > >>>>>>> identify > > > >>>>> test > > > >>>>>>> utilities that are not yet public/stable and fix that. > > > >>>>>>> 2.Phase, add ports to the new unified interfaces of existing > > > >>>>> connectors. > > > >>>>>>> That requires a previous Flink release to make utilities > stable. > > > >>>>>>> Keep > > > >>>>> old > > > >>>>>>> interfaces in flink-core. > > > >>>>>>> 3.Phase, remove old interfaces in flink-core of some connectors > > > >>>>>>> (tbd > > > >>>>> at a > > > >>>>>>> later point). > > > >>>>>>> 4.Phase, optionally move all remaining connectors (tbd at a > later > > > >>>>> point). > > > >>>>>>> I'd envision having ~3 months between the starting the > different > > > >>>>> phases. > > > >>>>>>> WDYT? > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> [1] > > > >>>>>>> > > https://urldefense.com/v3/__https://issues.apache.org/jira/browse > > > >>>>>>> > /FLINK-23527__;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgd > > > >>>>>>> ke_-XjpYgX2sIvAP4$ [issues[.]apache[.]org] > > > >>>>>>> > > > >>>>>>> On Thu, Oct 21, 2021 at 7:12 AM Kyle Bendickson < > k...@tabular.io > > > > > > >>>>> wrote: > > > >>>>>>>> Hi all, > > > >>>>>>>> > > > >>>>>>>> My name is Kyle and I’m an open source developer primarily > > > >>>>>>>> focused > > > >>>>> on > > > >>>>>>>> Apache Iceberg. > > > >>>>>>>> > > > >>>>>>>> I’m happy to help clarify or elaborate on any aspect of our > > > >>>>> experience > > > >>>>>>>> working on a relatively decoupled connector that is downstream > > > >>>>>>>> and > > > >>>>>> pretty > > > >>>>>>>> popular. > > > >>>>>>>> > > > >>>>>>>> I’d also love to be able to contribute or assist in any way I > > > >> can. > > > >>>>>>>> I don’t mean to thread jack, but are there any meetings or > > > >>>>>>>> community > > > >>>>>> sync > > > >>>>>>>> ups, specifically around the connector APIs, that I might join > > > >>>>>>>> / be > > > >>>>>>> invited > > > >>>>>>>> to? > > > >>>>>>>> > > > >>>>>>>> I did want to add that even though I’ve experienced some of > the > > > >>>>>>>> pain > > > >>>>>>> points > > > >>>>>>>> of integrating with an evolving system / API (catalog support > > > >>>>>>>> is > > > >>>>>>> generally > > > >>>>>>>> speaking pretty new everywhere really in this space), I also > > > >>>>>>>> agree personally that you shouldn’t slow down development > > > >>>>>>>> velocity too > > > >>>>> much > > > >>>>>> for > > > >>>>>>>> the sake of external connector. Getting to a performant and > > > >>>>>>>> stable > > > >>>>>> place > > > >>>>>>>> should be the primary goal, and slowing that down to support > > > >>>>> stragglers > > > >>>>>>>> will (in my personal opinion) always be a losing game. Some > > > >>>>>>>> folks > > > >>>>> will > > > >>>>>>>> simply stay behind on versions regardless until they have to > > > >>>>> upgrade. > > > >>>>>>>> I am working on ensuring that the Iceberg community stays > > > >>>>>>>> within 1-2 versions of Flink, so that we can help provide more > > > >>>>>>>> feedback or > > > >>>>>>> contribute > > > >>>>>>>> things that might make our ability to support multiple Flink > > > >>>>> runtimes / > > > >>>>>>>> versions with one project / codebase and minimal to no > > > >>>>>>>> reflection > > > >>>>> (our > > > >>>>>>>> desired goal). > > > >>>>>>>> > > > >>>>>>>> If there’s anything I can do or any way I can be of > assistance, > > > >>>>> please > > > >>>>>>>> don’t hesitate to reach out. Or find me on ASF slack 😀 > > > >>>>>>>> > > > >>>>>>>> I greatly appreciate your general concern for the needs of > > > >>>>> downstream > > > >>>>>>>> connector integrators! > > > >>>>>>>> > > > >>>>>>>> Cheers > > > >>>>>>>> Kyle Bendickson (GitHub: kbendick) Open Source Developer kyle > > > >>>>>>>> [at] tabular [dot] io > > > >>>>>>>> > > > >>>>>>>> On Wed, Oct 20, 2021 at 11:35 AM Thomas Weise <t...@apache.org > > > > > >>>>> wrote: > > > >>>>>>>>> Hi, > > > >>>>>>>>> > > > >>>>>>>>> I see the stable core Flink API as a prerequisite for > > > >>> modularity. > > > >>>>> And > > > >>>>>>>>> for connectors it is not just the source and sink API (source > > > >>>>> being > > > >>>>>>>>> stable as of 1.14), but everything that is required to build > > > >>>>>>>>> and maintain a connector downstream, such as the test > > > >>>>>>>>> utilities and infrastructure. > > > >>>>>>>>> > > > >>>>>>>>> Without the stable surface of core Flink, changes will leak > > > >>>>>>>>> into downstream dependencies and force lock step updates. > > > >>>>>>>>> Refactoring across N repos is more painful than a single > > > >>>>>>>>> repo. Those with experience developing downstream of Flink > > > >>>>>>>>> will know the pain, and > > > >>>>>> that > > > >>>>>>>>> isn't limited to connectors. I don't remember a Flink "minor > > > >>>>> version" > > > >>>>>>>>> update that was just a dependency version change and did not > > > >>>>>>>>> force other downstream changes. > > > >>>>>>>>> > > > >>>>>>>>> Imagine a project with a complex set of dependencies. Let's > > > >>>>>>>>> say > > > >>>>> Flink > > > >>>>>>>>> version A plus Flink reliant dependencies released by other > > > >>>>> projects > > > >>>>>>>>> (Flink-external connectors, Beam, Iceberg, Hudi, ..). We > > > >>>>>>>>> don't > > > >>>>> want a > > > >>>>>>>>> situation where we bump the core Flink version to B and > > > >>>>>>>>> things > > > >>>>> fall > > > >>>>>>>>> apart (interface changes, utilities that were useful but not > > > >>>>> public, > > > >>>>>>>>> transitive dependencies etc.). > > > >>>>>>>>> > > > >>>>>>>>> The discussion here also highlights the benefits of keeping > > > >>>>> certain > > > >>>>>>>>> connectors outside Flink. Whether that is due to difference > > > >>>>>>>>> in developer community, maturity of the connectors, their > > > >>>>>>>>> specialized/limited usage etc. I would like to see that as a > > > >>>>>>>>> sign > > > >>>>> of > > > >>>>>> a > > > >>>>>>>>> growing ecosystem and most of the ideas that Arvid has put > > > >>>>>>>>> forward would benefit further growth of the connector > > > >> ecosystem. > > > >>>>>>>>> As for keeping connectors within Apache Flink: I prefer that > > > >>>>>>>>> as > > > >>>>> the > > > >>>>>>>>> path forward for "essential" connectors like FileSource, > > > >>>>> KafkaSource, > > > >>>>>>>>> ... And we can still achieve a more flexible and faster > > > >>>>>>>>> release > > > >>>>>> cycle. > > > >>>>>>>>> Thanks, > > > >>>>>>>>> Thomas > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>>> On Wed, Oct 20, 2021 at 3:32 AM Jark Wu <imj...@gmail.com> > > > >>> wrote: > > > >>>>>>>>>> Hi Konstantin, > > > >>>>>>>>>> > > > >>>>>>>>>>> the connectors need to be adopted and require at least > > > >>>>>>>>>>> one > > > >>>>>> release > > > >>>>>>>> per > > > >>>>>>>>>> Flink minor release. > > > >>>>>>>>>> However, this will make the releases of connectors slower, > > > >>> e.g. > > > >>>>>>>> maintain > > > >>>>>>>>>> features for multiple branches and release multiple > > > >> branches. > > > >>>>>>>>>> I think the main purpose of having an external connector > > > >>>>> repository > > > >>>>>>> is > > > >>>>>>>> in > > > >>>>>>>>>> order to have "faster releases of connectors"? > > > >>>>>>>>>> > > > >>>>>>>>>> > > > >>>>>>>>>> From the perspective of CDC connector maintainers, the > > > >>>>>>>>>> biggest > > > >>>>>>>> advantage > > > >>>>>>>>> of > > > >>>>>>>>>> maintaining it outside of the Flink project is that: > > > >>>>>>>>>> 1) we can have a more flexible and faster release cycle > > > >>>>>>>>>> 2) we can be more liberal with committership for connector > > > >>>>>>> maintainers > > > >>>>>>>>>> which can also attract more committers to help the release. > > > >>>>>>>>>> > > > >>>>>>>>>> Personally, I think maintaining one connector repository > > > >>>>>>>>>> under > > > >>>>> the > > > >>>>>>> ASF > > > >>>>>>>>> may > > > >>>>>>>>>> not have the above benefits. > > > >>>>>>>>>> > > > >>>>>>>>>> Best, > > > >>>>>>>>>> Jark > > > >>>>>>>>>> > > > >>>>>>>>>> On Wed, 20 Oct 2021 at 15:14, Konstantin Knauf < > > > >>>>> kna...@apache.org> > > > >>>>>>>>> wrote: > > > >>>>>>>>>>> Hi everyone, > > > >>>>>>>>>>> > > > >>>>>>>>>>> regarding the stability of the APIs. I think everyone > > > >>>>>>>>>>> agrees > > > >>>>> that > > > >>>>>>>>>>> connector APIs which are stable across minor versions > > > >>>>>> (1.13->1.14) > > > >>>>>>>> are > > > >>>>>>>>> the > > > >>>>>>>>>>> mid-term goal. But: > > > >>>>>>>>>>> > > > >>>>>>>>>>> a) These APIs are still quite young, and we shouldn't > > > >>>>>>>>>>> make > > > >>>>> them > > > >>>>>>>> @Public > > > >>>>>>>>>>> prematurely either. > > > >>>>>>>>>>> > > > >>>>>>>>>>> b) Isn't this *mostly* orthogonal to where the connector > > > >>>>>>>>>>> code > > > >>>>>>> lives? > > > >>>>>>>>> Yes, > > > >>>>>>>>>>> as long as there are breaking changes, the connectors > > > >>>>>>>>>>> need to > > > >>>>> be > > > >>>>>>>>> adopted > > > >>>>>>>>>>> and require at least one release per Flink minor release. > > > >>>>>>>>>>> Documentation-wise this can be addressed via a > > > >>>>>>>>>>> compatibility > > > >>>>>> matrix > > > >>>>>>>> for > > > >>>>>>>>>>> each connector as Arvid suggested. IMO we shouldn't block > > > >>>>>>>>>>> this > > > >>>>>>> effort > > > >>>>>>>>> on > > > >>>>>>>>>>> the stability of the APIs. > > > >>>>>>>>>>> > > > >>>>>>>>>>> Cheers, > > > >>>>>>>>>>> > > > >>>>>>>>>>> Konstantin > > > >>>>>>>>>>> > > > >>>>>>>>>>> > > > >>>>>>>>>>> > > > >>>>>>>>>>> On Wed, Oct 20, 2021 at 8:56 AM Jark Wu > > > >>>>>>>>>>> <imj...@gmail.com> > > > >>>>>> wrote: > > > >>>>>>>>>>>> Hi, > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> I think Thomas raised very good questions and would like > > > >>>>>>>>>>>> to > > > >>>>> know > > > >>>>>>>> your > > > >>>>>>>>>>>> opinions if we want to move connectors out of flink in > > > >>>>>>>>>>>> this > > > >>>>>>> version. > > > >>>>>>>>>>>> (1) is the connector API already stable? > > > >>>>>>>>>>>>> Separate releases would only make sense if the core > > > >>>>>>>>>>>>> Flink > > > >>>>>>> surface > > > >>>>>>>> is > > > >>>>>>>>>>>>> fairly stable though. As evident from Iceberg (and > > > >>>>>>>>>>>>> also > > > >>>>> Beam), > > > >>>>>>>>> that's > > > >>>>>>>>>>>>> not the case currently. We should probably focus on > > > >>>>> addressing > > > >>>>>>> the > > > >>>>>>>>>>>>> stability first, before splitting code. A success > > > >>>>>>>>>>>>> criteria > > > >>>>>> could > > > >>>>>>>> be > > > >>>>>>>>>>>>> that we are able to build Iceberg and Beam against > > > >>>>>>>>>>>>> multiple > > > >>>>>>> Flink > > > >>>>>>>>>>>>> versions w/o the need to change code. The goal would > > > >>>>>>>>>>>>> be > > > >>>>> that > > > >>>>>> no > > > >>>>>>>>>>>>> connector breaks when we make changes to Flink core. > > > >>>>>>>>>>>>> Until > > > >>>>>>> that's > > > >>>>>>>>> the > > > >>>>>>>>>>>>> case, code separation creates a setup where 1+1 or N+1 > > > >>>>>>>> repositories > > > >>>>>>>>>>>>> need to move lock step. > > > >>>>>>>>>>>> From another discussion thread [1], connector API is far > > > >>>>>>>>>>>> from > > > >>>>>>>> stable. > > > >>>>>>>>>>>> Currently, it's hard to build connectors against > > > >>>>>>>>>>>> multiple > > > >>>>> Flink > > > >>>>>>>>> versions. > > > >>>>>>>>>>>> There are breaking API changes both in 1.12 -> 1.13 and > > > >>>>>>>>>>>> 1.13 > > > >>>>> -> > > > >>>>>>> 1.14 > > > >>>>>>>>> and > > > >>>>>>>>>>>> maybe also in the future versions, because Table > > > >>>>>>>>>>>> related > > > >>>>> APIs > > > >>>>>>> are > > > >>>>>>>>> still > > > >>>>>>>>>>>> @PublicEvolving and new Sink API is still @Experimental. > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> (2) Flink testability without connectors. > > > >>>>>>>>>>>>> Flink w/o Kafka connector (and few others) isn't > > > >>>>>>>>>>>>> viable. Testability of Flink was already brought up, > > > >>>>>>>>>>>>> can we > > > >>>>>>> really > > > >>>>>>>>>>>>> certify a Flink core release without Kafka connector? > > > >>>>>>>>>>>>> Maybe > > > >>>>>>> those > > > >>>>>>>>>>>>> connectors that are used in Flink e2e tests to > > > >>>>>>>>>>>>> validate > > > >>>>>>>>> functionality > > > >>>>>>>>>>>>> of core Flink should not be broken out? > > > >>>>>>>>>>>> This is a very good question. How can we guarantee the > > > >>>>>>>>>>>> new > > > >>>>>> Source > > > >>>>>>>> and > > > >>>>>>>>> Sink > > > >>>>>>>>>>>> API are stable with only test implementation? > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> Best, > > > >>>>>>>>>>>> Jark > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> On Tue, 19 Oct 2021 at 23:56, Chesnay Schepler < > > > >>>>>>> ches...@apache.org> > > > >>>>>>>>>>>> wrote: > > > >>>>>>>>>>>> > > > >>>>>>>>>>>>> Could you clarify what release cadence you're thinking > > > >>> of? > > > >>>>>>> There's > > > >>>>>>>>> quite > > > >>>>>>>>>>>>> a big range that fits "more frequent than Flink" > > > >>>>> (per-commit, > > > >>>>>>>> daily, > > > >>>>>>>>>>>>> weekly, bi-weekly, monthly, even bi-monthly). > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> On 19/10/2021 14:15, Martijn Visser wrote: > > > >>>>>>>>>>>>>> Hi all, > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> I think it would be a huge benefit if we can achieve > > > >>>>>>>>>>>>>> more > > > >>>>>>>> frequent > > > >>>>>>>>>>>>> releases > > > >>>>>>>>>>>>>> of connectors, which are not bound to the release > > > >>>>>>>>>>>>>> cycle > > > >>>>> of > > > >>>>>>> Flink > > > >>>>>>>>>>>> itself. > > > >>>>>>>>>>>>> I > > > >>>>>>>>>>>>>> agree that in order to get there, we need to have > > > >>>>>>>>>>>>>> stable > > > >>>>>>>>> interfaces > > > >>>>>>>>>>>> which > > > >>>>>>>>>>>>>> are trustworthy and reliable, so they can be safely > > > >>>>>>>>>>>>>> used > > > >>>>> by > > > >>>>>>>> those > > > >>>>>>>>>>>>>> connectors. I do think that work still needs to be > > > >>>>>>>>>>>>>> done > > > >>>>> on > > > >>>>>>> those > > > >>>>>>>>>>>>>> interfaces, but I am confident that we can get there > > > >>>>> from a > > > >>>>>>>> Flink > > > >>>>>>>>>>>>>> perspective. > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> I am worried that we would not be able to achieve > > > >>>>>>>>>>>>>> those > > > >>>>>>> frequent > > > >>>>>>>>>>>> releases > > > >>>>>>>>>>>>>> of connectors if we are putting these connectors > > > >>>>>>>>>>>>>> under > > > >>>>> the > > > >>>>>>>> Apache > > > >>>>>>>>>>>>> umbrella, > > > >>>>>>>>>>>>>> because that means that for each connector release > > > >>>>>>>>>>>>>> we > > > >>>>> have > > > >>>>>> to > > > >>>>>>>>> follow > > > >>>>>>>>>>>> the > > > >>>>>>>>>>>>>> Apache release creation process. This requires a lot > > > >>>>>>>>>>>>>> of > > > >>>>>> manual > > > >>>>>>>>> steps > > > >>>>>>>>>>>> and > > > >>>>>>>>>>>>>> prohibits automation and I think it would be hard to > > > >>>>> scale > > > >>>>>> out > > > >>>>>>>>>>>> frequent > > > >>>>>>>>>>>>>> releases of connectors. I'm curious how others think > > > >>>>>>>>>>>>>> this > > > >>>>>>>>> challenge > > > >>>>>>>>>>>> could > > > >>>>>>>>>>>>>> be solved. > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> Best regards, > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> Martijn > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> On Mon, 18 Oct 2021 at 22:22, Thomas Weise < > > > >>>>> t...@apache.org> > > > >>>>>>>>> wrote: > > > >>>>>>>>>>>>>>> Thanks for initiating this discussion. > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> There are definitely a few things that are not > > > >>>>>>>>>>>>>>> optimal > > > >>>>> with > > > >>>>>>> our > > > >>>>>>>>>>>>>>> current management of connectors. I would not > > > >>>>> necessarily > > > >>>>>>>>>>>> characterize > > > >>>>>>>>>>>>>>> it as a "mess" though. As the points raised so far > > > >>>>> show, it > > > >>>>>>>> isn't > > > >>>>>>>>>>>> easy > > > >>>>>>>>>>>>>>> to find a solution that balances competing > > > >>>>>>>>>>>>>>> requirements > > > >>>>> and > > > >>>>>>>>> leads to > > > >>>>>>>>>>>> a > > > >>>>>>>>>>>>>>> net improvement. > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> It would be great if we can find a setup that > > > >>>>>>>>>>>>>>> allows for > > > >>>>>>>>> connectors > > > >>>>>>>>>>>> to > > > >>>>>>>>>>>>>>> be released independently of core Flink and that > > > >>>>>>>>>>>>>>> each > > > >>>>>>> connector > > > >>>>>>>>> can > > > >>>>>>>>>>>> be > > > >>>>>>>>>>>>>>> released separately. Flink already has separate > > > >>>>>>>>>>>>>>> releases (flink-shaded), so that by itself isn't a > > > >>> new thing. > > > >>>>>>>>> Per-connector > > > >>>>>>>>>>>>>>> releases would need to allow for more frequent > > > >>>>>>>>>>>>>>> releases > > > >>>>>>>> (without > > > >>>>>>>>> the > > > >>>>>>>>>>>>>>> baggage that a full Flink release comes with). > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> Separate releases would only make sense if the core > > > >>>>> Flink > > > >>>>>>>>> surface is > > > >>>>>>>>>>>>>>> fairly stable though. As evident from Iceberg (and > > > >>>>>>>>>>>>>>> also > > > >>>>>>> Beam), > > > >>>>>>>>> that's > > > >>>>>>>>>>>>>>> not the case currently. We should probably focus on > > > >>>>>>> addressing > > > >>>>>>>>> the > > > >>>>>>>>>>>>>>> stability first, before splitting code. A success > > > >>>>> criteria > > > >>>>>>>> could > > > >>>>>>>>> be > > > >>>>>>>>>>>>>>> that we are able to build Iceberg and Beam against > > > >>>>> multiple > > > >>>>>>>> Flink > > > >>>>>>>>>>>>>>> versions w/o the need to change code. The goal > > > >>>>>>>>>>>>>>> would be > > > >>>>>> that > > > >>>>>>> no > > > >>>>>>>>>>>>>>> connector breaks when we make changes to Flink core. > > > >>>>> Until > > > >>>>>>>>> that's the > > > >>>>>>>>>>>>>>> case, code separation creates a setup where 1+1 or > > > >>>>>>>>>>>>>>> N+1 > > > >>>>>>>>> repositories > > > >>>>>>>>>>>>>>> need to move lock step. > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> Regarding some connectors being more important for > > > >>>>>>>>>>>>>>> Flink > > > >>>>>> than > > > >>>>>>>>> others: > > > >>>>>>>>>>>>>>> That's a fact. Flink w/o Kafka connector (and few > > > >>>>> others) > > > >>>>>>> isn't > > > >>>>>>>>>>>>>>> viable. Testability of Flink was already brought > > > >>>>>>>>>>>>>>> up, > > > >>>>> can we > > > >>>>>>>>> really > > > >>>>>>>>>>>>>>> certify a Flink core release without Kafka > > > >> connector? > > > >>>>> Maybe > > > >>>>>>>> those > > > >>>>>>>>>>>>>>> connectors that are used in Flink e2e tests to > > > >>>>>>>>>>>>>>> validate > > > >>>>>>>>> functionality > > > >>>>>>>>>>>>>>> of core Flink should not be broken out? > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> Finally, I think that the connectors that move into > > > >>>>>> separate > > > >>>>>>>>> repos > > > >>>>>>>>>>>>>>> should remain part of the Apache Flink project. > > > >>>>>>>>>>>>>>> Larger > > > >>>>>>>>> organizations > > > >>>>>>>>>>>>>>> tend to approve the use of and contribution to open > > > >>>>> source > > > >>>>>> at > > > >>>>>>>> the > > > >>>>>>>>>>>>>>> project level. Sometimes it is everything ASF. More > > > >>>>> often > > > >>>>>> it > > > >>>>>>> is > > > >>>>>>>>>>>>>>> "Apache Foo". It would be fatal to end up with a > > > >>>>> patchwork > > > >>>>>> of > > > >>>>>>>>>>>> projects > > > >>>>>>>>>>>>>>> with potentially different licenses and governance > > > >>>>>>>>>>>>>>> to > > > >>>>>> arrive > > > >>>>>>>> at a > > > >>>>>>>>>>>>>>> working Flink setup. This may mean we prioritize > > > >>>>> usability > > > >>>>>>> over > > > >>>>>>>>>>>>>>> developer convenience, if that's in the best > > > >>>>>>>>>>>>>>> interest of > > > >>>>>>> Flink > > > >>>>>>>>> as a > > > >>>>>>>>>>>>>>> whole. > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> Thanks, > > > >>>>>>>>>>>>>>> Thomas > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> On Mon, Oct 18, 2021 at 6:59 AM Chesnay Schepler < > > > >>>>>>>>> ches...@apache.org > > > >>>>>>>>>>>>>>> wrote: > > > >>>>>>>>>>>>>>>> Generally, the issues are reproducibility and > > > >>> control. > > > >>>>>>>>>>>>>>>> Stuffs completely broken on the Flink side for a > > > >>> week? > > > >>>>>> Well > > > >>>>>>>>> then so > > > >>>>>>>>>>>> are > > > >>>>>>>>>>>>>>>> the connector repos. > > > >>>>>>>>>>>>>>>> (As-is) You can't go back to a previous version of > > > >>>>>>>>>>>>>>>> the > > > >>>>>>>> snapshot. > > > >>>>>>>>>>>> Which > > > >>>>>>>>>>>>>>>> also means that checking out older commits can be > > > >>>>>>> problematic > > > >>>>>>>>>>>> because > > > >>>>>>>>>>>>>>>> you'd still work against the latest snapshots, and > > > >>>>>>>>>>>>>>>> they > > > >>>>>> not > > > >>>>>>> be > > > >>>>>>>>>>>>>>>> compatible with each other. > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> On 18/10/2021 15:22, Arvid Heise wrote: > > > >>>>>>>>>>>>>>>>> I was actually betting on snapshots versions. > > > >>>>>>>>>>>>>>>>> What are > > > >>>>>> the > > > >>>>>>>>> limits? > > > >>>>>>>>>>>>>>>>> Obviously, we can only do a release of a 1.15 > > > >>>>> connector > > > >>>>>>> after > > > >>>>>>>>> 1.15 > > > >>>>>>>>>>>> is > > > >>>>>>>>>>>>>>>>> release. > > > >>>>>>>>>>>>> > > > >>>>>>>>>>> > > > >>>>>>>>>>> -- > > > >>>>>>>>>>> > > > >>>>>>>>>>> Konstantin Knauf > > > >>>>>>>>>>> > > > >>>>>>>>>>> https://urldefense.com/v3/__https://twitter.com/snntrable > > > >>>>>>>>>>> __;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_- > > > >>>>>>>>>>> XjpYgX5MUy9M4$ [twitter[.]com] > > > >>>>>>>>>>> > > > >>>>>>>>>>> https://urldefense.com/v3/__https://github.com/knaufk__;! > > > >>>>>>>>>>> !LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-XjpY > > > >>>>>>>>>>> gXyX8u50S$ [github[.]com] > > > >>>>>>>>>>> > > > > > > > > >