TBH, I suspect the way of “a single repository per connector”, considering there are hundreds of connectors out there (Airbyte[1], Kafka[2]). I don’t think the community is feasible to maintain hundreds of repositories. It makes sense to combine some connectors to reduce the maintenance burden. I can imagine we would have a flink-jdbc-connector repo in the future to support PG, MySQL, MS SqlServer, Oracle, etc., together.
Best, Jark [1]: https://airbyte.com/connectors [2]: https://www.confluent.io/product/connectors/ <https://www.confluent.io/product/connectors/> > 2022年10月25日 06:56,Thomas Weise <t...@apache.org> 写道: > > Hi Danny, > > I'm also leaning slightly towards the single AWS connector repo direction. > > Bumps in the underlying AWS SDK would bump all of the connectors in any > case. And if a change occurs that is isolated to a single connector, then > those that do not use that connector can just skip the release. > > Cheers, > Thomas > > > On Mon, Oct 24, 2022 at 3:01 PM Teoh, Hong <lian...@amazon.co.uk.invalid> > wrote: > >> I like the single repo with single version idea. >> >> Pros: >> - Better discoverability for connectors for AWS services means a better >> experience for Flink users >> - Natural placement of AWS-related utils (Credentials, SDK Retry strategy) >> >> Caveats: >> - As you mentioned, it is not desirable if we have to evolve the major >> version of the connector just for a change in a single connector (e.g. >> DynamoDB). However, I think it is reasonable to only evolve the major >> version of the AWS connector repo when there are Flink Source/Sink API >> upgrades or AWS SDK major upgrades (probably quire rare). Any new features >> for individual connectors can be collapsed into minor releases. >> - An additional callout here is that we should be careful adopting any AWS >> connectors that don't use the AWS SDK directly (e.g. how the Kinesis >> connector used KPL for a long time). In my opinion, any new connectors like >> that would be better placed in their own repositories, otherwise we will >> have a complex mesh of dependencies to manage. >> >> Regards, >> Hong >> >> >> >> >> On 21/10/2022, 16:59, "Danny Cranmer" <dannycran...@apache.org> wrote: >> >> CAUTION: This email originated from outside of the organization. Do >> not click links or open attachments unless you can confirm the sender and >> know the content is safe. >> >> >> >> Thanks Chesnay for the suggestion, I will investigate this option. >> >> Related to the single repo idea, I have considered it in the past. Are >> you >> proposing we also use a single version between all connectors? If we >> have a >> single version then it makes sense to combine them in a single repo, if >> they are separate versions, then splitting them makes sense. This was >> discussed last year more generally [1] and the consensus was "we >> ultimately >> propose to have a single repository per connector". >> >> Combining all AWS connectors into a single repo with a single version >> is >> inline with how the AWS SDK works, therefore AWS users are familiar >> with >> this approach. However it is frustrating that we would have to release >> all >> connectors to fix a bug or add a feature in one of them. Example: a >> user is >> using Kinesis Data Streams only (the most popular and mature >> connector), >> and we evolve the version from 1.x to 2.y (or 1.x to 1.y) for a >> DynamoDB >> change. >> >> I am torn and will think some more, but it would be great to hear other >> people's opinions. >> >> [1] https://lists.apache.org/thread/bywh947r2f5hfocxq598zhyh06zhksrm >> >> Thanks, >> Danny >> >> On Fri, Oct 21, 2022 at 3:11 PM Jing Ge <j...@ververica.com> wrote: >> >>> I agree with Jark. It would be easier for the further development and >>> maintenance, if all aws related connectors and the base module are >> in the >>> same repo. It might make sense to upgrade the >> flink-connector-dynamodb to >>> flink-connector-aws and move the other modules including the >>> flink-connector-aws-base into it. The aws sdk could be managed in >>> flink-connector-aws-base. Any future common connector features could >> also >>> be developed in the base module. >>> >>> Best regards, >>> Jing >>> >>> On Fri, Oct 21, 2022 at 1:26 PM Jark Wu <imj...@gmail.com> wrote: >>> >>>> How about creating a new repository flink-connector-aws and merging >>>> dynamodb, kinesis firehouse into it? >>>> This can reduce the maintenance for complex dependencies and make >> the >>>> release easy. >>>> I think the maintainers of aws-releated connectors are the same >> people. >>>> >>>> Best, >>>> Jark >>>> >>>>> 2022年10月21日 17:41,Chesnay Schepler <ches...@apache.org> 写道: >>>>> >>>>> I would not go with 2); I think it'd just be messy . >>>>> >>>>> Here's another option: >>>>> >>>>> Create another repository (aws-connector-base) (following the >>>> externalization model), add it as a sub-module to the downstream >>>> repositories, and make it part of the release process of said >> connector. >>>>> >>>>> I.e., we never create a release for aws-connector-bose, but >> release it >>>> as part of the connector. >>>>> This main benefit here is that we'd always be able to make >> changes to >>>> the aws-base code without delaying connector releases. >>>>> I would assume that any added overhead due to _technically_ >> releasing >>>> the aws code multiple times to be negligible. >>>>> >>>>> >>>>> On 20/10/2022 22:38, Danny Cranmer wrote: >>>>>> Hello all, >>>>>> >>>>>> Currently we have 2 AWS Flink connectors in the main Flink >> codebase >>>>>> (Kinesis Data Streams and Kinesis Data Firehose) and one new >>>> externalized >>>>>> connector in progress (DynamoDB). Currently all three of these >> use >>>> common >>>>>> AWS utilities from the flink-connector-aws-base module. Common >> code >>>>>> includes client builders, property keys, validation, utils etc. >>>>>> >>>>>> Once we externalize the connectors, leaving >> flink-connector-aws-base >>>> in the >>>>>> main Flink repository will restrict our ability to evolve the >>>> connectors >>>>>> quickly. For example, as part of the DynamoDB connector build we >> are >>>>>> considering adding a general retry strategy config that can be >>>> leveraged by >>>>>> all connectors. We would need to block on Flink 1.17 for this. >>>>>> >>>>>> In the past we have tried to keep the AWS SDK version consistent >> across >>>>>> connectors, with the externalization this is more likely to >> diverge. >>>>>> >>>>>> Option 1: I propose we create a new repository, >> flink-connector-aws, >>>> which >>>>>> we can move the flink-connector-aws-base module to and create a >> new >>>>>> flink-connector-aws-parent to manage SDK versions. Each of the >>>> externalized >>>>>> AWS connectors will depend on this new module and parent. >> Downside is >>>> an >>>>>> additional module to release per Flink version, however I will >>>> volunteer to >>>>>> manage this. >>>>>> >>>>>> Option 2: We can move the flink-connector-aws-base module and >> create >>>>>> flink-connector-parent within the flink-connector-shared-utils >> repo [2] >>>>>> >>>>>> Option 3: We do nothing. >>>>>> >>>>>> For option 1+2 we will follow the general externalized connector >>>> versioning >>>>>> strategy and rules. >>>>>> >>>>>> I am inclined towards option 1, and appreciate feedback from the >>>> community. >>>>>> >>>>>> [1] >>>>>> >>>> >> https://github.com/apache/flink/tree/master/flink-connectors/flink-connector-aws-base >>>>>> [2] https://github.com/apache/flink-connector-shared-utils >>>>>> >>>>>> Thanks, >>>>>> Danny >>>>>> >>>>> >>>> >>>> >> >>