Re: [DISCUSS] Create a dedicated aws-base connector repository

Jark Wu Mon, 24 Oct 2022 19:03:45 -0700

TBH, I suspect the way of “a single repository per connector”, considering 
there are hundreds of connectors out there (Airbyte[1],  Kafka[2]).
I don’t think the community is feasible to maintain hundreds of repositories. 
It makes sense to combine some connectors to reduce the maintenance burden. 
I can imagine we would have a flink-jdbc-connector repo in the future to 
support PG, MySQL, MS SqlServer, Oracle, etc., together.


Best,
Jark

[1]: https://airbyte.com/connectors
[2]: https://www.confluent.io/product/connectors/ 
<https://www.confluent.io/product/connectors/>

> 2022年10月25日 06:56，Thomas Weise <t...@apache.org> 写道：
> 
> Hi Danny,
> 
> I'm also leaning slightly towards the single AWS connector repo direction.
> 
> Bumps in the underlying AWS SDK would bump all of the connectors in any
> case. And if a change occurs that is isolated to a single connector, then
> those that do not use that connector can just skip the release.
> 
> Cheers,
> Thomas
> 
> 
> On Mon, Oct 24, 2022 at 3:01 PM Teoh, Hong <lian...@amazon.co.uk.invalid>
> wrote:
> 
>> I like the single repo with single version idea.
>> 
>> Pros:
>> - Better discoverability for connectors for AWS services means a better
>> experience for Flink users
>> - Natural placement of AWS-related utils (Credentials, SDK Retry strategy)
>> 
>> Caveats:
>> - As you mentioned, it is not desirable if we have to evolve the major
>> version of the connector just for a change in a single connector (e.g.
>> DynamoDB). However, I think it is reasonable to only evolve the major
>> version of the AWS connector repo when there are Flink Source/Sink API
>> upgrades or AWS SDK major upgrades (probably quire rare). Any new features
>> for individual connectors can be collapsed into minor releases.
>> - An additional callout here is that we should be careful adopting any AWS
>> connectors that don't use the AWS SDK directly (e.g. how the Kinesis
>> connector used KPL for a long time). In my opinion, any new connectors like
>> that would be better placed in their own repositories, otherwise we will
>> have a complex mesh of dependencies to manage.
>> 
>> Regards,
>> Hong
>> 
>> 
>> 
>> 
>> On 21/10/2022, 16:59, "Danny Cranmer" <dannycran...@apache.org> wrote:
>> 
>>    CAUTION: This email originated from outside of the organization. Do
>> not click links or open attachments unless you can confirm the sender and
>> know the content is safe.
>> 
>> 
>> 
>>    Thanks Chesnay for the suggestion, I will investigate this option.
>> 
>>    Related to the single repo idea, I have considered it in the past. Are
>> you
>>    proposing we also use a single version between all connectors? If we
>> have a
>>    single version then it makes sense to combine them in a single repo, if
>>    they are separate versions, then splitting them makes sense. This was
>>    discussed last year more generally [1] and the consensus was "we
>> ultimately
>>    propose to have a single repository per connector".
>> 
>>    Combining all AWS connectors into a single repo with a single version
>> is
>>    inline with how the AWS SDK works, therefore AWS users are familiar
>> with
>>    this approach. However it is frustrating that we would have to release
>> all
>>    connectors to fix a bug or add a feature in one of them. Example: a
>> user is
>>    using Kinesis Data Streams only (the most popular and mature
>> connector),
>>    and we evolve the version from 1.x to 2.y (or 1.x to 1.y) for a
>> DynamoDB
>>    change.
>> 
>>    I am torn and will think some more, but it would be great to hear other
>>    people's opinions.
>> 
>>    [1] https://lists.apache.org/thread/bywh947r2f5hfocxq598zhyh06zhksrm
>> 
>>    Thanks,
>>    Danny
>> 
>>    On Fri, Oct 21, 2022 at 3:11 PM Jing Ge <j...@ververica.com> wrote:
>> 
>>> I agree with Jark. It would be easier for the further development and
>>> maintenance, if all aws related connectors and the base module are
>> in the
>>> same repo. It might make sense to upgrade the
>> flink-connector-dynamodb to
>>> flink-connector-aws and move the other modules including the
>>> flink-connector-aws-base into it. The aws sdk could be managed in
>>> flink-connector-aws-base. Any future common connector features could
>> also
>>> be developed in the base module.
>>> 
>>> Best regards,
>>> Jing
>>> 
>>> On Fri, Oct 21, 2022 at 1:26 PM Jark Wu <imj...@gmail.com> wrote:
>>> 
>>>> How about creating a new repository flink-connector-aws and merging
>>>> dynamodb, kinesis firehouse into it?
>>>> This can reduce the maintenance for complex dependencies and make
>> the
>>>> release easy.
>>>> I think the maintainers of aws-releated connectors are the same
>> people.
>>>> 
>>>> Best,
>>>> Jark
>>>> 
>>>>> 2022年10月21日 17:41，Chesnay Schepler <ches...@apache.org> 写道：
>>>>> 
>>>>> I would not go with 2); I think it'd just be messy .
>>>>> 
>>>>> Here's another option:
>>>>> 
>>>>> Create another repository (aws-connector-base) (following the
>>>> externalization model), add it as a sub-module to the downstream
>>>> repositories, and make it part of the release process of said
>> connector.
>>>>> 
>>>>> I.e., we never create a release for aws-connector-bose, but
>> release it
>>>> as part of the connector.
>>>>> This main benefit here is that we'd always be able to make
>> changes to
>>>> the aws-base code without delaying connector releases.
>>>>> I would assume that any added overhead due to _technically_
>> releasing
>>>> the aws code multiple times to be negligible.
>>>>> 
>>>>> 
>>>>> On 20/10/2022 22:38, Danny Cranmer wrote:
>>>>>> Hello all,
>>>>>> 
>>>>>> Currently we have 2 AWS Flink connectors in the main Flink
>> codebase
>>>>>> (Kinesis Data Streams and Kinesis Data Firehose) and one new
>>>> externalized
>>>>>> connector in progress (DynamoDB). Currently all three of these
>> use
>>>> common
>>>>>> AWS utilities from the flink-connector-aws-base module. Common
>> code
>>>>>> includes client builders, property keys, validation, utils etc.
>>>>>> 
>>>>>> Once we externalize the connectors, leaving
>> flink-connector-aws-base
>>>> in the
>>>>>> main Flink repository will restrict our ability to evolve the
>>>> connectors
>>>>>> quickly. For example, as part of the DynamoDB connector build we
>> are
>>>>>> considering adding a general retry strategy config that can be
>>>> leveraged by
>>>>>> all connectors. We would need to block on Flink 1.17 for this.
>>>>>> 
>>>>>> In the past we have tried to keep the AWS SDK version consistent
>> across
>>>>>> connectors, with the externalization this is more likely to
>> diverge.
>>>>>> 
>>>>>> Option 1: I propose we create a new repository,
>> flink-connector-aws,
>>>> which
>>>>>> we can move the flink-connector-aws-base module to and create a
>> new
>>>>>> flink-connector-aws-parent to manage SDK versions. Each of the
>>>> externalized
>>>>>> AWS connectors will depend on this new module and parent.
>> Downside is
>>>> an
>>>>>> additional module to release per Flink version, however I will
>>>> volunteer to
>>>>>> manage this.
>>>>>> 
>>>>>> Option 2: We can move the flink-connector-aws-base module and
>> create
>>>>>> flink-connector-parent within the flink-connector-shared-utils
>> repo [2]
>>>>>> 
>>>>>> Option 3: We do nothing.
>>>>>> 
>>>>>> For option 1+2 we will follow the general externalized connector
>>>> versioning
>>>>>> strategy and rules.
>>>>>> 
>>>>>> I am inclined towards option 1, and appreciate feedback from the
>>>> community.
>>>>>> 
>>>>>> [1]
>>>>>> 
>>>> 
>> https://github.com/apache/flink/tree/master/flink-connectors/flink-connector-aws-base
>>>>>> [2] https://github.com/apache/flink-connector-shared-utils
>>>>>> 
>>>>>> Thanks,
>>>>>> Danny
>>>>>> 
>>>>> 
>>>> 
>>>> 
>> 
>>

Re: [DISCUSS] Create a dedicated aws-base connector repository

Reply via email to