I don't think we want to talk about the Flink community accepting the
Iceberg connector just yet. The goal of Abid's exploration is to see
what it would look like as an external connector. We'd need to decide
in the Iceberg community if that's something that we'd want to do long
term. If it were me, I'd probably say wait until the connector APIs
are stable and there is a best practice for releasing.

Ryan

On Mon, Oct 24, 2022 at 11:16 AM Martijn Visser
<martijnvis...@apache.org> wrote:
>
> Hi all,
>
> There are many valid points raised in this discussion thread, but I think we 
> should not mix up different topics. From my perspective, there's two things 
> ongoing:
>
> 1. This thread is about the Flink community accepting the Iceberg connector, 
> with various maintainers from Iceberg volunteering to help with the 
> maintenance of the connector itself.
> 2. Also included in this thread are discussions about the externalization of 
> connectors from Flink. There have been recent discussions on this [1] and 
> there is engineering activity happening on that topic and it is a big focus 
> point for the next couple weeks/months. With regards to seeing different 
> opinions, I actually don't see those on the mailing list because after the 
> discussions, voting is passing.
>
> Best regards,
>
> Martijn
>
> [1] 
> https://cwiki.apache.org/confluence/display/FLINK/Externalized+Connector+development
>
> On Fri, Oct 21, 2022 at 3:01 AM Jark Wu <imj...@gmail.com> wrote:
>>
>> Hi Abid and all,
>>
>> I added the Iceberg dev community for a wider discussion.
>>
>> I agree with Yuxia and have the same concern as Steven Wu.
>>
>> There were long discussions around the externalizing connector and many
>> different opinions.
>> If I remember correctly[1][2], at last, we would like to externalize
>> ElasticSearch as an example,
>> and see how it works and what we can standardize (e.g., docs, releases,
>> versions, CI).
>> When everything works well, we can externalize other connectors.
>>
>> However, from what I see, currently, the externalized ElasticSearch
>> connector
>> is still at an early stage without releasing any versions.
>> It looks like we still don't have a mature workflow.
>> It's also not clear to me how much maintenance increased.
>> Is this a scalable way to support dozens of connectors?
>> Does the community have so many resources/committers to merge PR?
>> How much impact on contributors' contribution when it's not in the main
>> repo?
>>
>> IMO, the Iceberg connector is a very important connector for the Flink
>> ecosystem.
>> It's a mature connector and many users like it! I hope it can have a better
>> future.
>> However, the externalizing workflow is still evolving and under
>> verification.
>> It might not be the best place for popular connectors at the current point
>> in time.
>>
>> For the reasons of moving the Iceberg connection that Abid mentioned,
>> 1) API stability to reduce multiple version maintenance.
>> 2) Flink experts to help maintain the connector.
>>
>> I think the moving doesn't help much for the API issues because it is still
>> in a separate repo.
>> On the contrary, the connector has to struggle with additional API issues
>> from the Iceberg project.
>> Besides, the connector may need to maintain 6 more versions (3x3 vs 3)
>> which is un-maintainable.
>> Actually, Flink API is becoming stable in recent versions. We have also
>> verified the latest Iceberg
>> connector on the upcoming 1.16 release, and it works well. Flink community
>> also proposed FLIPs[3][4]
>> for API stability guarantees. On the other side, I also don't like the
>> version matrix modules/branches.
>> We use a shim layer to support different versions of Hive for
>> flink-connector-hive with only 1 module
>> for different hive versions. We have similar practices in
>> flink-cdc-connectors[5] and end-to-end tests
>> to guarantee compatibility with different Flink versions[6]. The
>> maintenance is acceptable to us for so long.
>>
>> In a word, I think we have ways to solve API issues and Flink API is
>> becoming stable.
>> For the Flink experts, Yuxia is the component owner of
>> flink-connector-hive. He has plenty
>> of knowledge of cross-version compatibility. He is willing to join the
>> Iceberg community to
>> help improve the version problem and maintain the connector. What do you
>> think about it?
>>
>> Best,
>> Jark
>> Ververica (Alibaba)
>>
>> [1] https://lists.apache.org/thread/8k1xonqt7hn0xldbky1cxfx3fzh6sj7h
>> [2] https://lists.apache.org/thread/9mzxnl4948ddq07f980mmzoz0c9stnlb
>> [3]:
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-196%3A+Source+API+stability+guarantees
>> [4]:
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-197%3A+API+stability+graduation+process
>> [5] https://github.com/ververica/flink-cdc-connectors/
>> [6]
>> https://github.com/ververica/flink-cdc-connectors/blob/master/flink-cdc-e2e-tests/src/test/java/com/ververica/cdc/connectors/tests/utils/FlinkContainerTestEnvironment.java#L124
>>
>> On Thu, 20 Oct 2022 at 22:41, Jing Ge <j...@ververica.com> wrote:
>>
>> > I agree with Steven Wu that those points are applicable to every
>> > externalized connector. So those were actually concerns about externalizing
>> > connector development and there were already some discussions and consensus
>> > has already been made to do it.
>> >
>> > Speaking of the 3x3 concern, I think the concept[1] proposed by Chesnay and
>> > voted at [2] could help you.
>> >
>> > [1]
>> >
>> > https://cwiki.apache.org/confluence/display/FLINK/Externalized+Connector+development
>> > [2] https://lists.apache.org/thread/7qr8jc053y8xpygcwbhlqq4r7c7fj1p3
>> >
>> > Best regards,
>> > Jing
>> >
>> > On Thu, Oct 20, 2022 at 3:46 PM Steven Wu <stevenz...@gmail.com> wrote:
>> >
>> > > Yuxia, those are valid points. But they are applicable to every connector
>> > > (not just Iceberg).
>> > >
>> > > I also had a similar concern expressed in the discussion thread of
>> > > "Externalized connector release details&workflow". My main concern is the
>> > > multiplication factor of two upstream projects (Flink & storage/Iceberg).
>> > > if we limit both to two versions, it will be 2x2, which might still be
>> > ok.
>> > > but if we need to do 3x3, that will probably be too many to manage.
>> > >
>> > > On Thu, Oct 20, 2022 at 5:27 AM yuxia <luoyu...@alumni.sjtu.edu.cn>
>> > wrote:
>> > >
>> > > > Hi, abmo, Abid!
>> > > > Thanks you guys for diriving it.
>> > > >
>> > > > As Iceberg is more and more pupular and is an important
>> > > > upstream/downstream system to Flink, I believe Flink community has paid
>> > > > much attention to Icberg and hope to be closer to Icberg community. No
>> > > > mather it's moved to Flink unbrella or not, I believe Flink experts are
>> > > > glad to give feedbacks to Iceberg and take part in the development of
>> > > > Icberg Flink connector.
>> > > >
>> > > >
>> > > > Personaly, as a Flink contributor and main maintainer of Hive Flink
>> > > > connector, I'm really glad to take part in Iceberg community for the
>> > > > maintenance and future development of Icberg Flink connector. I think I
>> > > can
>> > > > provide some views from Flink side and bring some feedbacks from Icberg
>> > > > comminuty to Flink community.
>> > > >
>> > > > But I have some concerns for moving the connector from Icberg
>> > repository
>> > > > to a separate connector under Flink umbrella:
>> > > >
>> > > > 1: If Iceberg develops new features, for icberg flink connector, it
>> > have
>> > > > to wait the Iceberg to be released before starting the development and
>> > > > release for making use of the new features.  For users, they may need
>> > to
>> > > > wait a much longer time before enjoying the new features of Icberg by
>> > > using
>> > > > Flink.
>> > > >
>> > > > 2: If we move it to a sepreate repositoy, I'm afrad of it'll loss
>> > > > attention from both Flink and Iceberg sides which is definitely a harm
>> > to
>> > > > Flink and Icerberg community. What's more, whenever Flink and icberge
>> > > > release a version, we need to update the version in the sepreate
>> > > > repositoy, which I think may be easily forgotten and tedious.
>> > > >
>> > > > Feel sorry for raising a different voice in this dicussion, but I think
>> > > it
>> > > > deserves a further dicussion in dev mail list, at least it will help to
>> > > get
>> > > > Flink developer's attention to Iceberg.
>> > > >
>> > > > Best regards,
>> > > > Yuxia
>> > > >
>> > > > ----- 原始邮件 -----
>> > > > 发件人: "abmo work" <abmo.w...@icloud.com.INVALID>
>> > > > 收件人: "dev" <d...@flink.apache.org>
>> > > > 发送时间: 星期四, 2022年 10 月 20日 上午 6:33:40
>> > > > 主题: Re: [Discuss]- Donate Iceberg Flink Connector
>> > > >
>> > > > Hi Martijn,
>> > > >
>> > > > I created a FLIP for this, its FLIP 267: Iceberg Connector  <
>> > > >
>> > >
>> > https://cwiki.apache.org/confluence/display/FLINK/FLIP+267:+Iceberg+Connector
>> > > > >
>> > > > Please let me know if anything else is needed. My email on confluence
>> > is
>> > > > abmo.w...@icloud.com.
>> > > >
>> > > > As 1.0 was released today, from Iceberg perspective we need to figure
>> > out
>> > > > what versions of Flink we will support and the release timeline as to
>> > > when
>> > > > the connector will be built and release off of the new repo vs Iceberg.
>> > > >
>> > > > Thanks
>> > > > Abid
>> > > >
>> > > > > On Oct 19, 2022, at 12:43 PM, Martijn Visser <
>> > martijnvis...@apache.org
>> > > >
>> > > > wrote:
>> > > > >
>> > > > > Hi Abid,
>> > > > >
>> > > > > We should have a FLIP as this would be a code contribution. If you
>> > > > provide
>> > > > > your Confluence user name, we can grant you access to create one.
>> > > > >
>> > > > > Is there also something from an Iceberg point of view needed to agree
>> > > > with
>> > > > > the code contribution?
>> > > > >
>> > > > > Best regards,
>> > > > >
>> > > > > Martijn
>> > > > >
>> > > > > Op wo 19 okt. 2022 om 19:11 schreef <abmo.w...@icloud.com.invalid>
>> > > > >
>> > > > >> Thanks Martijn!
>> > > > >>
>> > > > >> Thanks for all the support and positive responses. I will start a
>> > vote
>> > > > >> thread and send it out to the dev list.
>> > > > >>
>> > > > >> Also, we need help with creation of a new repo for the Iceberg
>> > > > Connector.
>> > > > >>
>> > > > >> Can someone help with the creation of a repo? Please let me know if
>> > I
>> > > > need
>> > > > >> to create an issue or flip for that.
>> > > > >> Following similar naming for other connectors, I propose
>> > > > >> https://github.com/apache/flink-connector-iceberg (doesn’t exist)
>> > > > >>
>> > > > >> Thanks
>> > > > >> Abid
>> > > > >>
>> > > > >> On 2022/10/19 08:41:02 Martijn Visser wrote:
>> > > > >>> Hi all,
>> > > > >>>
>> > > > >>> Thanks for the info and also thanks Peter and Steven for offering
>> > to
>> > > > >>> volunteer. I think that's a great idea and a necessity.
>> > > > >>>
>> > > > >>> Overall +1 given the current ideas to make this contribution
>> > happen.
>> > > > >>>
>> > > > >>> BTW congrats on reaching Iceberg 1.0, a great accomplishment :)
>> > > > >>>
>> > > > >>> Thanks,
>> > > > >>>
>> > > > >>> Martijn
>> > > > >>>
>> > > > >>> On Tue, Oct 18, 2022 at 12:31 AM Steven Wu <st...@gmail.com>
>> > wrote:
>> > > > >>>
>> > > > >>>> I was one of the maintainers for the Flink Iceberg connector in
>> > > > Iceberg
>> > > > >>>> repo. I can volunteer as one of the initial maintainers if we
>> > decide
>> > > > to
>> > > > >>>> move forward.
>> > > > >>>>
>> > > > >>>> On Mon, Oct 17, 2022 at 3:26 PM <ab...@icloud.com.invalid> wrote:
>> > > > >>>>
>> > > > >>>>> Hi Martijn,
>> > > > >>>>>
>> > > > >>>>> Yes, It is considered a connector in Flink terms.
>> > > > >>>>>
>> > > > >>>>> We wanted to join the Flink connector externalization effort so
>> > > that
>> > > > >> we
>> > > > >>>>> can bring the Iceberg connector closer to the Flink community. We
>> > > are
>> > > > >>>>> hoping any issues with the APIs for Iceberg connector will
>> > surface
>> > > > >> sooner
>> > > > >>>>> and get more attention from the Flink community when the
>> > connector
>> > > is
>> > > > >>>>> within Flink umbrella rather than in Iceberg repo. Also to get
>> > > better
>> > > > >>>>> feedback from Flink experts when it comes to things related to
>> > > adding
>> > > > >>>>> things in a connector vs Flink itself.
>> > > > >>>>>
>> > > > >>>>> Thanks everyone for all your responses! Looking forward to the
>> > next
>> > > > >>>> steps.
>> > > > >>>>>
>> > > > >>>>> Thanks
>> > > > >>>>> Abid
>> > > > >>>>>
>> > > > >>>>> On 2022/10/14 03:37:09 Jark Wu wrote:
>> > > > >>>>>> Thank Abid for the discussion,
>> > > > >>>>>>
>> > > > >>>>>> I'm also fine with maintaining it under the Flink project.
>> > > > >>>>>> But I'm also interested in the response to Martijn's question.
>> > > > >>>>>>
>> > > > >>>>>> Besides, once the code is moved to the Flink project, are there
>> > > any
>> > > > >>>>> initial
>> > > > >>>>>> maintainers for the connector we can find?
>> > > > >>>>>> In addition, do we still maintain documentation under Iceberg
>> > > > >>>>>> https://iceberg.apache.org/docs/latest/flink/ ?
>> > > > >>>>>>
>> > > > >>>>>> Best,
>> > > > >>>>>> Jark
>> > > > >>>>>>
>> > > > >>>>>>
>> > > > >>>>>> On Thu, 13 Oct 2022 at 17:52, yuxia <lu...@alumni.sjtu.edu.cn>
>> > > > >> wrote:
>> > > > >>>>>>
>> > > > >>>>>>> +1. Thanks for driving it. Hope I can find some chances to take
>> > > > >> part
>> > > > >>>> in
>> > > > >>>>>>> the future development of Iceberg Flink Connector.
>> > > > >>>>>>>
>> > > > >>>>>>> Best regards,
>> > > > >>>>>>> Yuxia
>> > > > >>>>>>>
>> > > > >>>>>>> ----- 原始邮件 -----
>> > > > >>>>>>> 发件人: "Zheng Yu Chen" <ja...@gmail.com>
>> > > > >>>>>>> 收件人: "dev" <de...@flink.apache.org>
>> > > > >>>>>>> 发送时间: 星期四, 2022年 10 月 13日 上午 11:26:29
>> > > > >>>>>>> 主题: Re: [Discuss]- Donate Iceberg Flink Connector
>> > > > >>>>>>>
>> > > > >>>>>>> +1, thanks to drive it
>> > > > >>>>>>>
>> > > > >>>>>>> Abid Mohammed <ab...@icloud.com.invalid> 于2022年10月10日周一
>> > 09:22写道:
>> > > > >>>>>>>
>> > > > >>>>>>>> Hi,
>> > > > >>>>>>>>
>> > > > >>>>>>>> I would like to start a discussion about contributing Iceberg
>> > > > >> Flink
>> > > > >>>>>>>> Connector to Flink.
>> > > > >>>>>>>>
>> > > > >>>>>>>> I created a doc <
>> > > > >>>>>>>>
>> > > > >>>>>>>
>> > > > >>>>>
>> > > > >>>>
>> > > > >>
>> > > >
>> > >
>> > https://docs.google.com/document/d/1WC8xkPiVdwtsKL2VSPAUgzm9EjrPs8ZRjEtcwv93ISI/edit?usp=sharing
>> > > > >>>>>>>>
>> > > > >>>>>>>> with all the details following the Flink Connector template as
>> > > > >> I
>> > > > >>>>> don’t
>> > > > >>>>>>> have
>> > > > >>>>>>>> permissions to create a FLIP yet.
>> > > > >>>>>>>> High level details are captured below:
>> > > > >>>>>>>>
>> > > > >>>>>>>> Motivation:
>> > > > >>>>>>>>
>> > > > >>>>>>>> This FLIP aims to contribute the existing Apache Iceberg Flink
>> > > > >>>>> Connector
>> > > > >>>>>>>> to Flink.
>> > > > >>>>>>>>
>> > > > >>>>>>>> Apache Iceberg is an open table format for huge analytic
>> > > > >> datasets.
>> > > > >>>>>>> Iceberg
>> > > > >>>>>>>> adds tables to compute engines including Spark, Trino,
>> > > > >> PrestoDB,
>> > > > >>>>> Flink,
>> > > > >>>>>>>> Hive and Impala using a high-performance table format that
>> > > > >> works
>> > > > >>>> just
>> > > > >>>>>>> like
>> > > > >>>>>>>> a SQL table.
>> > > > >>>>>>>> Iceberg avoids unpleasant surprises. Schema evolution works
>> > and
>> > > > >>>> won’t
>> > > > >>>>>>>> inadvertently un-delete data. Users don’t need to know about
>> > > > >>>>> partitioning
>> > > > >>>>>>>> to get fast queries. Iceberg was designed to solve correctness
>> > > > >>>>> problems
>> > > > >>>>>>> in
>> > > > >>>>>>>> eventually-consistent cloud object stores.
>> > > > >>>>>>>>
>> > > > >>>>>>>> Iceberg supports both Flink’s DataStream API and Table API.
>> > > > >> Based
>> > > > >>>> on
>> > > > >>>>> the
>> > > > >>>>>>>> guideline of the Flink community, only the latest 2 minor
>> > > > >> versions
>> > > > >>>>> are
>> > > > >>>>>>>> actively maintained. See the Multi-Engine Support#apache-flink
>> > > > >> for
>> > > > >>>>>>> further
>> > > > >>>>>>>> details.
>> > > > >>>>>>>>
>> > > > >>>>>>>>
>> > > > >>>>>>>> Iceberg connector supports:
>> > > > >>>>>>>>
>> > > > >>>>>>>>        • Source: detailed Source design <
>> > > > >>>>>>>>
>> > > > >>>>>>>
>> > > > >>>>>
>> > > > >>>>
>> > > > >>
>> > > >
>> > >
>> > https://docs.google.com/document/d/1q6xaBxUPFwYsW9aXWxYUh7die6O7rDeAPFQcTAMQ0GM/edit#
>> > > > >>>>>>>> ,
>> > > > >>>>>>>> based on FLIP-27
>> > > > >>>>>>>>        • Sink: detailed Sink design and interfaces used <
>> > > > >>>>>>>>
>> > > > >>>>>>>
>> > > > >>>>>
>> > > > >>>>
>> > > > >>
>> > > >
>> > >
>> > https://docs.google.com/document/d/1O-dPaFct59wUWQECXEEYIkl9_MOoG3zTbC2V-fZRwrg/edit#
>> > > > >>>>>>>>>
>> > > > >>>>>>>>        • Usable in both DataStream and Table API/SQL
>> > > > >>>>>>>>        • DataStream read/append/overwrite
>> > > > >>>>>>>>        • SQL create/alter/drop table, select, insert into,
>> > > > >> insert
>> > > > >>>>>>>> overwrite
>> > > > >>>>>>>>        • Streaming or batch read in Java API
>> > > > >>>>>>>>        • Support for Flink’s Python API
>> > > > >>>>>>>>
>> > > > >>>>>>>> See Iceberg Flink  <
>> > > > >>>>> https://iceberg.apache.org/docs/latest/flink/#flink
>> > > > >>>>>>>> for
>> > > > >>>>>>>> detailed usage instructions.
>> > > > >>>>>>>>
>> > > > >>>>>>>> Looking forward to the discussion!
>> > > > >>>>>>>>
>> > > > >>>>>>>> Thanks
>> > > > >>>>>>>> Abid
>> > > > >>>>>>>
>> > > > >>>>>>
>> > > > >>>>
>> > > > >>>
>> > > > >
>> > > > > --
>> > > > > Martijn
>> > > > > https://twitter.com/MartijnVisser82
>> > > > > https://github.com/MartijnVisser
>> > > >
>> > >
>> >



-- 
Ryan Blue
Tabular

Reply via email to