Hello, Thanks for the proposal, +1 for the FLIP. Best Regards Ahmed Hamdy
On Mon, 11 Mar 2024 at 15:12, wudi <676366...@qq.com.invalid> wrote: > Hi, Leonard > Thank you for your suggestion. > I referred to other Connectors[1], modified the naming and types of > relevant parameters[2], and also updated FLIP. > > [1] > https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/connectors/table/overview/ > [1] > https://github.com/apache/doris-flink-connector/blob/master/flink-doris-connector/src/main/java/org/apache/doris/flink/table/DorisConfigOptions.java > > Brs, > di.wu > > > 2024年3月7日 14:33,Leonard Xu <xbjt...@gmail.com> 写道: > > > > Thanks wudi for the updating, the FLIP generally looks good to me, I > only left two minor suggestions: > > > > (1) The suffix `.s` in configoption doris.request.query.timeout.s looks > strange to me, could we change all time interval related option value type > to Duration ? > > > > (2) Could you check and improve all config options like > `doris.exec.mem.limit` to make them to follow flink config option naming > and value type? > > > > Best, > > Leonard > > > > > >> > >> > >>> 2024年3月6日 06:12,Jing Ge <j...@ververica.com.INVALID> 写道: > >>> > >>> Hi Di, > >>> > >>> Thanks for your proposal. +1 for the contribution. I'd like to know > your > >>> thoughts about the following questions: > >>> > >>> 1. According to your clarification of the exactly-once, thanks for it > BTW, > >>> no PreCommitTopology is required. Does it make sense to let > DorisSink[1] > >>> implement SupportsCommitter, since the TwoPhaseCommittingSink is > >>> deprecated[2] before turning the Doris connector into a Flink > connector? > >>> 2. OLAP engines are commonly used as the tail/downstream of a data > pipeline > >>> to support further e.g. ad-hoc query or cube with feasible > pre-aggregation. > >>> Just out of curiosity, would you like to share some real use cases that > >>> will use OLAP engines as the source of a streaming data pipeline? Or it > >>> will only be used as the source for the batch? > >>> 3. The E2E test only covered sink[3], if I am not mistaken. Would you > like > >>> to test the source in E2E too? > >>> > >>> [1] > >>> > https://github.com/apache/doris-flink-connector/blob/43e0e5cf9b832854ea228fb093077872e3a311b6/flink-doris-connector/src/main/java/org/apache/doris/flink/sink/DorisSink.java#L55 > >>> [2] > >>> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-372%3A+Enhance+and+synchronize+Sink+API+to+match+the+Source+API > >>> [3] > >>> > https://github.com/apache/doris-flink-connector/blob/43e0e5cf9b832854ea228fb093077872e3a311b6/flink-doris-connector/src/test/java/org/apache/doris/flink/tools/cdc/MySQLDorisE2ECase.java#L96 > >>> > >>> Best regards, > >>> Jing > >>> > >>> On Tue, Mar 5, 2024 at 11:18 AM wudi <676366...@qq.com.invalid> wrote: > >>> > >>>> Hi, Jeyhun Karimov. > >>>> Thanks for your question. > >>>> > >>>> - How to ensure Exactly-Once? > >>>> 1. When the Checkpoint Barrier arrives, DorisSink will trigger the > >>>> precommit api of StreamLoad to complete the persistence of data in > Doris > >>>> (the data will not be visible at this time), and will also pass this > TxnID > >>>> to the Committer. > >>>> 2. When this Checkpoint of the entire Job is completed, the Committer > will > >>>> call the commit api of StreamLoad and commit TxnID to complete the > >>>> visibility of the transaction. > >>>> 3. When the task is restarted, the Txn with successful precommit and > >>>> failed commit will be aborted based on the label-prefix, and Doris' > abort > >>>> API will be called. (At the same time, Doris will also abort > transactions > >>>> that have not been committed for a long time) > >>>> > >>>> ps: At the same time, this part of the content has been updated in > FLIP > >>>> > >>>> - Because the default table model in Doris is Duplicate ( > >>>> https://doris.apache.org/docs/data-table/data-model/), which does not > >>>> have a primary key, batch writing may cause data duplication, but > UNIQ The > >>>> model has a primary key, which ensures the idempotence of writing, > thus > >>>> achieving Exactly-Once > >>>> > >>>> Brs, > >>>> di.wu > >>>> > >>>> > >>>>> 2024年3月2日 17:50,Jeyhun Karimov <je.kari...@gmail.com> 写道: > >>>>> > >>>>> Hi, > >>>>> > >>>>> Thanks for the proposal. +1 for the FLIP. > >>>>> I have a few questions: > >>>>> > >>>>> - How exactly the two (Stream Load's two-phase commit and Flink's > >>>> two-phase > >>>>> commit) combination will ensure the e2e exactly-once semantics? > >>>>> > >>>>> - The FLIP proposes to combine Doris's batch writing with the > primary key > >>>>> table to achieve Exactly-Once semantics. Could you elaborate more on > >>>> that? > >>>>> Why it is not the default behavior but a workaround? > >>>>> > >>>>> Regards, > >>>>> Jeyhun > >>>>> > >>>>> On Sat, Mar 2, 2024 at 10:14 AM Yanquan Lv <decq12y...@gmail.com> > wrote: > >>>>> > >>>>>> Thanks for driving this. > >>>>>> The content is very detailed, it is recommended to add a section on > Test > >>>>>> Plan for more completeness. > >>>>>> > >>>>>> Di Wu <d...@apache.org> 于2024年1月25日周四 15:40写道: > >>>>>> > >>>>>>> Hi all, > >>>>>>> > >>>>>>> Previously, we had some discussions about contributing Flink Doris > >>>>>>> Connector to the Flink community [1]. I want to further promote > this > >>>>>> work. > >>>>>>> I hope everyone will help participate in this FLIP discussion and > >>>> provide > >>>>>>> more valuable opinions and suggestions. > >>>>>>> Thanks. > >>>>>>> > >>>>>>> [1] > https://lists.apache.org/thread/lvh8g9o6qj8bt3oh60q81z0o1cv3nn8p > >>>>>>> > >>>>>>> Brs, > >>>>>>> di.wu > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> On 2023/12/07 05:02:46 wudi wrote: > >>>>>>>> > >>>>>>>> Hi all, > >>>>>>>> > >>>>>>>> As discussed in the previous email [1], about contributing the > Flink > >>>>>>> Doris Connector to the Flink community. > >>>>>>>> > >>>>>>>> > >>>>>>>> Apache Doris[2] is a high-performance, real-time analytical > database > >>>>>>> based on MPP architecture, for scenarios where Flink is used for > data > >>>>>>> analysis, processing, or real-time writing on Doris, Flink Doris > >>>>>> Connector > >>>>>>> is an effective tool. > >>>>>>>> > >>>>>>>> At the same time, Contributing Flink Doris Connector to the Flink > >>>>>>> community will further expand the Flink Connectors ecosystem. > >>>>>>>> > >>>>>>>> So I would like to start an official discussion FLIP-399: Flink > >>>>>>> Connector Doris[3]. > >>>>>>>> > >>>>>>>> Looking forward to comments, feedbacks and suggestions from the > >>>>>>> community on the proposal. > >>>>>>>> > >>>>>>>> [1] > https://lists.apache.org/thread/lvh8g9o6qj8bt3oh60q81z0o1cv3nn8p > >>>>>>>> [2] > >>>>>> > https://doris.apache.org/docs/dev/get-starting/what-is-apache-doris/ > >>>>>>>> [3] > >>>>>>> > >>>>>> > >>>> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-399%3A+Flink+Connector+Doris > >>>>>>>> > >>>>>>>> > >>>>>>>> Brs, > >>>>>>>> > >>>>>>>> di.wu > >>>>>>>> > >>>>>>> > >>>>>> > >>>> > >>>> > >> > > > > > >