Thanks for the explanation Di Wu. Regards, Jeyhun
On Thu, Mar 7, 2024 at 7:48 PM Jing Ge <[email protected]> wrote: > Thanks for the clarification. It makes sense to me. > > Best regards, > Jing > > On Wed, Mar 6, 2024 at 10:03 AM wudi <[email protected]> wrote: > > > Hi Jing Ge, thanks for your suggestions. > > > > 1. Currently, the Flink Doris Connector is compatible with Flink versions > > 1.15-1.18. SupportsCommitter[1] seems to be introduced in Flink 1.19, and > > most users may not have upgraded their Flink environments to that version > > yet. Modifying it now could lead to incompatibilities. I think we can > > postpone the modification and make it together with other connectors. > What > > do you think? > > > > 2. Yes, currently DorisSource only supports batch reading, typically used > > for data synchronization and ETL. Streaming reading is not supported yet, > > which requires the capability of Doris Binlog (mentioned in the Doris > 2024 > > RoadMap[2]). Streaming reading can be used to capture incremental events > > from the database, making it more convenient for users to process > real-time > > data newly added to Doris. > > > > 3. E2ECase[3] for DorisSource have been added, and the TestPlan in the > > FLIP has been modified. > > > > [1] > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-372%3A+Enhance+and+synchronize+Sink+API+to+match+the+Source+API > > [2] https://github.com/apache/doris/issues/30669 > > [3] > > > https://github.com/apache/doris-flink-connector/blob/master/flink-doris-connector/src/test/java/org/apache/doris/flink/tools/cdc/DorisDorisE2ECase.java > > > > Brs, > > di.wu > > > > > > > 2024年3月6日 06:12,Jing Ge <[email protected]> 写道: > > > > > > Hi Di, > > > > > > Thanks for your proposal. +1 for the contribution. I'd like to know > your > > > thoughts about the following questions: > > > > > > 1. According to your clarification of the exactly-once, thanks for it > > BTW, > > > no PreCommitTopology is required. Does it make sense to let > DorisSink[1] > > > implement SupportsCommitter, since the TwoPhaseCommittingSink is > > > deprecated[2] before turning the Doris connector into a Flink > connector? > > > 2. OLAP engines are commonly used as the tail/downstream of a data > > pipeline > > > to support further e.g. ad-hoc query or cube with feasible > > pre-aggregation. > > > Just out of curiosity, would you like to share some real use cases that > > > will use OLAP engines as the source of a streaming data pipeline? Or it > > > will only be used as the source for the batch? > > > 3. The E2E test only covered sink[3], if I am not mistaken. Would you > > like > > > to test the source in E2E too? > > > > > > [1] > > > > > > https://github.com/apache/doris-flink-connector/blob/43e0e5cf9b832854ea228fb093077872e3a311b6/flink-doris-connector/src/main/java/org/apache/doris/flink/sink/DorisSink.java#L55 > > > [2] > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-372%3A+Enhance+and+synchronize+Sink+API+to+match+the+Source+API > > > [3] > > > > > > https://github.com/apache/doris-flink-connector/blob/43e0e5cf9b832854ea228fb093077872e3a311b6/flink-doris-connector/src/test/java/org/apache/doris/flink/tools/cdc/MySQLDorisE2ECase.java#L96 > > > > > > Best regards, > > > Jing > > > > > > On Tue, Mar 5, 2024 at 11:18 AM wudi <[email protected]> wrote: > > > > > >> Hi, Jeyhun Karimov. > > >> Thanks for your question. > > >> > > >> - How to ensure Exactly-Once? > > >> 1. When the Checkpoint Barrier arrives, DorisSink will trigger the > > >> precommit api of StreamLoad to complete the persistence of data in > Doris > > >> (the data will not be visible at this time), and will also pass this > > TxnID > > >> to the Committer. > > >> 2. When this Checkpoint of the entire Job is completed, the Committer > > will > > >> call the commit api of StreamLoad and commit TxnID to complete the > > >> visibility of the transaction. > > >> 3. When the task is restarted, the Txn with successful precommit and > > >> failed commit will be aborted based on the label-prefix, and Doris' > > abort > > >> API will be called. (At the same time, Doris will also abort > > transactions > > >> that have not been committed for a long time) > > >> > > >> ps: At the same time, this part of the content has been updated in > FLIP > > >> > > >> - Because the default table model in Doris is Duplicate ( > > >> https://doris.apache.org/docs/data-table/data-model/), which does not > > >> have a primary key, batch writing may cause data duplication, but UNIQ > > The > > >> model has a primary key, which ensures the idempotence of writing, > thus > > >> achieving Exactly-Once > > >> > > >> Brs, > > >> di.wu > > >> > > >> > > >>> 2024年3月2日 17:50,Jeyhun Karimov <[email protected]> 写道: > > >>> > > >>> Hi, > > >>> > > >>> Thanks for the proposal. +1 for the FLIP. > > >>> I have a few questions: > > >>> > > >>> - How exactly the two (Stream Load's two-phase commit and Flink's > > >> two-phase > > >>> commit) combination will ensure the e2e exactly-once semantics? > > >>> > > >>> - The FLIP proposes to combine Doris's batch writing with the primary > > key > > >>> table to achieve Exactly-Once semantics. Could you elaborate more on > > >> that? > > >>> Why it is not the default behavior but a workaround? > > >>> > > >>> Regards, > > >>> Jeyhun > > >>> > > >>> On Sat, Mar 2, 2024 at 10:14 AM Yanquan Lv <[email protected]> > > wrote: > > >>> > > >>>> Thanks for driving this. > > >>>> The content is very detailed, it is recommended to add a section on > > Test > > >>>> Plan for more completeness. > > >>>> > > >>>> Di Wu <[email protected]> 于2024年1月25日周四 15:40写道: > > >>>> > > >>>>> Hi all, > > >>>>> > > >>>>> Previously, we had some discussions about contributing Flink Doris > > >>>>> Connector to the Flink community [1]. I want to further promote > this > > >>>> work. > > >>>>> I hope everyone will help participate in this FLIP discussion and > > >> provide > > >>>>> more valuable opinions and suggestions. > > >>>>> Thanks. > > >>>>> > > >>>>> [1] > https://lists.apache.org/thread/lvh8g9o6qj8bt3oh60q81z0o1cv3nn8p > > >>>>> > > >>>>> Brs, > > >>>>> di.wu > > >>>>> > > >>>>> > > >>>>> > > >>>>> On 2023/12/07 05:02:46 wudi wrote: > > >>>>>> > > >>>>>> Hi all, > > >>>>>> > > >>>>>> As discussed in the previous email [1], about contributing the > Flink > > >>>>> Doris Connector to the Flink community. > > >>>>>> > > >>>>>> > > >>>>>> Apache Doris[2] is a high-performance, real-time analytical > database > > >>>>> based on MPP architecture, for scenarios where Flink is used for > data > > >>>>> analysis, processing, or real-time writing on Doris, Flink Doris > > >>>> Connector > > >>>>> is an effective tool. > > >>>>>> > > >>>>>> At the same time, Contributing Flink Doris Connector to the Flink > > >>>>> community will further expand the Flink Connectors ecosystem. > > >>>>>> > > >>>>>> So I would like to start an official discussion FLIP-399: Flink > > >>>>> Connector Doris[3]. > > >>>>>> > > >>>>>> Looking forward to comments, feedbacks and suggestions from the > > >>>>> community on the proposal. > > >>>>>> > > >>>>>> [1] > > https://lists.apache.org/thread/lvh8g9o6qj8bt3oh60q81z0o1cv3nn8p > > >>>>>> [2] > > >>>> > https://doris.apache.org/docs/dev/get-starting/what-is-apache-doris/ > > >>>>>> [3] > > >>>>> > > >>>> > > >> > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-399%3A+Flink+Connector+Doris > > >>>>>> > > >>>>>> > > >>>>>> Brs, > > >>>>>> > > >>>>>> di.wu > > >>>>>> > > >>>>> > > >>>> > > >> > > >> > > > > >
