Thanks for the explanation Di Wu.

Regards,
Jeyhun

On Thu, Mar 7, 2024 at 7:48 PM Jing Ge <j...@ververica.com.invalid> wrote:

> Thanks for the clarification. It makes sense to me.
>
> Best regards,
> Jing
>
> On Wed, Mar 6, 2024 at 10:03 AM wudi <676366...@qq.com.invalid> wrote:
>
> > Hi Jing Ge, thanks for your suggestions.
> >
> > 1. Currently, the Flink Doris Connector is compatible with Flink versions
> > 1.15-1.18. SupportsCommitter[1] seems to be introduced in Flink 1.19, and
> > most users may not have upgraded their Flink environments to that version
> > yet. Modifying it now could lead to incompatibilities. I think we can
> > postpone the modification and make it together with other connectors.
> What
> > do you think?
> >
> > 2. Yes, currently DorisSource only supports batch reading, typically used
> > for data synchronization and ETL. Streaming reading is not supported yet,
> > which requires the capability of Doris Binlog (mentioned in the Doris
> 2024
> > RoadMap[2]). Streaming reading can be used to capture incremental events
> > from the database, making it more convenient for users to process
> real-time
> > data newly added to Doris.
> >
> > 3. E2ECase[3] for DorisSource have been added, and the TestPlan in the
> > FLIP has been modified.
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-372%3A+Enhance+and+synchronize+Sink+API+to+match+the+Source+API
> > [2] https://github.com/apache/doris/issues/30669
> > [3]
> >
> https://github.com/apache/doris-flink-connector/blob/master/flink-doris-connector/src/test/java/org/apache/doris/flink/tools/cdc/DorisDorisE2ECase.java
> >
> > Brs,
> > di.wu
> >
> >
> > > 2024年3月6日 06:12,Jing Ge <j...@ververica.com.INVALID> 写道:
> > >
> > > Hi Di,
> > >
> > > Thanks for your proposal. +1 for the contribution. I'd like to know
> your
> > > thoughts about the following questions:
> > >
> > > 1. According to your clarification of the exactly-once, thanks for it
> > BTW,
> > > no PreCommitTopology is required. Does it make sense to let
> DorisSink[1]
> > > implement SupportsCommitter, since the TwoPhaseCommittingSink is
> > > deprecated[2] before turning the Doris connector into a Flink
> connector?
> > > 2. OLAP engines are commonly used as the tail/downstream of a data
> > pipeline
> > > to support further e.g. ad-hoc query or cube with feasible
> > pre-aggregation.
> > > Just out of curiosity, would you like to share some real use cases that
> > > will use OLAP engines as the source of a streaming data pipeline? Or it
> > > will only be used as the source for the batch?
> > > 3. The E2E test only covered sink[3], if I am not mistaken. Would you
> > like
> > > to test the source in E2E too?
> > >
> > > [1]
> > >
> >
> https://github.com/apache/doris-flink-connector/blob/43e0e5cf9b832854ea228fb093077872e3a311b6/flink-doris-connector/src/main/java/org/apache/doris/flink/sink/DorisSink.java#L55
> > > [2]
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-372%3A+Enhance+and+synchronize+Sink+API+to+match+the+Source+API
> > > [3]
> > >
> >
> https://github.com/apache/doris-flink-connector/blob/43e0e5cf9b832854ea228fb093077872e3a311b6/flink-doris-connector/src/test/java/org/apache/doris/flink/tools/cdc/MySQLDorisE2ECase.java#L96
> > >
> > > Best regards,
> > > Jing
> > >
> > > On Tue, Mar 5, 2024 at 11:18 AM wudi <676366...@qq.com.invalid> wrote:
> > >
> > >> Hi, Jeyhun Karimov.
> > >> Thanks for your question.
> > >>
> > >> - How to ensure Exactly-Once?
> > >> 1. When the Checkpoint Barrier arrives, DorisSink will trigger the
> > >> precommit api of StreamLoad to complete the persistence of data in
> Doris
> > >> (the data will not be visible at this time), and will also pass this
> > TxnID
> > >> to the Committer.
> > >> 2. When this Checkpoint of the entire Job is completed, the Committer
> > will
> > >> call the commit api of StreamLoad and commit TxnID to complete the
> > >> visibility of the transaction.
> > >> 3. When the task is restarted, the Txn with successful precommit and
> > >> failed commit will be aborted based on the label-prefix, and Doris'
> > abort
> > >> API will be called. (At the same time, Doris will also abort
> > transactions
> > >> that have not been committed for a long time)
> > >>
> > >> ps: At the same time, this part of the content has been updated in
> FLIP
> > >>
> > >> - Because the default table model in Doris is Duplicate (
> > >> https://doris.apache.org/docs/data-table/data-model/), which does not
> > >> have a primary key, batch writing may cause data duplication, but UNIQ
> > The
> > >> model has a primary key, which ensures the idempotence of writing,
> thus
> > >> achieving Exactly-Once
> > >>
> > >> Brs,
> > >> di.wu
> > >>
> > >>
> > >>> 2024年3月2日 17:50,Jeyhun Karimov <je.kari...@gmail.com> 写道:
> > >>>
> > >>> Hi,
> > >>>
> > >>> Thanks for the proposal. +1 for the FLIP.
> > >>> I have a few questions:
> > >>>
> > >>> - How exactly the two (Stream Load's two-phase commit and Flink's
> > >> two-phase
> > >>> commit) combination will ensure the e2e exactly-once semantics?
> > >>>
> > >>> - The FLIP proposes to combine Doris's batch writing with the primary
> > key
> > >>> table to achieve Exactly-Once semantics. Could you elaborate more on
> > >> that?
> > >>> Why it is not the default behavior but a workaround?
> > >>>
> > >>> Regards,
> > >>> Jeyhun
> > >>>
> > >>> On Sat, Mar 2, 2024 at 10:14 AM Yanquan Lv <decq12y...@gmail.com>
> > wrote:
> > >>>
> > >>>> Thanks for driving this.
> > >>>> The content is very detailed, it is recommended to add a section on
> > Test
> > >>>> Plan for more completeness.
> > >>>>
> > >>>> Di Wu <d...@apache.org> 于2024年1月25日周四 15:40写道:
> > >>>>
> > >>>>> Hi all,
> > >>>>>
> > >>>>> Previously, we had some discussions about contributing Flink Doris
> > >>>>> Connector to the Flink community [1]. I want to further promote
> this
> > >>>> work.
> > >>>>> I hope everyone will help participate in this FLIP discussion and
> > >> provide
> > >>>>> more valuable opinions and suggestions.
> > >>>>> Thanks.
> > >>>>>
> > >>>>> [1]
> https://lists.apache.org/thread/lvh8g9o6qj8bt3oh60q81z0o1cv3nn8p
> > >>>>>
> > >>>>> Brs,
> > >>>>> di.wu
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> On 2023/12/07 05:02:46 wudi wrote:
> > >>>>>>
> > >>>>>> Hi all,
> > >>>>>>
> > >>>>>> As discussed in the previous email [1], about contributing the
> Flink
> > >>>>> Doris Connector to the Flink community.
> > >>>>>>
> > >>>>>>
> > >>>>>> Apache Doris[2] is a high-performance, real-time analytical
> database
> > >>>>> based on MPP architecture, for scenarios where Flink is used for
> data
> > >>>>> analysis, processing, or real-time writing on Doris, Flink Doris
> > >>>> Connector
> > >>>>> is an effective tool.
> > >>>>>>
> > >>>>>> At the same time, Contributing Flink Doris Connector to the Flink
> > >>>>> community will further expand the Flink Connectors ecosystem.
> > >>>>>>
> > >>>>>> So I would like to start an official discussion FLIP-399: Flink
> > >>>>> Connector Doris[3].
> > >>>>>>
> > >>>>>> Looking forward to comments, feedbacks and suggestions from the
> > >>>>> community on the proposal.
> > >>>>>>
> > >>>>>> [1]
> > https://lists.apache.org/thread/lvh8g9o6qj8bt3oh60q81z0o1cv3nn8p
> > >>>>>> [2]
> > >>>>
> https://doris.apache.org/docs/dev/get-starting/what-is-apache-doris/
> > >>>>>> [3]
> > >>>>>
> > >>>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-399%3A+Flink+Connector+Doris
> > >>>>>>
> > >>>>>>
> > >>>>>> Brs,
> > >>>>>>
> > >>>>>> di.wu
> > >>>>>>
> > >>>>>
> > >>>>
> > >>
> > >>
> >
> >
>

Reply via email to