Hi, Jeyhun Karimov.
Thanks for your question.

- How to ensure Exactly-Once?
1. When the Checkpoint Barrier arrives, DorisSink will trigger the precommit 
api of StreamLoad to complete the persistence of data in Doris (the data will 
not be visible at this time), and will also pass this TxnID to the Committer.
2. When this Checkpoint of the entire Job is completed, the Committer will call 
the commit api of StreamLoad and commit TxnID to complete the visibility of the 
transaction.
3. When the task is restarted, the Txn with successful precommit and failed 
commit will be aborted based on the label-prefix, and Doris' abort API will be 
called. (At the same time, Doris will also abort transactions that have not 
been committed for a long time)

ps: At the same time, this part of the content has been updated in FLIP

- Because the default table model in Doris is Duplicate 
(https://doris.apache.org/docs/data-table/data-model/), which does not have a 
primary key, batch writing may cause data duplication, but UNIQ The model has a 
primary key, which ensures the idempotence of writing, thus achieving 
Exactly-Once

Brs,
di.wu


> 2024年3月2日 17:50,Jeyhun Karimov <je.kari...@gmail.com> 写道:
> 
> Hi,
> 
> Thanks for the proposal. +1 for the FLIP.
> I have a few questions:
> 
> - How exactly the two (Stream Load's two-phase commit and Flink's two-phase
> commit) combination will ensure the e2e exactly-once semantics?
> 
> - The FLIP proposes to combine Doris's batch writing with the primary key
> table to achieve Exactly-Once semantics. Could you elaborate more on that?
> Why it is not the default behavior but a workaround?
> 
> Regards,
> Jeyhun
> 
> On Sat, Mar 2, 2024 at 10:14 AM Yanquan Lv <decq12y...@gmail.com> wrote:
> 
>> Thanks for driving this.
>> The content is very detailed, it is recommended to add a section on Test
>> Plan for more completeness.
>> 
>> Di Wu <d...@apache.org> 于2024年1月25日周四 15:40写道:
>> 
>>> Hi all,
>>> 
>>> Previously, we had some discussions about contributing Flink Doris
>>> Connector to the Flink community [1]. I want to further promote this
>> work.
>>> I hope everyone will help participate in this FLIP discussion and provide
>>> more valuable opinions and suggestions.
>>> Thanks.
>>> 
>>> [1] https://lists.apache.org/thread/lvh8g9o6qj8bt3oh60q81z0o1cv3nn8p
>>> 
>>> Brs,
>>> di.wu
>>> 
>>> 
>>> 
>>> On 2023/12/07 05:02:46 wudi wrote:
>>>> 
>>>> Hi all,
>>>> 
>>>> As discussed in the previous email [1], about contributing the Flink
>>> Doris Connector to the Flink community.
>>>> 
>>>> 
>>>> Apache Doris[2] is a high-performance, real-time analytical database
>>> based on MPP architecture, for scenarios where Flink is used for data
>>> analysis, processing, or real-time writing on Doris, Flink Doris
>> Connector
>>> is an effective tool.
>>>> 
>>>> At the same time, Contributing Flink Doris Connector to the Flink
>>> community will further expand the Flink Connectors ecosystem.
>>>> 
>>>> So I would like to start an official discussion FLIP-399: Flink
>>> Connector Doris[3].
>>>> 
>>>> Looking forward to comments, feedbacks and suggestions from the
>>> community on the proposal.
>>>> 
>>>> [1] https://lists.apache.org/thread/lvh8g9o6qj8bt3oh60q81z0o1cv3nn8p
>>>> [2]
>> https://doris.apache.org/docs/dev/get-starting/what-is-apache-doris/
>>>> [3]
>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-399%3A+Flink+Connector+Doris
>>>> 
>>>> 
>>>> Brs,
>>>> 
>>>> di.wu
>>>> 
>>> 
>> 

Reply via email to