Re: File Source Exactly Once Delivery Semantics

2023-08-02 Thread Shammon FY
Hi Kirti,

Simply speaking, sink needs to support `two-stage commit`, the sink can
`write` data as normal and only `commit` data after the checkpoint is
successful. This ensures that even if a failover occurs and data needs to
be replayed, the previously written data is not visible to the
user. However, this approach will increase data latency. The data is only
visible after the checkpoint is completed and the data is committed, rather
than immediately visible after the sink writes the data.

Best,
Shammon FY

On Thu, Aug 3, 2023 at 12:23 PM Kirti Dhar Upadhyay K via user <
user@flink.apache.org> wrote:

> Hi Team,
>
>
>
> I am using Flink File Source in one of my use case.
>
> I observed that, while reading file by source reader it stores its
> position in checkpointed data.
>
> In case application crashes, it restores its position from checkpointed
> data, once application comes up, which may result in re-emitting few
> records which were emitted in between last checkpointing and application
> crash.
>
> Whereas in doc link
> https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/connectors/datastream/guarantees/
> I found that File source ensures exactly once delivery semantics with help
> of data sink.
>
> *“**To guarantee end-to-end exactly-once record delivery (in addition to
> exactly-once state semantics), the data sink needs to take part in the
> checkpointing mechanism.”*
>
>
>
>
>
> Can someone put some light on this?
>
>
>
> Regards,
>
> Kirti Dhar
>
>
>


File Source Exactly Once Delivery Semantics

2023-08-02 Thread Kirti Dhar Upadhyay K via user
Hi Team,

I am using Flink File Source in one of my use case.
I observed that, while reading file by source reader it stores its position in 
checkpointed data.
In case application crashes, it restores its position from checkpointed data, 
once application comes up, which may result in re-emitting few records which 
were emitted in between last checkpointing and application crash.
Whereas in doc link 
https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/connectors/datastream/guarantees/
 I found that File source ensures exactly once delivery semantics with help of 
data sink.
"To guarantee end-to-end exactly-once record delivery (in addition to 
exactly-once state semantics), the data sink needs to take part in the 
checkpointing mechanism."


Can someone put some light on this?

Regards,
Kirti Dhar