Re: [DISCUSS] FLIP-572: Introduce Flink-Kafka Transactions Management Tool

Aleksandr Savonin Fri, 24 Apr 2026 04:44:09 -0700

Hi Shekhar and Hongshun,
Thank you for the questions and the participation.
Could you please clarify if you have any further questions? If not, I
will start a voting thread.


On Fri, 10 Apr 2026 at 14:07, Aleksandr Savonin <[email protected]> wrote:
>
> Hi Shekhar Prasad Rajak,
>
> > Don't we have timeout, session timeout at kafka side ?
>
> Yes, Kafka does have a timeout and when the timeout fires, the
> transaction is aborted by the broker. However, if Flink successfully
> completed a checkpoint but didn't commit the transaction (e.g. the JVM
> crashed before committing the transaction to Kafka), the data should
> be committed to satisfy 2PC protocol promises. Relying on the timeout
> may belong to data loss in this scenario.
> Moreover, some Flink-Kafka setups use high values for Kafka
> transaction timeouts, in that case, even transactions that must be
> aborted (because they were not checkpointed), will not be aborted
> until timeout fires, blocking the LSO. The tool provides a way to
> resolve this immediately.
>
> Kind regards,
> Aleksandr
>
> On Wed, 8 Apr 2026 at 05:34, Shekhar Rajak <[email protected]> wrote:
> >
> > Thanks for clarifying but
> > >  it holds the transaction open because it has noway to know whether the 
> > > data should be committed or aborted. That
> > decision belongs to Flink, which is no longer running.
> >
> > Don't we have timeout, session timeout at kafka side ? After enabling 2PC 
> > we must be using those config to make sure Kafka broker releases the locks 
> > and have durability.
> >
> > Regards,
> > Shekhar Prasad Rajak
> >
> >
> >     On Monday 30 March 2026 at 04:34:07 pm GMT+5:30, Aleksandr Savonin 
> > <[email protected]> wrote:
> >
> >  Hi Shekhar Rajak,
> > Good questions.
> > Let me clarify the transaction coordination model and why a
> > Flink-specific tool is needed.
> >
> > > Curious to know if we debug and analyse the root cause and try to 
> > > fix/enhance in the Kafka transaction coordinator side, will this issue be 
> > > resolved ?
> >
> >
> > Kafka has a built-in Transaction Coordinator, but it only manages the
> > Kafka protocol layer, it executes commit/abort when told to.
> > For exactly-once delivery, Flink acts as the orchestrator on top of
> > Kafka's transactions, deciding when to commit based on checkpoint
> > completion. The problem arises when Flink disappears (there is no
> > running Flink job anymore) and nobody is left to tell Kafka's
> > coordinator to commit or abort. Kafka's coordinator in this case is
> > behaving as designed - it holds the transaction open because it has no
> > way to know whether the data should be committed or aborted. That
> > decision belongs to Flink, which is no longer running.
> >
> > > Do we really need to think about transaction management tool specifically 
> > > for kafka ?
> >
> >
> > The Kafka sink is one of the most widely used Flink connectors, and
> > exactly-once delivery with Kafka transactions is a primary production
> > use case in some companies.
> >
> > --
> > Kind regards,
> > Aleksandr
> >
>
>
>
> --
> Kind regards,
> Aleksandr



-- 
Kind regards,
Aleksandr

Re: [DISCUSS] FLIP-572: Introduce Flink-Kafka Transactions Management Tool

Reply via email to