Re: [DISCUSS] KIP-939: Support Participation in 2PC

Rowland Smith Thu, 04 Jan 2024 18:14:23 -0800

It is probably me. I copied the original message subject into a new email.
Perhaps that is not enough to link them.

It was not my understanding from reading KIP-939 that we are doing away
with any transactional timeout in the Kafka broker. As I understand it, we
are allowing the application to set the transaction timeout to a value that
exceeds the *transaction.max.timeout.ms
<http://transaction.max.timeout.ms>* setting
on the broker, and having no timeout if the application does not set
*transaction.timeout.ms
<http://transaction.timeout.ms>* on the producer. The KIP says that the
semantics of *transaction.timeout.ms <http://transaction.timeout.ms>* are
not being changed, so I take that to mean that the broker will continue to
enforce a timeout if provided, and abort transactions that exceed it. From
the KIP:

Client Configuration Changes

*transaction.two.phase.commit.enable* The default would be ‘false’.  If set
to ‘true’, then the broker is informed that the client is participating in
two phase commit protocol and can set transaction timeout to values that
exceed *transaction.max.timeout.ms <http://transaction.max.timeout.ms>* setting
on the broker (if the timeout is not set explicitly on the client and the
two phase commit is set to ‘true’ then the transaction never expires).

*transaction.timeout.ms <http://transaction.timeout.ms>* The semantics is
not changed, but it can be set to values that exceed
*transaction.max.timeout.ms
<http://transaction.max.timeout.ms>* if two.phase.commit.enable is set to
‘true’.

Thinking about this more I believe we would also have a possible race
condition if the broker is unaware that a transaction has been prepared.
The application might call prepare and get a positive response, but the
broker might have already aborted the transaction for exceeding the
timeout. It is a general rule of 2PC that once a transaction has been
prepared it must be possible for it to be committed or aborted. It seems in
this case a prepared transaction might already be aborted by the broker, so
it would be impossible to commit.

I hope this is making sense and I am not misunderstanding the KIP. Please
let me know if I am.

- Rowland

On Thu, Jan 4, 2024 at 12:56 PM Justine Olshan <jols...@confluent.io.invalid>
wrote:

> Hey Rowland,
>
> Not sure why this message showed up in a different thread from the other
> KIP-939 discussion (is it just me?)
>
> In KIP-939, we do away with having any transactional timeout on the Kafka
> side. The external coordinator is fully responsible for controlling whether
> the transaction completes.
>
> While I think there is some use in having a prepare stage, I just wanted to
> clarify what the current KIP is proposing.
>
> Thanks,
> Justine
>
> On Wed, Jan 3, 2024 at 7:49 PM Rowland Smith <rowl...@gmail.com> wrote:
>
> > Hi Artem,
> >
> > I saw your response in the thread I started discussing Kafka distributed
> > transaction support and the XA interface. I would like to work with you
> to
> > add XA support to Kafka on top of the excellent foundational work that
> you
> > have started with KIP-939. I agree that explicit XA support should not be
> > included in the Kafka codebase as long as the right set of basic
> operations
> > are provided. I will begin pulling together a KIP to follow KIP-939.
> >
> > I did have one comment on KIP-939 itself. I see that you considered an
> > explicit "prepare" RPC, but decided not to add it. If I understand your
> > design correctly, that would mean that a 2PC transaction would have a
> > single timeout that would need to be long enough to ensure that prepared
> > transactions are not aborted when an external coordinator fails. However,
> > this also means that an unprepared transaction would not be aborted
> without
> > waiting for the same timeout. Since long running transactions block
> > transactional consumers, having a long timeout for all transactions could
> > be disruptive. An explicit "prepare " RPC would allow the server to abort
> > unprepared transactions after a relatively short timeout, and apply a
> much
> > longer timeout only to prepared transactions. The explicit "prepare" RPC
> > would make Kafka server more resilient to client failure at the cost of
> an
> > extra synchronous RPC call. I think its worth reconsidering this.
> >
> > With an XA implementation this might become a more significant issue
> since
> > the transaction coordinator has no memory of unprepared transactions
> across
> > restarts. Such transactions would need to be cleared by hand through the
> > admin client even when the transaction coordinator restarts successfully.
> >
> > - Rowland
> >
>

Re: [DISCUSS] KIP-939: Support Participation in 2PC

Reply via email to