Re: [DISCUSS] KIP-939: Support Participation in 2PC

Rowland Smith Mon, 05 Feb 2024 18:22:52 -0800

Hi Artem,

I don't think that you understand what I am saying. In any transaction,
there is work done before the call to prepareTranscation() and work done
afterwards. Any work performed before the call to prepareTransaction() can
be aborted after a relatively short timeout if the client fails. It is only
after the prepareTransaction() call that a transaction becomes in-doubt and
must be remembered for a much longer period of time to allow the client to
recover and make the decision to either commit or abort. A considerable
amount of time might be spent before prepareTransaction() is called, and if
the client fails in this period, relatively quick transaction abort would
unblock any partitions and make the system fully available. So a prepare
RPC would reduce the window where a client failure results in potentially
long-lived blocking transactions.


Here is the proposed sequence from the KIP with 2 added steps (4 and 5):


   1. Begin database transaction
   2. Begin Kafka transaction
   3. Produce data to Kafka
   4. Make updates to the database
   5. Repeat steps 3 and 4 as many times as necessary based on application
   needs.
   6. Prepare Kafka transaction [currently implicit operation, expressed as
   flush]
   7. Write produced data to the database
   8. Write offsets of produced data to the database
   9. Commit database transaction
   10. Commit Kafka transaction


If the client application crashes before step 6, it is safe to abort the
Kafka transaction after a relatively short timeout.

I fully agree with a layered approach. However, the XA layer is going to
require certain capabilities from the layer below it, and one of those
capabilities is to be able to identify and report prepared transactions
during recovery.

- Rowland

On Mon, Feb 5, 2024 at 12:46 AM Artem Livshits
<alivsh...@confluent.io.invalid> wrote:

> Hi Rowland,
>
> Thank you for your feedback.  Using an explicit prepare RPC was discussed
> and is listed in the rejected alternatives:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-939%3A+Support+Participation+in+2PC#KIP939:SupportParticipationin2PC-Explicit%E2%80%9Cprepare%E2%80%9DRPC
> .
> Basically, even if we had an explicit prepare RPC, it doesn't avoid the
> fact that a crashed client could cause a blocking transaction.  This is,
> btw, is not just a specific property of this concrete proposal, it's a
> fundamental trade off of any form of 2PC -- any 2PC implementation must
> allow for infinitely "in-doubt" transactions that may not be unilaterally
> automatically resolved within one participant.
>
> To mitigate the issue, using 2PC requires a special permission, so that the
> Kafka admin could control that only applications that follow proper
> standards in terms of availability (i.e. will automatically restart and
> cleanup after a crash) would be allowed to utilize 2PC.  It is also assumed
> that any practical deployment utilizing 2PC would have monitoring set up,
> so that an operator could be alerted to investigate and manually resolve
> in-doubt transactions (the metric and tooling support for doing so are also
> described in the KIP).
>
> For XA support, I wonder if we could take a layered approach and store XA
> information in a separate store, say in a compacted topic.  This way, the
> core Kafka protocol could be decoupled from specific implementations (and
> extra performance requirements that a specific implementation may impose)
> and serve as a foundation for multiple implementations.
>
> -Artem
>
> On Sun, Feb 4, 2024 at 1:37 PM Rowland Smith <rowl...@gmail.com> wrote:
>
> > Hi Artem,
> >
> > It has been a while, but I have gotten back to this. I understand that
> when
> > 2PC is used, the transaction timeout will be effectively infinite. I
> don't
> > think that this behavior is desirable. A long running transaction can be
> > extremely disruptive since it blocks consumers on any partitions written
> to
> > within the pending transaction. The primary reason for a long running
> > transaction is a failure of the client, or the network connecting the
> > client to the broker. If such a failure occurs before the client calls
> > the new prepareTransaction() method, it should be OK to abort the
> > transaction after a relatively short timeout period. This approach would
> > minimize the inconvenience and disruption of a long running transaction
> > blocking consumers, and provide higher availability for a system using
> > Kafka.
> >
> > In order to achieve this behavior, I think we would need a 'prepare' RPC
> > call so that the server knows that a transaction has been prepared, and
> > does not timeout and abort such transactions. There will be some cost to
> > this extra RPC call, but there will also be a benefit of better system
> > availability in case of failures.
> >
> > There is another reason why I would prefer this implementation. I am
> > working on an XA KIP, and XA requires that Kafka brokers be able to
> provide
> > a list of prepared transactions during recovery.  The broker can only
> know
> > that a transaction has been prepared if an RPC call is made., so my KIP
> > will need this functionality. In the XA KIP, I would like to use as much
> of
> > the KIP-939 solution as possible, so it would be helpful if
> > prepareTransactions() sent a 'prepare' RPC, and the broker recorded the
> > prepared transaction state.
> >
> > This could be made configurable behavior if we are concerned that the
> cost
> > of the extra RPC call is too much, and that some users would prefer to
> have
> > speed in exchange for less system availability in some cases of client or
> > network failure.
> >
> > Let me know what you think.
> >
> > -Rowland
> >
> > On Fri, Jan 5, 2024 at 8:03 PM Artem Livshits
> > <alivsh...@confluent.io.invalid> wrote:
> >
> > > Hi Rowland,
> > >
> > > Thank you for the feedback.  For the 2PC cases, the expectation is that
> > the
> > > timeout on the client would be set to "effectively infinite", that
> would
> > > exceed all practical 2PC delays.  Now I think that this flexibility is
> > > confusing and can be misused, I have updated the KIP to just say that
> if
> > > 2PC is used, the transaction never expires.
> > >
> > > -Artem
> > >
> > > On Thu, Jan 4, 2024 at 6:14 PM Rowland Smith <rowl...@gmail.com>
> wrote:
> > >
> > > > It is probably me. I copied the original message subject into a new
> > > email.
> > > > Perhaps that is not enough to link them.
> > > >
> > > > It was not my understanding from reading KIP-939 that we are doing
> away
> > > > with any transactional timeout in the Kafka broker. As I understand
> it,
> > > we
> > > > are allowing the application to set the transaction timeout to a
> value
> > > that
> > > > exceeds the *transaction.max.timeout.ms
> > > > <http://transaction.max.timeout.ms>* setting
> > > > on the broker, and having no timeout if the application does not set
> > > > *transaction.timeout.ms
> > > > <http://transaction.timeout.ms>* on the producer. The KIP says that
> > the
> > > > semantics of *transaction.timeout.ms <http://transaction.timeout.ms
> >*
> > > are
> > > > not being changed, so I take that to mean that the broker will
> continue
> > > to
> > > > enforce a timeout if provided, and abort transactions that exceed it.
> > > From
> > > > the KIP:
> > > >
> > > > Client Configuration Changes
> > > >
> > > > *transaction.two.phase.commit.enable* The default would be ‘false’.
> If
> > > set
> > > > to ‘true’, then the broker is informed that the client is
> participating
> > > in
> > > > two phase commit protocol and can set transaction timeout to values
> > that
> > > > exceed *transaction.max.timeout.ms <
> http://transaction.max.timeout.ms
> > >*
> > > > setting
> > > > on the broker (if the timeout is not set explicitly on the client and
> > the
> > > > two phase commit is set to ‘true’ then the transaction never
> expires).
> > > >
> > > > *transaction.timeout.ms <http://transaction.timeout.ms>* The
> semantics
> > > is
> > > > not changed, but it can be set to values that exceed
> > > > *transaction.max.timeout.ms
> > > > <http://transaction.max.timeout.ms>* if two.phase.commit.enable is
> set
> > > to
> > > > ‘true’.
> > > >
> > > >
> > > > Thinking about this more I believe we would also have a possible race
> > > > condition if the broker is unaware that a transaction has been
> > prepared.
> > > > The application might call prepare and get a positive response, but
> the
> > > > broker might have already aborted the transaction for exceeding the
> > > > timeout. It is a general rule of 2PC that once a transaction has been
> > > > prepared it must be possible for it to be committed or aborted. It
> > seems
> > > in
> > > > this case a prepared transaction might already be aborted by the
> > broker,
> > > so
> > > > it would be impossible to commit.
> > > >
> > > > I hope this is making sense and I am not misunderstanding the KIP.
> > Please
> > > > let me know if I am.
> > > >
> > > > - Rowland
> > > >
> > > >
> > > > On Thu, Jan 4, 2024 at 12:56 PM Justine Olshan
> > > > <jols...@confluent.io.invalid>
> > > > wrote:
> > > >
> > > > > Hey Rowland,
> > > > >
> > > > > Not sure why this message showed up in a different thread from the
> > > other
> > > > > KIP-939 discussion (is it just me?)
> > > > >
> > > > > In KIP-939, we do away with having any transactional timeout on the
> > > Kafka
> > > > > side. The external coordinator is fully responsible for controlling
> > > > whether
> > > > > the transaction completes.
> > > > >
> > > > > While I think there is some use in having a prepare stage, I just
> > > wanted
> > > > to
> > > > > clarify what the current KIP is proposing.
> > > > >
> > > > > Thanks,
> > > > > Justine
> > > > >
> > > > > On Wed, Jan 3, 2024 at 7:49 PM Rowland Smith <rowl...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Hi Artem,
> > > > > >
> > > > > > I saw your response in the thread I started discussing Kafka
> > > > distributed
> > > > > > transaction support and the XA interface. I would like to work
> with
> > > you
> > > > > to
> > > > > > add XA support to Kafka on top of the excellent foundational work
> > > that
> > > > > you
> > > > > > have started with KIP-939. I agree that explicit XA support
> should
> > > not
> > > > be
> > > > > > included in the Kafka codebase as long as the right set of basic
> > > > > operations
> > > > > > are provided. I will begin pulling together a KIP to follow
> > KIP-939.
> > > > > >
> > > > > > I did have one comment on KIP-939 itself. I see that you
> considered
> > > an
> > > > > > explicit "prepare" RPC, but decided not to add it. If I
> understand
> > > your
> > > > > > design correctly, that would mean that a 2PC transaction would
> > have a
> > > > > > single timeout that would need to be long enough to ensure that
> > > > prepared
> > > > > > transactions are not aborted when an external coordinator fails.
> > > > However,
> > > > > > this also means that an unprepared transaction would not be
> aborted
> > > > > without
> > > > > > waiting for the same timeout. Since long running transactions
> block
> > > > > > transactional consumers, having a long timeout for all
> transactions
> > > > could
> > > > > > be disruptive. An explicit "prepare " RPC would allow the server
> to
> > > > abort
> > > > > > unprepared transactions after a relatively short timeout, and
> > apply a
> > > > > much
> > > > > > longer timeout only to prepared transactions. The explicit
> > "prepare"
> > > > RPC
> > > > > > would make Kafka server more resilient to client failure at the
> > cost
> > > of
> > > > > an
> > > > > > extra synchronous RPC call. I think its worth reconsidering this.
> > > > > >
> > > > > > With an XA implementation this might become a more significant
> > issue
> > > > > since
> > > > > > the transaction coordinator has no memory of unprepared
> > transactions
> > > > > across
> > > > > > restarts. Such transactions would need to be cleared by hand
> > through
> > > > the
> > > > > > admin client even when the transaction coordinator restarts
> > > > successfully.
> > > > > >
> > > > > > - Rowland
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> > --
> > *Rowland E. Smith*
> > P: (862) 260-4163
> > M: (201) 396-3842
> >
>

Re: [DISCUSS] KIP-939: Support Participation in 2PC

Reply via email to