Re: [DISCUSS] KIP-939: Support Participation in 2PC

Artem Livshits Wed, 21 Feb 2024 15:59:41 -0800

Hi Rowland,

> The Open Group DTP model and the XA interface requires that resource
managers be able to report prepared transactions only, so a prepare RPC
will be required.


It's required in the XA protocol, but I'm not sure we have to build it into
a Kafka.

Looks like we just need a catalog of prepared transactions and I wonder if
XA protocol could implement it outside of Kafka transactional state.  As an
example you can take a look at Flink that keeps track of prepared
transactions in its own storage.  I think it would be desirable if all
protocols kept their details outside of Kafka, so that we keep Kafka to be
the most open and protocol agnostic (and most efficient and simple) system.

-Artem

On Mon, Feb 19, 2024 at 12:13 PM Rowland Smith <rowl...@gmail.com> wrote:

> Hi Artem,
>
> I think that we both have the same understanding. An explicit prepare RPC
> does not eliminate any conditions, it just reduces the window for possible
> undesirable conditions like pending in-doubt transactions. So there is no
> right or wrong answer, a prepare RPC will reduce the number of
> occurrences of in-doubt transactions, but with a performance cost of an
> extra RPC call on every transaction.
>
> The Open Group DTP model and the XA interface requires that resource
> managers be able to report prepared transactions only, so a prepare RPC
> will be required. I will include it in my KIP for XA interface support, and
> will propose an implementation where clients can choose whether they want a
> prepare RPC when not using the XA interface. How does that sound?
>
> - Rowland
>
> On Fri, Feb 16, 2024 at 7:15 PM Artem Livshits
> <alivsh...@confluent.io.invalid> wrote:
>
> > Hi Rowland,
> >
> > > I am not sure what you mean by guarantee,
> >
> > A guarantee would be an elimination of complexity or a condition.  E.g.
> if
> > adding an explicit prepare RPC eliminated in-doubt transactions, or
> > eliminated a significant complexity in implementation.
> >
> > > 1. Transactions that haven’t reached “prepared” state can be aborted
> via
> > timeout.
> >
> > The argument is that it doesn't eliminate any conditions, it merely
> reduces
> > a subset of circumstances for the conditions to happen, but the
> conditions
> > still happen and must be handled.  The operator still needs to set up
> > monitoring for run-away transactions, there still needs to be an
> > "out-of-band" channel to resolve run-away transactions (i.e. the
> operation
> > would need a way that's not a part of the 2PC protocol to reconcile with
> > the application owner), there still needs to be tooling for resolving
> > run-away transactions.
> >
> > On the downside, an explicit prepare RPC would have a performance hit on
> > the happy path in every single transaction.
> >
> > -Artem
> >
> > On Tue, Feb 6, 2024 at 7:35 PM Rowland Smith <rowl...@gmail.com> wrote:
> >
> > > Hi Artem,
> > >
> > > I am not sure what you mean by guarantee, but I am referring to a
> better
> > > operational experience. You mentioned this as the first benefit of an
> > > explicit "prepare" RPC in the KIP.
> > >
> > >
> > > 1. Transactions that haven’t reached “prepared” state can be aborted
> via
> > > timeout.
> > >
> > > However, in explaining why an explicit "prepare" RPC was not included
> in
> > > the design, you make no further mention of this benefit. So what I am
> > > saying is this benefit is quite significant operationally. Many client
> > > application failures may occur before the transaction reaches the
> > prepared
> > > state, and the ability to automatically abort those transactions and
> > > unblock affected partitions without administrative intervention or fast
> > > restart of the client would be a worthwhile benefit. An explicit
> > "prepare"
> > > RPC will also be needed by the XA implementation, so I would like to
> see
> > it
> > > implemented for that reason. Otherwise, I will need to add this work to
> > my
> > > KIP.
> > >
> > > - Rowland
> > >
> > > On Mon, Feb 5, 2024 at 9:35 PM Artem Livshits
> > > <alivsh...@confluent.io.invalid> wrote:
> > >
> > > > Hi Rowland,
> > > >
> > > > Thank you for your reply.  I think I understand what you're saying
> and
> > > just
> > > > tried to provide a quick summary.  The
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-939%3A+Support+Participation+in+2PC#KIP939:SupportParticipationin2PC-Explicit%E2%80%9Cprepare%E2%80%9DRPC
> > > > actually goes into the details on what would be the benefits of
> adding
> > an
> > > > explicit prepare RPC and why those won't really add any advantages
> such
> > > as
> > > > elimination the needs for monitoring, tooling or providing additional
> > > > guarantees.  Let me know if you think of a guarantee that prepare RPC
> > > would
> > > > provide.
> > > >
> > > > -Artem
> > > >
> > > > On Mon, Feb 5, 2024 at 6:22 PM Rowland Smith <rowl...@gmail.com>
> > wrote:
> > > >
> > > > > Hi Artem,
> > > > >
> > > > > I don't think that you understand what I am saying. In any
> > transaction,
> > > > > there is work done before the call to prepareTranscation() and work
> > > done
> > > > > afterwards. Any work performed before the call to
> > prepareTransaction()
> > > > can
> > > > > be aborted after a relatively short timeout if the client fails. It
> > is
> > > > only
> > > > > after the prepareTransaction() call that a transaction becomes
> > in-doubt
> > > > and
> > > > > must be remembered for a much longer period of time to allow the
> > client
> > > > to
> > > > > recover and make the decision to either commit or abort. A
> > considerable
> > > > > amount of time might be spent before prepareTransaction() is
> called,
> > > and
> > > > if
> > > > > the client fails in this period, relatively quick transaction abort
> > > would
> > > > > unblock any partitions and make the system fully available. So a
> > > prepare
> > > > > RPC would reduce the window where a client failure results in
> > > potentially
> > > > > long-lived blocking transactions.
> > > > >
> > > > > Here is the proposed sequence from the KIP with 2 added steps (4
> and
> > > 5):
> > > > >
> > > > >
> > > > >    1. Begin database transaction
> > > > >    2. Begin Kafka transaction
> > > > >    3. Produce data to Kafka
> > > > >    4. Make updates to the database
> > > > >    5. Repeat steps 3 and 4 as many times as necessary based on
> > > > application
> > > > >    needs.
> > > > >    6. Prepare Kafka transaction [currently implicit operation,
> > > expressed
> > > > as
> > > > >    flush]
> > > > >    7. Write produced data to the database
> > > > >    8. Write offsets of produced data to the database
> > > > >    9. Commit database transaction
> > > > >    10. Commit Kafka transaction
> > > > >
> > > > >
> > > > > If the client application crashes before step 6, it is safe to
> abort
> > > the
> > > > > Kafka transaction after a relatively short timeout.
> > > > >
> > > > > I fully agree with a layered approach. However, the XA layer is
> going
> > > to
> > > > > require certain capabilities from the layer below it, and one of
> > those
> > > > > capabilities is to be able to identify and report prepared
> > transactions
> > > > > during recovery.
> > > > >
> > > > > - Rowland
> > > > >
> > > > > On Mon, Feb 5, 2024 at 12:46 AM Artem Livshits
> > > > > <alivsh...@confluent.io.invalid> wrote:
> > > > >
> > > > > > Hi Rowland,
> > > > > >
> > > > > > Thank you for your feedback.  Using an explicit prepare RPC was
> > > > discussed
> > > > > > and is listed in the rejected alternatives:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-939%3A+Support+Participation+in+2PC#KIP939:SupportParticipationin2PC-Explicit%E2%80%9Cprepare%E2%80%9DRPC
> > > > > > .
> > > > > > Basically, even if we had an explicit prepare RPC, it doesn't
> avoid
> > > the
> > > > > > fact that a crashed client could cause a blocking transaction.
> > This
> > > > is,
> > > > > > btw, is not just a specific property of this concrete proposal,
> > it's
> > > a
> > > > > > fundamental trade off of any form of 2PC -- any 2PC
> implementation
> > > must
> > > > > > allow for infinitely "in-doubt" transactions that may not be
> > > > unilaterally
> > > > > > automatically resolved within one participant.
> > > > > >
> > > > > > To mitigate the issue, using 2PC requires a special permission,
> so
> > > that
> > > > > the
> > > > > > Kafka admin could control that only applications that follow
> proper
> > > > > > standards in terms of availability (i.e. will automatically
> restart
> > > and
> > > > > > cleanup after a crash) would be allowed to utilize 2PC.  It is
> also
> > > > > assumed
> > > > > > that any practical deployment utilizing 2PC would have monitoring
> > set
> > > > up,
> > > > > > so that an operator could be alerted to investigate and manually
> > > > resolve
> > > > > > in-doubt transactions (the metric and tooling support for doing
> so
> > > are
> > > > > also
> > > > > > described in the KIP).
> > > > > >
> > > > > > For XA support, I wonder if we could take a layered approach and
> > > store
> > > > XA
> > > > > > information in a separate store, say in a compacted topic.  This
> > way,
> > > > the
> > > > > > core Kafka protocol could be decoupled from specific
> > implementations
> > > > (and
> > > > > > extra performance requirements that a specific implementation may
> > > > impose)
> > > > > > and serve as a foundation for multiple implementations.
> > > > > >
> > > > > > -Artem
> > > > > >
> > > > > > On Sun, Feb 4, 2024 at 1:37 PM Rowland Smith <rowl...@gmail.com>
> > > > wrote:
> > > > > >
> > > > > > > Hi Artem,
> > > > > > >
> > > > > > > It has been a while, but I have gotten back to this. I
> understand
> > > > that
> > > > > > when
> > > > > > > 2PC is used, the transaction timeout will be effectively
> > infinite.
> > > I
> > > > > > don't
> > > > > > > think that this behavior is desirable. A long running
> transaction
> > > can
> > > > > be
> > > > > > > extremely disruptive since it blocks consumers on any
> partitions
> > > > > written
> > > > > > to
> > > > > > > within the pending transaction. The primary reason for a long
> > > running
> > > > > > > transaction is a failure of the client, or the network
> connecting
> > > the
> > > > > > > client to the broker. If such a failure occurs before the
> client
> > > > calls
> > > > > > > the new prepareTransaction() method, it should be OK to abort
> the
> > > > > > > transaction after a relatively short timeout period. This
> > approach
> > > > > would
> > > > > > > minimize the inconvenience and disruption of a long running
> > > > transaction
> > > > > > > blocking consumers, and provide higher availability for a
> system
> > > > using
> > > > > > > Kafka.
> > > > > > >
> > > > > > > In order to achieve this behavior, I think we would need a
> > > 'prepare'
> > > > > RPC
> > > > > > > call so that the server knows that a transaction has been
> > prepared,
> > > > and
> > > > > > > does not timeout and abort such transactions. There will be
> some
> > > cost
> > > > > to
> > > > > > > this extra RPC call, but there will also be a benefit of better
> > > > system
> > > > > > > availability in case of failures.
> > > > > > >
> > > > > > > There is another reason why I would prefer this
> implementation. I
> > > am
> > > > > > > working on an XA KIP, and XA requires that Kafka brokers be
> able
> > to
> > > > > > provide
> > > > > > > a list of prepared transactions during recovery.  The broker
> can
> > > only
> > > > > > know
> > > > > > > that a transaction has been prepared if an RPC call is made.,
> so
> > my
> > > > KIP
> > > > > > > will need this functionality. In the XA KIP, I would like to
> use
> > as
> > > > > much
> > > > > > of
> > > > > > > the KIP-939 solution as possible, so it would be helpful if
> > > > > > > prepareTransactions() sent a 'prepare' RPC, and the broker
> > recorded
> > > > the
> > > > > > > prepared transaction state.
> > > > > > >
> > > > > > > This could be made configurable behavior if we are concerned
> that
> > > the
> > > > > > cost
> > > > > > > of the extra RPC call is too much, and that some users would
> > prefer
> > > > to
> > > > > > have
> > > > > > > speed in exchange for less system availability in some cases of
> > > > client
> > > > > or
> > > > > > > network failure.
> > > > > > >
> > > > > > > Let me know what you think.
> > > > > > >
> > > > > > > -Rowland
> > > > > > >
> > > > > > > On Fri, Jan 5, 2024 at 8:03 PM Artem Livshits
> > > > > > > <alivsh...@confluent.io.invalid> wrote:
> > > > > > >
> > > > > > > > Hi Rowland,
> > > > > > > >
> > > > > > > > Thank you for the feedback.  For the 2PC cases, the
> expectation
> > > is
> > > > > that
> > > > > > > the
> > > > > > > > timeout on the client would be set to "effectively infinite",
> > > that
> > > > > > would
> > > > > > > > exceed all practical 2PC delays.  Now I think that this
> > > flexibility
> > > > > is
> > > > > > > > confusing and can be misused, I have updated the KIP to just
> > say
> > > > that
> > > > > > if
> > > > > > > > 2PC is used, the transaction never expires.
> > > > > > > >
> > > > > > > > -Artem
> > > > > > > >
> > > > > > > > On Thu, Jan 4, 2024 at 6:14 PM Rowland Smith <
> > rowl...@gmail.com>
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > It is probably me. I copied the original message subject
> > into a
> > > > new
> > > > > > > > email.
> > > > > > > > > Perhaps that is not enough to link them.
> > > > > > > > >
> > > > > > > > > It was not my understanding from reading KIP-939 that we
> are
> > > > doing
> > > > > > away
> > > > > > > > > with any transactional timeout in the Kafka broker. As I
> > > > understand
> > > > > > it,
> > > > > > > > we
> > > > > > > > > are allowing the application to set the transaction timeout
> > to
> > > a
> > > > > > value
> > > > > > > > that
> > > > > > > > > exceeds the *transaction.max.timeout.ms
> > > > > > > > > <http://transaction.max.timeout.ms>* setting
> > > > > > > > > on the broker, and having no timeout if the application
> does
> > > not
> > > > > set
> > > > > > > > > *transaction.timeout.ms
> > > > > > > > > <http://transaction.timeout.ms>* on the producer. The KIP
> > says
> > > > > that
> > > > > > > the
> > > > > > > > > semantics of *transaction.timeout.ms <
> > > > > http://transaction.timeout.ms
> > > > > > >*
> > > > > > > > are
> > > > > > > > > not being changed, so I take that to mean that the broker
> > will
> > > > > > continue
> > > > > > > > to
> > > > > > > > > enforce a timeout if provided, and abort transactions that
> > > exceed
> > > > > it.
> > > > > > > > From
> > > > > > > > > the KIP:
> > > > > > > > >
> > > > > > > > > Client Configuration Changes
> > > > > > > > >
> > > > > > > > > *transaction.two.phase.commit.enable* The default would be
> > > > ‘false’.
> > > > > > If
> > > > > > > > set
> > > > > > > > > to ‘true’, then the broker is informed that the client is
> > > > > > participating
> > > > > > > > in
> > > > > > > > > two phase commit protocol and can set transaction timeout
> to
> > > > values
> > > > > > > that
> > > > > > > > > exceed *transaction.max.timeout.ms <
> > > > > > http://transaction.max.timeout.ms
> > > > > > > >*
> > > > > > > > > setting
> > > > > > > > > on the broker (if the timeout is not set explicitly on the
> > > client
> > > > > and
> > > > > > > the
> > > > > > > > > two phase commit is set to ‘true’ then the transaction
> never
> > > > > > expires).
> > > > > > > > >
> > > > > > > > > *transaction.timeout.ms <http://transaction.timeout.ms>*
> The
> > > > > > semantics
> > > > > > > > is
> > > > > > > > > not changed, but it can be set to values that exceed
> > > > > > > > > *transaction.max.timeout.ms
> > > > > > > > > <http://transaction.max.timeout.ms>* if
> > > two.phase.commit.enable
> > > > is
> > > > > > set
> > > > > > > > to
> > > > > > > > > ‘true’.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Thinking about this more I believe we would also have a
> > > possible
> > > > > race
> > > > > > > > > condition if the broker is unaware that a transaction has
> > been
> > > > > > > prepared.
> > > > > > > > > The application might call prepare and get a positive
> > response,
> > > > but
> > > > > > the
> > > > > > > > > broker might have already aborted the transaction for
> > exceeding
> > > > the
> > > > > > > > > timeout. It is a general rule of 2PC that once a
> transaction
> > > has
> > > > > been
> > > > > > > > > prepared it must be possible for it to be committed or
> > aborted.
> > > > It
> > > > > > > seems
> > > > > > > > in
> > > > > > > > > this case a prepared transaction might already be aborted
> by
> > > the
> > > > > > > broker,
> > > > > > > > so
> > > > > > > > > it would be impossible to commit.
> > > > > > > > >
> > > > > > > > > I hope this is making sense and I am not misunderstanding
> the
> > > > KIP.
> > > > > > > Please
> > > > > > > > > let me know if I am.
> > > > > > > > >
> > > > > > > > > - Rowland
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Thu, Jan 4, 2024 at 12:56 PM Justine Olshan
> > > > > > > > > <jols...@confluent.io.invalid>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hey Rowland,
> > > > > > > > > >
> > > > > > > > > > Not sure why this message showed up in a different thread
> > > from
> > > > > the
> > > > > > > > other
> > > > > > > > > > KIP-939 discussion (is it just me?)
> > > > > > > > > >
> > > > > > > > > > In KIP-939, we do away with having any transactional
> > timeout
> > > on
> > > > > the
> > > > > > > > Kafka
> > > > > > > > > > side. The external coordinator is fully responsible for
> > > > > controlling
> > > > > > > > > whether
> > > > > > > > > > the transaction completes.
> > > > > > > > > >
> > > > > > > > > > While I think there is some use in having a prepare
> stage,
> > I
> > > > just
> > > > > > > > wanted
> > > > > > > > > to
> > > > > > > > > > clarify what the current KIP is proposing.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Justine
> > > > > > > > > >
> > > > > > > > > > On Wed, Jan 3, 2024 at 7:49 PM Rowland Smith <
> > > > rowl...@gmail.com>
> > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi Artem,
> > > > > > > > > > >
> > > > > > > > > > > I saw your response in the thread I started discussing
> > > Kafka
> > > > > > > > > distributed
> > > > > > > > > > > transaction support and the XA interface. I would like
> to
> > > > work
> > > > > > with
> > > > > > > > you
> > > > > > > > > > to
> > > > > > > > > > > add XA support to Kafka on top of the excellent
> > > foundational
> > > > > work
> > > > > > > > that
> > > > > > > > > > you
> > > > > > > > > > > have started with KIP-939. I agree that explicit XA
> > support
> > > > > > should
> > > > > > > > not
> > > > > > > > > be
> > > > > > > > > > > included in the Kafka codebase as long as the right set
> > of
> > > > > basic
> > > > > > > > > > operations
> > > > > > > > > > > are provided. I will begin pulling together a KIP to
> > follow
> > > > > > > KIP-939.
> > > > > > > > > > >
> > > > > > > > > > > I did have one comment on KIP-939 itself. I see that
> you
> > > > > > considered
> > > > > > > > an
> > > > > > > > > > > explicit "prepare" RPC, but decided not to add it. If I
> > > > > > understand
> > > > > > > > your
> > > > > > > > > > > design correctly, that would mean that a 2PC
> transaction
> > > > would
> > > > > > > have a
> > > > > > > > > > > single timeout that would need to be long enough to
> > ensure
> > > > that
> > > > > > > > > prepared
> > > > > > > > > > > transactions are not aborted when an external
> coordinator
> > > > > fails.
> > > > > > > > > However,
> > > > > > > > > > > this also means that an unprepared transaction would
> not
> > be
> > > > > > aborted
> > > > > > > > > > without
> > > > > > > > > > > waiting for the same timeout. Since long running
> > > transactions
> > > > > > block
> > > > > > > > > > > transactional consumers, having a long timeout for all
> > > > > > transactions
> > > > > > > > > could
> > > > > > > > > > > be disruptive. An explicit "prepare " RPC would allow
> the
> > > > > server
> > > > > > to
> > > > > > > > > abort
> > > > > > > > > > > unprepared transactions after a relatively short
> timeout,
> > > and
> > > > > > > apply a
> > > > > > > > > > much
> > > > > > > > > > > longer timeout only to prepared transactions. The
> > explicit
> > > > > > > "prepare"
> > > > > > > > > RPC
> > > > > > > > > > > would make Kafka server more resilient to client
> failure
> > at
> > > > the
> > > > > > > cost
> > > > > > > > of
> > > > > > > > > > an
> > > > > > > > > > > extra synchronous RPC call. I think its worth
> > reconsidering
> > > > > this.
> > > > > > > > > > >
> > > > > > > > > > > With an XA implementation this might become a more
> > > > significant
> > > > > > > issue
> > > > > > > > > > since
> > > > > > > > > > > the transaction coordinator has no memory of unprepared
> > > > > > > transactions
> > > > > > > > > > across
> > > > > > > > > > > restarts. Such transactions would need to be cleared by
> > > hand
> > > > > > > through
> > > > > > > > > the
> > > > > > > > > > > admin client even when the transaction coordinator
> > restarts
> > > > > > > > > successfully.
> > > > > > > > > > >
> > > > > > > > > > > - Rowland
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > *Rowland E. Smith*
> > > > > > > P: (862) 260-4163
> > > > > > > M: (201) 396-3842
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > > *Rowland E. Smith*
> > > P: (862) 260-4163
> > > M: (201) 396-3842
> > >
> >
>
>
> --
> *Rowland E. Smith*
> P: (862) 260-4163
> M: (201) 396-3842
>

Re: [DISCUSS] KIP-939: Support Participation in 2PC

Reply via email to