Hi, Justine,

I don't see this create any issue. It just makes it a bit hard to explain
what this non-tagged produce id field means. We are essentially trying to
combine two actions (completing a txn and init a new produce Id) in a
single record. But, this may be fine too.

A few other follow up comments.

101.3 I guess the reason that we only set the previous produce id tagged
field in the complete marker, but not in the prepare marker, is that in the
prepare state, we always return CONCURRENT_TRANSACTIONS on retried endMaker
requests?

110. "I believe your second point is mentioned in the KIP. I can add more
text on
this if it is helpful.
> The delayed message case can also violate EOS if the delayed message
comes in after the next addPartitionsToTxn request comes in. Effectively we
may see a message from a previous (aborted) transaction become part of the
next transaction."

The above is the case when a delayed message is appended to the data
partition. What I mentioned is a slightly different case when a delayed
marker is appended to the transaction log partition.

111. The KIP says "Once we move past the Prepare and Complete states, we
don’t need to worry about lastSeen fields and clear them, just handle state
transitions as normal.". Is the lastSeen field the same as the previous
Produce Id tagged field in TransactionLogValue?

112. Since the kip changes the inter-broker protocol, should we bump up the
MV/IBP version? Is this feature only for the KRaft mode?

Thanks,

Jun


On Wed, Jan 17, 2024 at 11:13 AM Justine Olshan
<jols...@confluent.io.invalid> wrote:

> Hey Jun,
>
> I'm glad we are getting to convergence on the design. :)
>
> While I understand it seems a little "weird". I'm not sure what the benefit
> of writing an extra record to the log.
> Is the concern a tool to describe transactions won't work (ie, the complete
> state is needed to calculate the time since the transaction completed?)
> If we have a reason like this, it is enough to convince me we need such an
> extra record. It seems like it would be replacing the record written on
> InitProducerId. Is this correct?
>
> Thanks,
> Justine
>
> On Tue, Jan 16, 2024 at 5:14 PM Jun Rao <j...@confluent.io.invalid> wrote:
>
> > Hi, Justine,
> >
> > Thanks for the explanation. I understand the intention now. In the
> overflow
> > case, we set the non-tagged field to the old pid (and the max epoch) in
> the
> > prepare marker so that we could correctly write the marker to the data
> > partition if the broker downgrades. When writing the complete marker, we
> > know the marker has already been written to the data partition. We set
> the
> > non-tagged field to the new pid to avoid InvalidPidMappingException in
> the
> > client if the broker downgrades.
> >
> > The above seems to work. It's just a bit inconsistent for a prepare
> marker
> > and a complete marker to use different pids in this special case. If we
> > downgrade with the complete marker, it seems that we will never be able
> to
> > write the complete marker with the old pid. Not sure if it causes any
> > issue, but it seems a bit weird. Instead of writing the complete marker
> > with the new pid, could we write two records: a complete marker with the
> > old pid followed by a TransactionLogValue with the new pid and an empty
> > state? We could make the two records in the same batch so that they will
> be
> > added to the log atomically.
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Fri, Jan 12, 2024 at 5:40 PM Justine Olshan
> > <jols...@confluent.io.invalid>
> > wrote:
> >
> > > (1) the prepare marker is written, but the endTxn response is not
> > received
> > > by the client when the server downgrades
> > > (2)  the prepare marker is written, the endTxn response is received by
> > the
> > > client when the server downgrades.
> > >
> > > I think I am still a little confused. In both of these cases, the
> > > transaction log has the old producer ID. We don't write the new
> producer
> > ID
> > > in the prepare marker's non tagged fields.
> > > If the server downgrades now, it would read the records not in tagged
> > > fields and the complete marker will also have the old producer ID.
> > > (If we had used the new producer ID, we would not have transactional
> > > correctness since the producer id doesn't match the transaction and the
> > > state would not be correct on the data partition.)
> > >
> > > In the overflow case, I'd expect the following to happen on the client
> > side
> > > Case 1  -- we retry EndTxn -- it is the same producer ID and epoch - 1
> > this
> > > would fence the producer
> > > Case 2 -- we don't retry EndTxn and use the new producer id which would
> > > result in InvalidPidMappingException
> > >
> > > Maybe we can have special handling for when a server downgrades. When
> it
> > > reconnects we could get an API version request showing KIP-890 part 2
> is
> > > not supported. In that case, we can call initProducerId to abort the
> > > transaction. (In the overflow case, this correctly gives us a new
> > producer
> > > ID)
> > >
> > > I guess the corresponding case would be where the *complete marker *is
> > > written but the endTxn is not received by the client and the server
> > > downgrades? This would result in the transaction coordinator having the
> > new
> > > ID and not the old one.  If the client retries, it will receive an
> > > InvalidPidMappingException. The InitProducerId scenario above would
> help
> > > here too.
> > >
> > > To be clear, my compatibility story is meant to support downgrades
> server
> > > side in keeping the transactional correctness. Keeping the client from
> > > fencing itself is not the priority.
> > >
> > > Hope this helps. I can also add text in the KIP about InitProducerId if
> > we
> > > think that fixes some edge cases.
> > >
> > > Justine
> > >
> > > On Fri, Jan 12, 2024 at 4:10 PM Jun Rao <j...@confluent.io.invalid>
> > wrote:
> > >
> > > > Hi, Justine,
> > > >
> > > > Thanks for the reply.
> > > >
> > > > I agree that we don't need to optimize for fencing during downgrades.
> > > > Regarding consistency, there are two possible cases: (1) the prepare
> > > marker
> > > > is written, but the endTxn response is not received by the client
> when
> > > the
> > > > server downgrades; (2)  the prepare marker is written, the endTxn
> > > response
> > > > is received by the client when the server downgrades. In (1), the
> > client
> > > > will have the old produce Id and in (2), the client will have the new
> > > > produce Id. If we downgrade right after the prepare marker, we can't
> be
> > > > consistent to both (1) and (2) since we can only put one value in the
> > > > existing produce Id field. It's also not clear which case is more
> > likely.
> > > > So we could probably be consistent with either case. By putting the
> new
> > > > producer Id in the prepare marker, we are consistent with case (2)
> and
> > it
> > > > also has the slight benefit that the produce field in the prepare and
> > > > complete marker are consistent in the overflow case.
> > > >
> > > > Jun
> > > >
> > > > On Fri, Jan 12, 2024 at 3:11 PM Justine Olshan
> > > > <jols...@confluent.io.invalid>
> > > > wrote:
> > > >
> > > > > Hi Jun,
> > > > >
> > > > > In the case you describe, we would need to have a delayed request,
> > > send a
> > > > > successful EndTxn, and a successful AddPartitionsToTxn and then
> have
> > > the
> > > > > delayed EndTxn request go through for a given producer.
> > > > > I'm trying to figure out if it is possible for the client to
> > transition
> > > > if
> > > > > a previous request is delayed somewhere. But yes, in this case I
> > think
> > > we
> > > > > would fence the client.
> > > > >
> > > > > Not for the overflow case. In the overflow case, the producer ID
> and
> > > the
> > > > > epoch are different on the marker and on the new transaction. So we
> > > want
> > > > > the marker to use the max epoch  but the new transaction should
> start
> > > > with
> > > > > the new ID and epoch 0 in the transactional state.
> > > > >
> > > > > In the server downgrade case, we want to see the producer ID as
> that
> > is
> > > > > what the client will have. If we complete the commit, and the
> > > transaction
> > > > > state is reloaded, we need the new producer ID in the state so
> there
> > > > isn't
> > > > > an invalid producer ID mapping.
> > > > > The server downgrade cases are considering transactional
> correctness
> > > and
> > > > > not regressing from previous behavior -- and are not concerned
> about
> > > > > supporting the safety from fencing retries (as we have downgraded
> so
> > we
> > > > > don't need to support). Perhaps this is a trade off, but I think it
> > is
> > > > the
> > > > > right one.
> > > > >
> > > > > (If the client downgrades, it will have restarted and it is ok for
> it
> > > to
> > > > > have a new producer ID too).
> > > > >
> > > > > Justine
> > > > >
> > > > > On Fri, Jan 12, 2024 at 11:42 AM Jun Rao <j...@confluent.io.invalid
> >
> > > > wrote:
> > > > >
> > > > > > Hi, Justine,
> > > > > >
> > > > > > Thanks for the reply.
> > > > > >
> > > > > > 101.4 "If the marker is written by the new client, we can as I
> > > > mentioned
> > > > > in
> > > > > > the last email guarantee that any EndTxn requests with the same
> > epoch
> > > > are
> > > > > > from the same producer and the same transaction. Then we don't
> have
> > > to
> > > > > > return a fenced error but can handle gracefully as described in
> the
> > > > KIP."
> > > > > > When a delayed EndTnx request is processed, the txn state could
> be
> > > > > ongoing
> > > > > > for the next txn. I guess in this case we still return the fenced
> > > error
> > > > > for
> > > > > > the delayed request?
> > > > > >
> > > > > > 102. Sorry, my question was inaccurate. What you described is
> > > accurate.
> > > > > > "The downgrade compatibility I mention is that we keep the same
> > > > producer
> > > > > ID
> > > > > > and epoch in the main (non-tagged) fields as we did before the
> code
> > > on
> > > > > the
> > > > > > server side." If we want to do this, it seems that we should use
> > the
> > > > > > current produce Id and max epoch in the existing producerId and
> > > > > > producerEpoch fields for both the prepare and the complete
> marker,
> > > > right?
> > > > > > The downgrade can happen after the complete marker is written.
> With
> > > > what
> > > > > > you described, the downgraded coordinator will see the new
> produce
> > Id
> > > > > > instead of the old one.
> > > > > >
> > > > > > Jun
> > > > > >
> > > > > > On Fri, Jan 12, 2024 at 10:44 AM Justine Olshan
> > > > > > <jols...@confluent.io.invalid> wrote:
> > > > > >
> > > > > > > Hi Jun,
> > > > > > >
> > > > > > > I can update the description.
> > > > > > >
> > > > > > > I believe your second point is mentioned in the KIP. I can add
> > more
> > > > > text
> > > > > > on
> > > > > > > this if it is helpful.
> > > > > > > > The delayed message case can also violate EOS if the delayed
> > > > message
> > > > > > > comes in after the next addPartitionsToTxn request comes in.
> > > > > Effectively
> > > > > > we
> > > > > > > may see a message from a previous (aborted) transaction become
> > part
> > > > of
> > > > > > the
> > > > > > > next transaction.
> > > > > > >
> > > > > > > If the marker is written by the new client, we can as I
> mentioned
> > > in
> > > > > the
> > > > > > > last email guarantee that any EndTxn requests with the same
> epoch
> > > are
> > > > > > from
> > > > > > > the same producer and the same transaction. Then we don't have
> to
> > > > > return
> > > > > > a
> > > > > > > fenced error but can handle gracefully as described in the KIP.
> > > > > > > I don't think a boolean is useful since it is directly encoded
> by
> > > the
> > > > > > > existence or lack of the tagged field being written.
> > > > > > > In the prepare marker we will have the same producer ID in the
> > > > > non-tagged
> > > > > > > field. In the Complete state we may not.
> > > > > > > I'm not sure why the ongoing state matters for this KIP. It
> does
> > > > matter
> > > > > > for
> > > > > > > KIP-939.
> > > > > > >
> > > > > > > I'm not sure what you are referring to about writing the
> previous
> > > > > > producer
> > > > > > > ID in the prepare marker. This is not in the KIP.
> > > > > > > In the overflow case, we write the nextProducerId in the
> prepare
> > > > state.
> > > > > > > This is so we know what we assigned when we reload the
> > transaction
> > > > log.
> > > > > > > Once we complete, we transition this ID to the main (non-tagged
> > > > field)
> > > > > > and
> > > > > > > have the previous producer ID field filled in. This is so we
> can
> > > > > identify
> > > > > > > in a retry case the operation completed successfully and we
> don't
> > > > fence
> > > > > > our
> > > > > > > producer. The downgrade compatibility I mention is that we keep
> > the
> > > > > same
> > > > > > > producer ID and epoch in the main (non-tagged) fields as we did
> > > > before
> > > > > > the
> > > > > > > code on the server side. If the server downgrades, we are still
> > > > > > compatible.
> > > > > > > This addresses both the prepare and complete state downgrades.
> > > > > > >
> > > > > > > Justine
> > > > > > >
> > > > > > > On Fri, Jan 12, 2024 at 10:21 AM Jun Rao
> > <j...@confluent.io.invalid
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hi, Justine,
> > > > > > > >
> > > > > > > > Thanks for the reply. Sorry for the delay. I have a few more
> > > > > comments.
> > > > > > > >
> > > > > > > > 110. I think the motivation section could be improved. One of
> > the
> > > > > > > > motivations listed by the KIP is "This can happen when a
> > message
> > > > gets
> > > > > > > stuck
> > > > > > > > or delayed due to networking issues or a network partition,
> the
> > > > > > > transaction
> > > > > > > > aborts, and then the delayed message finally comes in.". This
> > > seems
> > > > > not
> > > > > > > > very accurate. Without KIP-890, currently, if the coordinator
> > > times
> > > > > out
> > > > > > > and
> > > > > > > > aborts an ongoing transaction, it already bumps up the epoch
> in
> > > the
> > > > > > > marker,
> > > > > > > > which prevents the delayed produce message from being added
> to
> > > the
> > > > > user
> > > > > > > > partition. What can cause a hanging transaction is that the
> > > > producer
> > > > > > > > completes (either aborts or commits) a transaction before
> > > > receiving a
> > > > > > > > successful ack on messages published in the same txn. In this
> > > case,
> > > > > > it's
> > > > > > > > possible for the delayed message to be appended to the
> > partition
> > > > > after
> > > > > > > the
> > > > > > > > marker, causing a transaction to hang.
> > > > > > > >
> > > > > > > > A similar issue (not mentioned in the motivation) could
> happen
> > on
> > > > the
> > > > > > > > marker in the coordinator's log. For example, it's possible
> for
> > > an
> > > > > > > > EndTxnRequest to be delayed on the coordinator. By the time
> the
> > > > > delayed
> > > > > > > > EndTxnRequest is processed, it's possible that the previous
> txn
> > > has
> > > > > > > already
> > > > > > > > completed and a new txn has started. Currently, since the
> epoch
> > > is
> > > > > not
> > > > > > > > bumped on every txn, the delayed EndTxnRequest will add an
> > > > unexpected
> > > > > > > > prepare marker (and eventually a complete marker) to the
> > ongoing
> > > > txn.
> > > > > > > This
> > > > > > > > won't cause the transaction to hang, but it will break the
> EoS
> > > > > > semantic.
> > > > > > > > The proposal in this KIP will address this issue too.
> > > > > > > >
> > > > > > > > 101. "However, I was writing it so that we can distinguish
> > > between
> > > > > > > > old clients where we don't have the ability do this operation
> > and
> > > > new
> > > > > > > > clients that can. (Old clients don't bump the epoch on
> commit,
> > so
> > > > we
> > > > > > > can't
> > > > > > > > say for sure the write belongs to the given transaction)."
> > > > > > > > 101.1 I am wondering why we need to distinguish whether the
> > > marker
> > > > is
> > > > > > > > written by the old and the new client. Could you describe
> what
> > we
> > > > do
> > > > > > > > differently if we know the marker is written by the new
> client?
> > > > > > > > 101.2 If we do need a way to distinguish whether the marker
> is
> > > > > written
> > > > > > by
> > > > > > > > the old and the new client. Would it be simpler to just
> > > introduce a
> > > > > > > boolean
> > > > > > > > field instead of indirectly through the previous produce ID
> > > field?
> > > > > > > > 101.3 It's not clear to me why we only add the previous
> produce
> > > ID
> > > > > > field
> > > > > > > in
> > > > > > > > the complete marker, but not in the prepare marker. If we
> want
> > to
> > > > > know
> > > > > > > > whether a marker is written by the new client or not, it
> seems
> > > that
> > > > > we
> > > > > > > want
> > > > > > > > to do this consistently for all markers.
> > > > > > > > 101.4 What about the TransactionLogValue record representing
> > the
> > > > > > ongoing
> > > > > > > > state? Should we also distinguish whether it's written by the
> > old
> > > > or
> > > > > > the
> > > > > > > > new client?
> > > > > > > >
> > > > > > > > 102. In the overflow case, it's still not clear to me why we
> > > write
> > > > > the
> > > > > > > > previous produce Id in the prepare marker while writing the
> > next
> > > > > > produce
> > > > > > > Id
> > > > > > > > in the complete marker. You mentioned that it's for
> > downgrading.
> > > > > > However,
> > > > > > > > we could downgrade with either the prepare marker or the
> > complete
> > > > > > marker.
> > > > > > > > In either case, the downgraded coordinator should see the
> same
> > > > > produce
> > > > > > id
> > > > > > > > (probably the previous produce Id), right?
> > > > > > > >
> > > > > > > > Jun
> > > > > > > >
> > > > > > > > On Wed, Dec 20, 2023 at 6:00 PM Justine Olshan
> > > > > > > > <jols...@confluent.io.invalid>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hey Jun,
> > > > > > > > >
> > > > > > > > > Thanks for taking a look at the KIP again.
> > > > > > > > >
> > > > > > > > > 100. For the epoch overflow case, only the marker will have
> > max
> > > > > > epoch.
> > > > > > > > This
> > > > > > > > > keeps the behavior of the rest of the markers where the
> last
> > > > marker
> > > > > > is
> > > > > > > > the
> > > > > > > > > epoch of the transaction records + 1.
> > > > > > > > >
> > > > > > > > > 101. You are correct that we don't need to write the
> producer
> > > ID
> > > > > > since
> > > > > > > it
> > > > > > > > > is the same. However, I was writing it so that we can
> > > distinguish
> > > > > > > between
> > > > > > > > > old clients where we don't have the ability do this
> operation
> > > and
> > > > > new
> > > > > > > > > clients that can. (Old clients don't bump the epoch on
> > commit,
> > > so
> > > > > we
> > > > > > > > can't
> > > > > > > > > say for sure the write belongs to the given transaction).
> If
> > we
> > > > > > receive
> > > > > > > > an
> > > > > > > > > EndTxn request from a new client, we will fill this field.
> We
> > > can
> > > > > > > > guarantee
> > > > > > > > > that any EndTxn requests with the same epoch are from the
> > same
> > > > > > producer
> > > > > > > > and
> > > > > > > > > the same transaction.
> > > > > > > > >
> > > > > > > > > 102. In prepare phase, we have the same producer ID and
> epoch
> > > we
> > > > > > always
> > > > > > > > > had. It is the producer ID and epoch that are on the
> marker.
> > In
> > > > > > commit
> > > > > > > > > phase, we stay the same unless it is the overflow case. In
> > that
> > > > > case,
> > > > > > > we
> > > > > > > > > set the producer ID to the new one we generated and epoch
> to
> > 0
> > > > > after
> > > > > > > > > complete. This is for downgrade compatibility. The tagged
> > > fields
> > > > > are
> > > > > > > just
> > > > > > > > > safety guards for retries and failovers.
> > > > > > > > >
> > > > > > > > > In prepare phase for epoch overflow case only we store the
> > next
> > > > > > > producer
> > > > > > > > > ID. This is for the case where we reload the transaction
> > > > > coordinator
> > > > > > in
> > > > > > > > > prepare state. Once the transaction is committed, we can
> use
> > > the
> > > > > > > producer
> > > > > > > > > ID the client already is using.
> > > > > > > > >
> > > > > > > > > In commit phase, we store the previous producer ID in case
> of
> > > > > > retries.
> > > > > > > > >
> > > > > > > > > I think it is easier to think of it as just how we were
> > storing
> > > > > > > producer
> > > > > > > > ID
> > > > > > > > > and epoch before, with some extra bookeeping and edge case
> > > > handling
> > > > > > in
> > > > > > > > the
> > > > > > > > > tagged fields. We have to do it this way for compatibility
> > with
> > > > > > > > downgrades.
> > > > > > > > >
> > > > > > > > > 103. Next producer ID is for prepare status and previous
> > > producer
> > > > > ID
> > > > > > is
> > > > > > > > for
> > > > > > > > > after complete. The reason why we need two separate
> (tagged)
> > > > fields
> > > > > > is
> > > > > > > > for
> > > > > > > > > backwards compatibility. We need to keep the same semantics
> > for
> > > > the
> > > > > > > > > non-tagged field in case we downgrade.
> > > > > > > > >
> > > > > > > > > 104. We set the fields as we do in the transactional state
> > (as
> > > we
> > > > > > need
> > > > > > > to
> > > > > > > > > do this for compatibility -- if we downgrade, we will only
> > have
> > > > the
> > > > > > > > > non-tagged fields) It will be the old producer ID and max
> > > epoch.
> > > > > > > > >
> > > > > > > > > Hope this helps. Let me know if you have further questions.
> > > > > > > > >
> > > > > > > > > Justine
> > > > > > > > >
> > > > > > > > > On Wed, Dec 20, 2023 at 3:33 PM Jun Rao
> > > <j...@confluent.io.invalid
> > > > >
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi, Justine,
> > > > > > > > > >
> > > > > > > > > > It seems that you have made some changes to KIP-890 since
> > the
> > > > > vote.
> > > > > > > In
> > > > > > > > > > particular, we are changing the format of
> > > TransactionLogValue.
> > > > A
> > > > > > few
> > > > > > > > > > comments related to that.
> > > > > > > > > >
> > > > > > > > > > 100. Just to be clear. The overflow case (i.e. when a new
> > > > > > producerId
> > > > > > > is
> > > > > > > > > > generated) is when the current epoch equals to max - 1
> and
> > > not
> > > > > max?
> > > > > > > > > >
> > > > > > > > > > 101. For the "not epoch overflow" case, we write the
> > previous
> > > > ID
> > > > > in
> > > > > > > the
> > > > > > > > > > tagged field in the complete phase. Do we need to do that
> > > since
> > > > > > > produce
> > > > > > > > > id
> > > > > > > > > > doesn't change in this case?
> > > > > > > > > >
> > > > > > > > > > 102. It seems that the meaning for the
> > > ProducerId/ProducerEpoch
> > > > > > > fields
> > > > > > > > in
> > > > > > > > > > TransactionLogValue changes depending on the
> > > TransactionStatus.
> > > > > > When
> > > > > > > > > > the TransactionStatus is ongoing, they represent the
> > current
> > > > > > > ProducerId
> > > > > > > > > and
> > > > > > > > > > the current ProducerEpoch. When the TransactionStatus is
> > > > > > > > > > PrepareCommit/PrepareAbort, they represent the current
> > > > ProducerId
> > > > > > and
> > > > > > > > the
> > > > > > > > > > next ProducerEpoch. When the TransactionStatus is
> > > Commit/Abort,
> > > > > > they
> > > > > > > > > > further depend on whether the epoch overflows or not. If
> > > there
> > > > is
> > > > > > no
> > > > > > > > > > overflow, they represent  the current ProducerId and the
> > next
> > > > > > > > > ProducerEpoch
> > > > > > > > > > (max). Otherwise, they represent the newly generated
> > > ProducerId
> > > > > > and a
> > > > > > > > > > ProducerEpoch of 0. Is that right? This seems not easy to
> > > > > > understand.
> > > > > > > > > Could
> > > > > > > > > > we provide some examples like what Artem has done in
> > KIP-939?
> > > > > Have
> > > > > > we
> > > > > > > > > > considered a simpler design where
> ProducerId/ProducerEpoch
> > > > always
> > > > > > > > > represent
> > > > > > > > > > the same value (e.g. for the current transaction)
> > independent
> > > > of
> > > > > > the
> > > > > > > > > > TransactionStatus and epoch overflow?
> > > > > > > > > >
> > > > > > > > > > 103. It's not clear to me why we need 3 fields:
> ProducerId,
> > > > > > > > > PrevProducerId,
> > > > > > > > > > NextProducerId. Could we just have ProducerId and
> > > > NextProducerId?
> > > > > > > > > >
> > > > > > > > > > 104. For WriteTxnMarkerRequests, if the producer epoch
> > > > overflows,
> > > > > > > what
> > > > > > > > do
> > > > > > > > > > we set the producerId and the producerEpoch?
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > >
> > > > > > > > > > Jun
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to