Hey there -- small update to the KIP, The KIP mentioned introducing ABORTABLE_ERROR and bumping TxnOffsetCommit and Produce requests. I've changed the name in the KIP to ABORTABLE_TRANSACTION and the corresponding exception AbortableTransactionException to match the pattern we had for other errors. I also mentioned bumping all 6 transactional APIs so we can future proof/support the error on the client going forward. If a future change wants to have an error scenario that requires us to abort the transaction, we can rely on the 3.8+ clients to support it. We ran into issues finding good/generic error codes that older clients could support while working on this KIP, so this should help in the future.
The features discussion is still ongoing in KIP-1022. Will update again here when that concludes. Justine On Tue, Feb 6, 2024 at 8:39 AM Justine Olshan <jols...@confluent.io> wrote: > I don't think AddPartitions is a good example since we currenly don't gate > the version on TV or MV. (We only set a different flag depending on the TV) > > Even if we did want to gate it on TV, I think the idea is to move away > from MV gating inter broker protocols. Ideally we can get to a state where > MV is just used for metadata changes. > > I think some of this discussion might fit more with the feature version > KIP, so I can try to open that up soon. Until we settle that, some of the > work in KIP-890 is blocked. > > Justine > > On Mon, Feb 5, 2024 at 5:38 PM Jun Rao <j...@confluent.io.invalid> wrote: > >> Hi, Justine, >> >> Thanks for the reply. >> >> Since AddPartitions is an inter broker request, will its version be gated >> only by TV or other features like MV too? For example, if we need to >> change >> the protocol for AddPartitions for reasons other than txn verification in >> the future, will the new version be gated by a new MV? If so, does >> downgrading a TV imply potential downgrade of MV too? >> >> Jun >> >> >> >> On Mon, Feb 5, 2024 at 5:07 PM Justine Olshan >> <jols...@confluent.io.invalid> >> wrote: >> >> > One TV gates the flexible feature version (no rpcs involved, only the >> > transactional records that should only be gated by TV) >> > Another TV gates the ability to turn on kip-890 part 2. This would gate >> the >> > version of Produce and EndTxn (likely only used by transactions), and >> > specifies a flag in AddPartitionsToTxn though the version is already >> used >> > without TV. >> > >> > I think the only concern is the Produce request and we could consider >> work >> > arounds similar to the AddPartitionsToTxn call. >> > >> > Justine >> > >> > On Mon, Feb 5, 2024 at 4:56 PM Jun Rao <j...@confluent.io.invalid> >> wrote: >> > >> > > Hi, Justine, >> > > >> > > Which PRC/record protocols will TV guard? Going forward, will those >> > > PRC/record protocols only be guarded by TV and not by other features >> like >> > > MV? >> > > >> > > Thanks, >> > > >> > > Jun >> > > >> > > On Mon, Feb 5, 2024 at 2:41 PM Justine Olshan >> > <jols...@confluent.io.invalid >> > > > >> > > wrote: >> > > >> > > > Hi Jun, >> > > > >> > > > Sorry I think I misunderstood your question or answered incorrectly. >> > The >> > > TV >> > > > version should ideally be fully independent from MV. >> > > > At least for the changes I proposed, TV should not affect MV and MV >> > > should >> > > > not affect TV/ >> > > > >> > > > I think if we downgrade TV, only that feature should downgrade. >> > Likewise >> > > > the same with MV. The finalizedFeatures should just reflect the >> feature >> > > > downgrade we made. >> > > > >> > > > I also plan to write a new KIP for managing the disk format and >> upgrade >> > > > tool as we will need new flags to support these features. That >> should >> > > help >> > > > clarify some things. >> > > > >> > > > Justine >> > > > >> > > > On Mon, Feb 5, 2024 at 11:03 AM Jun Rao <j...@confluent.io.invalid> >> > > wrote: >> > > > >> > > > > Hi, Justine, >> > > > > >> > > > > Thanks for the reply. >> > > > > >> > > > > So, if we downgrade TV, we could implicitly downgrade another >> feature >> > > > (say >> > > > > MV) that has dependency (e.g. RPC). What would we return for >> > > > > FinalizedFeatures for MV in ApiVersionsResponse in that case? >> > > > > >> > > > > Thanks, >> > > > > >> > > > > Jun >> > > > > >> > > > > On Fri, Feb 2, 2024 at 1:06 PM Justine Olshan >> > > > <jols...@confluent.io.invalid >> > > > > > >> > > > > wrote: >> > > > > >> > > > > > Hey Jun, >> > > > > > >> > > > > > Yes, the idea is that if we downgrade TV (transaction version) >> we >> > > will >> > > > > stop >> > > > > > using the add partitions to txn optimization and stop writing >> the >> > > > > flexible >> > > > > > feature version of the log. >> > > > > > In the compatibility section I included some explanations on how >> > this >> > > > is >> > > > > > done. >> > > > > > >> > > > > > Thanks, >> > > > > > Justine >> > > > > > >> > > > > > On Fri, Feb 2, 2024 at 11:12 AM Jun Rao >> <j...@confluent.io.invalid> >> > > > > wrote: >> > > > > > >> > > > > > > Hi, Justine, >> > > > > > > >> > > > > > > Thanks for the update. >> > > > > > > >> > > > > > > If we ever downgrade the transaction feature, any feature >> > depending >> > > > on >> > > > > > > changes on top of those RPC/record >> > > > > > > (AddPartitionsToTxnRequest/TransactionLogValue) changes made >> in >> > > > KIP-890 >> > > > > > > will be automatically downgraded too? >> > > > > > > >> > > > > > > Jun >> > > > > > > >> > > > > > > On Tue, Jan 30, 2024 at 3:32 PM Justine Olshan >> > > > > > > <jols...@confluent.io.invalid> >> > > > > > > wrote: >> > > > > > > >> > > > > > > > Hey Jun, >> > > > > > > > >> > > > > > > > I wanted to get back to you about your questions about >> MV/IBP. >> > > > > > > > >> > > > > > > > Looking at the options, I think it makes the most sense to >> > > create a >> > > > > > > > separate feature for transactions and use that to version >> gate >> > > the >> > > > > > > features >> > > > > > > > we need to version gate (flexible transactional state >> records >> > and >> > > > > using >> > > > > > > the >> > > > > > > > new protocol) >> > > > > > > > I've updated the KIP to include this change. Hopefully >> that's >> > > > > > everything >> > > > > > > we >> > > > > > > > need for this KIP :) >> > > > > > > > >> > > > > > > > Justine >> > > > > > > > >> > > > > > > > >> > > > > > > > On Mon, Jan 22, 2024 at 3:17 PM Justine Olshan < >> > > > jols...@confluent.io >> > > > > > >> > > > > > > > wrote: >> > > > > > > > >> > > > > > > > > Thanks Jun, >> > > > > > > > > >> > > > > > > > > I will update the KIP with the prev field for prepare as >> > well. >> > > > > > > > > >> > > > > > > > > PREPARE >> > > > > > > > > producerId: x >> > > > > > > > > previous/lastProducerId (tagged field): x >> > > > > > > > > nextProducerId (tagged field): empty or z if y will >> overflow >> > > > > > > > > producerEpoch: y + 1 >> > > > > > > > > >> > > > > > > > > COMPLETE >> > > > > > > > > producerId: x or z if y overflowed >> > > > > > > > > previous/lastProducerId (tagged field): x >> > > > > > > > > nextProducerId (tagged field): empty >> > > > > > > > > producerEpoch: y + 1 or 0 if we overflowed >> > > > > > > > > >> > > > > > > > > Thanks again, >> > > > > > > > > Justine >> > > > > > > > > >> > > > > > > > > On Mon, Jan 22, 2024 at 3:15 PM Jun Rao >> > > <j...@confluent.io.invalid >> > > > > >> > > > > > > > wrote: >> > > > > > > > > >> > > > > > > > >> Hi, Justine, >> > > > > > > > >> >> > > > > > > > >> 101.3 Thanks for the explanation. >> > > > > > > > >> (1) My point was that the coordinator could fail right >> after >> > > > > writing >> > > > > > > the >> > > > > > > > >> prepare marker. When the new txn coordinator generates >> the >> > > > > complete >> > > > > > > > marker >> > > > > > > > >> after the failover, it needs some field from the prepare >> > > marker >> > > > to >> > > > > > > > >> determine whether it's written by the new client. >> > > > > > > > >> >> > > > > > > > >> (2) The changing of the behavior sounds good to me. We >> only >> > > want >> > > > > to >> > > > > > > > return >> > > > > > > > >> success if the prepare state is written by the new >> client. >> > So, >> > > > in >> > > > > > the >> > > > > > > > >> non-overflow case, it seems that we also need sth in the >> > > prepare >> > > > > > > marker >> > > > > > > > to >> > > > > > > > >> tell us whether it's written by the new client. >> > > > > > > > >> >> > > > > > > > >> 112. Thanks for the explanation. That sounds good to me. >> > > > > > > > >> >> > > > > > > > >> Jun >> > > > > > > > >> >> > > > > > > > >> On Mon, Jan 22, 2024 at 11:32 AM Justine Olshan >> > > > > > > > >> <jols...@confluent.io.invalid> wrote: >> > > > > > > > >> >> > > > > > > > >> > 101.3 I realized that I actually have two questions. >> > > > > > > > >> > > (1) In the non-overflow case, we need to write the >> > > previous >> > > > > > > produce >> > > > > > > > Id >> > > > > > > > >> > tagged field in the end maker so that we know if the >> > marker >> > > is >> > > > > > from >> > > > > > > > the >> > > > > > > > >> new >> > > > > > > > >> > client. Since the end maker is derived from the prepare >> > > > marker, >> > > > > > > should >> > > > > > > > >> we >> > > > > > > > >> > write the previous produce Id in the prepare marker >> field >> > > too? >> > > > > > > > >> Otherwise, >> > > > > > > > >> > we will lose this information when deriving the end >> > marker. >> > > > > > > > >> > >> > > > > > > > >> > The "previous" producer ID is in the normal producer ID >> > > field. >> > > > > So >> > > > > > > yes, >> > > > > > > > >> we >> > > > > > > > >> > need it in prepare and that was always the plan. >> > > > > > > > >> > >> > > > > > > > >> > Maybe it is a bit unclear so I will enumerate the >> fields >> > and >> > > > add >> > > > > > > them >> > > > > > > > to >> > > > > > > > >> > the KIP if that helps. >> > > > > > > > >> > Say we have producer ID x and epoch y. When we overflow >> > > epoch >> > > > y >> > > > > we >> > > > > > > get >> > > > > > > > >> > producer ID Z. >> > > > > > > > >> > >> > > > > > > > >> > PREPARE >> > > > > > > > >> > producerId: x >> > > > > > > > >> > previous/lastProducerId (tagged field): empty >> > > > > > > > >> > nextProducerId (tagged field): empty or z if y will >> > overflow >> > > > > > > > >> > producerEpoch: y + 1 >> > > > > > > > >> > >> > > > > > > > >> > COMPLETE >> > > > > > > > >> > producerId: x or z if y overflowed >> > > > > > > > >> > previous/lastProducerId (tagged field): x >> > > > > > > > >> > nextProducerId (tagged field): empty >> > > > > > > > >> > producerEpoch: y + 1 or 0 if we overflowed >> > > > > > > > >> > >> > > > > > > > >> > (2) In the prepare phase, if we retry and see epoch - >> 1 + >> > ID >> > > > in >> > > > > > last >> > > > > > > > >> seen >> > > > > > > > >> > fields and are issuing the same command (ie commit not >> > > abort), >> > > > > we >> > > > > > > > return >> > > > > > > > >> > success. The logic before KIP-890 seems to return >> > > > > > > > >> CONCURRENT_TRANSACTIONS >> > > > > > > > >> > in this case. Are we intentionally making this change? >> > > > > > > > >> > >> > > > > > > > >> > Hmm -- we would fence the producer if the epoch is >> bumped >> > > and >> > > > we >> > > > > > > get a >> > > > > > > > >> > lower epoch. Yes -- we are intentionally adding this to >> > > > prevent >> > > > > > > > fencing. >> > > > > > > > >> > >> > > > > > > > >> > >> > > > > > > > >> > 112. We already merged the code that adds the >> VerifyOnly >> > > field >> > > > > in >> > > > > > > > >> > AddPartitionsToTxnRequest, which is an inter broker >> > request. >> > > > It >> > > > > > > seems >> > > > > > > > >> that >> > > > > > > > >> > we didn't bump up the IBP for that. Do you know why? >> > > > > > > > >> > >> > > > > > > > >> > We no longer need IBP for all interbroker requests as >> > > > > ApiVersions >> > > > > > > > should >> > > > > > > > >> > correctly gate versioning. >> > > > > > > > >> > We also handle unsupported version errors correctly if >> we >> > > > > receive >> > > > > > > them >> > > > > > > > >> in >> > > > > > > > >> > edge cases like upgrades/downgrades. >> > > > > > > > >> > >> > > > > > > > >> > Justine >> > > > > > > > >> > >> > > > > > > > >> > On Mon, Jan 22, 2024 at 11:00 AM Jun Rao >> > > > > <j...@confluent.io.invalid >> > > > > > > >> > > > > > > > >> wrote: >> > > > > > > > >> > >> > > > > > > > >> > > Hi, Justine, >> > > > > > > > >> > > >> > > > > > > > >> > > Thanks for the reply. >> > > > > > > > >> > > >> > > > > > > > >> > > 101.3 I realized that I actually have two questions. >> > > > > > > > >> > > (1) In the non-overflow case, we need to write the >> > > previous >> > > > > > > produce >> > > > > > > > Id >> > > > > > > > >> > > tagged field in the end maker so that we know if the >> > > marker >> > > > is >> > > > > > > from >> > > > > > > > >> the >> > > > > > > > >> > new >> > > > > > > > >> > > client. Since the end maker is derived from the >> prepare >> > > > > marker, >> > > > > > > > >> should we >> > > > > > > > >> > > write the previous produce Id in the prepare marker >> > field >> > > > too? >> > > > > > > > >> Otherwise, >> > > > > > > > >> > > we will lose this information when deriving the end >> > > marker. >> > > > > > > > >> > > (2) In the prepare phase, if we retry and see epoch >> - 1 >> > + >> > > ID >> > > > > in >> > > > > > > last >> > > > > > > > >> seen >> > > > > > > > >> > > fields and are issuing the same command (ie commit >> not >> > > > abort), >> > > > > > we >> > > > > > > > >> return >> > > > > > > > >> > > success. The logic before KIP-890 seems to return >> > > > > > > > >> CONCURRENT_TRANSACTIONS >> > > > > > > > >> > > in this case. Are we intentionally making this >> change? >> > > > > > > > >> > > >> > > > > > > > >> > > 112. We already merged the code that adds the >> VerifyOnly >> > > > field >> > > > > > in >> > > > > > > > >> > > AddPartitionsToTxnRequest, which is an inter broker >> > > request. >> > > > > It >> > > > > > > > seems >> > > > > > > > >> > that >> > > > > > > > >> > > we didn't bump up the IBP for that. Do you know why? >> > > > > > > > >> > > >> > > > > > > > >> > > Jun >> > > > > > > > >> > > >> > > > > > > > >> > > On Fri, Jan 19, 2024 at 4:50 PM Justine Olshan >> > > > > > > > >> > > <jols...@confluent.io.invalid> >> > > > > > > > >> > > wrote: >> > > > > > > > >> > > >> > > > > > > > >> > > > Hi Jun, >> > > > > > > > >> > > > >> > > > > > > > >> > > > 101.3 I can change "last seen" to "current >> producer id >> > > and >> > > > > > > epoch" >> > > > > > > > if >> > > > > > > > >> > that >> > > > > > > > >> > > > was the part that was confusing >> > > > > > > > >> > > > 110 I can mention this >> > > > > > > > >> > > > 111 I can do that >> > > > > > > > >> > > > 112 We still need it. But I am still finalizing the >> > > > design. >> > > > > I >> > > > > > > will >> > > > > > > > >> > update >> > > > > > > > >> > > > the KIP once I get the information finalized. Sorry >> > for >> > > > the >> > > > > > > > delays. >> > > > > > > > >> > > > >> > > > > > > > >> > > > Justine >> > > > > > > > >> > > > >> > > > > > > > >> > > > On Fri, Jan 19, 2024 at 10:50 AM Jun Rao >> > > > > > > <j...@confluent.io.invalid >> > > > > > > > > >> > > > > > > > >> > > wrote: >> > > > > > > > >> > > > >> > > > > > > > >> > > > > Hi, Justine, >> > > > > > > > >> > > > > >> > > > > > > > >> > > > > Thanks for the reply. >> > > > > > > > >> > > > > >> > > > > > > > >> > > > > 101.3 In the non-overflow case, the previous ID >> is >> > the >> > > > > same >> > > > > > as >> > > > > > > > the >> > > > > > > > >> > > > produce >> > > > > > > > >> > > > > ID for the complete marker too, but we set the >> > > previous >> > > > ID >> > > > > > in >> > > > > > > > the >> > > > > > > > >> > > > complete >> > > > > > > > >> > > > > marker. Earlier you mentioned that this is to >> know >> > > that >> > > > > the >> > > > > > > > >> marker is >> > > > > > > > >> > > > > written by the new client so that we could return >> > > > success >> > > > > on >> > > > > > > > >> retried >> > > > > > > > >> > > > > endMarker requests. I was trying to understand >> why >> > > this >> > > > is >> > > > > > not >> > > > > > > > >> needed >> > > > > > > > >> > > for >> > > > > > > > >> > > > > the prepare marker since retry can happen in the >> > > prepare >> > > > > > state >> > > > > > > > >> too. >> > > > > > > > >> > Is >> > > > > > > > >> > > > the >> > > > > > > > >> > > > > reason that in the prepare state, we return >> > > > > > > > >> CONCURRENT_TRANSACTIONS >> > > > > > > > >> > > > instead >> > > > > > > > >> > > > > of success on retried endMaker requests? If so, >> > should >> > > > we >> > > > > > > change >> > > > > > > > >> "If >> > > > > > > > >> > we >> > > > > > > > >> > > > > retry and see epoch - 1 + ID in last seen fields >> and >> > > are >> > > > > > > issuing >> > > > > > > > >> the >> > > > > > > > >> > > same >> > > > > > > > >> > > > > command (ie commit not abort) we can return (with >> > the >> > > > new >> > > > > > > > epoch)" >> > > > > > > > >> > > > > accordingly? >> > > > > > > > >> > > > > >> > > > > > > > >> > > > > 110. Yes, without this KIP, a delayed endMaker >> > request >> > > > > > carries >> > > > > > > > the >> > > > > > > > >> > same >> > > > > > > > >> > > > > epoch and won't be fenced. This can commit/abort >> a >> > > > future >> > > > > > > > >> transaction >> > > > > > > > >> > > > > unexpectedly. I am not sure if we have seen this >> in >> > > > > practice >> > > > > > > > >> though. >> > > > > > > > >> > > > > >> > > > > > > > >> > > > > 111. Sounds good. It would be useful to make it >> > clear >> > > > that >> > > > > > we >> > > > > > > > can >> > > > > > > > >> now >> > > > > > > > >> > > > > populate the lastSeen field from the log >> reliably. >> > > > > > > > >> > > > > >> > > > > > > > >> > > > > 112. Yes, I was referring to >> > AddPartitionsToTxnRequest >> > > > > since >> > > > > > > > it's >> > > > > > > > >> > > called >> > > > > > > > >> > > > > across brokers and we are changing its schema. >> Are >> > you >> > > > > > saying >> > > > > > > we >> > > > > > > > >> > don't >> > > > > > > > >> > > > need >> > > > > > > > >> > > > > it any more? I thought that we already >> implemented >> > the >> > > > > > server >> > > > > > > > side >> > > > > > > > >> > > > > verification logic based on >> > AddPartitionsToTxnRequest >> > > > > across >> > > > > > > > >> brokers. >> > > > > > > > >> > > > > >> > > > > > > > >> > > > > Jun >> > > > > > > > >> > > > > >> > > > > > > > >> > > > > >> > > > > > > > >> > > > > On Thu, Jan 18, 2024 at 5:05 PM Justine Olshan >> > > > > > > > >> > > > > <jols...@confluent.io.invalid> >> > > > > > > > >> > > > > wrote: >> > > > > > > > >> > > > > >> > > > > > > > >> > > > > > Hey Jun, >> > > > > > > > >> > > > > > >> > > > > > > > >> > > > > > 101.3 We don't set the previous ID in the >> Prepare >> > > > field >> > > > > > > since >> > > > > > > > we >> > > > > > > > >> > > don't >> > > > > > > > >> > > > > need >> > > > > > > > >> > > > > > it. It is the same producer ID as the main >> > producer >> > > ID >> > > > > > > field. >> > > > > > > > >> > > > > > >> > > > > > > > >> > > > > > 110 Hmm -- maybe I need to reread your message >> > about >> > > > > > delayed >> > > > > > > > >> > markers. >> > > > > > > > >> > > > If >> > > > > > > > >> > > > > we >> > > > > > > > >> > > > > > receive a delayed endTxn marker after the >> > > transaction >> > > > is >> > > > > > > > already >> > > > > > > > >> > > > > complete? >> > > > > > > > >> > > > > > So we will commit the next transaction early >> > without >> > > > the >> > > > > > > fixes >> > > > > > > > >> in >> > > > > > > > >> > > part >> > > > > > > > >> > > > 2? >> > > > > > > > >> > > > > > >> > > > > > > > >> > > > > > 111 Yes -- this terminology was used in a >> previous >> > > KIP >> > > > > and >> > > > > > > > never >> > > > > > > > >> > > > > > implemented it in the log -- only in memory >> > > > > > > > >> > > > > > >> > > > > > > > >> > > > > > 112 Hmm -- which interbroker protocol are you >> > > > referring >> > > > > > to? >> > > > > > > I >> > > > > > > > am >> > > > > > > > >> > > > working >> > > > > > > > >> > > > > on >> > > > > > > > >> > > > > > the design for the work to remove the extra add >> > > > > partitions >> > > > > > > > call >> > > > > > > > >> > and I >> > > > > > > > >> > > > > right >> > > > > > > > >> > > > > > now the design bumps MV. I have yet to update >> that >> > > > > section >> > > > > > > as >> > > > > > > > I >> > > > > > > > >> > > > finalize >> > > > > > > > >> > > > > > the design so please stay tuned. Was there >> > anything >> > > > else >> > > > > > you >> > > > > > > > >> > thought >> > > > > > > > >> > > > > needed >> > > > > > > > >> > > > > > MV bump? >> > > > > > > > >> > > > > > >> > > > > > > > >> > > > > > Justine >> > > > > > > > >> > > > > > >> > > > > > > > >> > > > > > On Thu, Jan 18, 2024 at 3:07 PM Jun Rao >> > > > > > > > >> <j...@confluent.io.invalid> >> > > > > > > > >> > > > > wrote: >> > > > > > > > >> > > > > > >> > > > > > > > >> > > > > > > Hi, Justine, >> > > > > > > > >> > > > > > > >> > > > > > > > >> > > > > > > I don't see this create any issue. It just >> makes >> > > it >> > > > a >> > > > > > bit >> > > > > > > > >> hard to >> > > > > > > > >> > > > > explain >> > > > > > > > >> > > > > > > what this non-tagged produce id field means. >> We >> > > are >> > > > > > > > >> essentially >> > > > > > > > >> > > > trying >> > > > > > > > >> > > > > to >> > > > > > > > >> > > > > > > combine two actions (completing a txn and >> init a >> > > new >> > > > > > > produce >> > > > > > > > >> Id) >> > > > > > > > >> > > in a >> > > > > > > > >> > > > > > > single record. But, this may be fine too. >> > > > > > > > >> > > > > > > >> > > > > > > > >> > > > > > > A few other follow up comments. >> > > > > > > > >> > > > > > > >> > > > > > > > >> > > > > > > 101.3 I guess the reason that we only set the >> > > > previous >> > > > > > > > >> produce id >> > > > > > > > >> > > > > tagged >> > > > > > > > >> > > > > > > field in the complete marker, but not in the >> > > prepare >> > > > > > > marker, >> > > > > > > > >> is >> > > > > > > > >> > > that >> > > > > > > > >> > > > in >> > > > > > > > >> > > > > > the >> > > > > > > > >> > > > > > > prepare state, we always return >> > > > > CONCURRENT_TRANSACTIONS >> > > > > > on >> > > > > > > > >> > retried >> > > > > > > > >> > > > > > endMaker >> > > > > > > > >> > > > > > > requests? >> > > > > > > > >> > > > > > > >> > > > > > > > >> > > > > > > 110. "I believe your second point is >> mentioned >> > in >> > > > the >> > > > > > > KIP. I >> > > > > > > > >> can >> > > > > > > > >> > > add >> > > > > > > > >> > > > > more >> > > > > > > > >> > > > > > > text on >> > > > > > > > >> > > > > > > this if it is helpful. >> > > > > > > > >> > > > > > > > The delayed message case can also violate >> EOS >> > if >> > > > the >> > > > > > > > delayed >> > > > > > > > >> > > > message >> > > > > > > > >> > > > > > > comes in after the next addPartitionsToTxn >> > request >> > > > > comes >> > > > > > > in. >> > > > > > > > >> > > > > Effectively >> > > > > > > > >> > > > > > we >> > > > > > > > >> > > > > > > may see a message from a previous (aborted) >> > > > > transaction >> > > > > > > > become >> > > > > > > > >> > part >> > > > > > > > >> > > > of >> > > > > > > > >> > > > > > the >> > > > > > > > >> > > > > > > next transaction." >> > > > > > > > >> > > > > > > >> > > > > > > > >> > > > > > > The above is the case when a delayed message >> is >> > > > > appended >> > > > > > > to >> > > > > > > > >> the >> > > > > > > > >> > > data >> > > > > > > > >> > > > > > > partition. What I mentioned is a slightly >> > > different >> > > > > case >> > > > > > > > when >> > > > > > > > >> a >> > > > > > > > >> > > > delayed >> > > > > > > > >> > > > > > > marker is appended to the transaction log >> > > partition. >> > > > > > > > >> > > > > > > >> > > > > > > > >> > > > > > > 111. The KIP says "Once we move past the >> Prepare >> > > and >> > > > > > > > Complete >> > > > > > > > >> > > states, >> > > > > > > > >> > > > > we >> > > > > > > > >> > > > > > > don’t need to worry about lastSeen fields and >> > > clear >> > > > > > them, >> > > > > > > > just >> > > > > > > > >> > > handle >> > > > > > > > >> > > > > > state >> > > > > > > > >> > > > > > > transitions as normal.". Is the lastSeen >> field >> > the >> > > > > same >> > > > > > as >> > > > > > > > the >> > > > > > > > >> > > > previous >> > > > > > > > >> > > > > > > Produce Id tagged field in >> TransactionLogValue? >> > > > > > > > >> > > > > > > >> > > > > > > > >> > > > > > > 112. Since the kip changes the inter-broker >> > > > protocol, >> > > > > > > should >> > > > > > > > >> we >> > > > > > > > >> > > bump >> > > > > > > > >> > > > up >> > > > > > > > >> > > > > > the >> > > > > > > > >> > > > > > > MV/IBP version? Is this feature only for the >> > KRaft >> > > > > mode? >> > > > > > > > >> > > > > > > >> > > > > > > > >> > > > > > > Thanks, >> > > > > > > > >> > > > > > > >> > > > > > > > >> > > > > > > Jun >> > > > > > > > >> > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > >> > > > > > > On Wed, Jan 17, 2024 at 11:13 AM Justine >> Olshan >> > > > > > > > >> > > > > > > <jols...@confluent.io.invalid> wrote: >> > > > > > > > >> > > > > > > >> > > > > > > > >> > > > > > > > Hey Jun, >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > I'm glad we are getting to convergence on >> the >> > > > > design. >> > > > > > :) >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > While I understand it seems a little >> "weird". >> > > I'm >> > > > > not >> > > > > > > sure >> > > > > > > > >> what >> > > > > > > > >> > > the >> > > > > > > > >> > > > > > > benefit >> > > > > > > > >> > > > > > > > of writing an extra record to the log. >> > > > > > > > >> > > > > > > > Is the concern a tool to describe >> transactions >> > > > won't >> > > > > > > work >> > > > > > > > >> (ie, >> > > > > > > > >> > > the >> > > > > > > > >> > > > > > > complete >> > > > > > > > >> > > > > > > > state is needed to calculate the time since >> > the >> > > > > > > > transaction >> > > > > > > > >> > > > > completed?) >> > > > > > > > >> > > > > > > > If we have a reason like this, it is >> enough to >> > > > > > convince >> > > > > > > me >> > > > > > > > >> we >> > > > > > > > >> > > need >> > > > > > > > >> > > > > such >> > > > > > > > >> > > > > > > an >> > > > > > > > >> > > > > > > > extra record. It seems like it would be >> > > replacing >> > > > > the >> > > > > > > > record >> > > > > > > > >> > > > written >> > > > > > > > >> > > > > on >> > > > > > > > >> > > > > > > > InitProducerId. Is this correct? >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > Thanks, >> > > > > > > > >> > > > > > > > Justine >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > On Tue, Jan 16, 2024 at 5:14 PM Jun Rao >> > > > > > > > >> > <j...@confluent.io.invalid >> > > > > > > > >> > > > >> > > > > > > > >> > > > > > > wrote: >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > > Hi, Justine, >> > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > > > Thanks for the explanation. I understand >> the >> > > > > > intention >> > > > > > > > >> now. >> > > > > > > > >> > In >> > > > > > > > >> > > > the >> > > > > > > > >> > > > > > > > overflow >> > > > > > > > >> > > > > > > > > case, we set the non-tagged field to the >> old >> > > pid >> > > > > > (and >> > > > > > > > the >> > > > > > > > >> max >> > > > > > > > >> > > > > epoch) >> > > > > > > > >> > > > > > in >> > > > > > > > >> > > > > > > > the >> > > > > > > > >> > > > > > > > > prepare marker so that we could correctly >> > > write >> > > > > the >> > > > > > > > >> marker to >> > > > > > > > >> > > the >> > > > > > > > >> > > > > > data >> > > > > > > > >> > > > > > > > > partition if the broker downgrades. When >> > > writing >> > > > > the >> > > > > > > > >> complete >> > > > > > > > >> > > > > marker, >> > > > > > > > >> > > > > > > we >> > > > > > > > >> > > > > > > > > know the marker has already been written >> to >> > > the >> > > > > data >> > > > > > > > >> > partition. >> > > > > > > > >> > > > We >> > > > > > > > >> > > > > > set >> > > > > > > > >> > > > > > > > the >> > > > > > > > >> > > > > > > > > non-tagged field to the new pid to avoid >> > > > > > > > >> > > > InvalidPidMappingException >> > > > > > > > >> > > > > > in >> > > > > > > > >> > > > > > > > the >> > > > > > > > >> > > > > > > > > client if the broker downgrades. >> > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > > > The above seems to work. It's just a bit >> > > > > > inconsistent >> > > > > > > > for >> > > > > > > > >> a >> > > > > > > > >> > > > prepare >> > > > > > > > >> > > > > > > > marker >> > > > > > > > >> > > > > > > > > and a complete marker to use different >> pids >> > in >> > > > > this >> > > > > > > > >> special >> > > > > > > > >> > > case. >> > > > > > > > >> > > > > If >> > > > > > > > >> > > > > > we >> > > > > > > > >> > > > > > > > > downgrade with the complete marker, it >> seems >> > > > that >> > > > > we >> > > > > > > > will >> > > > > > > > >> > never >> > > > > > > > >> > > > be >> > > > > > > > >> > > > > > able >> > > > > > > > >> > > > > > > > to >> > > > > > > > >> > > > > > > > > write the complete marker with the old >> pid. >> > > Not >> > > > > sure >> > > > > > > if >> > > > > > > > it >> > > > > > > > >> > > causes >> > > > > > > > >> > > > > any >> > > > > > > > >> > > > > > > > > issue, but it seems a bit weird. Instead >> of >> > > > > writing >> > > > > > > the >> > > > > > > > >> > > complete >> > > > > > > > >> > > > > > marker >> > > > > > > > >> > > > > > > > > with the new pid, could we write two >> > records: >> > > a >> > > > > > > complete >> > > > > > > > >> > marker >> > > > > > > > >> > > > > with >> > > > > > > > >> > > > > > > the >> > > > > > > > >> > > > > > > > > old pid followed by a TransactionLogValue >> > with >> > > > the >> > > > > > new >> > > > > > > > pid >> > > > > > > > >> > and >> > > > > > > > >> > > an >> > > > > > > > >> > > > > > empty >> > > > > > > > >> > > > > > > > > state? We could make the two records in >> the >> > > same >> > > > > > batch >> > > > > > > > so >> > > > > > > > >> > that >> > > > > > > > >> > > > they >> > > > > > > > >> > > > > > > will >> > > > > > > > >> > > > > > > > be >> > > > > > > > >> > > > > > > > > added to the log atomically. >> > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > > > Thanks, >> > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > > > Jun >> > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > > > On Fri, Jan 12, 2024 at 5:40 PM Justine >> > Olshan >> > > > > > > > >> > > > > > > > > <jols...@confluent.io.invalid> >> > > > > > > > >> > > > > > > > > wrote: >> > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > > > > (1) the prepare marker is written, but >> the >> > > > > endTxn >> > > > > > > > >> response >> > > > > > > > >> > is >> > > > > > > > >> > > > not >> > > > > > > > >> > > > > > > > > received >> > > > > > > > >> > > > > > > > > > by the client when the server >> downgrades >> > > > > > > > >> > > > > > > > > > (2) the prepare marker is written, the >> > > endTxn >> > > > > > > > response >> > > > > > > > >> is >> > > > > > > > >> > > > > received >> > > > > > > > >> > > > > > > by >> > > > > > > > >> > > > > > > > > the >> > > > > > > > >> > > > > > > > > > client when the server downgrades. >> > > > > > > > >> > > > > > > > > > >> > > > > > > > >> > > > > > > > > > I think I am still a little confused. >> In >> > > both >> > > > of >> > > > > > > these >> > > > > > > > >> > cases, >> > > > > > > > >> > > > the >> > > > > > > > >> > > > > > > > > > transaction log has the old producer >> ID. >> > We >> > > > > don't >> > > > > > > > write >> > > > > > > > >> the >> > > > > > > > >> > > new >> > > > > > > > >> > > > > > > > producer >> > > > > > > > >> > > > > > > > > ID >> > > > > > > > >> > > > > > > > > > in the prepare marker's non tagged >> fields. >> > > > > > > > >> > > > > > > > > > If the server downgrades now, it would >> > read >> > > > the >> > > > > > > > records >> > > > > > > > >> not >> > > > > > > > >> > > in >> > > > > > > > >> > > > > > tagged >> > > > > > > > >> > > > > > > > > > fields and the complete marker will >> also >> > > have >> > > > > the >> > > > > > > old >> > > > > > > > >> > > producer >> > > > > > > > >> > > > > ID. >> > > > > > > > >> > > > > > > > > > (If we had used the new producer ID, we >> > > would >> > > > > not >> > > > > > > have >> > > > > > > > >> > > > > > transactional >> > > > > > > > >> > > > > > > > > > correctness since the producer id >> doesn't >> > > > match >> > > > > > the >> > > > > > > > >> > > transaction >> > > > > > > > >> > > > > and >> > > > > > > > >> > > > > > > the >> > > > > > > > >> > > > > > > > > > state would not be correct on the data >> > > > > partition.) >> > > > > > > > >> > > > > > > > > > >> > > > > > > > >> > > > > > > > > > In the overflow case, I'd expect the >> > > following >> > > > > to >> > > > > > > > >> happen on >> > > > > > > > >> > > the >> > > > > > > > >> > > > > > > client >> > > > > > > > >> > > > > > > > > side >> > > > > > > > >> > > > > > > > > > Case 1 -- we retry EndTxn -- it is the >> > same >> > > > > > > producer >> > > > > > > > ID >> > > > > > > > >> > and >> > > > > > > > >> > > > > epoch >> > > > > > > > >> > > > > > - >> > > > > > > > >> > > > > > > 1 >> > > > > > > > >> > > > > > > > > this >> > > > > > > > >> > > > > > > > > > would fence the producer >> > > > > > > > >> > > > > > > > > > Case 2 -- we don't retry EndTxn and use >> > the >> > > > new >> > > > > > > > >> producer id >> > > > > > > > >> > > > which >> > > > > > > > >> > > > > > > would >> > > > > > > > >> > > > > > > > > > result in InvalidPidMappingException >> > > > > > > > >> > > > > > > > > > >> > > > > > > > >> > > > > > > > > > Maybe we can have special handling for >> > when >> > > a >> > > > > > server >> > > > > > > > >> > > > downgrades. >> > > > > > > > >> > > > > > When >> > > > > > > > >> > > > > > > > it >> > > > > > > > >> > > > > > > > > > reconnects we could get an API version >> > > request >> > > > > > > showing >> > > > > > > > >> > > KIP-890 >> > > > > > > > >> > > > > > part 2 >> > > > > > > > >> > > > > > > > is >> > > > > > > > >> > > > > > > > > > not supported. In that case, we can >> call >> > > > > > > > initProducerId >> > > > > > > > >> to >> > > > > > > > >> > > > abort >> > > > > > > > >> > > > > > the >> > > > > > > > >> > > > > > > > > > transaction. (In the overflow case, >> this >> > > > > correctly >> > > > > > > > gives >> > > > > > > > >> > us a >> > > > > > > > >> > > > new >> > > > > > > > >> > > > > > > > > producer >> > > > > > > > >> > > > > > > > > > ID) >> > > > > > > > >> > > > > > > > > > >> > > > > > > > >> > > > > > > > > > I guess the corresponding case would be >> > > where >> > > > > the >> > > > > > > > >> *complete >> > > > > > > > >> > > > > marker >> > > > > > > > >> > > > > > > *is >> > > > > > > > >> > > > > > > > > > written but the endTxn is not received >> by >> > > the >> > > > > > client >> > > > > > > > and >> > > > > > > > >> > the >> > > > > > > > >> > > > > server >> > > > > > > > >> > > > > > > > > > downgrades? This would result in the >> > > > transaction >> > > > > > > > >> > coordinator >> > > > > > > > >> > > > > having >> > > > > > > > >> > > > > > > the >> > > > > > > > >> > > > > > > > > new >> > > > > > > > >> > > > > > > > > > ID and not the old one. If the client >> > > > retries, >> > > > > it >> > > > > > > > will >> > > > > > > > >> > > receive >> > > > > > > > >> > > > > an >> > > > > > > > >> > > > > > > > > > InvalidPidMappingException. The >> > > InitProducerId >> > > > > > > > scenario >> > > > > > > > >> > above >> > > > > > > > >> > > > > would >> > > > > > > > >> > > > > > > > help >> > > > > > > > >> > > > > > > > > > here too. >> > > > > > > > >> > > > > > > > > > >> > > > > > > > >> > > > > > > > > > To be clear, my compatibility story is >> > meant >> > > > to >> > > > > > > > support >> > > > > > > > >> > > > > downgrades >> > > > > > > > >> > > > > > > > server >> > > > > > > > >> > > > > > > > > > side in keeping the transactional >> > > correctness. >> > > > > > > Keeping >> > > > > > > > >> the >> > > > > > > > >> > > > client >> > > > > > > > >> > > > > > > from >> > > > > > > > >> > > > > > > > > > fencing itself is not the priority. >> > > > > > > > >> > > > > > > > > > >> > > > > > > > >> > > > > > > > > > Hope this helps. I can also add text in >> > the >> > > > KIP >> > > > > > > about >> > > > > > > > >> > > > > > InitProducerId >> > > > > > > > >> > > > > > > if >> > > > > > > > >> > > > > > > > > we >> > > > > > > > >> > > > > > > > > > think that fixes some edge cases. >> > > > > > > > >> > > > > > > > > > >> > > > > > > > >> > > > > > > > > > Justine >> > > > > > > > >> > > > > > > > > > >> > > > > > > > >> > > > > > > > > > On Fri, Jan 12, 2024 at 4:10 PM Jun Rao >> > > > > > > > >> > > > <j...@confluent.io.invalid >> > > > > > > > >> > > > > > >> > > > > > > > >> > > > > > > > > wrote: >> > > > > > > > >> > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > Hi, Justine, >> > > > > > > > >> > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > Thanks for the reply. >> > > > > > > > >> > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > I agree that we don't need to >> optimize >> > for >> > > > > > fencing >> > > > > > > > >> during >> > > > > > > > >> > > > > > > downgrades. >> > > > > > > > >> > > > > > > > > > > Regarding consistency, there are two >> > > > possible >> > > > > > > cases: >> > > > > > > > >> (1) >> > > > > > > > >> > > the >> > > > > > > > >> > > > > > > prepare >> > > > > > > > >> > > > > > > > > > marker >> > > > > > > > >> > > > > > > > > > > is written, but the endTxn response >> is >> > not >> > > > > > > received >> > > > > > > > by >> > > > > > > > >> > the >> > > > > > > > >> > > > > client >> > > > > > > > >> > > > > > > > when >> > > > > > > > >> > > > > > > > > > the >> > > > > > > > >> > > > > > > > > > > server downgrades; (2) the prepare >> > marker >> > > > is >> > > > > > > > written, >> > > > > > > > >> > the >> > > > > > > > >> > > > > endTxn >> > > > > > > > >> > > > > > > > > > response >> > > > > > > > >> > > > > > > > > > > is received by the client when the >> > server >> > > > > > > > downgrades. >> > > > > > > > >> In >> > > > > > > > >> > > (1), >> > > > > > > > >> > > > > the >> > > > > > > > >> > > > > > > > > client >> > > > > > > > >> > > > > > > > > > > will have the old produce Id and in >> (2), >> > > the >> > > > > > > client >> > > > > > > > >> will >> > > > > > > > >> > > have >> > > > > > > > >> > > > > the >> > > > > > > > >> > > > > > > new >> > > > > > > > >> > > > > > > > > > > produce Id. If we downgrade right >> after >> > > the >> > > > > > > prepare >> > > > > > > > >> > marker, >> > > > > > > > >> > > > we >> > > > > > > > >> > > > > > > can't >> > > > > > > > >> > > > > > > > be >> > > > > > > > >> > > > > > > > > > > consistent to both (1) and (2) since >> we >> > > can >> > > > > only >> > > > > > > put >> > > > > > > > >> one >> > > > > > > > >> > > > value >> > > > > > > > >> > > > > in >> > > > > > > > >> > > > > > > the >> > > > > > > > >> > > > > > > > > > > existing produce Id field. It's also >> not >> > > > clear >> > > > > > > which >> > > > > > > > >> case >> > > > > > > > >> > > is >> > > > > > > > >> > > > > more >> > > > > > > > >> > > > > > > > > likely. >> > > > > > > > >> > > > > > > > > > > So we could probably be consistent >> with >> > > > either >> > > > > > > case. >> > > > > > > > >> By >> > > > > > > > >> > > > putting >> > > > > > > > >> > > > > > the >> > > > > > > > >> > > > > > > > new >> > > > > > > > >> > > > > > > > > > > producer Id in the prepare marker, we >> > are >> > > > > > > consistent >> > > > > > > > >> with >> > > > > > > > >> > > > case >> > > > > > > > >> > > > > > (2) >> > > > > > > > >> > > > > > > > and >> > > > > > > > >> > > > > > > > > it >> > > > > > > > >> > > > > > > > > > > also has the slight benefit that the >> > > produce >> > > > > > field >> > > > > > > > in >> > > > > > > > >> the >> > > > > > > > >> > > > > prepare >> > > > > > > > >> > > > > > > and >> > > > > > > > >> > > > > > > > > > > complete marker are consistent in the >> > > > overflow >> > > > > > > case. >> > > > > > > > >> > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > Jun >> > > > > > > > >> > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > On Fri, Jan 12, 2024 at 3:11 PM >> Justine >> > > > Olshan >> > > > > > > > >> > > > > > > > > > > <jols...@confluent.io.invalid> >> > > > > > > > >> > > > > > > > > > > wrote: >> > > > > > > > >> > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > Hi Jun, >> > > > > > > > >> > > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > In the case you describe, we would >> > need >> > > to >> > > > > > have >> > > > > > > a >> > > > > > > > >> > delayed >> > > > > > > > >> > > > > > > request, >> > > > > > > > >> > > > > > > > > > send a >> > > > > > > > >> > > > > > > > > > > > successful EndTxn, and a successful >> > > > > > > > >> AddPartitionsToTxn >> > > > > > > > >> > > and >> > > > > > > > >> > > > > then >> > > > > > > > >> > > > > > > > have >> > > > > > > > >> > > > > > > > > > the >> > > > > > > > >> > > > > > > > > > > > delayed EndTxn request go through >> for >> > a >> > > > > given >> > > > > > > > >> producer. >> > > > > > > > >> > > > > > > > > > > > I'm trying to figure out if it is >> > > possible >> > > > > for >> > > > > > > the >> > > > > > > > >> > client >> > > > > > > > >> > > > to >> > > > > > > > >> > > > > > > > > transition >> > > > > > > > >> > > > > > > > > > > if >> > > > > > > > >> > > > > > > > > > > > a previous request is delayed >> > somewhere. >> > > > But >> > > > > > > yes, >> > > > > > > > in >> > > > > > > > >> > this >> > > > > > > > >> > > > > case >> > > > > > > > >> > > > > > I >> > > > > > > > >> > > > > > > > > think >> > > > > > > > >> > > > > > > > > > we >> > > > > > > > >> > > > > > > > > > > > would fence the client. >> > > > > > > > >> > > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > Not for the overflow case. In the >> > > overflow >> > > > > > case, >> > > > > > > > the >> > > > > > > > >> > > > producer >> > > > > > > > >> > > > > > ID >> > > > > > > > >> > > > > > > > and >> > > > > > > > >> > > > > > > > > > the >> > > > > > > > >> > > > > > > > > > > > epoch are different on the marker >> and >> > on >> > > > the >> > > > > > new >> > > > > > > > >> > > > transaction. >> > > > > > > > >> > > > > > So >> > > > > > > > >> > > > > > > we >> > > > > > > > >> > > > > > > > > > want >> > > > > > > > >> > > > > > > > > > > > the marker to use the max epoch >> but >> > the >> > > > new >> > > > > > > > >> > transaction >> > > > > > > > >> > > > > should >> > > > > > > > >> > > > > > > > start >> > > > > > > > >> > > > > > > > > > > with >> > > > > > > > >> > > > > > > > > > > > the new ID and epoch 0 in the >> > > > transactional >> > > > > > > state. >> > > > > > > > >> > > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > In the server downgrade case, we >> want >> > to >> > > > see >> > > > > > the >> > > > > > > > >> > producer >> > > > > > > > >> > > > ID >> > > > > > > > >> > > > > as >> > > > > > > > >> > > > > > > > that >> > > > > > > > >> > > > > > > > > is >> > > > > > > > >> > > > > > > > > > > > what the client will have. If we >> > > complete >> > > > > the >> > > > > > > > >> commit, >> > > > > > > > >> > and >> > > > > > > > >> > > > the >> > > > > > > > >> > > > > > > > > > transaction >> > > > > > > > >> > > > > > > > > > > > state is reloaded, we need the new >> > > > producer >> > > > > ID >> > > > > > > in >> > > > > > > > >> the >> > > > > > > > >> > > state >> > > > > > > > >> > > > > so >> > > > > > > > >> > > > > > > > there >> > > > > > > > >> > > > > > > > > > > isn't >> > > > > > > > >> > > > > > > > > > > > an invalid producer ID mapping. >> > > > > > > > >> > > > > > > > > > > > The server downgrade cases are >> > > considering >> > > > > > > > >> > transactional >> > > > > > > > >> > > > > > > > correctness >> > > > > > > > >> > > > > > > > > > and >> > > > > > > > >> > > > > > > > > > > > not regressing from previous >> behavior >> > -- >> > > > and >> > > > > > are >> > > > > > > > not >> > > > > > > > >> > > > > concerned >> > > > > > > > >> > > > > > > > about >> > > > > > > > >> > > > > > > > > > > > supporting the safety from fencing >> > > retries >> > > > > (as >> > > > > > > we >> > > > > > > > >> have >> > > > > > > > >> > > > > > downgraded >> > > > > > > > >> > > > > > > > so >> > > > > > > > >> > > > > > > > > we >> > > > > > > > >> > > > > > > > > > > > don't need to support). Perhaps >> this >> > is >> > > a >> > > > > > trade >> > > > > > > > off, >> > > > > > > > >> > but >> > > > > > > > >> > > I >> > > > > > > > >> > > > > > think >> > > > > > > > >> > > > > > > it >> > > > > > > > >> > > > > > > > > is >> > > > > > > > >> > > > > > > > > > > the >> > > > > > > > >> > > > > > > > > > > > right one. >> > > > > > > > >> > > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > (If the client downgrades, it will >> > have >> > > > > > > restarted >> > > > > > > > >> and >> > > > > > > > >> > it >> > > > > > > > >> > > is >> > > > > > > > >> > > > > ok >> > > > > > > > >> > > > > > > for >> > > > > > > > >> > > > > > > > it >> > > > > > > > >> > > > > > > > > > to >> > > > > > > > >> > > > > > > > > > > > have a new producer ID too). >> > > > > > > > >> > > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > Justine >> > > > > > > > >> > > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > On Fri, Jan 12, 2024 at 11:42 AM >> Jun >> > Rao >> > > > > > > > >> > > > > > > <j...@confluent.io.invalid >> > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > > > > > wrote: >> > > > > > > > >> > > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > Hi, Justine, >> > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > Thanks for the reply. >> > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > 101.4 "If the marker is written >> by >> > the >> > > > new >> > > > > > > > >> client, we >> > > > > > > > >> > > can >> > > > > > > > >> > > > > as >> > > > > > > > >> > > > > > I >> > > > > > > > >> > > > > > > > > > > mentioned >> > > > > > > > >> > > > > > > > > > > > in >> > > > > > > > >> > > > > > > > > > > > > the last email guarantee that any >> > > EndTxn >> > > > > > > > requests >> > > > > > > > >> > with >> > > > > > > > >> > > > the >> > > > > > > > >> > > > > > same >> > > > > > > > >> > > > > > > > > epoch >> > > > > > > > >> > > > > > > > > > > are >> > > > > > > > >> > > > > > > > > > > > > from the same producer and the >> same >> > > > > > > transaction. >> > > > > > > > >> Then >> > > > > > > > >> > > we >> > > > > > > > >> > > > > > don't >> > > > > > > > >> > > > > > > > have >> > > > > > > > >> > > > > > > > > > to >> > > > > > > > >> > > > > > > > > > > > > return a fenced error but can >> handle >> > > > > > > gracefully >> > > > > > > > as >> > > > > > > > >> > > > > described >> > > > > > > > >> > > > > > in >> > > > > > > > >> > > > > > > > the >> > > > > > > > >> > > > > > > > > > > KIP." >> > > > > > > > >> > > > > > > > > > > > > When a delayed EndTnx request is >> > > > > processed, >> > > > > > > the >> > > > > > > > >> txn >> > > > > > > > >> > > state >> > > > > > > > >> > > > > > could >> > > > > > > > >> > > > > > > > be >> > > > > > > > >> > > > > > > > > > > > ongoing >> > > > > > > > >> > > > > > > > > > > > > for the next txn. I guess in this >> > case >> > > > we >> > > > > > > still >> > > > > > > > >> > return >> > > > > > > > >> > > > the >> > > > > > > > >> > > > > > > fenced >> > > > > > > > >> > > > > > > > > > error >> > > > > > > > >> > > > > > > > > > > > for >> > > > > > > > >> > > > > > > > > > > > > the delayed request? >> > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > 102. Sorry, my question was >> > > inaccurate. >> > > > > What >> > > > > > > you >> > > > > > > > >> > > > described >> > > > > > > > >> > > > > is >> > > > > > > > >> > > > > > > > > > accurate. >> > > > > > > > >> > > > > > > > > > > > > "The downgrade compatibility I >> > mention >> > > > is >> > > > > > that >> > > > > > > > we >> > > > > > > > >> > keep >> > > > > > > > >> > > > the >> > > > > > > > >> > > > > > same >> > > > > > > > >> > > > > > > > > > > producer >> > > > > > > > >> > > > > > > > > > > > ID >> > > > > > > > >> > > > > > > > > > > > > and epoch in the main >> (non-tagged) >> > > > fields >> > > > > as >> > > > > > > we >> > > > > > > > >> did >> > > > > > > > >> > > > before >> > > > > > > > >> > > > > > the >> > > > > > > > >> > > > > > > > code >> > > > > > > > >> > > > > > > > > > on >> > > > > > > > >> > > > > > > > > > > > the >> > > > > > > > >> > > > > > > > > > > > > server side." If we want to do >> this, >> > > it >> > > > > > seems >> > > > > > > > >> that we >> > > > > > > > >> > > > > should >> > > > > > > > >> > > > > > > use >> > > > > > > > >> > > > > > > > > the >> > > > > > > > >> > > > > > > > > > > > > current produce Id and max epoch >> in >> > > the >> > > > > > > existing >> > > > > > > > >> > > > producerId >> > > > > > > > >> > > > > > and >> > > > > > > > >> > > > > > > > > > > > > producerEpoch fields for both the >> > > > prepare >> > > > > > and >> > > > > > > > the >> > > > > > > > >> > > > complete >> > > > > > > > >> > > > > > > > marker, >> > > > > > > > >> > > > > > > > > > > right? >> > > > > > > > >> > > > > > > > > > > > > The downgrade can happen after >> the >> > > > > complete >> > > > > > > > >> marker is >> > > > > > > > >> > > > > > written. >> > > > > > > > >> > > > > > > > With >> > > > > > > > >> > > > > > > > > > > what >> > > > > > > > >> > > > > > > > > > > > > you described, the downgraded >> > > > coordinator >> > > > > > will >> > > > > > > > see >> > > > > > > > >> > the >> > > > > > > > >> > > > new >> > > > > > > > >> > > > > > > > produce >> > > > > > > > >> > > > > > > > > Id >> > > > > > > > >> > > > > > > > > > > > > instead of the old one. >> > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > Jun >> > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > On Fri, Jan 12, 2024 at 10:44 AM >> > > Justine >> > > > > > > Olshan >> > > > > > > > >> > > > > > > > > > > > > <jols...@confluent.io.invalid> >> > wrote: >> > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > Hi Jun, >> > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > I can update the description. >> > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > I believe your second point is >> > > > mentioned >> > > > > > in >> > > > > > > > the >> > > > > > > > >> > KIP. >> > > > > > > > >> > > I >> > > > > > > > >> > > > > can >> > > > > > > > >> > > > > > > add >> > > > > > > > >> > > > > > > > > more >> > > > > > > > >> > > > > > > > > > > > text >> > > > > > > > >> > > > > > > > > > > > > on >> > > > > > > > >> > > > > > > > > > > > > > this if it is helpful. >> > > > > > > > >> > > > > > > > > > > > > > > The delayed message case can >> > also >> > > > > > violate >> > > > > > > > EOS >> > > > > > > > >> if >> > > > > > > > >> > > the >> > > > > > > > >> > > > > > > delayed >> > > > > > > > >> > > > > > > > > > > message >> > > > > > > > >> > > > > > > > > > > > > > comes in after the next >> > > > > addPartitionsToTxn >> > > > > > > > >> request >> > > > > > > > >> > > > comes >> > > > > > > > >> > > > > > in. >> > > > > > > > >> > > > > > > > > > > > Effectively >> > > > > > > > >> > > > > > > > > > > > > we >> > > > > > > > >> > > > > > > > > > > > > > may see a message from a >> previous >> > > > > > (aborted) >> > > > > > > > >> > > transaction >> > > > > > > > >> > > > > > > become >> > > > > > > > >> > > > > > > > > part >> > > > > > > > >> > > > > > > > > > > of >> > > > > > > > >> > > > > > > > > > > > > the >> > > > > > > > >> > > > > > > > > > > > > > next transaction. >> > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > If the marker is written by the >> > new >> > > > > > client, >> > > > > > > we >> > > > > > > > >> can >> > > > > > > > >> > > as I >> > > > > > > > >> > > > > > > > mentioned >> > > > > > > > >> > > > > > > > > > in >> > > > > > > > >> > > > > > > > > > > > the >> > > > > > > > >> > > > > > > > > > > > > > last email guarantee that any >> > EndTxn >> > > > > > > requests >> > > > > > > > >> with >> > > > > > > > >> > > the >> > > > > > > > >> > > > > same >> > > > > > > > >> > > > > > > > epoch >> > > > > > > > >> > > > > > > > > > are >> > > > > > > > >> > > > > > > > > > > > > from >> > > > > > > > >> > > > > > > > > > > > > > the same producer and the same >> > > > > > transaction. >> > > > > > > > >> Then we >> > > > > > > > >> > > > don't >> > > > > > > > >> > > > > > > have >> > > > > > > > >> > > > > > > > to >> > > > > > > > >> > > > > > > > > > > > return >> > > > > > > > >> > > > > > > > > > > > > a >> > > > > > > > >> > > > > > > > > > > > > > fenced error but can handle >> > > gracefully >> > > > > as >> > > > > > > > >> described >> > > > > > > > >> > > in >> > > > > > > > >> > > > > the >> > > > > > > > >> > > > > > > KIP. >> > > > > > > > >> > > > > > > > > > > > > > I don't think a boolean is >> useful >> > > > since >> > > > > it >> > > > > > > is >> > > > > > > > >> > > directly >> > > > > > > > >> > > > > > > encoded >> > > > > > > > >> > > > > > > > by >> > > > > > > > >> > > > > > > > > > the >> > > > > > > > >> > > > > > > > > > > > > > existence or lack of the tagged >> > > field >> > > > > > being >> > > > > > > > >> > written. >> > > > > > > > >> > > > > > > > > > > > > > In the prepare marker we will >> have >> > > the >> > > > > > same >> > > > > > > > >> > producer >> > > > > > > > >> > > ID >> > > > > > > > >> > > > > in >> > > > > > > > >> > > > > > > the >> > > > > > > > >> > > > > > > > > > > > non-tagged >> > > > > > > > >> > > > > > > > > > > > > > field. In the Complete state we >> > may >> > > > not. >> > > > > > > > >> > > > > > > > > > > > > > I'm not sure why the ongoing >> state >> > > > > matters >> > > > > > > for >> > > > > > > > >> this >> > > > > > > > >> > > > KIP. >> > > > > > > > >> > > > > It >> > > > > > > > >> > > > > > > > does >> > > > > > > > >> > > > > > > > > > > matter >> > > > > > > > >> > > > > > > > > > > > > for >> > > > > > > > >> > > > > > > > > > > > > > KIP-939. >> > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > I'm not sure what you are >> > referring >> > > to >> > > > > > about >> > > > > > > > >> > writing >> > > > > > > > >> > > > the >> > > > > > > > >> > > > > > > > previous >> > > > > > > > >> > > > > > > > > > > > > producer >> > > > > > > > >> > > > > > > > > > > > > > ID in the prepare marker. This >> is >> > > not >> > > > in >> > > > > > the >> > > > > > > > >> KIP. >> > > > > > > > >> > > > > > > > > > > > > > In the overflow case, we write >> the >> > > > > > > > >> nextProducerId >> > > > > > > > >> > in >> > > > > > > > >> > > > the >> > > > > > > > >> > > > > > > > prepare >> > > > > > > > >> > > > > > > > > > > state. >> > > > > > > > >> > > > > > > > > > > > > > This is so we know what we >> > assigned >> > > > when >> > > > > > we >> > > > > > > > >> reload >> > > > > > > > >> > > the >> > > > > > > > >> > > > > > > > > transaction >> > > > > > > > >> > > > > > > > > > > log. >> > > > > > > > >> > > > > > > > > > > > > > Once we complete, we transition >> > this >> > > > ID >> > > > > to >> > > > > > > the >> > > > > > > > >> main >> > > > > > > > >> > > > > > > (non-tagged >> > > > > > > > >> > > > > > > > > > > field) >> > > > > > > > >> > > > > > > > > > > > > and >> > > > > > > > >> > > > > > > > > > > > > > have the previous producer ID >> > field >> > > > > filled >> > > > > > > in. >> > > > > > > > >> This >> > > > > > > > >> > > is >> > > > > > > > >> > > > so >> > > > > > > > >> > > > > > we >> > > > > > > > >> > > > > > > > can >> > > > > > > > >> > > > > > > > > > > > identify >> > > > > > > > >> > > > > > > > > > > > > > in a retry case the operation >> > > > completed >> > > > > > > > >> > successfully >> > > > > > > > >> > > > and >> > > > > > > > >> > > > > we >> > > > > > > > >> > > > > > > > don't >> > > > > > > > >> > > > > > > > > > > fence >> > > > > > > > >> > > > > > > > > > > > > our >> > > > > > > > >> > > > > > > > > > > > > > producer. The downgrade >> > > compatibility >> > > > I >> > > > > > > > mention >> > > > > > > > >> is >> > > > > > > > >> > > that >> > > > > > > > >> > > > > we >> > > > > > > > >> > > > > > > keep >> > > > > > > > >> > > > > > > > > the >> > > > > > > > >> > > > > > > > > > > > same >> > > > > > > > >> > > > > > > > > > > > > > producer ID and epoch in the >> main >> > > > > > > (non-tagged) >> > > > > > > > >> > fields >> > > > > > > > >> > > > as >> > > > > > > > >> > > > > we >> > > > > > > > >> > > > > > > did >> > > > > > > > >> > > > > > > > > > > before >> > > > > > > > >> > > > > > > > > > > > > the >> > > > > > > > >> > > > > > > > > > > > > > code on the server side. If the >> > > server >> > > > > > > > >> downgrades, >> > > > > > > > >> > we >> > > > > > > > >> > > > are >> > > > > > > > >> > > > > > > still >> > > > > > > > >> > > > > > > > > > > > > compatible. >> > > > > > > > >> > > > > > > > > > > > > > This addresses both the prepare >> > and >> > > > > > complete >> > > > > > > > >> state >> > > > > > > > >> > > > > > > downgrades. >> > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > Justine >> > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > On Fri, Jan 12, 2024 at >> 10:21 AM >> > Jun >> > > > Rao >> > > > > > > > >> > > > > > > > > <j...@confluent.io.invalid >> > > > > > > > >> > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > wrote: >> > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > Hi, Justine, >> > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > Thanks for the reply. Sorry >> for >> > > the >> > > > > > > delay. I >> > > > > > > > >> > have a >> > > > > > > > >> > > > few >> > > > > > > > >> > > > > > > more >> > > > > > > > >> > > > > > > > > > > > comments. >> > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > 110. I think the motivation >> > > section >> > > > > > could >> > > > > > > be >> > > > > > > > >> > > > improved. >> > > > > > > > >> > > > > > One >> > > > > > > > >> > > > > > > of >> > > > > > > > >> > > > > > > > > the >> > > > > > > > >> > > > > > > > > > > > > > > motivations listed by the >> KIP is >> > > > "This >> > > > > > can >> > > > > > > > >> happen >> > > > > > > > >> > > > when >> > > > > > > > >> > > > > a >> > > > > > > > >> > > > > > > > > message >> > > > > > > > >> > > > > > > > > > > gets >> > > > > > > > >> > > > > > > > > > > > > > stuck >> > > > > > > > >> > > > > > > > > > > > > > > or delayed due to networking >> > > issues >> > > > > or a >> > > > > > > > >> network >> > > > > > > > >> > > > > > partition, >> > > > > > > > >> > > > > > > > the >> > > > > > > > >> > > > > > > > > > > > > > transaction >> > > > > > > > >> > > > > > > > > > > > > > > aborts, and then the delayed >> > > message >> > > > > > > finally >> > > > > > > > >> > comes >> > > > > > > > >> > > > > in.". >> > > > > > > > >> > > > > > > This >> > > > > > > > >> > > > > > > > > > seems >> > > > > > > > >> > > > > > > > > > > > not >> > > > > > > > >> > > > > > > > > > > > > > > very accurate. Without >> KIP-890, >> > > > > > currently, >> > > > > > > > if >> > > > > > > > >> the >> > > > > > > > >> > > > > > > coordinator >> > > > > > > > >> > > > > > > > > > times >> > > > > > > > >> > > > > > > > > > > > out >> > > > > > > > >> > > > > > > > > > > > > > and >> > > > > > > > >> > > > > > > > > > > > > > > aborts an ongoing >> transaction, >> > it >> > > > > > already >> > > > > > > > >> bumps >> > > > > > > > >> > up >> > > > > > > > >> > > > the >> > > > > > > > >> > > > > > > epoch >> > > > > > > > >> > > > > > > > in >> > > > > > > > >> > > > > > > > > > the >> > > > > > > > >> > > > > > > > > > > > > > marker, >> > > > > > > > >> > > > > > > > > > > > > > > which prevents the delayed >> > produce >> > > > > > message >> > > > > > > > >> from >> > > > > > > > >> > > being >> > > > > > > > >> > > > > > added >> > > > > > > > >> > > > > > > > to >> > > > > > > > >> > > > > > > > > > the >> > > > > > > > >> > > > > > > > > > > > user >> > > > > > > > >> > > > > > > > > > > > > > > partition. What can cause a >> > > hanging >> > > > > > > > >> transaction >> > > > > > > > >> > is >> > > > > > > > >> > > > that >> > > > > > > > >> > > > > > the >> > > > > > > > >> > > > > > > > > > > producer >> > > > > > > > >> > > > > > > > > > > > > > > completes (either aborts or >> > > > commits) a >> > > > > > > > >> > transaction >> > > > > > > > >> > > > > before >> > > > > > > > >> > > > > > > > > > > receiving a >> > > > > > > > >> > > > > > > > > > > > > > > successful ack on messages >> > > published >> > > > > in >> > > > > > > the >> > > > > > > > >> same >> > > > > > > > >> > > txn. >> > > > > > > > >> > > > > In >> > > > > > > > >> > > > > > > this >> > > > > > > > >> > > > > > > > > > case, >> > > > > > > > >> > > > > > > > > > > > > it's >> > > > > > > > >> > > > > > > > > > > > > > > possible for the delayed >> message >> > > to >> > > > be >> > > > > > > > >> appended >> > > > > > > > >> > to >> > > > > > > > >> > > > the >> > > > > > > > >> > > > > > > > > partition >> > > > > > > > >> > > > > > > > > > > > after >> > > > > > > > >> > > > > > > > > > > > > > the >> > > > > > > > >> > > > > > > > > > > > > > > marker, causing a >> transaction to >> > > > hang. >> > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > A similar issue (not >> mentioned >> > in >> > > > the >> > > > > > > > >> motivation) >> > > > > > > > >> > > > could >> > > > > > > > >> > > > > > > > happen >> > > > > > > > >> > > > > > > > > on >> > > > > > > > >> > > > > > > > > > > the >> > > > > > > > >> > > > > > > > > > > > > > > marker in the coordinator's >> log. >> > > For >> > > > > > > > example, >> > > > > > > > >> > it's >> > > > > > > > >> > > > > > possible >> > > > > > > > >> > > > > > > > for >> > > > > > > > >> > > > > > > > > > an >> > > > > > > > >> > > > > > > > > > > > > > > EndTxnRequest to be delayed >> on >> > the >> > > > > > > > >> coordinator. >> > > > > > > > >> > By >> > > > > > > > >> > > > the >> > > > > > > > >> > > > > > time >> > > > > > > > >> > > > > > > > the >> > > > > > > > >> > > > > > > > > > > > delayed >> > > > > > > > >> > > > > > > > > > > > > > > EndTxnRequest is processed, >> it's >> > > > > > possible >> > > > > > > > that >> > > > > > > > >> > the >> > > > > > > > >> > > > > > previous >> > > > > > > > >> > > > > > > > txn >> > > > > > > > >> > > > > > > > > > has >> > > > > > > > >> > > > > > > > > > > > > > already >> > > > > > > > >> > > > > > > > > > > > > > > completed and a new txn has >> > > started. >> > > > > > > > >> Currently, >> > > > > > > > >> > > since >> > > > > > > > >> > > > > the >> > > > > > > > >> > > > > > > > epoch >> > > > > > > > >> > > > > > > > > > is >> > > > > > > > >> > > > > > > > > > > > not >> > > > > > > > >> > > > > > > > > > > > > > > bumped on every txn, the >> delayed >> > > > > > > > EndTxnRequest >> > > > > > > > >> > will >> > > > > > > > >> > > > add >> > > > > > > > >> > > > > > an >> > > > > > > > >> > > > > > > > > > > unexpected >> > > > > > > > >> > > > > > > > > > > > > > > prepare marker (and >> eventually a >> > > > > > complete >> > > > > > > > >> marker) >> > > > > > > > >> > > to >> > > > > > > > >> > > > > the >> > > > > > > > >> > > > > > > > > ongoing >> > > > > > > > >> > > > > > > > > > > txn. >> > > > > > > > >> > > > > > > > > > > > > > This >> > > > > > > > >> > > > > > > > > > > > > > > won't cause the transaction >> to >> > > hang, >> > > > > but >> > > > > > > it >> > > > > > > > >> will >> > > > > > > > >> > > > break >> > > > > > > > >> > > > > > the >> > > > > > > > >> > > > > > > > EoS >> > > > > > > > >> > > > > > > > > > > > > semantic. >> > > > > > > > >> > > > > > > > > > > > > > > The proposal in this KIP will >> > > > address >> > > > > > this >> > > > > > > > >> issue >> > > > > > > > >> > > too. >> > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > 101. "However, I was writing >> it >> > so >> > > > > that >> > > > > > we >> > > > > > > > can >> > > > > > > > >> > > > > > distinguish >> > > > > > > > >> > > > > > > > > > between >> > > > > > > > >> > > > > > > > > > > > > > > old clients where we don't >> have >> > > the >> > > > > > > ability >> > > > > > > > do >> > > > > > > > >> > this >> > > > > > > > >> > > > > > > operation >> > > > > > > > >> > > > > > > > > and >> > > > > > > > >> > > > > > > > > > > new >> > > > > > > > >> > > > > > > > > > > > > > > clients that can. (Old >> clients >> > > don't >> > > > > > bump >> > > > > > > > the >> > > > > > > > >> > epoch >> > > > > > > > >> > > > on >> > > > > > > > >> > > > > > > > commit, >> > > > > > > > >> > > > > > > > > so >> > > > > > > > >> > > > > > > > > > > we >> > > > > > > > >> > > > > > > > > > > > > > can't >> > > > > > > > >> > > > > > > > > > > > > > > say for sure the write >> belongs >> > to >> > > > the >> > > > > > > given >> > > > > > > > >> > > > > > transaction)." >> > > > > > > > >> > > > > > > > > > > > > > > 101.1 I am wondering why we >> need >> > > to >> > > > > > > > >> distinguish >> > > > > > > > >> > > > whether >> > > > > > > > >> > > > > > the >> > > > > > > > >> > > > > > > > > > marker >> > > > > > > > >> > > > > > > > > > > is >> > > > > > > > >> > > > > > > > > > > > > > > written by the old and the >> new >> > > > client. >> > > > > > > Could >> > > > > > > > >> you >> > > > > > > > >> > > > > describe >> > > > > > > > >> > > > > > > > what >> > > > > > > > >> > > > > > > > > we >> > > > > > > > >> > > > > > > > > > > do >> > > > > > > > >> > > > > > > > > > > > > > > differently if we know the >> > marker >> > > is >> > > > > > > written >> > > > > > > > >> by >> > > > > > > > >> > the >> > > > > > > > >> > > > new >> > > > > > > > >> > > > > > > > client? >> > > > > > > > >> > > > > > > > > > > > > > > 101.2 If we do need a way to >> > > > > distinguish >> > > > > > > > >> whether >> > > > > > > > >> > > the >> > > > > > > > >> > > > > > marker >> > > > > > > > >> > > > > > > > is >> > > > > > > > >> > > > > > > > > > > > written >> > > > > > > > >> > > > > > > > > > > > > by >> > > > > > > > >> > > > > > > > > > > > > > > the old and the new client. >> > Would >> > > it >> > > > > be >> > > > > > > > >> simpler >> > > > > > > > >> > to >> > > > > > > > >> > > > just >> > > > > > > > >> > > > > > > > > > introduce a >> > > > > > > > >> > > > > > > > > > > > > > boolean >> > > > > > > > >> > > > > > > > > > > > > > > field instead of indirectly >> > > through >> > > > > the >> > > > > > > > >> previous >> > > > > > > > >> > > > > produce >> > > > > > > > >> > > > > > ID >> > > > > > > > >> > > > > > > > > > field? >> > > > > > > > >> > > > > > > > > > > > > > > 101.3 It's not clear to me >> why >> > we >> > > > only >> > > > > > add >> > > > > > > > the >> > > > > > > > >> > > > previous >> > > > > > > > >> > > > > > > > produce >> > > > > > > > >> > > > > > > > > > ID >> > > > > > > > >> > > > > > > > > > > > > field >> > > > > > > > >> > > > > > > > > > > > > > in >> > > > > > > > >> > > > > > > > > > > > > > > the complete marker, but not >> in >> > > the >> > > > > > > prepare >> > > > > > > > >> > marker. >> > > > > > > > >> > > > If >> > > > > > > > >> > > > > we >> > > > > > > > >> > > > > > > > want >> > > > > > > > >> > > > > > > > > to >> > > > > > > > >> > > > > > > > > > > > know >> > > > > > > > >> > > > > > > > > > > > > > > whether a marker is written >> by >> > the >> > > > new >> > > > > > > > client >> > > > > > > > >> or >> > > > > > > > >> > > not, >> > > > > > > > >> > > > > it >> > > > > > > > >> > > > > > > > seems >> > > > > > > > >> > > > > > > > > > that >> > > > > > > > >> > > > > > > > > > > > we >> > > > > > > > >> > > > > > > > > > > > > > want >> > > > > > > > >> > > > > > > > > > > > > > > to do this consistently for >> all >> > > > > markers. >> > > > > > > > >> > > > > > > > > > > > > > > 101.4 What about the >> > > > > TransactionLogValue >> > > > > > > > >> record >> > > > > > > > >> > > > > > > representing >> > > > > > > > >> > > > > > > > > the >> > > > > > > > >> > > > > > > > > > > > > ongoing >> > > > > > > > >> > > > > > > > > > > > > > > state? Should we also >> > distinguish >> > > > > > whether >> > > > > > > > it's >> > > > > > > > >> > > > written >> > > > > > > > >> > > > > by >> > > > > > > > >> > > > > > > the >> > > > > > > > >> > > > > > > > > old >> > > > > > > > >> > > > > > > > > > > or >> > > > > > > > >> > > > > > > > > > > > > the >> > > > > > > > >> > > > > > > > > > > > > > > new client? >> > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > 102. In the overflow case, >> it's >> > > > still >> > > > > > not >> > > > > > > > >> clear >> > > > > > > > >> > to >> > > > > > > > >> > > me >> > > > > > > > >> > > > > why >> > > > > > > > >> > > > > > > we >> > > > > > > > >> > > > > > > > > > write >> > > > > > > > >> > > > > > > > > > > > the >> > > > > > > > >> > > > > > > > > > > > > > > previous produce Id in the >> > prepare >> > > > > > marker >> > > > > > > > >> while >> > > > > > > > >> > > > writing >> > > > > > > > >> > > > > > the >> > > > > > > > >> > > > > > > > > next >> > > > > > > > >> > > > > > > > > > > > > produce >> > > > > > > > >> > > > > > > > > > > > > > Id >> > > > > > > > >> > > > > > > > > > > > > > > in the complete marker. You >> > > > mentioned >> > > > > > that >> > > > > > > > >> it's >> > > > > > > > >> > for >> > > > > > > > >> > > > > > > > > downgrading. >> > > > > > > > >> > > > > > > > > > > > > However, >> > > > > > > > >> > > > > > > > > > > > > > > we could downgrade with >> either >> > the >> > > > > > prepare >> > > > > > > > >> marker >> > > > > > > > >> > > or >> > > > > > > > >> > > > > the >> > > > > > > > >> > > > > > > > > complete >> > > > > > > > >> > > > > > > > > > > > > marker. >> > > > > > > > >> > > > > > > > > > > > > > > In either case, the >> downgraded >> > > > > > coordinator >> > > > > > > > >> should >> > > > > > > > >> > > see >> > > > > > > > >> > > > > the >> > > > > > > > >> > > > > > > > same >> > > > > > > > >> > > > > > > > > > > > produce >> > > > > > > > >> > > > > > > > > > > > > id >> > > > > > > > >> > > > > > > > > > > > > > > (probably the previous >> produce >> > > Id), >> > > > > > right? >> > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > Jun >> > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > On Wed, Dec 20, 2023 at >> 6:00 PM >> > > > > Justine >> > > > > > > > Olshan >> > > > > > > > >> > > > > > > > > > > > > > > <jols...@confluent.io.invalid >> > >> > > > > > > > >> > > > > > > > > > > > > > > wrote: >> > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > Hey Jun, >> > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > Thanks for taking a look at >> > the >> > > > KIP >> > > > > > > again. >> > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > 100. For the epoch overflow >> > > case, >> > > > > only >> > > > > > > the >> > > > > > > > >> > marker >> > > > > > > > >> > > > > will >> > > > > > > > >> > > > > > > have >> > > > > > > > >> > > > > > > > > max >> > > > > > > > >> > > > > > > > > > > > > epoch. >> > > > > > > > >> > > > > > > > > > > > > > > This >> > > > > > > > >> > > > > > > > > > > > > > > > keeps the behavior of the >> rest >> > > of >> > > > > the >> > > > > > > > >> markers >> > > > > > > > >> > > where >> > > > > > > > >> > > > > the >> > > > > > > > >> > > > > > > > last >> > > > > > > > >> > > > > > > > > > > marker >> > > > > > > > >> > > > > > > > > > > > > is >> > > > > > > > >> > > > > > > > > > > > > > > the >> > > > > > > > >> > > > > > > > > > > > > > > > epoch of the transaction >> > > records + >> > > > > 1. >> > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > 101. You are correct that >> we >> > > don't >> > > > > > need >> > > > > > > to >> > > > > > > > >> > write >> > > > > > > > >> > > > the >> > > > > > > > >> > > > > > > > producer >> > > > > > > > >> > > > > > > > > > ID >> > > > > > > > >> > > > > > > > > > > > > since >> > > > > > > > >> > > > > > > > > > > > > > it >> > > > > > > > >> > > > > > > > > > > > > > > > is the same. However, I was >> > > > writing >> > > > > it >> > > > > > > so >> > > > > > > > >> that >> > > > > > > > >> > we >> > > > > > > > >> > > > can >> > > > > > > > >> > > > > > > > > > distinguish >> > > > > > > > >> > > > > > > > > > > > > > between >> > > > > > > > >> > > > > > > > > > > > > > > > old clients where we don't >> > have >> > > > the >> > > > > > > > ability >> > > > > > > > >> do >> > > > > > > > >> > > this >> > > > > > > > >> > > > > > > > operation >> > > > > > > > >> > > > > > > > > > and >> > > > > > > > >> > > > > > > > > > > > new >> > > > > > > > >> > > > > > > > > > > > > > > > clients that can. (Old >> clients >> > > > don't >> > > > > > > bump >> > > > > > > > >> the >> > > > > > > > >> > > epoch >> > > > > > > > >> > > > > on >> > > > > > > > >> > > > > > > > > commit, >> > > > > > > > >> > > > > > > > > > so >> > > > > > > > >> > > > > > > > > > > > we >> > > > > > > > >> > > > > > > > > > > > > > > can't >> > > > > > > > >> > > > > > > > > > > > > > > > say for sure the write >> belongs >> > > to >> > > > > the >> > > > > > > > given >> > > > > > > > >> > > > > > transaction). >> > > > > > > > >> > > > > > > > If >> > > > > > > > >> > > > > > > > > we >> > > > > > > > >> > > > > > > > > > > > > receive >> > > > > > > > >> > > > > > > > > > > > > > > an >> > > > > > > > >> > > > > > > > > > > > > > > > EndTxn request from a new >> > > client, >> > > > we >> > > > > > > will >> > > > > > > > >> fill >> > > > > > > > >> > > this >> > > > > > > > >> > > > > > > field. >> > > > > > > > >> > > > > > > > We >> > > > > > > > >> > > > > > > > > > can >> > > > > > > > >> > > > > > > > > > > > > > > guarantee >> > > > > > > > >> > > > > > > > > > > > > > > > that any EndTxn requests >> with >> > > the >> > > > > same >> > > > > > > > epoch >> > > > > > > > >> > are >> > > > > > > > >> > > > from >> > > > > > > > >> > > > > > the >> > > > > > > > >> > > > > > > > > same >> > > > > > > > >> > > > > > > > > > > > > producer >> > > > > > > > >> > > > > > > > > > > > > > > and >> > > > > > > > >> > > > > > > > > > > > > > > > the same transaction. >> > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > 102. In prepare phase, we >> have >> > > the >> > > > > > same >> > > > > > > > >> > producer >> > > > > > > > >> > > ID >> > > > > > > > >> > > > > and >> > > > > > > > >> > > > > > > > epoch >> > > > > > > > >> > > > > > > > > > we >> > > > > > > > >> > > > > > > > > > > > > always >> > > > > > > > >> > > > > > > > > > > > > > > > had. It is the producer ID >> and >> > > > epoch >> > > > > > > that >> > > > > > > > >> are >> > > > > > > > >> > on >> > > > > > > > >> > > > the >> > > > > > > > >> > > > > > > > marker. >> > > > > > > > >> > > > > > > > > In >> > > > > > > > >> > > > > > > > > > > > > commit >> > > > > > > > >> > > > > > > > > > > > > > > > phase, we stay the same >> unless >> > > it >> > > > is >> > > > > > the >> > > > > > > > >> > overflow >> > > > > > > > >> > > > > case. >> > > > > > > > >> > > > > > > In >> > > > > > > > >> > > > > > > > > that >> > > > > > > > >> > > > > > > > > > > > case, >> > > > > > > > >> > > > > > > > > > > > > > we >> > > > > > > > >> > > > > > > > > > > > > > > > set the producer ID to the >> new >> > > one >> > > > > we >> > > > > > > > >> generated >> > > > > > > > >> > > and >> > > > > > > > >> > > > > > epoch >> > > > > > > > >> > > > > > > > to >> > > > > > > > >> > > > > > > > > 0 >> > > > > > > > >> > > > > > > > > > > > after >> > > > > > > > >> > > > > > > > > > > > > > > > complete. This is for >> > downgrade >> > > > > > > > >> compatibility. >> > > > > > > > >> > > The >> > > > > > > > >> > > > > > tagged >> > > > > > > > >> > > > > > > > > > fields >> > > > > > > > >> > > > > > > > > > > > are >> > > > > > > > >> > > > > > > > > > > > > > just >> > > > > > > > >> > > > > > > > > > > > > > > > safety guards for retries >> and >> > > > > > failovers. >> > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > In prepare phase for epoch >> > > > overflow >> > > > > > case >> > > > > > > > >> only >> > > > > > > > >> > we >> > > > > > > > >> > > > > store >> > > > > > > > >> > > > > > > the >> > > > > > > > >> > > > > > > > > next >> > > > > > > > >> > > > > > > > > > > > > > producer >> > > > > > > > >> > > > > > > > > > > > > > > > ID. This is for the case >> where >> > > we >> > > > > > reload >> > > > > > > > the >> > > > > > > > >> > > > > > transaction >> > > > > > > > >> > > > > > > > > > > > coordinator >> > > > > > > > >> > > > > > > > > > > > > in >> > > > > > > > >> > > > > > > > > > > > > > > > prepare state. Once the >> > > > transaction >> > > > > is >> > > > > > > > >> > committed, >> > > > > > > > >> > > > we >> > > > > > > > >> > > > > > can >> > > > > > > > >> > > > > > > > use >> > > > > > > > >> > > > > > > > > > the >> > > > > > > > >> > > > > > > > > > > > > > producer >> > > > > > > > >> > > > > > > > > > > > > > > > ID the client already is >> > using. >> > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > In commit phase, we store >> the >> > > > > previous >> > > > > > > > >> producer >> > > > > > > > >> > > ID >> > > > > > > > >> > > > in >> > > > > > > > >> > > > > > > case >> > > > > > > > >> > > > > > > > of >> > > > > > > > >> > > > > > > > > > > > > retries. >> > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > I think it is easier to >> think >> > of >> > > > it >> > > > > as >> > > > > > > > just >> > > > > > > > >> how >> > > > > > > > >> > > we >> > > > > > > > >> > > > > were >> > > > > > > > >> > > > > > > > > storing >> > > > > > > > >> > > > > > > > > > > > > > producer >> > > > > > > > >> > > > > > > > > > > > > > > ID >> > > > > > > > >> > > > > > > > > > > > > > > > and epoch before, with some >> > > extra >> > > > > > > > bookeeping >> > > > > > > > >> > and >> > > > > > > > >> > > > edge >> > > > > > > > >> > > > > > > case >> > > > > > > > >> > > > > > > > > > > handling >> > > > > > > > >> > > > > > > > > > > > > in >> > > > > > > > >> > > > > > > > > > > > > > > the >> > > > > > > > >> > > > > > > > > > > > > > > > tagged fields. We have to >> do >> > it >> > > > this >> > > > > > way >> > > > > > > > for >> > > > > > > > >> > > > > > > compatibility >> > > > > > > > >> > > > > > > > > with >> > > > > > > > >> > > > > > > > > > > > > > > downgrades. >> > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > 103. Next producer ID is >> for >> > > > prepare >> > > > > > > > status >> > > > > > > > >> and >> > > > > > > > >> > > > > > previous >> > > > > > > > >> > > > > > > > > > producer >> > > > > > > > >> > > > > > > > > > > > ID >> > > > > > > > >> > > > > > > > > > > > > is >> > > > > > > > >> > > > > > > > > > > > > > > for >> > > > > > > > >> > > > > > > > > > > > > > > > after complete. The reason >> why >> > > we >> > > > > need >> > > > > > > two >> > > > > > > > >> > > separate >> > > > > > > > >> > > > > > > > (tagged) >> > > > > > > > >> > > > > > > > > > > fields >> > > > > > > > >> > > > > > > > > > > > > is >> > > > > > > > >> > > > > > > > > > > > > > > for >> > > > > > > > >> > > > > > > > > > > > > > > > backwards compatibility. We >> > need >> > > > to >> > > > > > keep >> > > > > > > > the >> > > > > > > > >> > same >> > > > > > > > >> > > > > > > semantics >> > > > > > > > >> > > > > > > > > for >> > > > > > > > >> > > > > > > > > > > the >> > > > > > > > >> > > > > > > > > > > > > > > > non-tagged field in case we >> > > > > downgrade. >> > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > 104. We set the fields as >> we >> > do >> > > in >> > > > > the >> > > > > > > > >> > > > transactional >> > > > > > > > >> > > > > > > state >> > > > > > > > >> > > > > > > > > (as >> > > > > > > > >> > > > > > > > > > we >> > > > > > > > >> > > > > > > > > > > > > need >> > > > > > > > >> > > > > > > > > > > > > > to >> > > > > > > > >> > > > > > > > > > > > > > > > do this for compatibility >> -- >> > if >> > > we >> > > > > > > > >> downgrade, >> > > > > > > > >> > we >> > > > > > > > >> > > > will >> > > > > > > > >> > > > > > > only >> > > > > > > > >> > > > > > > > > have >> > > > > > > > >> > > > > > > > > > > the >> > > > > > > > >> > > > > > > > > > > > > > > > non-tagged fields) It will >> be >> > > the >> > > > > old >> > > > > > > > >> producer >> > > > > > > > >> > ID >> > > > > > > > >> > > > and >> > > > > > > > >> > > > > > max >> > > > > > > > >> > > > > > > > > > epoch. >> > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > Hope this helps. Let me >> know >> > if >> > > > you >> > > > > > have >> > > > > > > > >> > further >> > > > > > > > >> > > > > > > questions. >> > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > Justine >> > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > On Wed, Dec 20, 2023 at >> > 3:33 PM >> > > > Jun >> > > > > > Rao >> > > > > > > > >> > > > > > > > > > <j...@confluent.io.invalid >> > > > > > > > >> > > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > wrote: >> > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > > Hi, Justine, >> > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > > It seems that you have >> made >> > > some >> > > > > > > changes >> > > > > > > > >> to >> > > > > > > > >> > > > KIP-890 >> > > > > > > > >> > > > > > > since >> > > > > > > > >> > > > > > > > > the >> > > > > > > > >> > > > > > > > > > > > vote. >> > > > > > > > >> > > > > > > > > > > > > > In >> > > > > > > > >> > > > > > > > > > > > > > > > > particular, we are >> changing >> > > the >> > > > > > format >> > > > > > > > of >> > > > > > > > >> > > > > > > > > > TransactionLogValue. >> > > > > > > > >> > > > > > > > > > > A >> > > > > > > > >> > > > > > > > > > > > > few >> > > > > > > > >> > > > > > > > > > > > > > > > > comments related to that. >> > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > > 100. Just to be clear. >> The >> > > > > overflow >> > > > > > > case >> > > > > > > > >> > (i.e. >> > > > > > > > >> > > > > when a >> > > > > > > > >> > > > > > > new >> > > > > > > > >> > > > > > > > > > > > > producerId >> > > > > > > > >> > > > > > > > > > > > > > is >> > > > > > > > >> > > > > > > > > > > > > > > > > generated) is when the >> > current >> > > > > epoch >> > > > > > > > >> equals >> > > > > > > > >> > to >> > > > > > > > >> > > > max >> > > > > > > > >> > > > > - >> > > > > > > > >> > > > > > 1 >> > > > > > > > >> > > > > > > > and >> > > > > > > > >> > > > > > > > > > not >> > > > > > > > >> > > > > > > > > > > > max? >> > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > > 101. For the "not epoch >> > > > overflow" >> > > > > > > case, >> > > > > > > > we >> > > > > > > > >> > > write >> > > > > > > > >> > > > > the >> > > > > > > > >> > > > > > > > > previous >> > > > > > > > >> > > > > > > > > > > ID >> > > > > > > > >> > > > > > > > > > > > in >> > > > > > > > >> > > > > > > > > > > > > > the >> > > > > > > > >> > > > > > > > > > > > > > > > > tagged field in the >> complete >> > > > > phase. >> > > > > > Do >> > > > > > > > we >> > > > > > > > >> > need >> > > > > > > > >> > > to >> > > > > > > > >> > > > > do >> > > > > > > > >> > > > > > > that >> > > > > > > > >> > > > > > > > > > since >> > > > > > > > >> > > > > > > > > > > > > > produce >> > > > > > > > >> > > > > > > > > > > > > > > > id >> > > > > > > > >> > > > > > > > > > > > > > > > > doesn't change in this >> case? >> > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > > 102. It seems that the >> > meaning >> > > > for >> > > > > > the >> > > > > > > > >> > > > > > > > > > ProducerId/ProducerEpoch >> > > > > > > > >> > > > > > > > > > > > > > fields >> > > > > > > > >> > > > > > > > > > > > > > > in >> > > > > > > > >> > > > > > > > > > > > > > > > > TransactionLogValue >> changes >> > > > > > depending >> > > > > > > on >> > > > > > > > >> the >> > > > > > > > >> > > > > > > > > > TransactionStatus. >> > > > > > > > >> > > > > > > > > > > > > When >> > > > > > > > >> > > > > > > > > > > > > > > > > the TransactionStatus is >> > > > ongoing, >> > > > > > they >> > > > > > > > >> > > represent >> > > > > > > > >> > > > > the >> > > > > > > > >> > > > > > > > > current >> > > > > > > > >> > > > > > > > > > > > > > ProducerId >> > > > > > > > >> > > > > > > > > > > > > > > > and >> > > > > > > > >> > > > > > > > > > > > > > > > > the current >> ProducerEpoch. >> > > When >> > > > > the >> > > > > > > > >> > > > > TransactionStatus >> > > > > > > > >> > > > > > > is >> > > > > > > > >> > > > > > > > > > > > > > > > > >> PrepareCommit/PrepareAbort, >> > > they >> > > > > > > > represent >> > > > > > > > >> > the >> > > > > > > > >> > > > > > current >> > > > > > > > >> > > > > > > > > > > ProducerId >> > > > > > > > >> > > > > > > > > > > > > and >> > > > > > > > >> > > > > > > > > > > > > > > the >> > > > > > > > >> > > > > > > > > > > > > > > > > next ProducerEpoch. When >> the >> > > > > > > > >> > TransactionStatus >> > > > > > > > >> > > is >> > > > > > > > >> > > > > > > > > > Commit/Abort, >> > > > > > > > >> > > > > > > > > > > > > they >> > > > > > > > >> > > > > > > > > > > > > > > > > further depend on whether >> > the >> > > > > epoch >> > > > > > > > >> overflows >> > > > > > > > >> > > or >> > > > > > > > >> > > > > not. >> > > > > > > > >> > > > > > > If >> > > > > > > > >> > > > > > > > > > there >> > > > > > > > >> > > > > > > > > > > is >> > > > > > > > >> > > > > > > > > > > > > no >> > > > > > > > >> > > > > > > > > > > > > > > > > overflow, they represent >> > the >> > > > > > current >> > > > > > > > >> > > ProducerId >> > > > > > > > >> > > > > and >> > > > > > > > >> > > > > > > the >> > > > > > > > >> > > > > > > > > next >> > > > > > > > >> > > > > > > > > > > > > > > > ProducerEpoch >> > > > > > > > >> > > > > > > > > > > > > > > > > (max). Otherwise, they >> > > represent >> > > > > the >> > > > > > > > newly >> > > > > > > > >> > > > > generated >> > > > > > > > >> > > > > > > > > > ProducerId >> > > > > > > > >> > > > > > > > > > > > > and a >> > > > > > > > >> > > > > > > > > > > > > > > > > ProducerEpoch of 0. Is >> that >> > > > right? >> > > > > > > This >> > > > > > > > >> seems >> > > > > > > > >> > > not >> > > > > > > > >> > > > > > easy >> > > > > > > > >> > > > > > > to >> > > > > > > > >> > > > > > > > > > > > > understand. >> > > > > > > > >> > > > > > > > > > > > > > > > Could >> > > > > > > > >> > > > > > > > > > > > > > > > > we provide some examples >> > like >> > > > what >> > > > > > > Artem >> > > > > > > > >> has >> > > > > > > > >> > > done >> > > > > > > > >> > > > > in >> > > > > > > > >> > > > > > > > > KIP-939? >> > > > > > > > >> > > > > > > > > > > > Have >> > > > > > > > >> > > > > > > > > > > > > we >> > > > > > > > >> > > > > > > > > > > > > > > > > considered a simpler >> design >> > > > where >> > > > > > > > >> > > > > > > > ProducerId/ProducerEpoch >> > > > > > > > >> > > > > > > > > > > always >> > > > > > > > >> > > > > > > > > > > > > > > > represent >> > > > > > > > >> > > > > > > > > > > > > > > > > the same value (e.g. for >> the >> > > > > current >> > > > > > > > >> > > transaction) >> > > > > > > > >> > > > > > > > > independent >> > > > > > > > >> > > > > > > > > > > of >> > > > > > > > >> > > > > > > > > > > > > the >> > > > > > > > >> > > > > > > > > > > > > > > > > TransactionStatus and >> epoch >> > > > > > overflow? >> > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > > 103. It's not clear to me >> > why >> > > we >> > > > > > need >> > > > > > > 3 >> > > > > > > > >> > fields: >> > > > > > > > >> > > > > > > > ProducerId, >> > > > > > > > >> > > > > > > > > > > > > > > > PrevProducerId, >> > > > > > > > >> > > > > > > > > > > > > > > > > NextProducerId. Could we >> > just >> > > > have >> > > > > > > > >> ProducerId >> > > > > > > > >> > > and >> > > > > > > > >> > > > > > > > > > > NextProducerId? >> > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > > 104. For >> > > WriteTxnMarkerRequests, >> > > > > if >> > > > > > > the >> > > > > > > > >> > > producer >> > > > > > > > >> > > > > > epoch >> > > > > > > > >> > > > > > > > > > > overflows, >> > > > > > > > >> > > > > > > > > > > > > > what >> > > > > > > > >> > > > > > > > > > > > > > > do >> > > > > > > > >> > > > > > > > > > > > > > > > > we set the producerId and >> > the >> > > > > > > > >> producerEpoch? >> > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > > Thanks, >> > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > > Jun >> > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > >> > > > > > > > >> > > > > > > > > > >> > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > >> > > > > > >> > > > > > > > >> > > > > >> > > > > > > > >> > > > >> > > > > > > > >> > > >> > > > > > > > >> > >> > > > > > > > >> >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> >