Hi Roger, Arjun, Thank you for the questions. > It looks like the application must have stable transactional ids over time?
The transactional id should uniquely identify a producer instance and needs to be stable across the restarts. If the transactional id is not stable across restarts, then zombie messages from a previous incarnation of the producer may violate atomicity. If there are 2 producer instances concurrently producing data with the same transactional id, they are going to constantly fence each other and most likely make little or no progress. The name might be a little bit confusing as it may be mistaken for a transaction id / TID that uniquely identifies every transaction. The name and the semantics were defined in the original exactly-once-semantics (EoS) proposal ( https://cwiki.apache.org/confluence/display/KAFKA/KIP-98+-+Exactly+Once+Delivery+and+Transactional+Messaging) and KIP-939 just build on top of that. > I'm curious to understand what happens if the producer dies, and does not come up and recover the pending transaction within the transaction timeout interval. If the producer / application never comes back, the transaction will remain in prepared (a.k.a. "in-doubt") state until an operator forcefully terminates the transaction. That's why there is a new ACL is defined in this proposal -- this functionality should only provided to applications that implement proper recovery logic. -Artem On Tue, Aug 22, 2023 at 12:52 AM Arjun Satish <arjun.sat...@gmail.com> wrote: > Hello Artem, > > Thanks for the KIP. > > I have the same question as Roger on concurrent writes, and an additional > one on consumer behavior. Typically, transactions will timeout if not > committed within some time interval. With the proposed changes in this KIP, > consumers cannot consume past the ongoing transaction. I'm curious to > understand what happens if the producer dies, and does not come up and > recover the pending transaction within the transaction timeout interval. Or > are we saying that when used in this 2PC context, we should configure these > transaction timeouts to very large durations? > > Thanks in advance! > > Best, > Arjun > > > On Mon, Aug 21, 2023 at 1:06 PM Roger Hoover <roger.hoo...@gmail.com> > wrote: > > > Hi Artem, > > > > Thanks for writing this KIP. Can you clarify the requirements a bit more > > for managing transaction state? It looks like the application must have > > stable transactional ids over time? What is the granularity of those > ids > > and producers? Say the application is a multi-threaded Java web server, > > can/should all the concurrent threads share a transactional id and > > producer? That doesn't seem right to me unless the application is using > > global DB locks that serialize all requests. Instead, if the application > > uses row-level DB locks, there could be multiple, concurrent, independent > > txns happening in the same JVM so it seems like the granularity managing > > transactional ids and txn state needs to line up with granularity of the > DB > > locking. > > > > Does that make sense or am I misunderstanding? > > > > Thanks, > > > > Roger > > > > On Wed, Aug 16, 2023 at 11:40 PM Artem Livshits > > <alivsh...@confluent.io.invalid> wrote: > > > > > Hello, > > > > > > This is a discussion thread for > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-939%3A+Support+Participation+in+2PC > > > . > > > > > > The KIP proposes extending Kafka transaction support (that already uses > > 2PC > > > under the hood) to enable atomicity of dual writes to Kafka and an > > external > > > database, and helps to fix a long standing Flink issue. > > > > > > An example of code that uses the dual write recipe with JDBC and should > > > work for most SQL databases is here > > > https://github.com/apache/kafka/pull/14231. > > > > > > The FLIP for the sister fix in Flink is here > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=255071710 > > > > > > -Artem > > > > > >