Looks great! A few questions:
1. What is the relationship between transaction.app.id and the existing config application.id in streams? 2. The initTransactions() call is a little annoying. Can we get rid of that and call it automatically if you set a transaction.app.id when we do the first message send as we do with metadata? Arguably we should have included a general connect() or init() call in the producer, but given that we didn't do this it seems weird that the cluster metadata initializes automatically on demand and the transaction metadata doesn't. 3. The equivalent concept of what we call "fetch.mode" in databases is called "isolation level" and takes values like "serializable", "read committed", "read uncommitted". Since we went with transaction as the name for the thing in between the begin/commit might make sense to use this terminology for the concept and levels? I think the behavior we are planning is "read committed" and the alternative re-ordering behavior is equivalent to "serializable"? 4. Can the PID be made 4 bytes if we handle roll-over gracefully? 2 billion concurrent producers should be enough for anyone, right? 5. One implication of factoring out the message set seems to be you can't ever "repack" messages to improve compression beyond what is done by the producer. We'd talked about doing this either by buffering when writing or during log cleaning. This isn't a show stopper but I think one implication is that we won't be able to do this. Furthermore with log cleaning you'd assume that over time ALL messages would collapse down to a single wrapper as compaction removes the others. -Jay On Wed, Nov 30, 2016 at 2:19 PM, Guozhang Wang <wangg...@gmail.com> wrote: > Hi all, > > I have just created KIP-98 to enhance Kafka with exactly once delivery > semantics: > > *https://cwiki.apache.org/confluence/display/KAFKA/KIP- > 98+-+Exactly+Once+Delivery+and+Transactional+Messaging > <https://cwiki.apache.org/confluence/display/KAFKA/KIP- > 98+-+Exactly+Once+Delivery+and+Transactional+Messaging>* > > This KIP adds a transactional messaging mechanism along with an idempotent > producer implementation to make sure that 1) duplicated messages sent from > the same identified producer can be detected on the broker side, and 2) a > group of messages sent within a transaction will atomically be either > reflected and fetchable to consumers or not as a whole. > > The above wiki page provides a high-level view of the proposed changes as > well as summarized guarantees. Initial draft of the detailed implementation > design is described in this Google doc: > > https://docs.google.com/document/d/11Jqy_GjUGtdXJK94XGsEIK7CP1SnQGdp2eF > 0wSw9ra8 > > > We would love to hear your comments and suggestions. > > Thanks, > > -- Guozhang >