@jun - good proposal. i was willing to concede that read-uncommitted was impossible under my proposal but if LSO/NSO is introduced is becomes possible.
On Tue, Jan 3, 2017 at 7:50 AM, Jun Rao <j...@confluent.io> wrote: > Just to follow up on Radai's idea of pushing the buffering logic to the > broker. It may be possible to do this efficiently if we assume aborted > transactions are rare. The following is a draft proposal. For each > partition, the broker maintains the last stable offset (LSO) as described > in the document, and only exposes messages up to this point if the reader > is in the read-committed mode. When a new stable offset (NSO) is > determined, if there is no aborted message in this window, the broker > simply advances the LSO to the NSO. If there is at least one aborted > message, the broker first replaces the current log segment with new log > segments excluding the aborted messages and then advances the LSO. To make > the replacement efficient, we can replace the current log segment with 3 > new segments: (1) a new "shadow" log segment that simply references the > portion of the current log segment from the beginning to the LSO, (2) a log > segment created by copying only committed messages between the LSO and the > NSO, (3) a new "shadow" log segment that references the portion of the > current log segment from the NSO (open ended). Note that only (2) involves > real data copying. If aborted transactions are rare, this overhead will be > insignificant. Assuming that applications typically don't abort > transactions, transactions will only be aborted by transaction coordinators > during hard failure of the producers, which should be rare. > > This way, the consumer library's logic will be simplified. We can still > expose uncommitted messages to readers in the read-uncommitted mode and > therefore leave the door open for speculative reader in the future. > > Thanks, > > Jun > > > On Wed, Dec 21, 2016 at 10:44 AM, Apurva Mehta <apu...@confluent.io> > wrote: > > > Hi Joel, > > > > The alternatives are embedded in the 'discussion' sections which are > spread > > throughout the google doc. > > > > Admittedly, we have not covered high level alternatives like those which > > have been brought up in this thread. In particular, having a separate log > > for transactional mesages and also having multiple producers participate > in > > a single transaction. > > > > This is an omission which we will correct. > > > > Thanks, > > Apurva > > > > On Wed, Dec 21, 2016 at 10:34 AM, Joel Koshy <jjkosh...@gmail.com> > wrote: > > > > > > > > > > > > > > @Joel, > > > > > > > > I read over your wiki, and apart from the introduction of the notion > of > > > > journal partitions --whose pros and cons are already being > discussed-- > > > you > > > > also introduce the notion of a 'producer group' which enables > multiple > > > > producers to participate in a single transaction. This is completely > > > > opposite of the model in the KIP where a transaction is defined by a > > > > producer id, and hence there is a 1-1 mapping between producers and > > > > transactions. Further, each producer can have exactly one in-flight > > > > transaction at a time in the KIP. > > > > > > > > > > Hi Apurva - yes I did notice those differences among other things :) > > BTW, I > > > haven't yet gone through the google-doc carefully but on a skim it does > > not > > > seem to contain any rejected alternatives as the wiki states. > > > > > >