Hi Radai > 1. how do you handle possible duplications caused by the "special" > producer timing-out/retrying? are you explicitely relying on the > "exactly once" sequencing?
A duplicate ProduceRequest would be rejected with an INVALID_PRODUCE_OFFSET error. We envision using an idempotent producer for cluster replication but to not require it. > 2. what about the combination of log compacted topics + replicator > downtime? by the time the replicator comes back up there might be > "holes" in the source offsets (some msgs might have been compacted > out)? how is that recoverable? > 3. similarly, what if you try and fire up replication on a non-empty > source topic? does the kip allow for offsets starting at some > arbitrary X > 0 ? or would this have to be designed from the start. Both these cases do not pose a problem. As mentioned in the KIP each Producer batch must not contain offset gaps, but these can exist between batches. The companion PR has an implementation with tests that cover these cases > and lastly, since this KIP seems to be designed fro active-passive > failover (there can be no produce traffic except the replicator) > wouldnt a solution based on seeking to a time offset be more generic? > your producers could checkpoint the last (say log append) timestamp of > records theyve seen, and when restoring in the remote site seek to > those timestamps (which will be metadata in their committed offsets) - > assumming replication takes > 0 time you'd need to handle some dups, > but every kafka consumer setup needs to know how to handle those > anyway. can you please clarify? We do not expect any cooperation from users applications. thanks! E&M > On Fri, Nov 23, 2018 at 2:27 AM Edoardo Comar <eco...@uk.ibm.com> wrote: > > > > Hi Stanislav > > > > > > The flag is needed to distinguish a batch with a desired base offset > > of > > > 0, > > > from a regular batch for which offsets need to be generated. > > > If the producer can provide offsets, why not provide a base offset of 0? > > > > a regular batch (for which offsets are generated by the broker on write) > > is sent with a base offset of 0. > > How could you distinguish it from a batch where you *want* the first > > record to be written at offset 0 (i.e. be the first in the partition and > > be rejected if there are records on the log already) ? > > We wanted to avoid a "deep" inspection (and potentially decompression) of > > the records. > > > > For the replicator use case, a single produce request where all the data > > is to be assumed with offset, > > or all without offsets, seems to suffice, > > So we added only a toplevel flag, not a per-topic-partition one. > > > > Thanks for your interest ! > > cheers > > Edo > > -------------------------------------------------- > > > > Edoardo Comar > > > > IBM Event Streams > > IBM UK Ltd, Hursley Park, SO21 2JN > > > > > > Stanislav Kozlovski <stanis...@confluent.io> wrote on 22/11/2018 22:32:42: > > > > > From: Stanislav Kozlovski <stanis...@confluent.io> > > > To: dev@kafka.apache.org > > > Date: 22/11/2018 22:33 > > > Subject: Re: [DISCUSS] KIP-391: Allow Producing with Offsets for > > > Cluster Replication > > > > > > Hey Edo & Mickael, > > > > > > > The flag is needed to distinguish a batch with a desired base offset > > of > > > 0, > > > from a regular batch for which offsets need to be generated. > > > If the producer can provide offsets, why not provide a base offset of 0? > > > > > > > (I am reading your post thinking about > > > partitions rather than topics). > > > Yes, I meant partitions. Sorry about that. > > > > > > Thanks for answering my questions :) > > > > > > Best, > > > Stanislav > > > > > > On Thu, Nov 22, 2018 at 5:28 PM Edoardo Comar <eco...@uk.ibm.com> wrote: > > > > > > > Hi Stanislav, > > > > > > > > you're right we envision the replicator use case to have a single > > producer > > > > with offsets per partition (I am reading your post thinking about > > > > partitions rather than topics). > > > > > > > > If a regular producer was to send its own records at the same time, > > it's > > > > very likely that the one sending with an offset will fail because of > > > > invalid offsets. > > > > Same if two producers were sending with offsets, likely both would > > then > > > > fail. > > > > > > > > > Does it make sense to *lock* the topic from other producers while > > there > > > > is > > > > > one that uses offsets? > > > > > > > > You could do that with ACL permissions if you wanted, I don't think it > > > > needs to be mandated by changing the broker logic. > > > > > > > > > > > > > Since we are tying the produce-with-offset request to the ACL, do we > > > > need > > > > > the `use_offset` field in the produce request? Maybe we make it > > > > mandatory > > > > > for produce requests with that ACL to have offsets. > > > > > > > > The flag is needed to distinguish a batch with a desired base offset > > of 0, > > > > from a regular batch for which offsets need to be generated. > > > > I would not restrict a principal to only send-with-offsets (by making > > that > > > > mandatory via the ACL). > > > > > > > > Thanks > > > > Edo & Mickael > > > > > > > > -------------------------------------------------- > > > > > > > > Edoardo Comar > > > > > > > > IBM Event Streams > > > > IBM UK Ltd, Hursley Park, SO21 2JN > > > > > > > > > > > > Stanislav Kozlovski <stanis...@confluent.io> wrote on 22/11/2018 > > 16:17:11: > > > > > > > > > From: Stanislav Kozlovski <stanis...@confluent.io> > > > > > To: dev@kafka.apache.org > > > > > Date: 22/11/2018 16:17 > > > > > Subject: Re: [DISCUSS] KIP-391: Allow Producing with Offsets for > > > > > Cluster Replication > > > > > > > > > > Hey Edurdo, thanks for the KIP! > > > > > > > > > > I have some questions, apologies if they are naive: > > > > > Is this intended to work for a single producer use case only? > > > > > How would it work if two producers were producing to the same topic > > with > > > > > offsets? > > > > > How would it work if two producers, one with offsets and one without > > > > were > > > > > producing to a topic? > > > > > Does it make sense to *lock* the topic from other producers while > > there > > > > is > > > > > one that uses offsets? > > > > > > > > > > Since we are tying the produce-with-offset request to the ACL, do we > > > > need > > > > > the `use_offset` field in the produce request? Maybe we make it > > > > mandatory > > > > > for produce requests with that ACL to have offsets. > > > > > > > > > > Best, > > > > > Stanislav > > > > > > > > > > On Wed, Nov 21, 2018 at 5:14 PM Edoardo Comar <eco...@uk.ibm.com> > > wrote: > > > > > > > > > > > Hi, > > > > > > we've opened a KIP to improve data replication between Kafka > > clusters > > > > : > > > > > > > > > > > > > > > > > > INVALID URI REMOVED > > > > > > > > > > > > > > > > > > > u=https-3A__cwiki.apache.org_confluence_display_KAFKA_KIP-2D391-253A-2BAllow-2BProducing-2Bwith-2BOffsets-2Bfor-2BCluster-2BReplication&d=DwIBaQ&c=jf_iaSHvJObTbx- > > > > > > > > > > > siA1ZOg&r=EzRhmSah4IHsUZVekRUIINhltZK7U0OaeRo7hgW4_tQ&m=uUj9C3BdbYz0dDNA- > > > > > > > > > > > E6iXreg1M5hWiWgG6ClS86VIPI&s=Vav8_-N7_OpfYEW33yGOf_or8ESMUJ4S45t2g-EUWKg&e= > > > > > > > > > > > > We'd like to start a discussion, please post your feedback in this > > > > thread. > > > > > > > > > > > > Thank you > > > > > > Edo and Mickael > > > > > > > > > > > > > > > > > > -------------------------------------------------- > > > > > > > > > > > > Edoardo Comar > > > > > > > > > > > > IBM Event Streams > > > > > > IBM UK Ltd, Hursley Park, SO21 2JN > > > > > > > > > > > > Unless stated otherwise above: > > > > > > IBM United Kingdom Limited - Registered in England and Wales with > > > > number > > > > > > 741598. > > > > > > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire > > PO6 > > > > 3AU > > > > > > > > > > > > > > > > > > > > > -- > > > > > Best, > > > > > Stanislav > > > > > > > > Unless stated otherwise above: > > > > IBM United Kingdom Limited - Registered in England and Wales with > > number > > > > 741598. > > > > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 > > 3AU > > > > > > > > > > > > > -- > > > Best, > > > Stanislav > > > > Unless stated otherwise above: > > IBM United Kingdom Limited - Registered in England and Wales with number > > 741598. > > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU > Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU