Retitled the thread to conform to the common format. On Fri, Feb 5, 2021 at 4:00 PM Ning Zhang <ning2008w...@gmail.com> wrote:
> Hello Henry, > > This is a very interesting proposal. > https://issues.apache.org/jira/browse/KAFKA-10728 reflects the similar > concern of re-compressing data in mirror maker. > > Probably one thing may need to clarify is: how "shallow" mirroring is only > applied to mirrormaker use case, if the changes need to be made on generic > consumer and producer (e.g. by adding `fetch.raw.bytes` and > `send.raw.bytes` to producer and consumer config) > > On 2021/02/05 00:59:57, Henry Cai <h...@pinterest.com.INVALID> wrote: > > Dear Community members, > > > > We are proposing a new feature to improve the performance of Kafka mirror > > maker: > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-712%3A+Shallow+Mirroring > > > > The current Kafka MirrorMaker process (with the underlying Consumer and > > Producer library) uses significant CPU cycles and memory to > > decompress/recompress, deserialize/re-serialize messages and copy > multiple > > times of messages bytes along the mirroring/replicating stages. > > > > The KIP proposes a *shallow mirror* feature which brings back the shallow > > iterator concept to the mirror process and also proposes to skip the > > unnecessary message decompression and recompression steps. We argue in > > many cases users just want a simple replication pipeline to replicate the > > message as it is from the source cluster to the destination cluster. In > > many cases the messages in the source cluster are already compressed and > > properly batched, users just need an identical copy of the message bytes > > through the mirroring without any transformation or repartitioning. > > > > We have a prototype implementation in house with MirrorMaker v1 and > > observed *CPU usage dropped from 50% to 15%* for some mirror pipelines. > > > > We name this feature: *shallow mirroring* since it has some resemblance > to > > the old Kafka 0.7 namesake feature but the implementations are not quite > > the same. ‘*Shallow*’ means 1. we *shallowly* iterate RecordBatches > inside > > MemoryRecords structure instead of deep iterating records inside > > RecordBatch; 2. We *shallowly* copy (share) pointers inside ByteBuffer > > instead of deep copying and deserializing bytes into objects. > > > > Please share discussions/feedback along this email thread. > > > -- Thanks! --Vahid