On the question "whether shallow mirror is only applied on mirror maker
v1", the code change is mostly on consumer and producer code path, the
change to mirrormaker v1 is very trivial.  We chose to modify the
consumer/producer path (instead of creating a new mirror product) so other
use cases can use that feature as well.  The change to mirror maker v2
should be straightforward as well but we don't have that environment in
house.  I think the community can easily port this change to mirror maker
v2.



On Wed, Feb 10, 2021 at 12:58 PM Vahid Hashemian <vahid.hashem...@gmail.com>
wrote:

> Retitled the thread to conform to the common format.
>
> On Fri, Feb 5, 2021 at 4:00 PM Ning Zhang <ning2008w...@gmail.com> wrote:
>
> > Hello Henry,
> >
> > This is a very interesting proposal.
> > https://issues.apache.org/jira/browse/KAFKA-10728 reflects the similar
> > concern of re-compressing data in mirror maker.
> >
> > Probably one thing may need to clarify is: how "shallow" mirroring is
> only
> > applied to mirrormaker use case, if the changes need to be made on
> generic
> > consumer and producer (e.g. by adding `fetch.raw.bytes` and
> > `send.raw.bytes` to producer and consumer config)
> >
> > On 2021/02/05 00:59:57, Henry Cai <h...@pinterest.com.INVALID> wrote:
> > > Dear Community members,
> > >
> > > We are proposing a new feature to improve the performance of Kafka
> mirror
> > > maker:
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-712%3A+Shallow+Mirroring
> > >
> > > The current Kafka MirrorMaker process (with the underlying Consumer and
> > > Producer library) uses significant CPU cycles and memory to
> > > decompress/recompress, deserialize/re-serialize messages and copy
> > multiple
> > > times of messages bytes along the mirroring/replicating stages.
> > >
> > > The KIP proposes a *shallow mirror* feature which brings back the
> shallow
> > > iterator concept to the mirror process and also proposes to skip the
> > > unnecessary message decompression and recompression steps.  We argue in
> > > many cases users just want a simple replication pipeline to replicate
> the
> > > message as it is from the source cluster to the destination cluster.
> In
> > > many cases the messages in the source cluster are already compressed
> and
> > > properly batched, users just need an identical copy of the message
> bytes
> > > through the mirroring without any transformation or repartitioning.
> > >
> > > We have a prototype implementation in house with MirrorMaker v1 and
> > > observed *CPU usage dropped from 50% to 15%* for some mirror pipelines.
> > >
> > > We name this feature: *shallow mirroring* since it has some resemblance
> > to
> > > the old Kafka 0.7 namesake feature but the implementations are not
> quite
> > > the same.  ‘*Shallow*’ means 1. we *shallowly* iterate RecordBatches
> > inside
> > > MemoryRecords structure instead of deep iterating records inside
> > > RecordBatch; 2. We *shallowly* copy (share) pointers inside ByteBuffer
> > > instead of deep copying and deserializing bytes into objects.
> > >
> > > Please share discussions/feedback along this email thread.
> > >
> >
>
>
> --
>
> Thanks!
> --Vahid
>

Reply via email to