Here's my understanding: when log compaction kicks in, the system time at
the moment would be larger than the message timestamp to be compacted, so
the modification on the batch timestamp would practically be increasing its
value, and hence the deltas for each inner message would be negative to
maintain their actual timestamp. Depending on the time diff between the
actual timestamp of the message and the time when log compaction happens,
this negative delta can be large or small since it not long depends on the
cleaner thread wakeup frequency but also dirty ratio etc.

With varInt encoding, the num.bytes needed for encode an int varies from 1
to 5 bytes; before compaction, the deltas should be relatively small
positive values compared with the base timestamp, and hence most likely 1
or 2 bytes needed to encode, after compaction, the deltas could be
relatively large negative values that may take more bytes to encode. With a
record batch of 512 in practice, and suppose after compaction each record
would take 2 more byte for encoding deltas, that would be 1K more per
batch. Usually it would not be too big of an issue with reasonable sized
message, but I just wanted to point out this as a potential regression.


Guozhang

On Wed, Oct 16, 2019 at 9:36 PM Richard Yu <yohan.richard...@gmail.com>
wrote:

> Hi Guozhang,
>
> Your understanding basically is on point.
>
> I haven't looked into the details for what happens if we change the base
> timestamp and how its calculated, so I'm not clear on how small the delta
> (or big) is.
> To be fair, would the the delta size pose a big problem if it takes up more
> bytes to encode?
>
> Cheers,
> Richard
>
> On Wed, Oct 16, 2019 at 7:36 PM Guozhang Wang <wangg...@gmail.com> wrote:
>
> > Hello Richard,
> >
> > Thanks for the KIP, I just have one clarification regarding "So the idea
> is
> > to set the base timestamp to the delete horizon and adjust the deltas
> > accordingly." My understanding is that during compaction, for each
> > compacted new segment, we would set its base offset of each batch as the
> > delete horizon, which is the "current system time that cleaner has seen
> so
> > far", and adjust the delta timestamps of each of the inner records of the
> > batch (and practically the deltas will be all negative)?
> >
> > If that's case, could we do some back of the envelope calculation on
> what's
> > the possible smallest case of deltas? Note that since we use varInt for
> > delta values for each record, the smaller the negative delta, that would
> > take more bytes to encode.
> >
> > Guozhang
> >
> > On Wed, Oct 16, 2019 at 6:48 PM Jason Gustafson <ja...@confluent.io>
> > wrote:
> >
> > > +1. Thanks Richard.
> > >
> > > On Wed, Oct 16, 2019 at 10:04 AM Richard Yu <
> yohan.richard...@gmail.com>
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > Want to try to get this KIP wrapped up. So it would be great if we
> can
> > > get
> > > > some votes.
> > > >
> > > > Cheers,
> > > > Richard
> > > >
> > > > On Tue, Oct 15, 2019 at 12:58 PM Jun Rao <j...@confluent.io> wrote:
> > > >
> > > > > Hi, Richard,
> > > > >
> > > > > Thanks for the updated KIP. +1 from me.
> > > > >
> > > > > Jun
> > > > >
> > > > > On Tue, Oct 15, 2019 at 12:46 PM Richard Yu <
> > > yohan.richard...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi Jun,
> > > > > >
> > > > > > I've updated the link accordingly. :)
> > > > > > Here is the updated KIP link:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-534%3A+Retain+tombstones+and+transaction+markers+for+approximately+delete.retention.ms+milliseconds
> > > > > >
> > > > > > Cheers,
> > > > > > Richard
> > > > > >
> > > > > > On Mon, Oct 14, 2019 at 5:12 PM Jun Rao <j...@confluent.io>
> wrote:
> > > > > >
> > > > > > > Hi, Richard,
> > > > > > >
> > > > > > > Thanks for the KIP. Looks good to me overall. A few minor
> > comments
> > > > > below.
> > > > > > >
> > > > > > > 1. Could you change the title from "Retain tombstones" to
> "Retain
> > > > > > > tombstones and transaction markers" to make it more general?
> > > > > > >
> > > > > > > 2. Could you document which bit in the batch attribute will be
> > used
> > > > for
> > > > > > the
> > > > > > > new flag? The current format of the batch attribute is the
> > > following.
> > > > > > >
> > > > > > > *
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> -------------------------------------------------------------------------------------------------
> > > > > > > *  | Unused (6-15) | Control (5) | Transactional (4) |
> Timestamp
> > > Type
> > > > > > > (3) | Compression Type (0-2) |
> > > > > > >
> > > > > > > *
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> -------------------------------------------------------------------------------------------------
> > > > > > >
> > > > > > > 3. Could you provide the reasons for the rejected proposals?
> For
> > > > > > > proposal 1, one reason is that it doesn't cover the transaction
> > > > > > > markers. For proposal 2, one reason is that the interval record
> > > > header
> > > > > > > could be exposed to the clients.
> > > > > > >
> > > > > > >
> > > > > > > Jun
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Oct 14, 2019 at 4:42 PM Richard Yu <
> > > > yohan.richard...@gmail.com
> > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi all,
> > > > > > > >
> > > > > > > > The discussion for KIP-534 seems to have concluded.
> > > > > > > > So I wish to vote this in so that we can get it done. Its a
> > small
> > > > bug
> > > > > > > fix.
> > > > > > > > :)
> > > > > > > >
> > > > > > > > Below is the KIP link:
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-534%3A+Retain+tombstones+for+approximately+delete.retention.ms+milliseconds
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > > Richard
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> > --
> > -- Guozhang
> >
>


-- 
-- Guozhang

Reply via email to