A magic byte bump would be required for the addition of new fields; or
removal of existing fields. Changing the interpretation of an existing
field (e.g., switching from absolute to relative offsets) almost always
needs a magic byte bump as well.

One concern Nacho alluded to (I think) is if a newer client sends a
tombstone with the tombstone attribute set (and non-null value field) to an
older broker. While it is true that a higher magic byte would cause the
(older) broker to reject the message, the fact that it is sending a newer
version of the produce request which the broker does not accept will also
cause it to be rejected. So that safeguard already exists for the broker to
indicate that it does not support flag-based compaction.

So strictly speaking we could do without a magic byte bump. However in this
case, we are also changing the interpretation of the value field. Up until
now, a null entry in the value field of the message would cause it to be
interpreted as a tombstone. With this proposal, that is no longer the case.
In fact if an older client issues a fetch request for a tombstone on a
newer broker we should decide on whether we should down-convert it to the
older format and consider replacing the value field (if non-null) to be
null. Another weird scenario is if there is a non-tombstone message with a
null value. An older client that fetches this message would be unable to
tell whether it is a tombstone or not.

Also, under "rejected alternatives" - I'm not a huge fan of using headers
(should they materialize) to mark tombstones. I strongly prefer the
attribute bit or even the existing mechanism of nulls in payloads over
custom headers.

Thanks,

Joel

On Wed, Oct 26, 2016 at 11:23 AM, Magnus Edenhill <mag...@edenhill.se>
wrote:

> Hi Renu,
>
> that is not completely true, the LZ4 compression codec was added without a
> MagicByte bump.
> (LZ4 might be a bad example though since this feature was added without
> updating the protocol docs..)
>
> Unless the broker needs the MagicByte internally (for translating old logs
> on disk or whatever)
> I dont see a need for bumping the MagicByte when just adding a bit
> *definition* to an existing field.
>
> With just one MagicByte bump so far it is hard to say what is right or
> wrong, but this might be
> a good time for us to decide on some rules.
>
> I dont have a strong opinion, either way is fine with me.
>
> Regards,
> Magnus
>
>
> 2016-10-26 20:01 GMT+02:00 Renu Tewari <tewa...@gmail.com>:
>
> > +1 @Magnus.
> > It is also in line with traditional use of the magic field to indicate a
> > change in the format of the message. Thus a change in the magic field
> > indicates a different "schema" which in this case would reflect adding a
> > new field, removing a field or changing the type of fields etc.
> >
> > The version number bump does not change the message format but just the
> > interpretation of the values.
> >
> > Going with what @Michael suggested on keeping it simple we don't need to
> a
> > add a new field to indicate a tombstone and therefore require a change in
> > the magic byte field. The attribute bit is sufficient for indicating the
> > new interpretation of attribute.bit5 to indicate a tombstone.
> >
> > Bumping the version number and changing the attribute bit keeps it
> simple.
> >
> >
> > Thanks
> > Renu
> >
> >
> >
> > On Wed, Oct 26, 2016 at 10:09 AM, Mayuresh Gharat <
> > gharatmayures...@gmail.com> wrote:
> >
> > > I see the reasoning Magnus described.
> > > If you check the docs https://kafka.apache.org/
> > documentation#messageformat
> > > ,
> > > it says :
> > >
> > > 1 byte "magic" identifier to allow format changes, value is 0 or 1
> > >
> > > Moreover as per comments in the code :
> > > /**
> > >    * The "magic" value
> > >    * When magic value is 0, the message uses absolute offset and does
> not
> > > have a timestamp field.
> > >    * When magic value is 1, the message uses relative offset and has a
> > > timestamp field.
> > >    */
> > >
> > > Since timeStamp was added as an actual field we bumped the the magic
> byte
> > > to 1 for this change.
> > > But since we are not adding an actual field, we can do away with
> bumping
> > up
> > > the magic byte.
> > >
> > > If we really want to go the standard route of bumping up the magic byte
> > for
> > > any change to message format we should actually add a new field for
> > > handling log compaction instead of just using the attribute field,
> which
> > > might sound like an overkill.
> > >
> > > Thanks,
> > >
> > > Mayuresh
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Wed, Oct 26, 2016 at 1:32 AM, Magnus Edenhill <mag...@edenhill.se>
> > > wrote:
> > >
> > > > 2016-10-25 21:36 GMT+02:00 Nacho Solis <nso...@linkedin.com.invalid
> >:
> > > >
> > > > > I think you probably require a MagicByte bump if you expect correct
> > > > > behavior of the system as a whole.
> > > > >
> > > > > From a client perspective you want to make sure that when you
> > deliver a
> > > > > message that the broker supports the feature you're expecting
> > > > > (compaction).  So, depending on the behavior of the broker on
> > > > encountering
> > > > > a previously undefined bit flag I would suggest making some change
> to
> > > > make
> > > > > certain that flag-based compaction is supported.  I'm going to
> guess
> > > that
> > > > > the MagicByte would do this.
> > > > >
> > > >
> > > > I dont believe this is needed since it is already attributed through
> > the
> > > > request's API version.
> > > >
> > > > Producer:
> > > >  * if a client sends ProduceRequest V4 then attributes.bit5
> indicates a
> > > > tombstone
> > > >  * if a clients sends ProduceRequest <V4 then attributes.bit5 is
> > ignored
> > > > and value==null indicates a tombstone
> > > >  * in both cases the on-disk messages are stored with attributes.bit5
> > (I
> > > > assume?)
> > > >
> > > > Consumer:
> > > >  * if a clients sends FetchRequest V4 messages are sendfile():ed
> > directly
> > > > from disk (with attributes.bit5)
> > > >  * if a client sends FetchRequest <V4 messages are slowpathed and
> > > > translated from attributes.bit5 to value=null as required.
> > > >
> > > >
> > > > That's my understanding anyway, please correct me if I'm wrong.
> > > >
> > > > /Magnus
> > > >
> > > >
> > > >
> > > > > On Tue, Oct 25, 2016 at 10:17 AM, Magnus Edenhill <
> > mag...@edenhill.se>
> > > > > wrote:
> > > > >
> > > > > > It is safe to assume that a previously undefined attributes bit
> > will
> > > be
> > > > > > unset in protocol requests from existing clients, if not, such a
> > > client
> > > > > is
> > > > > > already violating the protocol and needs to be fixed.
> > > > > >
> > > > > > So I dont see a need for a MagicByte bump, both broker and client
> > has
> > > > the
> > > > > > information it needs to construct or parse the message according
> to
> > > > > request
> > > > > > version.
> > > > > >
> > > > > >
> > > > > > 2016-10-25 18:48 GMT+02:00 Michael Pearce <michael.pea...@ig.com
> >:
> > > > > >
> > > > > > > Hi Magnus,
> > > > > > >
> > > > > > > I was wondering if I even needed to change those also, as
> > > technically
> > > > > > > we’re just making use of a non used attribute bit, but im not
> > 100%
> > > > that
> > > > > > it
> > > > > > > be always false currently.
> > > > > > >
> > > > > > > If someone can say 100% it will already be set false with
> current
> > > and
> > > > > > > historic bit wise masking techniques used over the time, we
> could
> > > do
> > > > > away
> > > > > > > with both, and simply just start to use it. Unfortunately I
> don’t
> > > > have
> > > > > > that
> > > > > > > historic knowledge so was hoping it would be flagged up in this
> > > > > > discussion
> > > > > > > thread ☺
> > > > > > >
> > > > > > > Cheers
> > > > > > > Mike
> > > > > > >
> > > > > > > On 10/25/16, 5:36 PM, "Magnus Edenhill" <mag...@edenhill.se>
> > > wrote:
> > > > > > >
> > > > > > >     Hi Michael,
> > > > > > >
> > > > > > >     With the version bumps for Produce and Fetch requests, do
> you
> > > > > really
> > > > > > > need
> > > > > > >     to bump MagicByte too?
> > > > > > >
> > > > > > >     Regards,
> > > > > > >     Magnus
> > > > > > >
> > > > > > >
> > > > > > >     2016-10-25 18:09 GMT+02:00 Michael Pearce <
> > > michael.pea...@ig.com
> > > > >:
> > > > > > >
> > > > > > >     > Hi All,
> > > > > > >     >
> > > > > > >     > I would like to discuss the following KIP proposal:
> > > > > > >     > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > >     > 87+-+Add+Compaction+Tombstone+Flag
> > > > > > >     >
> > > > > > >     > This is off the back of the discussion on KIP-82  / KIP
> > > meeting
> > > > > > > where it
> > > > > > >     > was agreed to separate this issue and feature. See:
> > > > > > >     > http://mail-archives.apache.or
> g/mod_mbox/kafka-dev/201610.
> > > > > > >     > mbox/%3cCAJS3ho8OcR==EcxsJ8OP99pD2hz=iiGecWsv-
> > > > > > >     > EZsBsNyDcKr=g...@mail.gmail.com%3e
> > > > > > >     >
> > > > > > >     > Thanks
> > > > > > >     > Mike
> > > > > > >     >
> > > > > > >     > The information contained in this email is strictly
> > > > confidential
> > > > > > and
> > > > > > > for
> > > > > > >     > the use of the addressee only, unless otherwise
> indicated.
> > If
> > > > you
> > > > > > > are not
> > > > > > >     > the intended recipient, please do not read, copy, use or
> > > > disclose
> > > > > > to
> > > > > > > others
> > > > > > >     > this message or any attachment. Please also notify the
> > sender
> > > > by
> > > > > > > replying
> > > > > > >     > to this email or by telephone (+44(020 7896 0011) and
> then
> > > > delete
> > > > > > > the email
> > > > > > >     > and any copies of it. Opinions, conclusion (etc) that do
> > not
> > > > > relate
> > > > > > > to the
> > > > > > >     > official business of this company shall be understood as
> > > > neither
> > > > > > > given nor
> > > > > > >     > endorsed by it. IG is a trading name of IG Markets
> Limited
> > (a
> > > > > > company
> > > > > > >     > registered in England and Wales, company number 04008957)
> > and
> > > > IG
> > > > > > > Index
> > > > > > >     > Limited (a company registered in England and Wales,
> company
> > > > > number
> > > > > > >     > 01190902). Registered address at Cannon Bridge House, 25
> > > > Dowgate
> > > > > > > Hill,
> > > > > > >     > London EC4R 2YA. Both IG Markets Limited (register number
> > > > 195355)
> > > > > > > and IG
> > > > > > >     > Index Limited (register number 114059) are authorised and
> > > > > regulated
> > > > > > > by the
> > > > > > >     > Financial Conduct Authority.
> > > > > > >     >
> > > > > > >
> > > > > > >
> > > > > > > The information contained in this email is strictly
> confidential
> > > and
> > > > > for
> > > > > > > the use of the addressee only, unless otherwise indicated. If
> you
> > > are
> > > > > not
> > > > > > > the intended recipient, please do not read, copy, use or
> disclose
> > > to
> > > > > > others
> > > > > > > this message or any attachment. Please also notify the sender
> by
> > > > > replying
> > > > > > > to this email or by telephone (+44(020 7896 0011) and then
> delete
> > > the
> > > > > > email
> > > > > > > and any copies of it. Opinions, conclusion (etc) that do not
> > relate
> > > > to
> > > > > > the
> > > > > > > official business of this company shall be understood as
> neither
> > > > given
> > > > > > nor
> > > > > > > endorsed by it. IG is a trading name of IG Markets Limited (a
> > > company
> > > > > > > registered in England and Wales, company number 04008957) and
> IG
> > > > Index
> > > > > > > Limited (a company registered in England and Wales, company
> > number
> > > > > > > 01190902). Registered address at Cannon Bridge House, 25
> Dowgate
> > > > Hill,
> > > > > > > London EC4R 2YA. Both IG Markets Limited (register number
> 195355)
> > > and
> > > > > IG
> > > > > > > Index Limited (register number 114059) are authorised and
> > regulated
> > > > by
> > > > > > the
> > > > > > > Financial Conduct Authority.
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Nacho (Ignacio) Solis
> > > > > Kafka
> > > > > nso...@linkedin.com
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > -Regards,
> > > Mayuresh R. Gharat
> > > (862) 250-7125
> > >
> >
>

Reply via email to