Hi All,

We have been discussing in the below thread and final changes have been made to 
the KIP wiki based on these discussions.

We would now like to put to the vote the following KIP:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-87+-+Add+Compaction+Tombstone+Flag

This kip is for having a distinct compaction attribute “tombstone” flag instead 
of relying on null value, allowing non-null value delete messages.

Many thanks,
Michael



On 22/11/2016, 15:52, "Michael Pearce" <michael.pea...@ig.com> wrote:

    Hi Mayuresh,
    
    LGTM. Ive just made one small adjustment updating the wire protocol to show 
the magic byte bump.
    
    Do we think we’re good to put to a vote? Is there any other bits needing 
discussion?
    
    Cheers
    Mike
    
    On 21/11/2016, 18:26, "Mayuresh Gharat" <gharatmayures...@gmail.com> wrote:
    
        Hi Michael,
    
        I have updated the migration section of the KIP. Can you please take a 
look?
    
        Thanks,
    
        Mayuresh
    
        On Fri, Nov 18, 2016 at 9:07 AM, Mayuresh Gharat 
<gharatmayures...@gmail.com
        > wrote:
    
        > Hi Michael,
        >
        > That whilst sending tombstone and non null value, the consumer can 
expect
        > only to receive the non-null message only in step (3) is this correct?
        > ---> I do agree with you here.
        >
        > Becket, Ismael : can you guys review the migration plan listed above 
using
        > magic byte?
        >
        > Thanks,
        >
        > Mayuresh
        >
        > On Fri, Nov 18, 2016 at 8:58 AM, Michael Pearce 
<michael.pea...@ig.com>
        > wrote:
        >
        >> Many thanks for this Mayuresh. I don't have any objections.
        >>
        >> I assume we should state:
        >>
        >> That whilst sending tombstone and non null value, the consumer can 
expect
        >> only to receive the non-null message only in step (3) is this 
correct?
        >>
        >> Cheers
        >> Mike
        >>
        >>
        >>
        >> Sent using OWA for iPhone
        >> ________________________________________
        >> From: Mayuresh Gharat <gharatmayures...@gmail.com>
        >> Sent: Thursday, November 17, 2016 5:18:41 PM
        >> To: dev@kafka.apache.org
        >> Subject: Re: [DISCUSS] KIP-87 - Add Compaction Tombstone Flag
        >>
        >> Hi Ismael,
        >>
        >> Thanks for the explanation.
        >> Specially I like this part where in you mentioned we can get rid of 
the
        >> older null value support for log compaction later on, here :
        >> We can't change semantics of the message format without having a long
        >> transition period. And we can't rely
        >> on people reading documentation or acting on a warning for something 
so
        >> fundamental. As such, my take is that we need to bump the magic 
byte. The
        >> good news is
        >> that we don't have to support all versions forever. We have said 
that we
        >> will support direct upgrades for 2 years. That means that message 
format
        >> version n could, in theory, be removed 2 years after the it's 
introduced.
        >>
        >> Just a heads up, I would like to mention that even without bumping 
magic
        >> byte, we will *NOT* loose zero copy as in the client(x+1) in my
        >> explanation
        >> above will convert internally a null value to have a tombstone bit 
set and
        >> a tombstone bit set to have a null value automatically internally 
and by
        >> the time we move to version (x+2), the clients would have upgraded.
        >> Obviously if we support a request from consumer(x), we will loose 
zero
        >> copy
        >> but that is the same case with magic byte.
        >>
        >> But if magic byte bump makes life easier for transition for the above
        >> reasons that you explained, I am OK with it since we are going to 
meet the
        >> end goal down the road :)
        >>
        >> On a side note can we update the doc here on magic byte to say that 
"*it
        >> should be bumped whenever the message format is changed or the
        >> interpretation of message format (usage of the reserved bits as 
well) is
        >> changed*".
        >>
        >>
        >> Hi Michael,
        >>
        >> Here is the update plan that we discussed offline yesterday :
        >>
        >> Currently the magic-byte which corresponds to the 
"message.format.version"
        >> is set to 1.
        >>
        >> 1) On broker it will be set to 1 initially.
        >>
        >> 2) When a producer client sends a message with magic-byte = 2, since 
the
        >> broker is on magic-byte = 1, we will down convert it, which means if 
the
        >> tombstone bit is set, the value will be set to null. A consumer
        >> understanding magic-byte = 1, will still work with this. A consumer
        >> working
        >> with magic-byte =2 will also be able to understand this, since it
        >> understands the tombstone.
        >> Now there is still the question of supporting a non-tombstone and 
null
        >> value from producer client with magic-byte = 2.* (I am not sure if we
        >> should support this. Ismael/Becket can comment here)*
        >>
        >> 3) When almost all the clients have upgraded, the 
message.format.version
        >> on
        >> the broker can be changed to 2, where in the down conversion in the 
above
        >> step will not happen. If at this point we get a consumer request 
from a
        >> older consumer, we might have to down convert where in we loose zero 
copy,
        >> but these cases should be rare.
        >>
        >> Becket can you review this plan and add more details if I have
        >> missed/wronged something, before we put it on KIP.
        >>
        >> Thanks,
        >>
        >> Mayuresh
        >>
        >> On Wed, Nov 16, 2016 at 11:07 PM, Michael Pearce 
<michael.pea...@ig.com>
        >> wrote:
        >>
        >> > Thanks guys, for discussing this offline and getting some 
consensus.
        >> >
        >> > So its clear for myself and others what is proposed now (i think i
        >> > understand, but want to make sure)
        >> >
        >> > Could i ask either directly update the kip to detail the migration
        >> > strategy, or (re-)state your offline discussed and agreed migration
        >> > strategy based on a magic byte is in this thread.
        >> >
        >> >
        >> > The main original driver for the KIP was to support compaction 
where
        >> value
        >> > isn't null, based off the discussions on KIP-82 thread.
        >> >
        >> > We should be able to support non-tombstone + null value by the
        >> completion
        >> > of the KIP, as we noted when discussing this kip, having logic 
based on
        >> a
        >> > null value isn't very clean and also separates the concerns.
        >> >
        >> > As discussed already though we can split this into KIP-87a and 
KIP-87b
        >> >
        >> > Where we look to deliver KIP-87a on a compacted topic (to address 
the
        >> > immediate issues)
        >> > * tombstone + null value
        >> > * tombstone + non-null value
        >> > * non-tombstone + non-null value
        >> >
        >> > Then we can discuss once KIP-87a is completed options later and 
how we
        >> > support the second part KIP-87b to deliver:
        >> > * non-tombstone + null value
        >> >
        >> > Cheers
        >> > Mike
        >> >
        >> >
        >> >
        >> > ________________________________________
        >> > From: Becket Qin <becket....@gmail.com>
        >> > Sent: Thursday, November 17, 2016 1:43 AM
        >> > To: dev@kafka.apache.org
        >> > Subject: Re: [DISCUSS] KIP-87 - Add Compaction Tombstone Flag
        >> >
        >> > Renu, Mayuresh and I had an offline discussion, and following is a 
brief
        >> > summary.
        >> >
        >> > 1. We agreed that not bumping up magic value may result in losing 
zero
        >> copy
        >> > during migration.
        >> > 2. Given that bumping up magic value is almost free and has 
benefit of
        >> > avoiding potential performance issue. It is probably worth doing.
        >> >
        >> > One issue we still need to think about is whether we want to 
support a
        >> > non-tombstone message with null value.
        >> > Currently it is not supported by Kafka. If we allow a 
non-tombstone null
        >> > value message to exist after KIP-87. The problem is that such 
message
        >> will
        >> > not be supported by the consumers prior to KIP-87. Because a null 
value
        >> > will always be interpreted to a tombstone.
        >> >
        >> > One option is that we keep the current way, i.e. do not support 
such
        >> > message. It would be good to know if there is a concrete use case 
for
        >> such
        >> > message. If there is not, we can probably just not support it.
        >> >
        >> > Thanks,
        >> >
        >> > JIangjie (Becket) Qin
        >> >
        >> >
        >> >
        >> > On Wed, Nov 16, 2016 at 1:28 PM, Mayuresh Gharat <
        >> > gharatmayures...@gmail.com
        >> > > wrote:
        >> >
        >> > > Hi Ismael,
        >> > >
        >> > > This is something I can think of for migration plan:
        >> > > So the migration plan can look something like this, with up
        >> conversion :
        >> > >
        >> > > 1) Currently lets say we have Broker at version x.
        >> > > 2) Currently we have clients at version x.
        >> > > 3) a) We move the version to Broker(x+1) : supports both 
tombstone and
        >> > null
        >> > > for log compaction.
        >> > >     b) We upgrade the client to version client(x+1) : if in the
        >> producer
        >> > > client(x+1) the value is set to null, we will automatically set 
the
        >> > > Tombstone bit internally. If the producer client(x+1) sets the
        >> tombstone
        >> > > itself, well and good. For producer client(x), the broker will up
        >> convert
        >> > > to have the tombstone bit. Broker(x+1) is supporting both. 
Consumer
        >> > > client(x+1) will be aware of this and should be able to handle 
this.
        >> For
        >> > > consumer client(x) we will down convert the message on the broker
        >> side.
        >> > >     c) At this point we will have to specify a warning or clearly
        >> specify
        >> > > in docs that this behavior is about to be changed for log 
compaction.
        >> > > 4) a) In next release of the Broker(x+2), we say that only 
Tombstone
        >> is
        >> > > used for log compaction on the Broker side. Clients(x+1) still is
        >> > > supported.
        >> > >     b) We upgrade the client to version client(x+2) : if value 
is set
        >> to
        >> > > null, tombstone will not be set automatically. The client will 
have to
        >> > call
        >> > > setTombstone() to actually set the tombstone.
        >> > >
        >> > > We should compare this migration plan with the migration plan for
        >> magic
        >> > > byte bump and do whatever looks good.
        >> > > I am just worried that if we go down magic byte route, unless I 
am
        >> > missing
        >> > > something, it sounds like kafka will be stuck with supporting 
both
        >> null
        >> > > value and tombstone bit for log compaction for life long, which 
does
        >> not
        >> > > look like a good end state.
        >> > >
        >> > > Thanks,
        >> > >
        >> > > Mayuresh
        >> > >
        >> > >
        >> > >
        >> > >
        >> > > On Wed, Nov 16, 2016 at 9:32 AM, Mayuresh Gharat <
        >> > > gharatmayures...@gmail.com
        >> > > > wrote:
        >> > >
        >> > > > Hi Ismael,
        >> > > >
        >> > > > That's a very good point which I might have not considered 
earlier.
        >> > > >
        >> > > > Here is a plan that I can think of:
        >> > > >
        >> > > > Stage 1) The broker from now on, up converts the message to 
have the
        >> > > > tombstone marker. The log compaction thread does log compaction
        >> based
        >> > on
        >> > > > both null and tombstone marker. This is our transition period.
        >> > > > Stage 2) The next release we only say that log compaction is 
based
        >> on
        >> > > > tombstone marker. (Open source kafka makes this as a policy). 
By
        >> this
        >> > > time,
        >> > > > the organization which is moving to this release will be sure 
that
        >> they
        >> > > > have gone through the entire transition period.
        >> > > >
        >> > > > My only goal of doing this is that Kafka clearly specifies the 
end
        >> > state
        >> > > > about what log compaction means (is it null value or a 
tombstone
        >> > marker,
        >> > > > but not both).
        >> > > >
        >> > > > What do you think?
        >> > > >
        >> > > > Thanks,
        >> > > >
        >> > > > Mayuresh
        >> > > > .
        >> > > >
        >> > > > On Wed, Nov 16, 2016 at 9:17 AM, Ismael Juma 
<ism...@juma.me.uk>
        >> > wrote:
        >> > > >
        >> > > >> One comment below.
        >> > > >>
        >> > > >> On Wed, Nov 16, 2016 at 5:08 PM, Mayuresh Gharat <
        >> > > >> gharatmayures...@gmail.com
        >> > > >> > wrote:
        >> > > >>
        >> > > >> >    - If we don't bump up the magic byte, on the broker 
side, the
        >> > > broker
        >> > > >> >    will always have to look at both tombstone bit and the 
value
        >> when
        >> > > do
        >> > > >> the
        >> > > >> >    compaction. Assuming we do not bump up the magic byte,
        >> > > >> >    imagine the broker sees a message which does not have a
        >> tombstone
        >> > > bit
        >> > > >> >    set. The broker does not know when the message was 
produced
        >> (i.e.
        >> > > >> > whether
        >> > > >> >    the message has been up converted or not), it has to 
take a
        >> > further
        >> > > >> > look at
        >> > > >> >    the value to see if it is null or not in order to 
determine
        >> if it
        >> > > is
        >> > > >> a
        >> > > >> >    tombstone. The same logic has to be put on the consumer 
as
        >> well
        >> > > >> because
        >> > > >> > the
        >> > > >> >    consumer does not know if the message has been up 
converted or
        >> > not.
        >> > > >> >       - If we upconvert while appending, this is not the 
case,
        >> > right?
        >> > > >>
        >> > > >>
        >> > > >> If I understand you correctly, this is not sufficient because 
the
        >> log
        >> > > may
        >> > > >> have messages appended before it was upgraded to include 
KIP-87.
        >> > > >>
        >> > > >> Ismael
        >> > > >>
        >> > > >
        >> > > >
        >> > > >
        >> > > > --
        >> > > > -Regards,
        >> > > > Mayuresh R. Gharat
        >> > > > (862) 250-7125
        >> > > >
        >> > >
        >> > >
        >> > >
        >> > > --
        >> > > -Regards,
        >> > > Mayuresh R. Gharat
        >> > > (862) 250-7125
        >> > >
        >> > The information contained in this email is strictly confidential 
and for
        >> > the use of the addressee only, unless otherwise indicated. If you 
are
        >> not
        >> > the intended recipient, please do not read, copy, use or disclose 
to
        >> others
        >> > this message or any attachment. Please also notify the sender by
        >> replying
        >> > to this email or by telephone (+44(020 7896 0011) and then delete 
the
        >> email
        >> > and any copies of it. Opinions, conclusion (etc) that do not 
relate to
        >> the
        >> > official business of this company shall be understood as neither 
given
        >> nor
        >> > endorsed by it. IG is a trading name of IG Markets Limited (a 
company
        >> > registered in England and Wales, company number 04008957) and IG 
Index
        >> > Limited (a company registered in England and Wales, company number
        >> > 01190902). Registered address at Cannon Bridge House, 25 Dowgate 
Hill,
        >> > London EC4R 2YA. Both IG Markets Limited (register number 195355) 
and IG
        >> > Index Limited (register number 114059) are authorised and 
regulated by
        >> the
        >> > Financial Conduct Authority.
        >> >
        >>
        >>
        >>
        >> --
        >> -Regards,
        >> Mayuresh R. Gharat
        >> (862) 250-7125
        >> The information contained in this email is strictly confidential and 
for
        >> the use of the addressee only, unless otherwise indicated. If you 
are not
        >> the intended recipient, please do not read, copy, use or disclose to 
others
        >> this message or any attachment. Please also notify the sender by 
replying
        >> to this email or by telephone (+44(020 7896 0011) and then delete 
the email
        >> and any copies of it. Opinions, conclusion (etc) that do not relate 
to the
        >> official business of this company shall be understood as neither 
given nor
        >> endorsed by it. IG is a trading name of IG Markets Limited (a company
        >> registered in England and Wales, company number 04008957) and IG 
Index
        >> Limited (a company registered in England and Wales, company number
        >> 01190902). Registered address at Cannon Bridge House, 25 Dowgate 
Hill,
        >> London EC4R 2YA. Both IG Markets Limited (register number 195355) 
and IG
        >> Index Limited (register number 114059) are authorised and regulated 
by the
        >> Financial Conduct Authority.
        >>
        >
        >
        >
        > --
        > -Regards,
        > Mayuresh R. Gharat
        > (862) 250-7125
        >
    
    
    
        --
        -Regards,
        Mayuresh R. Gharat
        (862) 250-7125
    
    
    The information contained in this email is strictly confidential and for 
the use of the addressee only, unless otherwise indicated. If you are not the 
intended recipient, please do not read, copy, use or disclose to others this 
message or any attachment. Please also notify the sender by replying to this 
email or by telephone (+44(020 7896 0011) and then delete the email and any 
copies of it. Opinions, conclusion (etc) that do not relate to the official 
business of this company shall be understood as neither given nor endorsed by 
it. IG is a trading name of IG Markets Limited (a company registered in England 
and Wales, company number 04008957) and IG Index Limited (a company registered 
in England and Wales, company number 01190902). Registered address at Cannon 
Bridge House, 25 Dowgate Hill, London EC4R 2YA. Both IG Markets Limited 
(register number 195355) and IG Index Limited (register number 114059) are 
authorised and regulated by the Financial Conduct Authority.
    

Reply via email to