Re: Broker Interceptors

2019-12-03 Thread Ignacio Solis
t; > >
> > > > One of the motivations for leading with client interceptors was to
> gain
> > > > experience and see how useable they are before tackling the server
> side
> > > > implementation which would ultimately "allow us to have a more
> > > > complete/detailed message monitoring".
> > > >
> > > > Broker interceptors could also provide more value than just more
> > > complete
> > > > and detailed monitoring such as server side schema validation, so I
> am
> > > > curious to learn if anyone in the community has progressed this work;
> > > has
> > > > ideas about other potential server side interceptor uses or has
> > actually
> > > > implemented something similar.
> > > >
> > >
> > >  I personally feel that the cost here is the impact on performance. If
> I
> > > am
> > > right, this interceptor is going to tap into nearly everything. If you
> > > have
> > > strong guarantee (min.in.sync.replicas = N-1) then this may incur some
> > > delay (and let's not forget inter broker comms protection by TLS
> config).
> > > This may not be desirable for some systems. That said, it would be good
> > to
> > > know what others think about this.
> > >
> > > Thanks,
> > >
> > > >
> > > > Regards,
> > > >
> > > > Tom Aley
> > > > thomas.a...@ibm.com
> > > > Unless stated otherwise above:
> > > > IBM United Kingdom Limited - Registered in England and Wales with
> > number
> > > > 741598.
> > > > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire
> PO6
> > > 3AU
> > > >
> > > >
> > >
> > >
> > >
> > > Unless stated otherwise above:
> > > IBM United Kingdom Limited - Registered in England and Wales with
> number
> > > 741598.
> > > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
> > 3AU
> > >
> > >
> >
>


-- 
Nacho - Ignacio Solis - iso...@igso.net


Re: [DISCUSS] KIP-82 - Add Record Headers

2017-02-16 Thread Ignacio Solis
g serialisation/deserialization to and from the
>> > primitives,
>> > > > string and byte[]. This is akin to some other messaging systems.
>> > > > 2) We are making it optional, so that for those not wanting
>> > headers have
>> > > 0
>> > > > bytes overhead (think of it as a feature flag), I don’t think this
>> > is
>> > > > complex, especially if comparing to changes proposed in other kips
>> > like
>> > > > kip-98.
>> > > > a. If you really really don’t like this, we can drop it, but it
>> > would
>> > > mean
>> > > > buying into 4 bytes extra overhead for users who do not want to use
>> > > headers.
>> > > > 3) In the summary yes, it is at a higher level, but I think this
>> > is well
>> > > > documented in the proposed changes section.
>> > > > a. Added getHeaders method to Producer/Consumer record (that is it)
>> > > > b. We’ve also detailed the new Headers class that this method
>> > returns
>> > > that
>> > > > encapsulates the headers protocol and logic.
>> > > >
>> > > > Best,
>> > > > Mike
>> > > >
>> > > > ==Original questions from the vote thread from Jay.==
>> > > >
>> > > > Couple of things I think we still need to work out:
>> > > >
>> > > >1. I think we agree about the key, but I think we haven't
>> > talked about
>> > > >the value yet. I think if our goal is an open ecosystem of these
>> > > header
>> > > >spread across many plugins from many systems we should consider
>> > making
>> > > > this
>> > > >a string as well so it can be printed, set via a UI, set in
>> > config,
>> > > etc.
>> > > >Basically encouraging pluggable serialization formats here will
>> > lead
>> > > to
>> > > > a
>> > > >bit of a tower of babel.
>> > > >2. This proposal still includes a pretty big change to our
>> > > serialization
>> > > >and protocol definition layer. Essentially it is introducing an
>> > > optional
>> > > >type, where the format is data dependent. I think this is
>> > actually a
>> > > big
>> > > >change though it doesn't seem like it. It means you can no
>> > longer
>> > > > specify
>> > > >this type with our type definition DSL, and likewise it requires
>> > > custom
>> > > >handling in client libs. This isn't a huge thing, since the
>> > Record
>> > > >definition is custom anyway, but I think this kind of protocol
>> > > >inconsistency is very non-desirable and ties you to hand-coding
>> > > things.
>> > > > I
>> > > >think the type should instead by [Key Value] in our BNF, where
>> > key and
>> > > >value are both short strings as used elsewhere. This brings it
>> > in line
>> > > > with
>> > > >the rest of the protocol.
>> > > >3. Could we get more specific about the exact Java API change to
>> > > >ProducerRecord, ConsumerRecord, Record, etc?
>> > > >
>> > > > -Jay
>> > > >
>> > >
>> >
>> >
>> > The information contained in this email is strictly confidential and for
>> > the use of the addressee only, unless otherwise indicated. If you are not
>> > the intended recipient, please do not read, copy, use or disclose to others
>> > this message or any attachment. Please also notify the sender by replying
>> > to this email or by telephone (+44(020 7896 0011) and then delete the email
>> > and any copies of it. Opinions, conclusion (etc) that do not relate to the
>> > official business of this company shall be understood as neither given nor
>> > endorsed by it. IG is a trading name of IG Markets Limited (a company
>> > registered in England and Wales, company number 04008957) and IG Index
>> > Limited (a company registered in England and Wales, company number
>> > 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill,
>> > London EC4R 2YA. Both IG Markets Limited (register number 195355) and IG
>> > Index Limited (register number 114059) are authorised and regulated by the
>> > Financial Conduct Authority.
>> >



-- 
Nacho - Ignacio Solis - iso...@igso.net


[DISCUSS] Control Messages - [Was: KIP-82 - Add Record Headers]

2016-12-14 Thread Ignacio Solis
I'm renaming this thread in case we start deep diving.

I'm in favor of so called "control messages", at least the notion of
those.  However, I'm not sure about the design.

What I understood from the original mail:

A. Provide a message that does not get returned by poll()
B. Provide a way for applications to consume these messages (sign up?)
C. Control messages would be associated with a topic.
D. Control messages should be _in_ the topic.



1. The first thing to point out is that this can be done with headers.
I assume that's why you sent it on the header thread. As you state, if
we had headers, you would not require a separate KIP.  So, in a way,
you're trying to provide a concrete use case for headers.  I wanted to
separate the discussion to a separate thread mostly because while I
like the idea, and I like the fact that it can be done by headers,
people might want to discuss alternatives.

2. I'm also assuming that you're intentionally trying to preserve
order. Headers could do this natively of course. You could also
achieve this with the separate topic given identifiers, sequence
numbers, headers, etc.  However...

3. There are a few use cases where ordering is important but
out-of-band is even more important. We have a few large workloads
where this is of interest to us.  Obviously we can achieve this with a
separate topic, but having a control channel for a topic that can send
high priority data would be interesting.   And yes, we would learn a
lot form the TCP experiences with the urgent pointer (
https://tools.ietf.org/html/rfc6093 ) and other out-of-band
communication techniques.

You have an example of a "shutdown marker".  This works ok as a
terminator, however, it is not very fast.  If I have 4 TB of data
because of asynchronous processing, then a shutdown marker at the end
of the 4TB is not as useful as having an out-of-band message that will
tell me immediately that those 4TB should not be processed.   So, from
this perspective, I prefer to have a separate topic and not embed
control messages with the data.

If the messages are part of the data, or associated to specific data,
then they should be in the data. If they are about process, we need an
out-of-band mechanism.


4. The general feeling I have gotten from a few people on the list is:
Why not just do this above the kafka clients?  After all, you could
have a system to ignore certain schemas.

Effectively, if we had headers, it would be done from a client
perspective, without the need to modify anything major.

If we wanted to do it with a separate topic, that could also be done
without any broker changes. But you could imagine wanting some broker
changes if the broker understands that 2 streams are tied together
then it may make decisions based on that.  This would be similar to
the handling of file system forks (
https://en.wikipedia.org/wiki/Fork_(file_system) )


5. Also heard on discussions about headers: we don't know if this is
generally useful. Maybe only a couple of institutions?  It may not be
worth it to modify the whole stack for that.

I would again say that with headers you could pull it off easily, even
if only for a subset of clients/applications wanted to use it.


So, in summary. I like the idea.  I see benefits in implementing it
through headers, but I also see benefits of having it as a separate
stream.  I'm not too in favor of having a separate message handling
pipeline for the same topic though.

Nacho





On Wed, Dec 14, 2016 at 9:51 AM, Matthias J. Sax  wrote:
> Yes and no. I did overload the term "control message".
>
> EOS control messages are for client-broker communication and thus never
> exposed to any application. And I think this is a good design because
> broker needs to understand those control messages. Thus, this should be
> a protocol change.
>
> The type of control messages I have in mind are for client-client
> (application-application) communication and the broker is agnostic to
> them. Thus, it should not be a protocol change.
>
>
> -Matthias
>
>
>
> On 12/14/16 9:42 AM, radai wrote:
>> arent control messages getting pushed as their own top level protocol
>> change (and a fairly massive one) for the transactions KIP ?
>>
>> On Tue, Dec 13, 2016 at 5:54 PM, Matthias J. Sax 
>> wrote:
>>
>>> Hi,
>>>
>>> I want to add a completely new angle to this discussion. For this, I
>>> want to propose an extension for the headers feature that enables new
>>> uses cases -- and those new use cases might convince people to support
>>> headers (of course including the larger scoped proposal).
>>>
>>> Extended Proposal:
>>>
>>> Allow messages with a certain header key to be special "control
>>> messages" (w/ o w/o payload) that are not exposed to an application via
>>> .poll().
>>>
>>> Thus, a consumer client would automatically skip over those messages. If
>>> an application knows about embedded control messages, it can "sing up"
>>> to those messages by the 

Re: [DISCUSS] KIP-82 - Add Record Headers

2016-12-01 Thread Ignacio Solis
each message
> for auditing. These metadata are really at the producer level though. So, a
> more efficient way is to only include a "producerId" per message and send
> the producerId -> metadata mapping independently. KIP-98 is actually
> proposing including such a producerId natively in the message.
>
> So, overall, I not sure that I am fully convinced of the strong third-party
> use cases of headers yet. Perhaps we could discuss a bit more to make one
> or two really convincing use cases.
>
> Another orthogonal  question is whether header should be exposed in stream
> processing systems such Kafka stream, Samza, and Spark streaming.
> Currently, those systems just deal with key/value pairs. Should we expose a
> third thing header there too or somehow map header to key or value?
>
> Thanks,
>
> Jun
>
>
> On Tue, Nov 29, 2016 at 3:35 AM, Michael Pearce <michael.pea...@ig.com>
> wrote:
>
> > I assume, that after a period of a week, that there is no concerns now
> > with points 1, and 2 and now we have agreement that headers are useful and
> > needed in Kafka. As such if put to a KIP vote, this wouldn’t be a reason to
> > reject.
> >
> > @
> > Ignacio on point 4).
> > I think for purpose of getting this KIP moving past this, we can state the
> > key will be a 4 bytes space that can will be naturally interpreted as an
> > Int32 (if namespacing is later wanted you can easily split this into two
> > int16 spaces), from the wire protocol implementation this makes no
> > difference I don’t believe. Is this reasonable to all?
> >
> > On 5) as per point 4 therefor happy we keep with 32 bits.
> >
> >
> >
> >
> >
> >
> > On 18/11/2016, 20:34, "ignacio.so...@gmail.com on behalf of Ignacio
> > Solis" <ignacio.so...@gmail.com on behalf of iso...@igso.net> wrote:
> >
> > Summary:
> >
> > 3) Yes - Header value as byte[]
> >
> > 4a) Int,Int - No
> > 4b) Int - Yes
> > 4c) String - Reluctant maybe
> >
> > 5) I believe the header system should take a single int.  I think
> > 32bits is
> > a good size, if you want to interpret this as to 16bit numbers in the
> > layer
> > above go right ahead.  If somebody wants to argue for 16 bits or 64
> > bits of
> > header key space I would listen.
> >
> >
> > Discussion:
> > Dividing the key space into sub_key_1 and sub_key_2 makes no sense to
> > me at
> > this layer.  Are we going to start providing APIs to get all the
> > sub_key_1s? or all the sub_key_2s?  If there is no distinguishing
> > functions
> > that are applied to each one then they should be a single value.  At
> > this
> > layer all we're doing is equality.
> > If the above layer wants to interpret this as 2, 3 or more values
> > that's a
> > different question.  I personally think it's all one keyspace that is
> > getting assigned using some structure, but if you want to sub-assign
> > parts
> > of it then that's fine.
> >
> > The same discussion applies to strings.  If somebody argued for
> > strings,
> > would we be arguing to divide the strings with dots ('.') as a
> > requirement?
> > Would we want them to give us the different name segments separately?
> > Would we be performing any actions on this key other than matching?
> >
> > Nacho
> >
> >
> >
> > On Fri, Nov 18, 2016 at 9:30 AM, Michael Pearce <michael.pea...@ig.com
> > >
> > wrote:
> >
> > > #jay #jun any concerns on 1 and 2 still?
> > >
> > > @all
> > > To get this moving along a bit more I'd also like to ask to get
> > clarity on
> > > the below last points:
> > >
> > > 3) I believe we're all roughly happy with the header value being a
> > byte[]?
> > >
> > > 4) I believe consensus has been for an namespace based int approach
> > > {int,int} for the key. Any objections if this is what we go with?
> > >
> > > 5) as we have if assumption in (4)  is correct, {int,int} keys.
> > > Should both int's be int16 or int32?
> > > I'm for them being int16(2 bytes) as combined is space of 4bytes as
> > per
> > > original and gives plenty of combinations for the foreseeable, and
> > keeps
> > > the overhead small.
> > >
> > > Do we see any benefit in another kip call to discuss these at all?
> >

Re: [DISCUSS] 0.10.1.1 Plan

2016-11-29 Thread Ignacio Solis
Sorry, that was a hasty reply.  There are also various logging things that
change format. This could break parsers.

None of them are important, my only argument is that the final keyword
removal is not important either.

Nacho


On Tue, Nov 29, 2016 at 1:25 PM, Ignacio Solis <iso...@igso.net> wrote:

> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=commit;h=
> 10cfc1628df024f7596d3af5c168fa90f59035ca
>
> On Tue, Nov 29, 2016 at 1:24 PM, Ismael Juma <ism...@juma.me.uk> wrote:
>
>> Which changes break compatibility in the 0.10.1 branch? It would be good
>> to
>> fix before the release goes out.
>>
>> Ismael
>>
>> On 29 Nov 2016 9:09 pm, "Ignacio Solis" <iso...@igso.net> wrote:
>>
>> > Some of the changes in the 0.10.1 branch already are not bug fixes. Some
>> > break compatibility.
>> >
>> > Having said that, at this level we should maintain a stable API and
>> leave
>> > any changes for real version bumps.  This should be only a bugfix
>> release.
>> >
>> > Nacho
>> >
>> >
>> >
>> >
>> > On Tue, Nov 29, 2016 at 8:35 AM, Ismael Juma <ism...@juma.me.uk> wrote:
>> >
>> > > I disagree, but let's discuss it another time and in a separate
>> thread.
>> > :)
>> > >
>> > > Ismael
>> > >
>> > > On Tue, Nov 29, 2016 at 4:30 PM, radai <radai.rosenbl...@gmail.com>
>> > wrote:
>> > >
>> > > > designing kafka code for stable extensibility is a worthy and noble
>> > > cause.
>> > > > however, seeing as there are no such derivatives out in the wild
>> yet i
>> > > > think investing the effort right now is a bit premature from kafka's
>> > pov.
>> > > > I think its enough simply not to purposefully prevent such
>> extensions.
>> > > >
>> > > > On Tue, Nov 29, 2016 at 4:05 AM, Ismael Juma <ism...@juma.me.uk>
>> > wrote:
>> > > >
>> > > > > On Sat, Nov 26, 2016 at 11:08 PM, radai <
>> radai.rosenbl...@gmail.com>
>> > > > > wrote:
>> > > > >
>> > > > > > "compatibility guarantees that are expected by people who
>> subclass
>> > > > these
>> > > > > > classes"
>> > > > > >
>> > > > > > sorry if this is not the best thread for this discussion, but I
>> > just
>> > > > > wanted
>> > > > > > to pop in and say that since any subclassing of these will
>> > obviously
>> > > > not
>> > > > > be
>> > > > > > done within the kafka codebase - what guarantees are needed?
>> > > > > >
>> > > > >
>> > > > > I elaborated a little in my other message in this thread. A simple
>> > and
>> > > > > somewhat contrived example: `ConsumerRecord.toString` calls the
>> > `topic`
>> > > > > method. Someone overrides the `topic` method and it all works as
>> > > > expected.
>> > > > > In a subsequent release, we change `toString` to use the field
>> > directly
>> > > > > (like it's done for other fields like `key` and `value`) and it
>> will
>> > > > break
>> > > > > `toString` for this user. One may wonder: why would one override a
>> > > method
>> > > > > like `topic`? That is a good question, but part of the exercise is
>> > > > deciding
>> > > > > how we approach these issues. We could make the methods final and
>> > > > eliminate
>> > > > > the possibility, we could document it so that users can choose to
>> do
>> > > > weird
>> > > > > things if they want, etc.
>> > > > >
>> > > > > Another thing that is usually good to think about is the
>> expectation
>> > > for
>> > > > > `equals` and `hashCode`. What if subclasses implement them to have
>> > > value
>> > > > > semantics instead of identity semantics. Is that OK or would it
>> break
>> > > > > things?
>> > > > >
>> > > > > Designing for implementation inheritance is generally complex
>> > although
>> > > > for
>> > > > > simple "record" like classes, it can be easier by following a few
>> > > > > guidelines.
>> > > > >
>> > > > > Ismael
>> > > > >
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> > Nacho - Ignacio Solis - iso...@igso.net
>> >
>>
>
>
>
> --
> Nacho - Ignacio Solis - iso...@igso.net
>



-- 
Nacho - Ignacio Solis - iso...@igso.net


Re: [DISCUSS] 0.10.1.1 Plan

2016-11-29 Thread Ignacio Solis
https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=commit;h=10cfc1628df024f7596d3af5c168fa90f59035ca

On Tue, Nov 29, 2016 at 1:24 PM, Ismael Juma <ism...@juma.me.uk> wrote:

> Which changes break compatibility in the 0.10.1 branch? It would be good to
> fix before the release goes out.
>
> Ismael
>
> On 29 Nov 2016 9:09 pm, "Ignacio Solis" <iso...@igso.net> wrote:
>
> > Some of the changes in the 0.10.1 branch already are not bug fixes. Some
> > break compatibility.
> >
> > Having said that, at this level we should maintain a stable API and leave
> > any changes for real version bumps.  This should be only a bugfix
> release.
> >
> > Nacho
> >
> >
> >
> >
> > On Tue, Nov 29, 2016 at 8:35 AM, Ismael Juma <ism...@juma.me.uk> wrote:
> >
> > > I disagree, but let's discuss it another time and in a separate thread.
> > :)
> > >
> > > Ismael
> > >
> > > On Tue, Nov 29, 2016 at 4:30 PM, radai <radai.rosenbl...@gmail.com>
> > wrote:
> > >
> > > > designing kafka code for stable extensibility is a worthy and noble
> > > cause.
> > > > however, seeing as there are no such derivatives out in the wild yet
> i
> > > > think investing the effort right now is a bit premature from kafka's
> > pov.
> > > > I think its enough simply not to purposefully prevent such
> extensions.
> > > >
> > > > On Tue, Nov 29, 2016 at 4:05 AM, Ismael Juma <ism...@juma.me.uk>
> > wrote:
> > > >
> > > > > On Sat, Nov 26, 2016 at 11:08 PM, radai <
> radai.rosenbl...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > "compatibility guarantees that are expected by people who
> subclass
> > > > these
> > > > > > classes"
> > > > > >
> > > > > > sorry if this is not the best thread for this discussion, but I
> > just
> > > > > wanted
> > > > > > to pop in and say that since any subclassing of these will
> > obviously
> > > > not
> > > > > be
> > > > > > done within the kafka codebase - what guarantees are needed?
> > > > > >
> > > > >
> > > > > I elaborated a little in my other message in this thread. A simple
> > and
> > > > > somewhat contrived example: `ConsumerRecord.toString` calls the
> > `topic`
> > > > > method. Someone overrides the `topic` method and it all works as
> > > > expected.
> > > > > In a subsequent release, we change `toString` to use the field
> > directly
> > > > > (like it's done for other fields like `key` and `value`) and it
> will
> > > > break
> > > > > `toString` for this user. One may wonder: why would one override a
> > > method
> > > > > like `topic`? That is a good question, but part of the exercise is
> > > > deciding
> > > > > how we approach these issues. We could make the methods final and
> > > > eliminate
> > > > > the possibility, we could document it so that users can choose to
> do
> > > > weird
> > > > > things if they want, etc.
> > > > >
> > > > > Another thing that is usually good to think about is the
> expectation
> > > for
> > > > > `equals` and `hashCode`. What if subclasses implement them to have
> > > value
> > > > > semantics instead of identity semantics. Is that OK or would it
> break
> > > > > things?
> > > > >
> > > > > Designing for implementation inheritance is generally complex
> > although
> > > > for
> > > > > simple "record" like classes, it can be easier by following a few
> > > > > guidelines.
> > > > >
> > > > > Ismael
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > Nacho - Ignacio Solis - iso...@igso.net
> >
>



-- 
Nacho - Ignacio Solis - iso...@igso.net


Re: [DISCUSS] 0.10.1.1 Plan

2016-11-29 Thread Ignacio Solis
Some of the changes in the 0.10.1 branch already are not bug fixes. Some
break compatibility.

Having said that, at this level we should maintain a stable API and leave
any changes for real version bumps.  This should be only a bugfix release.

Nacho




On Tue, Nov 29, 2016 at 8:35 AM, Ismael Juma <ism...@juma.me.uk> wrote:

> I disagree, but let's discuss it another time and in a separate thread. :)
>
> Ismael
>
> On Tue, Nov 29, 2016 at 4:30 PM, radai <radai.rosenbl...@gmail.com> wrote:
>
> > designing kafka code for stable extensibility is a worthy and noble
> cause.
> > however, seeing as there are no such derivatives out in the wild yet i
> > think investing the effort right now is a bit premature from kafka's pov.
> > I think its enough simply not to purposefully prevent such extensions.
> >
> > On Tue, Nov 29, 2016 at 4:05 AM, Ismael Juma <ism...@juma.me.uk> wrote:
> >
> > > On Sat, Nov 26, 2016 at 11:08 PM, radai <radai.rosenbl...@gmail.com>
> > > wrote:
> > >
> > > > "compatibility guarantees that are expected by people who subclass
> > these
> > > > classes"
> > > >
> > > > sorry if this is not the best thread for this discussion, but I just
> > > wanted
> > > > to pop in and say that since any subclassing of these will obviously
> > not
> > > be
> > > > done within the kafka codebase - what guarantees are needed?
> > > >
> > >
> > > I elaborated a little in my other message in this thread. A simple and
> > > somewhat contrived example: `ConsumerRecord.toString` calls the `topic`
> > > method. Someone overrides the `topic` method and it all works as
> > expected.
> > > In a subsequent release, we change `toString` to use the field directly
> > > (like it's done for other fields like `key` and `value`) and it will
> > break
> > > `toString` for this user. One may wonder: why would one override a
> method
> > > like `topic`? That is a good question, but part of the exercise is
> > deciding
> > > how we approach these issues. We could make the methods final and
> > eliminate
> > > the possibility, we could document it so that users can choose to do
> > weird
> > > things if they want, etc.
> > >
> > > Another thing that is usually good to think about is the expectation
> for
> > > `equals` and `hashCode`. What if subclasses implement them to have
> value
> > > semantics instead of identity semantics. Is that OK or would it break
> > > things?
> > >
> > > Designing for implementation inheritance is generally complex although
> > for
> > > simple "record" like classes, it can be easier by following a few
> > > guidelines.
> > >
> > > Ismael
> > >
> >
>



-- 
Nacho - Ignacio Solis - iso...@igso.net


Re: [DISCUSS] KIP-82 - Add Record Headers

2016-11-29 Thread Ignacio Solis
I'm ok with 32 bit keys and leaving the interpretation out of this
discussion/KIP.

Nacho

On Tue, Nov 29, 2016 at 3:35 AM, Michael Pearce <michael.pea...@ig.com>
wrote:

> I assume, that after a period of a week, that there is no concerns now
> with points 1, and 2 and now we have agreement that headers are useful and
> needed in Kafka. As such if put to a KIP vote, this wouldn’t be a reason to
> reject.
>
> @
> Ignacio on point 4).
> I think for purpose of getting this KIP moving past this, we can state the
> key will be a 4 bytes space that can will be naturally interpreted as an
> Int32 (if namespacing is later wanted you can easily split this into two
> int16 spaces), from the wire protocol implementation this makes no
> difference I don’t believe. Is this reasonable to all?
>
> On 5) as per point 4 therefor happy we keep with 32 bits.
>
>
>
>
>
>
> On 18/11/2016, 20:34, "ignacio.so...@gmail.com on behalf of Ignacio
> Solis" <ignacio.so...@gmail.com on behalf of iso...@igso.net> wrote:
>
> Summary:
>
> 3) Yes - Header value as byte[]
>
> 4a) Int,Int - No
> 4b) Int - Yes
> 4c) String - Reluctant maybe
>
> 5) I believe the header system should take a single int.  I think
> 32bits is
> a good size, if you want to interpret this as to 16bit numbers in the
> layer
> above go right ahead.  If somebody wants to argue for 16 bits or 64
> bits of
> header key space I would listen.
>
>
> Discussion:
> Dividing the key space into sub_key_1 and sub_key_2 makes no sense to
> me at
> this layer.  Are we going to start providing APIs to get all the
> sub_key_1s? or all the sub_key_2s?  If there is no distinguishing
> functions
> that are applied to each one then they should be a single value.  At
> this
> layer all we're doing is equality.
> If the above layer wants to interpret this as 2, 3 or more values
> that's a
> different question.  I personally think it's all one keyspace that is
> getting assigned using some structure, but if you want to sub-assign
> parts
> of it then that's fine.
>
> The same discussion applies to strings.  If somebody argued for
> strings,
> would we be arguing to divide the strings with dots ('.') as a
> requirement?
> Would we want them to give us the different name segments separately?
> Would we be performing any actions on this key other than matching?
>
> Nacho
>
>
>
> On Fri, Nov 18, 2016 at 9:30 AM, Michael Pearce <michael.pea...@ig.com
> >
> wrote:
>
> > #jay #jun any concerns on 1 and 2 still?
> >
> > @all
> > To get this moving along a bit more I'd also like to ask to get
> clarity on
> > the below last points:
> >
> > 3) I believe we're all roughly happy with the header value being a
> byte[]?
> >
> > 4) I believe consensus has been for an namespace based int approach
> > {int,int} for the key. Any objections if this is what we go with?
> >
> > 5) as we have if assumption in (4)  is correct, {int,int} keys.
> > Should both int's be int16 or int32?
> > I'm for them being int16(2 bytes) as combined is space of 4bytes as
> per
> > original and gives plenty of combinations for the foreseeable, and
> keeps
> > the overhead small.
> >
> > Do we see any benefit in another kip call to discuss these at all?
> >
> > Cheers
> > Mike
> > 
>     > From: K Burstev <k.burs...@yandex.com>
> > Sent: Friday, November 18, 2016 7:07:07 AM
> > To: dev@kafka.apache.org
> > Subject: Re: [DISCUSS] KIP-82 - Add Record Headers
> >
> > For what it is worth also i agree. As a user:
> >
> >  1) Yes - Headers are worthwhile
> >  2) Yes - Headers should be a top level option
> >
> > 14.11.2016, 21:15, "Ignacio Solis" <iso...@igso.net>:
> > > 1) Yes - Headers are worthwhile
> > > 2) Yes - Headers should be a top level option
> > >
> > > On Mon, Nov 14, 2016 at 9:16 AM, Michael Pearce <
> michael.pea...@ig.com>
> > > wrote:
> > >
> > >>  Hi Roger,
> > >>
> > >>  The kip details/examples the original proposal for key spacing ,
> not
> > the
> > >>  new mentioned as per discussion namespace idea.
> > >>
> > >>  We will need to update the kip, when we get agreement this is a
> better

Re: [DISCUSS] KIP-82 - Add Record Headers

2016-11-18 Thread Ignacio Solis
Summary:

3) Yes - Header value as byte[]

4a) Int,Int - No
4b) Int - Yes
4c) String - Reluctant maybe

5) I believe the header system should take a single int.  I think 32bits is
a good size, if you want to interpret this as to 16bit numbers in the layer
above go right ahead.  If somebody wants to argue for 16 bits or 64 bits of
header key space I would listen.


Discussion:
Dividing the key space into sub_key_1 and sub_key_2 makes no sense to me at
this layer.  Are we going to start providing APIs to get all the
sub_key_1s? or all the sub_key_2s?  If there is no distinguishing functions
that are applied to each one then they should be a single value.  At this
layer all we're doing is equality.
If the above layer wants to interpret this as 2, 3 or more values that's a
different question.  I personally think it's all one keyspace that is
getting assigned using some structure, but if you want to sub-assign parts
of it then that's fine.

The same discussion applies to strings.  If somebody argued for strings,
would we be arguing to divide the strings with dots ('.') as a requirement?
Would we want them to give us the different name segments separately?
Would we be performing any actions on this key other than matching?

Nacho



On Fri, Nov 18, 2016 at 9:30 AM, Michael Pearce <michael.pea...@ig.com>
wrote:

> #jay #jun any concerns on 1 and 2 still?
>
> @all
> To get this moving along a bit more I'd also like to ask to get clarity on
> the below last points:
>
> 3) I believe we're all roughly happy with the header value being a byte[]?
>
> 4) I believe consensus has been for an namespace based int approach
> {int,int} for the key. Any objections if this is what we go with?
>
> 5) as we have if assumption in (4)  is correct, {int,int} keys.
> Should both int's be int16 or int32?
> I'm for them being int16(2 bytes) as combined is space of 4bytes as per
> original and gives plenty of combinations for the foreseeable, and keeps
> the overhead small.
>
> Do we see any benefit in another kip call to discuss these at all?
>
> Cheers
> Mike
> 
> From: K Burstev <k.burs...@yandex.com>
> Sent: Friday, November 18, 2016 7:07:07 AM
> To: dev@kafka.apache.org
> Subject: Re: [DISCUSS] KIP-82 - Add Record Headers
>
> For what it is worth also i agree. As a user:
>
>  1) Yes - Headers are worthwhile
>  2) Yes - Headers should be a top level option
>
> 14.11.2016, 21:15, "Ignacio Solis" <iso...@igso.net>:
> > 1) Yes - Headers are worthwhile
> > 2) Yes - Headers should be a top level option
> >
> > On Mon, Nov 14, 2016 at 9:16 AM, Michael Pearce <michael.pea...@ig.com>
> > wrote:
> >
> >>  Hi Roger,
> >>
> >>  The kip details/examples the original proposal for key spacing , not
> the
> >>  new mentioned as per discussion namespace idea.
> >>
> >>  We will need to update the kip, when we get agreement this is a better
> >>  approach (which seems to be the case if I have understood the general
> >>  feeling in the conversation)
> >>
> >>  Re the variable ints, at very early stage we did think about this. I
> think
> >>  the added complexity for the saving isn't worth it. I'd rather go
> with, if
> >>  we want to reduce overheads and size int16 (2bytes) keys as it keeps it
> >>  simple.
> >>
> >>  On the note of no headers, there is as per the kip as we use an
> attribute
> >>  bit to denote if headers are present or not as such provides a zero
> >>  overhead currently if headers are not used.
> >>
> >>  I think as radai mentions would be good first if we can get clarity if
> do
> >>  we now have general consensus that (1) headers are worthwhile and
> useful,
> >>  and (2) we want it as a top level entity.
> >>
> >>  Just to state the obvious i believe (1) headers are worthwhile and (2)
> >>  agree as a top level entity.
> >>
> >>  Cheers
> >>  Mike
> >>  
> >>  From: Roger Hoover <roger.hoo...@gmail.com>
> >>  Sent: Wednesday, November 9, 2016 9:10:47 PM
> >>  To: dev@kafka.apache.org
> >>  Subject: Re: [DISCUSS] KIP-82 - Add Record Headers
> >>
> >>  Sorry for going a little in the weeds but thanks for the replies
> regarding
> >>  varint.
> >>
> >>  Agreed that a prefix and {int, int} can be the same. It doesn't look
> like
> >>  that's what the KIP is saying the "Open" section. The example shows
> >>  211
> >>  for New Relic and 210002 for App Dynamics implying th

Re: [DISCUSS] KIP-82 - Add Record Headers

2016-11-14 Thread Ignacio Solis
1) Yes - Headers are worthwhile
2) Yes - Headers should be a top level option

On Mon, Nov 14, 2016 at 9:16 AM, Michael Pearce 
wrote:

> Hi Roger,
>
> The kip details/examples the original proposal for key spacing , not the
> new mentioned as per discussion namespace idea.
>
> We will need to update the kip, when we get agreement this is a better
> approach (which seems to be the case if I have understood the general
> feeling in the conversation)
>
> Re the variable ints, at very early stage we did think about this. I think
> the added complexity for the saving isn't worth it. I'd rather go with, if
> we want to reduce overheads and size int16 (2bytes) keys as it keeps it
> simple.
>
> On the note of no headers, there is as per the kip as we use an attribute
> bit to denote if headers are present or not as such provides a zero
> overhead currently if headers are not used.
>
> I think as radai mentions would be good first if we can get clarity if do
> we now have general consensus that (1) headers are worthwhile and useful,
> and (2) we want it as a top level entity.
>
>
> Just to state the obvious i believe (1) headers are worthwhile and (2)
> agree as a top level entity.
>
> Cheers
> Mike
> 
> From: Roger Hoover 
> Sent: Wednesday, November 9, 2016 9:10:47 PM
> To: dev@kafka.apache.org
> Subject: Re: [DISCUSS] KIP-82 - Add Record Headers
>
> Sorry for going a little in the weeds but thanks for the replies regarding
> varint.
>
> Agreed that a prefix and {int, int} can be the same.  It doesn't look like
> that's what the KIP is saying the "Open" section.   The example shows
> 211
> for New Relic and 210002 for App Dynamics implying that the New Relic
> organization will have only a single header id to work with.  Or is 211
> a prefix?  The main point of a namespace or prefix is to reduce the
> overhead of config mapping or registration depending on how
> namespaces/prefixes are managed.
>
> Would love to hear more feedback on the higher-level questions though...
>
> Cheers,
>
> Roger
>
>
> On Wed, Nov 9, 2016 at 11:38 AM, radai  wrote:
>
> > I think this discussion is getting a bit into the weeds on technical
> > implementation details.
> > I'd liek to step back a minute and try and establish where we are in the
> > larger picture:
> >
> > (re-wording nacho's last paragraph)
> > 1. are we all in agreement that headers are a worthwhile and useful
> > addition to have? this was contested early on
> > 2. are we all in agreement on headers as top level entity vs headers
> > squirreled-away in V?
> >
> > if there are still concerns around these #2 points (#jay? #jun?)?
> >
> > (and now back to our normal programming ...)
> >
> > varints are nice. having said that, its adding complexity (see
> > https://github.com/addthis/stream-lib/blob/master/src/
> > main/java/com/clearspring/analytics/util/Varint.java
> > as 1st google result) and would require anyone writing other clients (C?
> > Python? Go? Bash? ;-) ) to get/implement the same, and for relatively
> > little gain (int vs string is order of magnitude, this isnt).
> >
> > int namespacing vs {int, int} namespacing are basically the same thing -
> > youre just namespacing an int64 and giving people while 2^32 ranges at a
> > time. the part i like about this is letting people have a large swath of
> > numbers with one registration so they dont have to come back for every
> > single plugin/header they want to "reserve".
> >
> >
> > On Wed, Nov 9, 2016 at 11:01 AM, Roger Hoover 
> > wrote:
> >
> > > Since some of the debate has been about overhead + performance, I'm
> > > wondering if we have considered a varint encoding (
> > > https://developers.google.com/protocol-buffers/docs/encoding#varints)
> > for
> > > the header length field (int32 in the proposal) and for header ids?  If
> > you
> > > don't use headers, the overhead would be a single byte and for each
> > header
> > > id < 128 would also need only a single byte?
> > >
> > >
> > >
> > > On Wed, Nov 9, 2016 at 6:43 AM, radai 
> > wrote:
> > >
> > > > @magnus - and very dangerous (youre essentially downloading and
> > executing
> > > > arbitrary code off the internet on your servers ... bad idea without
> a
> > > > sandbox, even with)
> > > >
> > > > as for it being a purely administrative task - i disagree.
> > > >
> > > > i wish it would, really, because then my earlier point on the
> > complexity
> > > of
> > > > the remapping process would be invalid, but at linkedin, for example,
> > we
> > > > (the team im in) run kafka as a service. we dont really know what our
> > > users
> > > > (developing applications that use kafka) are up to at any given
> moment.
> > > it
> > > > is very possible (given the existance of headers and a corresponding
> > > plugin
> > > > ecosystem) for some application to "equip" their producers and
> > 

Re: [DISCUSS] KIP-87 - Add Compaction Tombstone Flag

2016-11-10 Thread Ignacio Solis
; > > > > > > > clarify
> > > > > > > > > the
> > > > > > > > > >> behavior
> > > > > > > > > >>> for null messages where the tombstone
> > flag
> > > is
> > > > > not
> > > > > > net.
> > > > > > > > > >>>
> > > > > > > > > >>> On Wed, Oct 26, 2016 at 1:32 AM Magnus
> > > > > Edenhill <
> > > > > > > > > mag...@edenhill.se>
> > > > > > > > > >>> wrote:
> > > > > > > > > >>>
> > > > > > > > > >>>> 2016-10-25 21:36 GMT+02:00 Nacho Solis
> > > > > > > > > <nso...@linkedin.com.invalid>:
> > > > > > > > > >>>>
> > > > > > > > > >>>>> I think you probably require a
> > MagicByte
> > > > > bump if
> > > > > > you
> > > > > > > > expect
> > > > > > > > > correct
> > > > > > > > > >>>>> behavior of the system as a whole.
> > > > > > > > > >>>>>
> > > > > > > > > >>>>> From a client perspective you want to
> > > make
> > > > > sure
> > > > > > that
> > > > > > > > when you
> > > > > > > > > >> deliver a
> > > > > > > > > >>>>> message that the broker supports the
> > > > feature
> > > > > you're
> > > > > > > > expecting
> > > > > > > > > >>>>> (compaction).  So, depending on the
> > > > behavior
> > > > > of the
> > > > > > > > broker on
> > > > > > > > > >>>> encountering
> > > > > > > > > >>>>> a previously undefined bit flag I
> would
> > > > > suggest
> > > > > > > making
> > > > > > > > some
> > > > > > > > > change to
> > > > > > > > > >>>> make
> > > > > > > > > >>>>> certain that flag-based compaction is
> > > > > supported.
> > > > > > I'm
> > > > > > > > going
> > > > > > > > > to guess
> > > > > > > > > >>> that
> > > > > > > > > >>>>> the MagicByte would do this.
> > > > > > > > > >>>>>
> > > > > > > > > >>>>
> > > > > > > > > >>>> I dont believe this is needed since it
> > is
> > > > > already
> > > > > > > > attributed
> > > > > > > > > through
> > > > > > > > > >> the
> > > > > > > > > >>>> request's API version.
> > > > > > > > > >>>>
> > > > > > > > > >>>> Producer:
> > > > > > > > > >>>> * if a client sends ProduceRequest V4
> > then
> > > > > > > > attributes.bit5
> > > > > > > > > indicates a
> > > > > > > > > >>>> tombstone
> > > > > > > > > >>>> * if a clients sends ProduceRequest
>  > > then
> > > > > > > > attributes.bit5
> > > > > > > > > is
> > > > > > > > > >> ignored
> > > > > > > > > >>>> and value==null in