from:"Dana Powers"

Re: [VOTE] 1.0.0 RC3

2017-10-25 Thread Dana Powers

Does the voting deadline also need an update?

> *** Please download, test and vote by Friday, October 20, 8pm PT

On Wed, Oct 25, 2017 at 10:37 AM, Guozhang Wang  wrote:

> Ted:
>
> Thanks for the reminder. Yes it is a typo. In fact this is the "forth"
> candidate of the release, not the "third" one :)
>
>
> Jaikiran:
>
> That's a fair point. Though I do not know how to achieve that with the
> maven central staging repository mechanism today [1]. If anyone has ideas
> how to do that I'm all ears.
>
>
> All:
>
> The passed Jenkins builders for this RC can now be found here:
>
> System test: https://jenkins.confluent.io/job/system-test-kafka-1.0/14/
> Unit test: https://builds.apache.org/job/kafka-1.0-jdk7/55/
>
>
> Please help verify the quickstarts / tutorials / binary signatures /
> anything you can and cast your vote before the voting deadline.
>
> Guozhang
>
>
> [1] repository.apache.org/#stagingRepositories
>
>
> On Mon, Oct 23, 2017 at 6:06 PM, Ted Yu  wrote:
>
> > bq. Tag to be voted upon (off 1.0 branch) is the 1.0.0-rc2 tag:
> >
> > There seems to be a typo above: 1.0.0-rc3 tag
> >
> > FYI
> >
> > On Mon, Oct 23, 2017 at 6:00 PM, Guozhang Wang 
> wrote:
> >
> > > Hello Kafka users, developers and client-developers,
> > >
> > > This is the third candidate for release of Apache Kafka 1.0.0. The main
> > PRs
> > > that gets merged in after RC1 are the following:
> > >
> > > https://github.com/apache/kafka/commit/dc6bfa553e73ffccd1e604963e076c
> > > 78d8ddcd69
> > >
> > > It's worth noting that starting in this version we are using a
> different
> > > version protocol with three digits: *major.minor.bug-fix*
> > >
> > > Any and all testing is welcome, but the following areas are worth
> > > highlighting:
> > >
> > > 1. Client developers should verify that their clients can
> produce/consume
> > > to/from 1.0.0 brokers (ideally with compressed and uncompressed data).
> > > 2. Performance and stress testing. Heroku and LinkedIn have helped with
> > > this in the past (and issues have been found and fixed).
> > > 3. End users can verify that their apps work correctly with the new
> > > release.
> > >
> > > This is a major version release of Apache Kafka. It includes 29 new
> KIPs.
> > > See the release notes and release plan
> > > (*https://cwiki.apache.org/confluence/pages/viewpage.
> > > action?pageId=71764913
> > >  > action?pageId=71764913
> > > >*)
> > > for more details. A few feature highlights:
> > >
> > > * Java 9 support with significantly faster TLS and CRC32C
> implementations
> > > * JBOD improvements: disk failure only disables failed disk but not the
> > > broker (KIP-112/KIP-113 part I)
> > > * Controller improvements: reduced logging change to greatly accelerate
> > > admin request handling.
> > > * Newly added metrics across all the modules (KIP-164, KIP-168,
> KIP-187,
> > > KIP-188, KIP-196)
> > > * Kafka Streams API improvements (KIP-120 / 130 / 138 / 150 / 160 /
> 161),
> > > and drop compatibility "Evolving" annotations
> > >
> > > Release notes for the 1.0.0 release:
> > > *http://home.apache.org/~guozhang/kafka-1.0.0-rc3/RELEASE_NOTES.html
> > > *
> > >
> > >
> > >
> > > *** Please download, test and vote by Friday, October 20, 8pm PT
> > >
> > > Kafka's KEYS file containing PGP keys we use to sign the release:
> > > http://kafka.apache.org/KEYS
> > >
> > > * Release artifacts to be voted upon (source and binary):
> > > *http://home.apache.org/~guozhang/kafka-1.0.0-rc3/
> > > *
> > >
> > > * Maven artifacts to be voted upon:
> > > https://repository.apache.org/content/groups/staging/org/apache/kafka/
> > >
> > > * Javadoc:
> > > *http://home.apache.org/~guozhang/kafka-1.0.0-rc3/javadoc/
> > > *
> > >
> > > * Tag to be voted upon (off 1.0 branch) is the 1.0.0-rc2 tag:
> > >
> > > https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=
> > > 7774b0da8ead0d9edd1d4b2f7e1cd743af694112
> > >
> > >
> > > * Documentation:
> > > Note the documentation can't be pushed live due to changes that will
> not
> > go
> > > live until the release. You can manually verify by downloading
> > > http://home.apache.org/~guozhang/kafka-1.0.0-rc3/
> > > kafka_2.11-1.0.0-site-docs.tgz
> > >
> > > I will update this thread with up coming Jenkins builds for this RC
> > later,
> > > they are currently being executed and will be done tomorrow.
> > >
> > >
> > > /**
> > >
> > >
> > > Thanks,
> > > -- Guozhang
> > >
> >
>
>
>
> --
> -- Guozhang
>

Re: [kafka-clients] [VOTE] 1.0.0 RC3

2017-10-23 Thread Dana Powers

+1. passed kafka-python integration tests, and manually verified
producer/consumer on both compressed and non-compressed data.

-Dana

On Mon, Oct 23, 2017 at 6:00 PM, Guozhang Wang  wrote:

> Hello Kafka users, developers and client-developers,
>
> This is the third candidate for release of Apache Kafka 1.0.0. The main
> PRs that gets merged in after RC1 are the following:
>
> https://github.com/apache/kafka/commit/dc6bfa553e73ffccd1e60
> 4963e076c78d8ddcd69
>
> It's worth noting that starting in this version we are using a different
> version protocol with three digits: *major.minor.bug-fix*
>
> Any and all testing is welcome, but the following areas are worth
> highlighting:
>
> 1. Client developers should verify that their clients can produce/consume
> to/from 1.0.0 brokers (ideally with compressed and uncompressed data).
> 2. Performance and stress testing. Heroku and LinkedIn have helped with
> this in the past (and issues have been found and fixed).
> 3. End users can verify that their apps work correctly with the new
> release.
>
> This is a major version release of Apache Kafka. It includes 29 new KIPs.
> See the release notes and release plan 
> (*https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=71764913
> *)
> for more details. A few feature highlights:
>
> * Java 9 support with significantly faster TLS and CRC32C implementations
> * JBOD improvements: disk failure only disables failed disk but not the
> broker (KIP-112/KIP-113 part I)
> * Controller improvements: reduced logging change to greatly accelerate
> admin request handling.
> * Newly added metrics across all the modules (KIP-164, KIP-168, KIP-187,
> KIP-188, KIP-196)
> * Kafka Streams API improvements (KIP-120 / 130 / 138 / 150 / 160 / 161),
> and drop compatibility "Evolving" annotations
>
> Release notes for the 1.0.0 release:
> *http://home.apache.org/~guozhang/kafka-1.0.0-rc3/RELEASE_NOTES.html
> *
>
>
>
> *** Please download, test and vote by Friday, October 20, 8pm PT
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
> http://kafka.apache.org/KEYS
>
> * Release artifacts to be voted upon (source and binary):
> *http://home.apache.org/~guozhang/kafka-1.0.0-rc3/
> *
>
> * Maven artifacts to be voted upon:
> https://repository.apache.org/content/groups/staging/org/apache/kafka/
>
> * Javadoc:
> *http://home.apache.org/~guozhang/kafka-1.0.0-rc3/javadoc/
> *
>
> * Tag to be voted upon (off 1.0 branch) is the 1.0.0-rc2 tag:
>
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=
> 7774b0da8ead0d9edd1d4b2f7e1cd743af694112
>
>
> * Documentation:
> Note the documentation can't be pushed live due to changes that will not go
> live until the release. You can manually verify by downloading
> http://home.apache.org/~guozhang/kafka-1.0.0-rc3/
> kafka_2.11-1.0.0-site-docs.tgz
>
> I will update this thread with up coming Jenkins builds for this RC later,
> they are currently being executed and will be done tomorrow.
>
>
> /**
>
>
> Thanks,
> -- Guozhang
>
> --
> You received this message because you are subscribed to the Google Groups
> "kafka-clients" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kafka-clients+unsubscr...@googlegroups.com.
> To post to this group, send email to kafka-clie...@googlegroups.com.
> Visit this group at https://groups.google.com/group/kafka-clients.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/kafka-clients/CAHwHRrU8QrK7cPSRAj7uaEQ1vgnwv
> o8Y5rJxa1-54dLqxLAsHw%40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

Re: [kafka-clients] [VOTE] 1.0.0 RC2

2017-10-23 Thread Dana Powers

1.0.0-RC2 passed all kafka-python integration tests. Excited for this
release -- great work everyone!

-Dana

On Tue, Oct 17, 2017 at 9:47 AM, Guozhang Wang  wrote:

> Hello Kafka users, developers and client-developers,
>
> This is the third candidate for release of Apache Kafka 1.0.0. The main
> PRs that gets merged in after RC1 are the following:
>
> https://github.com/apache/kafka/commit/dc6bfa553e73ffccd1e604963e076c
> 78d8ddcd69
>
> It's worth noting that starting in this version we are using a different
> version protocol with three digits: *major.minor.bug-fix*
>
> Any and all testing is welcome, but the following areas are worth
> highlighting:
>
> 1. Client developers should verify that their clients can produce/consume
> to/from 1.0.0 brokers (ideally with compressed and uncompressed data).
> 2. Performance and stress testing. Heroku and LinkedIn have helped with
> this in the past (and issues have been found and fixed).
> 3. End users can verify that their apps work correctly with the new
> release.
>
> This is a major version release of Apache Kafka. It includes 29 new KIPs.
> See the release notes and release plan 
> (*https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=71764913
> *)
> for more details. A few feature highlights:
>
> * Java 9 support with significantly faster TLS and CRC32C implementations
> * JBOD improvements: disk failure only disables failed disk but not the
> broker (KIP-112/KIP-113)
> * Controller improvements: async ZK access for faster administrative
> request handling
> * Newly added metrics across all the modules (KIP-164, KIP-168, KIP-187,
> KIP-188, KIP-196)
> * Kafka Streams API improvements (KIP-120 / 130 / 138 / 150 / 160 / 161),
> and drop compatibility "Evolving" annotations
>
> Release notes for the 1.0.0 release:
> *http://home.apache.org/~guozhang/kafka-1.0.0-rc2/RELEASE_NOTES.html
> *
>
>
>
> *** Please download, test and vote by Friday, October 20, 8pm PT
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
> http://kafka.apache.org/KEYS
>
> * Release artifacts to be voted upon (source and binary):
> *http://home.apache.org/~guozhang/kafka-1.0.0-rc2/
> *
>
> * Maven artifacts to be voted upon:
> https://repository.apache.org/content/groups/staging/org/apache/kafka/
>
> * Javadoc:
> *http://home.apache.org/~guozhang/kafka-1.0.0-rc2/javadoc/
> *
>
> * Tag to be voted upon (off 1.0 branch) is the 1.0.0-rc2 tag:
>
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=
> 51d5f12e190a38547839c7d2710c97faaeaca586
>
> * Documentation:
> Note the documentation can't be pushed live due to changes that will not go
> live until the release. You can manually verify by downloading
> http://home.apache.org/~guozhang/kafka-1.0.0-rc2/
> kafka_2.11-1.0.0-site-docs.tgz
>
> * Successful Jenkins builds for the 1.0.0 branch:
> Unit/integration tests: https://builds.apache.org/job/kafka-1.0-jdk7/40/
> System test: https://jenkins.confluent.io/job/system-test-kafka-1.0/6/
>
>
> /**
>
>
> Thanks,
> -- Guozhang
>
> --
> You received this message because you are subscribed to the Google Groups
> "kafka-clients" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kafka-clients+unsubscr...@googlegroups.com.
> To post to this group, send email to kafka-clie...@googlegroups.com.
> Visit this group at https://groups.google.com/group/kafka-clients.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/kafka-clients/CAHwHRrXD0nLUqFV0HV_Mtz5eY%
> 2B2RhXCSk_xX1FPkHV%3D0s6u7pQ%40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

[jira] [Created] (KAFKA-5465) FetchResponse v0 does not return any messages when max_bytes smaller than v2 message set

2017-06-17 Thread Dana Powers (JIRA)

Dana Powers created KAFKA-5465:
--

 Summary: FetchResponse v0 does not return any messages when 
max_bytes smaller than v2 message set 
 Key: KAFKA-5465
 URL: https://issues.apache.org/jira/browse/KAFKA-5465
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.11.0.0
Reporter: Dana Powers
Priority: Minor


In prior releases, when consuming uncompressed messages FetchResponse v0 will 
returns a message if it is smaller than the max_bytes sent in the FetchRequest. 
In 0.11.0.0 RC0, when messages are stored as v2 internally, the response will 
be empty unless the full message set is smaller than max_bytes. In some 
configurations, this may cause some old consumers to get stuck on large 
messages where previously they were able to make progress one message at a time.

For example, when I produce 10 5KB messages using ProduceRequest v0 and then 
attempt FetchRequest v0 with partition max bytes = 6KB (larger than a single 
message but smaller than all 10 messages together), I get an empty message set 
from 0.11.0.0. Previous brokers would have returned a single message.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Re: [DISCUSS] KIP-144: Exponential backoff for broker reconnect attempts

2017-05-08 Thread Dana Powers

s/back-office/backoff/

Final note: although the goal here is not to resolve contention (as in the
aws article), I think we do still want a relatively smooth rate of
reconnects across all clients to avoid storm spikes. Full Jitter does that.
I expect that narrower jitter bands will lead to more clumping of
reconnects, but that's maybe ok.

Another idea would be to make jitter configurable. Full jitter would be
100%. No jitter 0%. Equal Jitter 50%. Etc.

On May 8, 2017 5:28 PM, "Dana Powers" <dana.pow...@gmail.com> wrote:

> For some discussion of jitter and exponential back-office, I found this
> article useful:
>
> https://www.awsarchitectureblog.com/2015/03/backoff.html
>
> My initial POC used the "Full Jitter" approach described therein. Equal
> Jitter is good too, and may perform a little better. It is random
> distribution between 50% and 100% of calculated backoff.
>
> Dana
>
> On May 4, 2017 8:50 PM, "Ismael Juma" <ism...@juma.me.uk> wrote:
>
>> Thanks for the feedback Gwen and Colin. I agree that the original formula
>> was not intuitive. I updated it to include a max jitter as was suggested.
>> I
>> also updated the config name to include `ms`:
>>
>> https://cwiki.apache.org/confluence/pages/diffpagesbyversion
>> .action?pageId=69408222=3=1
>>
>> If there are no other concerns, I will start the vote tomorrow.
>>
>> Ismael
>>
>> On Mon, May 1, 2017 at 6:18 PM, Colin McCabe <cmcc...@apache.org> wrote:
>>
>> > Thanks for the KIP, Ismael & Dana!  This could be pretty important for
>> > avoiding congestion collapse when there are a lot of clients.
>> >
>> > It seems like a good idea to keep the "ms" suffix, like we have with
>> > "reconnect.backoff.ms".  So maybe we should use
>> > "reconnect.backoff.max.ms"?  In general unitless timeouts can be the
>> > source of a lot of confusion (is it seconds, milliseconds, etc.?)
>> >
>> > It's good that the KIP inject random delays (jitter) into the timeout.
>> > As per Gwen's point, does it make sense to put an upper bound on the
>> > jitter, though?  If someone sets reconnect.backoff.max to 5 minutes,
>> > they probably would be a little surprised to find it doing three retries
>> > after 100 ms in a row (as it could under the current scheme.)  Maybe a
>> > maximum jitter configuration would help address that, and make the
>> > behavior a little more intuitive.
>> >
>> > best,
>> > Colin
>> >
>> >
>> > On Thu, Apr 27, 2017, at 09:39, Gwen Shapira wrote:
>> > > This is a great suggestion. I like how we just do it by default
>> instead
>> > > of
>> > > making it a choice users need to figure out.
>> > > Avoiding connection storms is great.
>> > >
>> > > One concern. If I understand the formula for effective maximum backoff
>> > > correctly, then with default maximum of 1000ms and default backoff of
>> > > 100ms, the effective maximum backoff will be 450ms rather than 1000ms.
>> > > This
>> > > isn't exactly intuitive.
>> > > I'm wondering if it makes more sense to allow "one last doubling"
>> which
>> > > may
>> > > bring us slightly over the maximum, but much closer to it. I.e. have
>> the
>> > > effective maximum be in [max.backoff - backoff, max.backoff + backoff]
>> > > range rather than half that. Does that make sense?
>> > >
>> > > Gwen
>> > >
>> > > On Thu, Apr 27, 2017 at 9:06 AM, Ismael Juma <ism...@juma.me.uk>
>> wrote:
>> > >
>> > > > Hi all,
>> > > >
>> > > > Dana Powers posted a PR a while back for exponential backoff for
>> broker
>> > > > reconnect attempts. Because it adds a config, a KIP is required and
>> > Dana
>> > > > seems to be busy so I posted it:
>> > > >
>> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> > > > 144%3A+Exponential+backoff+for+broker+reconnect+attempts
>> > > >
>> > > > Please take a look. Your feedback is appreciated.
>> > > >
>> > > > Thanks,
>> > > > Ismael
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > *Gwen Shapira*
>> > > Product Manager | Confluent
>> > > 650.450.2760 | @gwenshap
>> > > Follow us: Twitter <https://twitter.com/ConfluentInc> | blog
>> > > <http://www.confluent.io/blog>
>> >
>>
>

Re: [DISCUSS] KIP-144: Exponential backoff for broker reconnect attempts

2017-05-08 Thread Dana Powers

For some discussion of jitter and exponential back-office, I found this
article useful:

https://www.awsarchitectureblog.com/2015/03/backoff.html

My initial POC used the "Full Jitter" approach described therein. Equal
Jitter is good too, and may perform a little better. It is random
distribution between 50% and 100% of calculated backoff.

Dana

On May 4, 2017 8:50 PM, "Ismael Juma" <ism...@juma.me.uk> wrote:

> Thanks for the feedback Gwen and Colin. I agree that the original formula
> was not intuitive. I updated it to include a max jitter as was suggested. I
> also updated the config name to include `ms`:
>
> https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?
> pageId=69408222=3=1
>
> If there are no other concerns, I will start the vote tomorrow.
>
> Ismael
>
> On Mon, May 1, 2017 at 6:18 PM, Colin McCabe <cmcc...@apache.org> wrote:
>
> > Thanks for the KIP, Ismael & Dana!  This could be pretty important for
> > avoiding congestion collapse when there are a lot of clients.
> >
> > It seems like a good idea to keep the "ms" suffix, like we have with
> > "reconnect.backoff.ms".  So maybe we should use
> > "reconnect.backoff.max.ms"?  In general unitless timeouts can be the
> > source of a lot of confusion (is it seconds, milliseconds, etc.?)
> >
> > It's good that the KIP inject random delays (jitter) into the timeout.
> > As per Gwen's point, does it make sense to put an upper bound on the
> > jitter, though?  If someone sets reconnect.backoff.max to 5 minutes,
> > they probably would be a little surprised to find it doing three retries
> > after 100 ms in a row (as it could under the current scheme.)  Maybe a
> > maximum jitter configuration would help address that, and make the
> > behavior a little more intuitive.
> >
> > best,
> > Colin
> >
> >
> > On Thu, Apr 27, 2017, at 09:39, Gwen Shapira wrote:
> > > This is a great suggestion. I like how we just do it by default instead
> > > of
> > > making it a choice users need to figure out.
> > > Avoiding connection storms is great.
> > >
> > > One concern. If I understand the formula for effective maximum backoff
> > > correctly, then with default maximum of 1000ms and default backoff of
> > > 100ms, the effective maximum backoff will be 450ms rather than 1000ms.
> > > This
> > > isn't exactly intuitive.
> > > I'm wondering if it makes more sense to allow "one last doubling" which
> > > may
> > > bring us slightly over the maximum, but much closer to it. I.e. have
> the
> > > effective maximum be in [max.backoff - backoff, max.backoff + backoff]
> > > range rather than half that. Does that make sense?
> > >
> > > Gwen
> > >
> > > On Thu, Apr 27, 2017 at 9:06 AM, Ismael Juma <ism...@juma.me.uk>
> wrote:
> > >
> > > > Hi all,
> > > >
> > > > Dana Powers posted a PR a while back for exponential backoff for
> broker
> > > > reconnect attempts. Because it adds a config, a KIP is required and
> > Dana
> > > > seems to be busy so I posted it:
> > > >
> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > 144%3A+Exponential+backoff+for+broker+reconnect+attempts
> > > >
> > > > Please take a look. Your feedback is appreciated.
> > > >
> > > > Thanks,
> > > > Ismael
> > > >
> > >
> > >
> > >
> > > --
> > > *Gwen Shapira*
> > > Product Manager | Confluent
> > > 650.450.2760 | @gwenshap
> > > Follow us: Twitter <https://twitter.com/ConfluentInc> | blog
> > > <http://www.confluent.io/blog>
> >
>

Re: [VOTE] KIP-144: Exponential backoff for broker reconnect attempts

2017-05-06 Thread Dana Powers

+1 !

On May 6, 2017 4:49 AM, "Edoardo Comar"  wrote:

> +1 (non binding)
> thanks
> --
> Edoardo Comar
> IBM MessageHub
> eco...@uk.ibm.com
> IBM UK Ltd, Hursley Park, SO21 2JN
>
> IBM United Kingdom Limited Registered in England and Wales with number
> 741598 Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6
> 3AU
>
>
>
> From:   Jay Kreps 
> To: dev@kafka.apache.org
> Date:   05/05/2017 23:19
> Subject:Re: [VOTE] KIP-144: Exponential backoff for broker
> reconnect attempts
>
>
>
> +1
> On Fri, May 5, 2017 at 7:29 PM Sriram Subramanian 
> wrote:
>
> > +1
> >
> > On Fri, May 5, 2017 at 6:04 PM, Gwen Shapira  wrote:
> >
> > > +1
> > >
> > > On Fri, May 5, 2017 at 3:32 PM, Ismael Juma  wrote:
> > >
> > > > Hi all,
> > > >
> > > > Given the simple and non controversial nature of the KIP, I would
> like
> > to
> > > > start the voting process for KIP-144: Exponential backoff for broker
> > > > reconnect attempts:
> > > >
> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 144%3A+Exponential+
> > > > backoff+for+broker+reconnect+attempts
> > > >
> > > > The vote will run for a minimum of 72 hours.
> > > >
> > > > Thanks,
> > > > Ismael
> > > >
> > >
> > >
> > >
> > > --
> > > *Gwen Shapira*
> > > Product Manager | Confluent
> > > 650.450.2760 | @gwenshap
> > > Follow us: Twitter  | blog
> > > 
> > >
> >
>
>
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>

Re: [VOTE] Add REST Server to Apache Kafka

2016-10-26 Thread Dana Powers

-1


On Wed, Oct 26, 2016 at 9:23 AM, Shekar Tippur  wrote:
> +1
>
> Thanks

Re: [kafka-clients] [VOTE] 0.10.1.0 RC3

2016-10-17 Thread Dana Powers

+1 -- passes kafka-python integration tests

On Mon, Oct 17, 2016 at 1:28 PM, Jun Rao  wrote:
> Thanks for preparing the release. Verified quick start on scala 2.11 binary.
> +1
>
> Jun
>
> On Fri, Oct 14, 2016 at 4:29 PM, Jason Gustafson  wrote:
>>
>> Hello Kafka users, developers and client-developers,
>>
>> One more RC for 0.10.1.0. We're hoping this is the final one so that we
>> can meet the release target date of Oct. 17 (Monday). Please let me know as
>> soon as possible if you find any major problems.
>>
>> Release plan:
>> https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+0.10.1.
>>
>> Release notes for the 0.10.1.0 release:
>> http://home.apache.org/~jgus/kafka-0.10.1.0-rc3/RELEASE_NOTES.html
>>
>> *** Please download, test and vote by Monday, Oct 17, 5pm PT
>>
>> Kafka's KEYS file containing PGP keys we use to sign the release:
>> http://kafka.apache.org/KEYS
>>
>> * Release artifacts to be voted upon (source and binary):
>> http://home.apache.org/~jgus/kafka-0.10.1.0-rc3/
>>
>> * Maven artifacts to be voted upon:
>> https://repository.apache.org/content/groups/staging/
>>
>> * Javadoc:
>> http://home.apache.org/~jgus/kafka-0.10.1.0-rc3/javadoc/
>>
>> * Tag to be voted upon (off 0.10.1 branch) is the 0.10.1.0-rc3 tag:
>>
>> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=50f30a44f31fca1bd9189d2814388d51bd56b06b
>>
>> * Documentation:
>> http://kafka.apache.org/0101/documentation.html
>>
>> * Protocol:
>> http://kafka.apache.org/0101/protocol.html
>>
>> * Tests:
>> Unit tests: https://builds.apache.org/job/kafka-0.10.1-jdk7/71/
>> System tests:
>> http://testing.confluent.io/confluent-kafka-0-10-1-system-test-results/?prefix=2016-10-13--001.1476369986--apache--0.10.1--ee212d1/
>>
>> (Note that these tests do not include a couple patches merged today. I
>> will send links to updated test builds as soon as they are available)
>>
>> Thanks,
>>
>> Jason
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "kafka-clients" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to kafka-clients+unsubscr...@googlegroups.com.
>> To post to this group, send email to kafka-clie...@googlegroups.com.
>> Visit this group at https://groups.google.com/group/kafka-clients.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/kafka-clients/CAJDuW%3DBm0HCOjiHiwnW3WJ_i5u_0e4J2G_mZ_KBkB_WEmo7pNg%40mail.gmail.com.
>> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "kafka-clients" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kafka-clients+unsubscr...@googlegroups.com.
> To post to this group, send email to kafka-clie...@googlegroups.com.
> Visit this group at https://groups.google.com/group/kafka-clients.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/kafka-clients/CAFc58G_Atqyc7-O13EGnRNibng5UPo-a_2h00N2%3D%3DMtWktm%3D1g%40mail.gmail.com.
>
> For more options, visit https://groups.google.com/d/optout.

Re: [kafka-clients] [VOTE] 0.10.1.0 RC2

2016-10-12 Thread Dana Powers

+1; all kafka-python integration tests pass.

-Dana


On Wed, Oct 12, 2016 at 10:41 AM, Jason Gustafson  wrote:
> Hello Kafka users, developers and client-developers,
>
> One more RC for 0.10.1.0. I think we're getting close!
>
> Release plan:
> https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+0.10.1.
>
> Release notes for the 0.10.1.0 release:
> http://home.apache.org/~jgus/kafka-0.10.1.0-rc2/RELEASE_NOTES.html
>
> *** Please download, test and vote by Saturday, Oct 15, 11am PT
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
> http://kafka.apache.org/KEYS
>
> * Release artifacts to be voted upon (source and binary):
> http://home.apache.org/~jgus/kafka-0.10.1.0-rc2/
>
> * Maven artifacts to be voted upon:
> https://repository.apache.org/content/groups/staging/
>
> * Javadoc:
> http://home.apache.org/~jgus/kafka-0.10.1.0-rc2/javadoc/
>
> * Tag to be voted upon (off 0.10.1 branch) is the 0.10.1.0-rc2 tag:
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=8702d66434b86092a3738472f9186d6845ab0720
>
> * Documentation:
> http://kafka.apache.org/0101/documentation.html
>
> * Protocol:
> http://kafka.apache.org/0101/protocol.html
>
> * Tests:
> Unit tests: https://builds.apache.org/job/kafka-0.10.1-jdk7/68/
> System tests:
> http://confluent-kafka-0-10-1-system-test-results.s3-us-west-2.amazonaws.com/2016-10-11--001.1476197348--apache--0.10.1--d981dd2/
>
> Thanks,
>
> Jason
>
> --
> You received this message because you are subscribed to the Google Groups
> "kafka-clients" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kafka-clients+unsubscr...@googlegroups.com.
> To post to this group, send email to kafka-clie...@googlegroups.com.
> Visit this group at https://groups.google.com/group/kafka-clients.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/kafka-clients/CAJDuW%3DDk7Mi6ZsiniHcdbCCBdBhasjSeb7_N3EW%3D97OrfvFyew%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.

Re: [DISCUSS] KIP-82 - Add Record Headers

2016-10-07 Thread Dana Powers

> I agree with the critique of compaction not having a value. I think we should 
> consider fixing that directly.

Agree that the compaction issue is troubling: compacted "null" deletes
are incompatible w/ headers that must be packed into the message
value. Are there any alternatives on compaction delete semantics that
could address this? The KIP wiki discussion I think mostly assumes
that compaction-delete is what it is and can't be changed/fixed.

-Dana

On Fri, Oct 7, 2016 at 1:38 PM, Michael Pearce  wrote:
>
> Hi Jay,
>
> Thanks for the comments and feedback.
>
> I think its quite clear that if a problem keeps arising then it is clear that 
> it needs resolving, and addressing properly.
>
> Fair enough at linkedIn, and historically for the very first use cases 
> addressing this maybe not have been a big priority. But as Kafka is now 
> Apache open source and being picked up by many including my company, it is 
> clear and evident that this is a requirement and issue that needs to be now 
> addressed to address these needs.
>
> The fact in almost every transport mechanism including networking layers in 
> the enterprise ive worked in, there has always been headers i think clearly 
> shows their need and success for a transport mechanism.
>
> I understand some concerns with regards to impact for others not needing it.
>
> What we are proposing is flexible solution that provides no overhead on 
> storage or network traffic layers if you chose not to use headers, but does 
> enable those who need or want it to use it.
>
>
> On your response to 1), there is nothing saying that it should be put in any 
> faster or without diligence and the same KIP process can still apply for 
> adding kafka-scope headers, having headers, just makes it easier to add, 
> without constant message and record changes. Timestamp is a clear real 
> example of actually what should be in a header (along with other fields) but 
> as such the whole message/record object needed to be changed to add this, as 
> will any further headers deemed needed by kafka.
>
> On response to 2) why within my company as a platforms designer should i 
> enforce that all teams use the same serialization for their payloads? But 
> what i do need is some core cross cutting concerns and information addressed 
> at my platform level and i don't want to impose onto my development teams. 
> This is the same argument why byte[] is the exposed value and key because as 
> a messaging platform you dont want to impose that on my company.
>
> On response to 3) Actually this isnt true, there are many 3rd party tools, we 
> need to hook into our messaging flows that they only build onto standardised 
> interfaces as obviously the cost to have a custom implementation for every 
> company would be very high.
> APM tooling is a clear case in point, every enterprise level APM tool on the 
> market is able to stitch in transaction flow end 2 end over a platform over 
> http, jms because they can stitch in some "magic" data in a 
> uniform/standardised for the two mentioned they stitch this into the headers. 
> It is current form they cannot do this with Kafka. Providing a standardised 
> interface will i believe actually benefit the project as commercial companies 
> like these will now be able to plugin their tooling uniformly, making it 
> attractive and possible.
>
> Some of you other concerns as Joel mentions these are more implementation 
> details, that i think should be agreed upon, but i think can be addressed.
>
> e.g. re your concern on the hashmap.
> it is more than possible not to have every record have to have a hashmap 
> unless it actually has a header (just like we have managed to do on the 
> serialized meesage) so if theres a concern on the in memory record size for 
> those using kafka without headers.
>
> On your second to last comment about every team choosing their own format, 
> actually we do want this a little, as very first mentioned, no we don't want 
> a free for all, but some freedom to have different serialization has 
> different benefits and draw backs across our business. I can iterate these if 
> needed. One of the use case for headers provided by linkedIn on top of my KIP 
> even shows where headers could be beneficial here as a header could be used 
> to detail which data format the message is serialized to allowing me to 
> consume different formats.
>
> Also we have some systems that we need to integrate that pretty near 
> impossible to wrap or touch their binary payloads, or we’re not allowed to 
> touch them (historic system, or inter/intra corporate)
>
> Headers really gives as a solution to provide a pluggable platform, and 
> standardisation that allows users to build platforms that adapt to their 
> needs.
>
>
> Cheers
> Mike
>
>
> 
> From: Jay Kreps 
> Sent: Friday, October 7, 2016 4:45 PM
> To: dev@kafka.apache.org
> Subject: Re: [DISCUSS]

Consul / Zookeeper [was Re: any update on this?]

2016-09-19 Thread Dana Powers

[+ dev list]

I have not worked on KAFKA-1793 directly, but I believe most of the
work so far has been in removing all zookeeper dependencies from
clients. The two main areas for that are (1) consumer rebalancing, and
(2) administrative commands.

1) Consumer rebalancing APIs were added to the broker in 0.9. The "new
consumer" uses these apis and does not connect directly to zookeeper
to manage group leadership and rebalancing. So my understanding is
that this work is complete.

2) Admin commands are being converted to direct API calls instead of
direct zookeeper access with KIP-4. A small part of this project was
released in 0.10.0.0 and are open PRs for additional chunks that may
make it into 0.10.1.0 . If you are interested in helping or getting
involved, you can follow the KIP-4 discussions on the dev mailing
list.

When the client issues are completed I think the next step will be to
refactor the broker's zookeeper access (zkUtils) into an abstract
interface that could potentially be provided by consul or etcd. With
an interface in place, it should be possible to write an alternate
implementation of that interface for consul.

Hope this helps,

-Dana

On Mon, Sep 19, 2016 at 6:31 AM, Martin Gainty  wrote:
> Jens/Kant
> not aware of any shortfall with zookeeper so perhaps you can suggest 
> advantages for Consul vs Zookeeper?
> Maven (I am building, testing and running kafka internally with maven) 
> implements wagon-providers for URLConnection vs HttpURLConnection 
> wagonshttps://maven.apache.org/guides/mini/guide-wagon-providers.html
> Thinking a network_provider should work for integrating external network 
> provider. how would you architect this integration?
>
> would a configurable network-provider such as maven-wagon-provider work for 
> kafka?Martin
>
>> From: kanth...@gmail.com
>> To: us...@kafka.apache.org
>> Subject: Re: any update on this?
>> Date: Mon, 19 Sep 2016 09:41:10 +
>>
>> Yes ofcourse the goal shouldn't be moving towards consul. It should just be
>> flexible enough for users to pick any distributed coordinated system.
>>
>>
>>
>>
>>
>>
>> On Mon, Sep 19, 2016 2:23 AM, Jens Rantil jens.ran...@tink.se
>> wrote:
>> I think I read somewhere that the long-term goal is to make Kafka
>>
>> independent of Zookeeper alltogether. Maybe not worth spending time on
>>
>> migrating to Consul in that case.
>>
>>
>>
>>
>> Cheers,
>>
>> Jens
>>
>>
>>
>>
>> On Sat, Sep 17, 2016 at 10:38 PM Jennifer Fountain 
>>
>> wrote:
>>
>>
>>
>>
>> > +2 watching.
>>
>> >
>>
>> > On Sat, Sep 17, 2016 at 2:45 AM, kant kodali  wrote:
>>
>> >
>>
>> > > https://issues.apache.org/jira/browse/KAFKA-1793
>>
>> > > It would be great to use Consul instead of Zookeeper for Kafka and I
>>
>> > think
>>
>> > > it
>>
>> > > would benefit Kafka a lot from the exponentially growing consul
>>
>> > community.
>>
>> >
>>
>> >
>>
>> >
>>
>> >
>>
>> > --
>>
>> >
>>
>> >
>>
>> > Jennifer Fountain
>>
>> > DevOPS
>>
>> >
>>
>> --
>>
>>
>>
>>
>> Jens Rantil
>>
>> Backend Developer @ Tink
>>
>>
>>
>>
>> Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
>>
>> For urgent matters you can reach me at +46-708-84 18 32.
>

Re: [VOTE] 0.10.1 Release Plan

2016-09-13 Thread Dana Powers

+1

Re: Review request for KAFKA-3600

2016-08-12 Thread Dana Powers

I think one of the hard parts is that while we agree that a call to
ApiVersions is required, I don't think there is agreement on how to
use the response.

In kafka-python, for example, to support backwards compatibility we
set a single "api" configuration value during the client constructor.
This value is either set by the user explicitly via configuration or,
if unset by the user, inferred from the first bootstrap server that
responds. In either case we are choosing 1 "api" to apply to all
broker connections. We then use this configuration throughout the
client code to choose different feature paths (Zookeeper v. Kafka
offsets, for example; or whether to use group coordination apis v.
assign all partitions locally on 'subscribe'; etc).

We do not short-circuit requests that appear unsupported by a
particular broker based on this api configuration. We send all
requests normally. If the broker doesn't understand a request, we
expect it to fail normally. Of course, "normally" here means no
response and an abruptly closed socket, which I think could be
improved.

So KAFKA-3600. At its core, this revision implements a check on each
connection guaranteeing support for a configured api vector. If any
connection does not support that vector, the client raises a
KafkaException in the response handler.

I think this is definitely an improvement. Compatibility errors are
completely opaque right now and can be very difficult to diagnose for
users that do not have direct access to their kafka cluster / brokers,
and/or do not have a firm grasp of the internals of the kafka client
and kafka protocol. But I also don't think this error handling is core
to compatibility. While supporting backwards compatibility, we
continue to rely on standard error handling (indeed, we have no choice
since we support backwards compatibility to 0.8.0).

So I think the revision is attempting to add a few things that are
assumed to be building blocks for a compatibility layer:
(1) ApiVersions request / response handler;
(2) a new connection state,
(3) a hashmap of nodeApiVersions {node_id -> [api_versions]} .

I'm confident that #1 is necessary. Is a new connection state
necessary? Is a full map of all nodes -> supported api version
necessary? These aren't used in the kafka-python implementation, so
from my perspective the answer is no. But maybe the java community has
some interesting ideas for improvement where these are key
ingredients. In any case, I think it would probably help substantially
to get some agreement on where the goalposts are for compatibility to
help push this forward.

-Dana

On Fri, Aug 12, 2016 at 7:37 AM, Ashish Singh  wrote:
> Most of the work in the patch is to enhance NetworkClient to maintain api
> versions supported by brokers it has connections to. When a broker
> disconnects, its api versions info is removed and when it reconnects that
> info is fetched again. In short, with these changes Network Client at any
> given point knows what are the various brokers it is connected to and what
> are the various apis they support. This information is required to enable/
> select features based on brokers the client is talking to. I want to do the
> following in the order mentioned.
>
>- Enable clients to detect, and so inform applications, that they are
>not compatible with brokers they are trying to talk to, rather than getting
>connection dropped or similar not so sure response. This has been ready for
>some time, so we can definitely work to include in next release.
>- Use the api versions info from NetworkClient to enable/ disable
>feature sets, something similar to what librdkafka already has. This will
>need much broader testing.
>
> Note that both of the aforementioned steps have been discussed as part of
> KIP-35 discussions, but only first step was voted in. We will have to go
> through another KIP for second step.
>
> Any suggestions on how to make progress here will be helpful. Dana has left
> some comments and I will address them soon. However, I would appreciate if
> I can get a committer willing to help merging this in.
>
>
> On Thu, Aug 11, 2016 at 4:15 PM, Gwen Shapira  wrote:
>
>> I am 100% pro smart Java clients that support KIP-35 and can use it to
>> work with newer brokers. If this JIRA makes sense as a step in that
>> direction, I think its great and remove my objection.  I didn't see
>> anything that looked like a plan toward full forward-backward
>> compatibility, which is why I responded as I did...
>>
>> Verification is good, but it looked like there was much complexity
>> added toward very little benefits.
>>
>> On Thu, Aug 11, 2016 at 3:37 PM, Ashish Singh  wrote:
>> > Hey Gwen,
>> >
>> > I think this was more than a verification step, it was a building step
>> > towards a backwards compatible clients or for clients that can select
>> > feature based on brokers it is talking to. Are we now against the

Re: [VOTE] KIP-70: Revise Partition Assignment Semantics on New Consumer's Subscription Change

2016-08-09 Thread Dana Powers

+1 (still confused about bindings)

On Tue, Aug 9, 2016 at 6:24 AM, Ismael Juma  wrote:
> Thanks for the KIP. +1 (binding)
>
> Ismael
>
> On Mon, Aug 8, 2016 at 7:53 PM, Vahid S Hashemian > wrote:
>
>> I would like to initiate the voting process for KIP-70 (
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> 70%3A+Revise+Partition+Assignment+Semantics+on+New+
>> Consumer%27s+Subscription+Change
>> ).
>>
>> The only issue that was discussed in the discussion thread is
>> compatibility, but because it applies to an edge case, it is not expected
>> to impact existing users.
>> The proposal was shared with Spark and Storm users and no issue was raised
>> by those communities.
>>
>> Thanks.
>>
>> Regards,
>> --Vahid
>>
>>

Re: [VOTE] 0.10.0.1 RC2

2016-08-05 Thread Dana Powers

passed kafka-python integration tests, +1

-Dana


On Fri, Aug 5, 2016 at 9:35 AM, Tom Crayford  wrote:
> Heroku has tested this using the same performance testing setup we used to
> evaluate the impact of 0.9 -> 0.10 (see https://engineering.
> heroku.com/blogs/2016-05-27-apache-kafka-010-evaluating-
> performance-in-distributed-systems/).
>
> We see no issues at all with them, so +1 (non-binding) from here.
>
> On Fri, Aug 5, 2016 at 12:58 PM, Jim Jagielski  wrote:
>
>> Looks good here: +1
>>
>> > On Aug 4, 2016, at 9:54 AM, Ismael Juma  wrote:
>> >
>> > Hello Kafka users, developers and client-developers,
>> >
>> > This is the third candidate for the release of Apache Kafka 0.10.0.1.
>> This
>> > is a bug fix release and it includes fixes and improvements from 53 JIRAs
>> > (including a few critical bugs). See the release notes for more details:
>> >
>> > http://home.apache.org/~ijuma/kafka-0.10.0.1-rc2/RELEASE_NOTES.html
>> >
>> > When compared to RC1, RC2 contains a fix for a regression where an older
>> > version of slf4j-log4j12 was also being included in the libs folder of
>> the
>> > binary tarball (KAFKA-4008). Thanks to Manikumar Reddy for reporting the
>> > issue.
>> >
>> > *** Please download, test and vote by Monday, 8 August, 8am PT ***
>> >
>> > Kafka's KEYS file containing PGP keys we use to sign the release:
>> > http://kafka.apache.org/KEYS
>> >
>> > * Release artifacts to be voted upon (source and binary):
>> > http://home.apache.org/~ijuma/kafka-0.10.0.1-rc2/
>> >
>> > * Maven artifacts to be voted upon:
>> > https://repository.apache.org/content/groups/staging
>> >
>> > * Javadoc:
>> > http://home.apache.org/~ijuma/kafka-0.10.0.1-rc2/javadoc/
>> >
>> > * Tag to be voted upon (off 0.10.0 branch) is the 0.10.0.1-rc2 tag:
>> > https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=
>> f8f56751744ba8e55f90f5c4f3aed8c3459447b2
>> >
>> > * Documentation:
>> > http://kafka.apache.org/0100/documentation.html
>> >
>> > * Protocol:
>> > http://kafka.apache.org/0100/protocol.html
>> >
>> > * Successful Jenkins builds for the 0.10.0 branch:
>> > Unit/integration tests: *https://builds.apache.org/job
>> /kafka-0.10.0-jdk7/182/
>> > *
>> > System tests: *https://jenkins.confluent.io/
>> job/system-test-kafka-0.10.0/138/
>> > *
>> >
>> > Thanks,
>> > Ismael
>>
>>

Re: [kafka-clients] [VOTE] 0.10.0.1 RC0

2016-07-29 Thread Dana Powers

+1

tested against kafka-python integration test suite = pass.

Aside: as the scope of kafka gets bigger, it may be useful to organize
release notes into functional groups like core, brokers, clients,
kafka-streams, etc. I've found this useful when organizing
kafka-python release notes.

-Dana

On Fri, Jul 29, 2016 at 7:46 AM, Ismael Juma  wrote:
> Hello Kafka users, developers and client-developers,
>
> This is the first candidate for the release of Apache Kafka 0.10.0.1. This
> is a bug fix release and it includes fixes and improvements from 50 JIRAs
> (including a few critical bugs). See the release notes for more details:
>
> http://home.apache.org/~ijuma/kafka-0.10.0.1-rc0/RELEASE_NOTES.html
>
> *** Please download, test and vote by Monday, 1 August, 8am PT ***
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
> http://kafka.apache.org/KEYS
>
> * Release artifacts to be voted upon (source and binary):
> http://home.apache.org/~ijuma/kafka-0.10.0.1-rc0/
>
> * Maven artifacts to be voted upon:
> https://repository.apache.org/content/groups/staging
>
> * Javadoc:
> http://home.apache.org/~ijuma/kafka-0.10.0.1-rc0/javadoc/
>
> * Tag to be voted upon (off 0.10.0 branch) is the 0.10.0.1-rc0 tag:
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=0c2322c2cf7ab7909cfd8b834d1d2fffc34db109
>
> * Documentation:
> http://kafka.apache.org/0100/documentation.html
>
> * Protocol:
> http://kafka.apache.org/0100/protocol.html
>
> * Successful Jenkins builds for the 0.10.0 branch:
> Unit/integration tests: https://builds.apache.org/job/kafka-0.10.0-jdk7/170/
> System tests: https://jenkins.confluent.io/job/system-test-kafka-0.10.0/130/
>
> Thanks,
> Ismael
>
> --
> You received this message because you are subscribed to the Google Groups
> "kafka-clients" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kafka-clients+unsubscr...@googlegroups.com.
> To post to this group, send email to kafka-clie...@googlegroups.com.
> Visit this group at https://groups.google.com/group/kafka-clients.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/kafka-clients/CAD5tkZYz8fbLAodpqKg5eRiCsm4ze9QK3ufTz3Q4U%3DGs0CRb1A%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.

Re: [DISCUSS] KIP-70: Revise Partition Assignment Semantics on New Consumer's Subscription Change

2016-07-22 Thread Dana Powers

This is a nice change. Great KIP write up.

-Dana

On Fri, Jul 22, 2016 at 10:07 AM, Vahid S Hashemian
 wrote:
> Thanks Ismael.
>
> What do you think is the best way to check with Storm / Spark users? Their
> mailing list?
>
> Thanks.
>
> Regards,
> --Vahid
>
>
>
>
> From:   Ismael Juma 
> To: dev@kafka.apache.org
> Date:   07/22/2016 01:44 AM
> Subject:Re: [DISCUSS] KIP-70: Revise Partition Assignment
> Semantics on New Consumer's Subscription Change
> Sent by:isma...@gmail.com
>
>
>
> Thanks for the KIP Vahid. The change makes sense. On the compatibility
> front, could we check some of the advanced Kafka users like Storm and
> Spark
> in order to verify if they would be affected?
>
> Ismael
>
> On Wed, Jul 20, 2016 at 1:55 AM, Vahid S Hashemian <
> vahidhashem...@us.ibm.com> wrote:
>
>> Hi all,
>>
>> We have started a new KIP under
>>
>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-70%3A+Revise+Partition+Assignment+Semantics+on+New+Consumer%27s+Subscription+Change
>
>>
>> Your feedback is much appreciated.
>>
>> Regards,
>> Vahid Hashemian
>>
>>
>
>
>
>

Re: [DISCUSS] KIP-53 Add custom policies for reconnect attempts to NetworkdClient

2016-07-08 Thread Dana Powers

Thanks for the confirmation, Ismael. I will write something up for further
discussion.

-Dana
On Jul 5, 2016 4:43 PM, "Ismael Juma" <ism...@juma.me.uk> wrote:

> Hi Dana,
>
> Thanks for the PR. Technically, a (simple) KIP is required when proposing
> new configs.
>
> Ismael
>
> On Sun, Jun 19, 2016 at 7:42 PM, Dana Powers <dana.pow...@gmail.com>
> wrote:
>
> > I took a stab at implementing a simplified exponential + randomized
> > backoff policy here: https://github.com/apache/kafka/pull/1523
> >
> > Rather than changing public interfaces / using plugins, it just adds a
> > new client configuration "reconnect.backoff.max" that can be used to
> > enable increasing backoff when node failures repeat. If this
> > configuration is not set higher than reconnect.backoff.ms then the
> > current constant backoff policy is retained. The default is to
> > continue w/ current 50ms constant backoff.
> >
> > Thoughts? Would a change like this require a KIP?
> >
> > -Dana
> >
> >
> > On Mon, May 2, 2016 at 10:29 AM, Guozhang Wang <wangg...@gmail.com>
> wrote:
> > > For the specific problem of connection storm, randomized with normal
> > > distribution at specified mean as "reconnect.backoff.ms" has been
> proved
> > > pretty well. The most recent usage of it in my mind is RAFT, and it
> turns
> > > out pretty effective in eliminating leader-election storms.
> > >
> > >
> > > Guozhang
> > >
> > > On Fri, Apr 29, 2016 at 8:57 PM, Ewen Cheslack-Postava <
> > e...@confluent.io>
> > > wrote:
> > >
> > >> I'll agree w/ Jay and point out that the implementations of
> > >> ReconnectionPolicy provided built-in with that driver are Constant,
> > >> Exponential, and Counting. Constant and exponential can be combined
> with
> > >> the right set of policy config parameters. I'm curious if there's a
> real
> > >> need for something else, or if you're just looking for something
> > >> exponential instead of non-constant? I think a fixed exponential
> backoff
> > >> policy that defaults parameters to the current constant backoff policy
> > >> would probably satisfy our needs.
> > >>
> > >> On Mon, Apr 4, 2016 at 1:25 PM, Jay Kreps <j...@confluent.io> wrote:
> > >>
> > >> > If I understand the problem we are fixing is a connection storm
> where
> > >> when
> > >> > a new broker comes online it is overwhelmed with connections.
> > >> >
> > >> > In general we try hard to avoid plugins where possible. Maybe
> instead
> > of
> > >> > adding another plugin interface we could just directly solve this
> > problem
> > >> > by doing some randomization in the backoff to space out the
> > >> reconnections?
> > >> > This seems like it would be good for anyone with a large client
> > >> > environment?
> > >> >
> > >> > -Jay
> > >> >
> > >> > On Mon, Apr 4, 2016 at 12:54 PM, Florian Hussonnois <
> > >> fhussonn...@gmail.com
> > >> > >
> > >> > wrote:
> > >> >
> > >> > > Hi Kafka Team,
> > >> > >
> > >> > > I have made a new Kafka Improvement Proposal.
> > >> > >
> > >> > >
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-53+-+Add+custom+policies+for+reconnect+attempts+to+NetworkdClient
> > >> > >
> > >> > > This is my first proposal so I don't know if I have given enough
> > >> > > information.
> > >> > > Also I have already proposed an implementation :
> > >> > > https://github.com/apache/kafka/pull/1179
> > >> > >
> > >> > > Thanks
> > >> > >
> > >> > > --
> > >> > > Florian HUSSONNOIS
> > >> > >
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >> Thanks,
> > >> Ewen
> > >>
> > >
> > >
> > >
> > > --
> > > -- Guozhang
> >
>

Re: [VOTE] KIP-4 Delete Topics Schema

2016-06-24 Thread Dana Powers

+1

Re: [VOTE] KIP-4 Create Topics Schema

2016-06-20 Thread Dana Powers

+1 -- thanks for the update

On Mon, Jun 20, 2016 at 10:49 AM, Grant Henke  wrote:
> I have update the patch and wiki based on the feedback in the discussion
> thread. The only change is that instead of logging and disconnecting in the
> case of invalid messages (duplicate topics or both arguments) we now return
> and InvalidRequest error back to the client for that topic.
>
> I would like to restart the vote now including that change. If you have
> already voted, please revote in this thread.
>
> Thank you,
> Grant
>
> On Sun, Jun 19, 2016 at 8:57 PM, Ewen Cheslack-Postava 
> wrote:
>
>> Don't necessarily want to add noise here, but I'm -1 based on the
>> disconnect part. See discussion in other thread. (I'm +1 otherwise, and
>> happy to have my vote applied assuming we clean up that one issue.)
>>
>> -Ewen
>>
>> On Thu, Jun 16, 2016 at 6:05 PM, Harsha  wrote:
>>
>> > +1 (binding)
>> > Thanks,
>> > Harsha
>> >
>> > On Thu, Jun 16, 2016, at 04:15 PM, Guozhang Wang wrote:
>> > > +1.
>> > >
>> > > On Thu, Jun 16, 2016 at 3:47 PM, Ismael Juma 
>> wrote:
>> > >
>> > > > +1 (binding)
>> > > >
>> > > > On Thu, Jun 16, 2016 at 11:50 PM, Grant Henke 
>> > wrote:
>> > > >
>> > > > > I would like to initiate the voting process for the "KIP-4 Create
>> > Topics
>> > > > > Schema changes". This is not a vote for all of KIP-4, but
>> > specifically
>> > > > for
>> > > > > the create topics changes. I have included the exact changes below
>> > for
>> > > > > clarity:
>> > > > > >
>> > > > > > Create Topics Request (KAFKA-2945
>> > > > > > )
>> > > > > >
>> > > > > > CreateTopics Request (Version: 0) => [create_topic_requests]
>> > timeout
>> > > > > >   create_topic_requests => topic num_partitions
>> replication_factor
>> > > > > [replica_assignment] [configs]
>> > > > > > topic => STRING
>> > > > > > num_partitions => INT32
>> > > > > > replication_factor => INT16
>> > > > > > replica_assignment => partition_id [replicas]
>> > > > > >   partition_id => INT32
>> > > > > >   replicas => INT32
>> > > > > > configs => config_key config_value
>> > > > > >   config_key => STRING
>> > > > > >   config_value => STRING
>> > > > > >   timeout => INT32
>> > > > > >
>> > > > > > CreateTopicsRequest is a batch request to initiate topic creation
>> > with
>> > > > > > either predefined or automatic replica assignment and optionally
>> > topic
>> > > > > > configuration.
>> > > > > >
>> > > > > > Request semantics:
>> > > > > >
>> > > > > >1. Must be sent to the controller broker
>> > > > > >2. If there are multiple instructions for the same topic in
>> one
>> > > > > >request an InvalidRequestException will be logged on the
>> broker
>> > and
>> > > > > the
>> > > > > >client will be disconnected.
>> > > > > >   - This is because the list of topics is modeled server side
>> > as a
>> > > > > >   map with TopicName as the key
>> > > > > >3. The principal must be authorized to the "Create" Operation
>> > on the
>> > > > > >"Cluster" resource to create topics.
>> > > > > >   - Unauthorized requests will receive a
>> > > > > ClusterAuthorizationException
>> > > > > >4.
>> > > > > >
>> > > > > >Only one from ReplicaAssignment or (num_partitions +
>> > > > > replication_factor
>> > > > > >), can be defined in one instruction.
>> > > > > >- If both parameters are specified an InvalidRequestException
>> > will
>> > > > be
>> > > > > >   logged on the broker and the client will be disconnected.
>> > > > > >   - In the case ReplicaAssignment is defined number of
>> > partitions
>> > > > and
>> > > > > >   replicas will be calculated from the supplied
>> > replica_assignment.
>> > > > > >   - In the case of defined (num_partitions +
>> > replication_factor)
>> > > > > >   replica assignment will be automatically generated by the
>> > server.
>> > > > > >   - One or the other must be defined. The existing broker
>> side
>> > auto
>> > > > > >   create defaults will not be used
>> > > > > >   (default.replication.factor, num.partitions). The client
>> > > > > implementation can
>> > > > > >   have defaults for these options when generating the
>> messages.
>> > > > > >   - The first replica in [replicas] is assumed to be the
>> > preferred
>> > > > > >   leader. This matches current behavior elsewhere.
>> > > > > >5. Setting a timeout > 0 will allow the request to block until
>> > the
>> > > > > >topic metadata is "complete" on the controller node.
>> > > > > >   - Complete means the local topic metadata cache been
>> > completely
>> > > > > >   populated and all partitions have leaders
>> > > > > >  - The topic metadata is updated when the controller
>> sends
>> > out
>> > > > > >  update metadata requests to the brokers
>> > > > > >

[jira] [Commented] (KAFKA-3878) Support exponential backoff for broker reconnect attempts

2016-06-20 Thread Dana Powers (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15339941#comment-15339941
 ] 

Dana Powers commented on KAFKA-3878:


Proposed implementation at https://github.com/apache/kafka/pull/1523

> Support exponential backoff for broker reconnect attempts
> -
>
> Key: KAFKA-3878
> URL: https://issues.apache.org/jira/browse/KAFKA-3878
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients, network
>    Reporter: Dana Powers
>
> The client currently uses a constant backoff policy, configured via 
> 'reconnect.backoff.ms' . To reduce network load during longer broker outages, 
> it would be useful to support an optional exponentially increasing backoff 
> policy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (KAFKA-3878) Support exponential backoff for broker reconnect attempts

2016-06-20 Thread Dana Powers (JIRA)

Dana Powers created KAFKA-3878:
--

 Summary: Support exponential backoff for broker reconnect attempts
 Key: KAFKA-3878
 URL: https://issues.apache.org/jira/browse/KAFKA-3878
 Project: Kafka
  Issue Type: Improvement
  Components: clients, network
Reporter: Dana Powers


The client currently uses a constant backoff policy, configured via 
'reconnect.backoff.ms' . To reduce network load during longer broker outages, 
it would be useful to support an optional exponentially increasing backoff 
policy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [DISCUSS] KIP-53 Add custom policies for reconnect attempts to NetworkdClient

2016-06-19 Thread Dana Powers

I took a stab at implementing a simplified exponential + randomized
backoff policy here: https://github.com/apache/kafka/pull/1523

Rather than changing public interfaces / using plugins, it just adds a
new client configuration "reconnect.backoff.max" that can be used to
enable increasing backoff when node failures repeat. If this
configuration is not set higher than reconnect.backoff.ms then the
current constant backoff policy is retained. The default is to
continue w/ current 50ms constant backoff.

Thoughts? Would a change like this require a KIP?

-Dana


On Mon, May 2, 2016 at 10:29 AM, Guozhang Wang  wrote:
> For the specific problem of connection storm, randomized with normal
> distribution at specified mean as "reconnect.backoff.ms" has been proved
> pretty well. The most recent usage of it in my mind is RAFT, and it turns
> out pretty effective in eliminating leader-election storms.
>
>
> Guozhang
>
> On Fri, Apr 29, 2016 at 8:57 PM, Ewen Cheslack-Postava 
> wrote:
>
>> I'll agree w/ Jay and point out that the implementations of
>> ReconnectionPolicy provided built-in with that driver are Constant,
>> Exponential, and Counting. Constant and exponential can be combined with
>> the right set of policy config parameters. I'm curious if there's a real
>> need for something else, or if you're just looking for something
>> exponential instead of non-constant? I think a fixed exponential backoff
>> policy that defaults parameters to the current constant backoff policy
>> would probably satisfy our needs.
>>
>> On Mon, Apr 4, 2016 at 1:25 PM, Jay Kreps  wrote:
>>
>> > If I understand the problem we are fixing is a connection storm where
>> when
>> > a new broker comes online it is overwhelmed with connections.
>> >
>> > In general we try hard to avoid plugins where possible. Maybe instead of
>> > adding another plugin interface we could just directly solve this problem
>> > by doing some randomization in the backoff to space out the
>> reconnections?
>> > This seems like it would be good for anyone with a large client
>> > environment?
>> >
>> > -Jay
>> >
>> > On Mon, Apr 4, 2016 at 12:54 PM, Florian Hussonnois <
>> fhussonn...@gmail.com
>> > >
>> > wrote:
>> >
>> > > Hi Kafka Team,
>> > >
>> > > I have made a new Kafka Improvement Proposal.
>> > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-53+-+Add+custom+policies+for+reconnect+attempts+to+NetworkdClient
>> > >
>> > > This is my first proposal so I don't know if I have given enough
>> > > information.
>> > > Also I have already proposed an implementation :
>> > > https://github.com/apache/kafka/pull/1179
>> > >
>> > > Thanks
>> > >
>> > > --
>> > > Florian HUSSONNOIS
>> > >
>> >
>>
>>
>>
>> --
>> Thanks,
>> Ewen
>>
>
>
>
> --
> -- Guozhang

Consumer autocommit interference via unintentional/internal poll() calls

2016-06-19 Thread Dana Powers

I searched through jira and the mailing list for prior discussion of
this and could not find any. Forgive me if I missed it, and if so
please send a link!

It was raised in the kafka-python issue list by an astute reader that
the KafkaConsumer autocommit semantics can be accidentally broken by
consumer methods that themselves call poll(), triggering background
tasks like AutoCommitTask inadvertently.

Normally, the autocommit semantics say that message offsets will not
be committed (ack) until after the consumer has processed them. Common
pattern in pseudocode would be:

```
while True:
batch = consumer.poll();
for message in batch:
process(message);
# failure here should block acks for all messages since last poll()
```

This is a good at-least-once-delivery model.

But so the problem raised is that if during message processing the
user were to call a consumer method that does network requests via
poll(), then it is possible that the AutoCommitTask could be called
prematurely and messages returned in the last batch could be
committed/acked before processing completes. Such methods appear to
include: consumer.listTopics, consumer.position,
consumer.partitionsFor. The problem then is that if there is a failure
after one of these methods but before message processing completes,
those messages will have been auto-committed and will not be
reprocessed.

Has this issue been discussed before? Any thoughts on how to address?

-Dana

Re: [DISCUSS] KIP-4 Create Topic Schema

2016-06-17 Thread Dana Powers

I'm unconvinced (crazy, right?). Comments below:

On Fri, Jun 17, 2016 at 7:27 AM, Grant Henke <ghe...@cloudera.com> wrote:
> Hi Dana,
>
> You mentioned one of the reasons I error and disconnect. Because I can't
> return an error for every request so the cardinality between request and
> response would be different. Beyond that though, I am handling this
> protocol rule/parsing error the same way all other messages do.

But you can return an error for every topic, and isn't that the level
of error required here?

> CreateTopic Response (Version: 0) => [topic_error_codes]
>   topic_error_codes => topic error_code
> topic => STRING
> error_code => INT16

If I submit duplicate requests for a topic, it's an error isolated to
that topic. If I mess up the partition / replication / etc semantics
for a topic, that's an error isolated to that topic. Is there a
cardinality problem at this level?


>
> Parsing is handled in the RequestChannel and any exception that occurs
> during this phase is caught, converted into an InvalidRequestException and
> results in a disconnect:
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/network/RequestChannel.scala#L92-L95
>
> Since this is an error that could only occur (and would always occur) due
> to incorrect client implementations, and not because of any cluster state
> or unusual situation, I felt this behavior was okay and made sense. For
> client developers the broker logging should make it obvious what the issue
> is. My patch also clearly documents the protocol rules in the Protocol
> definition.

Documentation is great and definitely a must. But requiring client
developers to dig through server logs is not ideal. Client developers
don't always have direct access to those logs. And even if they do,
the brokers may have other traffic, which makes it difficult to track
down the exact point in the logs where the error occurred.

As discussed above, I don't think you need to or should model this as
a request-level parsing error. It may be easier for the current broker
implementation to do that and just crash the connection, but I don't
think it makes that much sense from a raw api perspective.

> In the future having a response header with an error code (and optimally
> error message) for every response would help solve this problem (for all
> message types).

That will definitely help solve the more general invalid request error
problem. But I think given the current state of error handling /
feedback from brokers on request-level errors, you should treat
connection crash as a last resort. I think there is a good opportunity
to avoid it in this case, and I think the api would be better if done
that way.

-Dana

> On Fri, Jun 17, 2016 at 12:04 AM, Dana Powers <dana.pow...@gmail.com> wrote:
>
>> Why disconnect the client on a InvalidRequestException? The 2 errors
>> you are catching are both topic-level: (1) multiple requests for the
>> same topic, and (2) ReplicaAssignment and num_partitions /
>> replication_factor both set. Wouldn't it be better to just error the
>> offending create_topic_request, not the entire connection? The
>> CreateTopicsResponse returns a map of topics to error codes. You could
>> just return the topic that caused the error and an
>> InvalidRequestException error code.
>>
>> -Dana
>>
>> On Thu, Jun 16, 2016 at 8:37 AM, Grant Henke <ghe...@cloudera.com> wrote:
>> > I have updated the wiki and pull request based on the feedback. If there
>> > are no objections I will start a vote at the end of the day.
>> >
>> > Details for this implementation can be read here:
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-CreateTopicRequest
>> >
>> > The updated pull request can be found here (feel free to review):
>> > https://github.com/apache/kafka/pull/1489
>> >
>> > Below is the exact content for clarity:
>> >
>> >> Create Topics Request (KAFKA-2945
>> >> <https://issues.apache.org/jira/browse/KAFKA-2945>)
>> >>
>> >>
>> >>
>> >> CreateTopics Request (Version: 0) => [create_topic_requests] timeout
>> >>   create_topic_requests => topic num_partitions replication_factor
>> [replica_assignment] [configs]
>> >> topic => STRING
>> >> num_partitions => INT32
>> >> replication_factor => INT16
>> >> replica_assignment => partition_id [replicas]
>> >>   partition_id => INT32
>> >>   replicas => INT32
>> >&g

Re: [DISCUSS] KIP-4 Create Topic Schema

2016-06-16 Thread Dana Powers

as
>>   valid and the topic creation was triggered.
>>7. The request is not transactional.
>>   1. If an error occurs on one topic, the others could still be
>>   created.
>>   2. Errors are reported independently.
>>
>> QA:
>>
>>- Why is CreateTopicsRequest a batch request?
>>   - Scenarios where tools or admins want to create many topics should
>>   be able to with fewer requests
>>   - Example: MirrorMaker may want to create the topics downstream
>>- What happens if some topics error immediately? Will it
>>return immediately?
>>   - The request will block until all topics have either been created,
>>   errors, or the timeout has been hit
>>   - There is no "short circuiting" where 1 error stops the other
>>   topics from being created
>>- Why implement "partial blocking" instead of fully async or fully
>>consistent?
>>   - See Cluster Consistent Blocking
>>   
>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-cluster-consistent-blocking>
>>below
>>- Why require the request to go to the controller?
>>   - The controller is responsible for the cluster metadata and
>>   its propagation
>>   - See Request Forwarding
>>   
>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-request>
>>below
>>
>> Create Topics Response
>>
>>
>>
>> CreateTopics Response (Version: 0) => [topic_error_codes]
>>   topic_error_codes => topic error_code
>> topic => STRING
>> error_code => INT16
>>
>> CreateTopicsResponse contains a map between topic and topic creation
>> result error code (see New Protocol Errors
>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-NewProtocolErrors>
>> ).
>>
>>
> Thank you,
> Grant
>
>
> On Wed, Jun 15, 2016 at 4:11 PM, Grant Henke <ghe...@cloudera.com> wrote:
>
>> Turns out we already have an InvalidRequestException:
>> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/network/RequestChannel.scala#L75-L98
>>
>> We just don't map it in Errors.java so it results in an UNKNOWN error when
>> sent back to the client.
>>
>> I will migrate the InvalidRequestException to the client package, add it
>> to Errors and use that to signify any protocol parsing/rule errors.
>>
>>
>>
>> On Wed, Jun 15, 2016 at 2:55 PM, Dana Powers <dana.pow...@gmail.com>
>> wrote:
>>
>>> On Wed, Jun 15, 2016 at 12:19 PM, Ismael Juma <ism...@juma.me.uk> wrote:
>>> > Hi Grant,
>>> >
>>> > Comments below.
>>> >
>>> > On Wed, Jun 15, 2016 at 6:52 PM, Grant Henke <ghe...@cloudera.com>
>>> wrote:
>>> >>
>>> >> The one thing I want to avoid is to many super specific error codes. I
>>> am
>>> >> not sure how much of a problem it really is but in the case of wire
>>> >> protocol errors like multiple instances of the same topic, do you have
>>> any
>>> >> thoughts on the error? Should we make a generic InvalidRequest error
>>> and
>>> >> log the detailed message on the broker for client authors to debug?
>>> >>
>>> >
>>> > That is a good question. It would be good to get input from client
>>> > developers like Dana on this.
>>>
>>> I think generic error codes are fine if the wire protocol requirements
>>> are documented [i.e., no duplicate topics and partitions/replicas are
>>> either/or not both]. If I get a broker error at the protocol level
>>> that I don't understand, the first place I look is the protocol docs.
>>> It may cause a few more emails to the mailing lists asking for
>>> clarification, but I think those will be easier to triage than
>>> confused emails like "I said create topic with 10 partitions, but I
>>> only got 5???"
>>>
>>> -Dana
>>>
>>
>>
>>
>> --
>> Grant Henke
>> Software Engineer | Cloudera
>> gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke
>>
>
>
>
> --
> Grant Henke
> Software Engineer | Cloudera
> gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke

Re: [DISCUSS] KIP-4 Create Topic Schema

2016-06-15 Thread Dana Powers

On Wed, Jun 15, 2016 at 12:19 PM, Ismael Juma  wrote:
> Hi Grant,
>
> Comments below.
>
> On Wed, Jun 15, 2016 at 6:52 PM, Grant Henke  wrote:
>>
>> The one thing I want to avoid is to many super specific error codes. I am
>> not sure how much of a problem it really is but in the case of wire
>> protocol errors like multiple instances of the same topic, do you have any
>> thoughts on the error? Should we make a generic InvalidRequest error and
>> log the detailed message on the broker for client authors to debug?
>>
>
> That is a good question. It would be good to get input from client
> developers like Dana on this.

I think generic error codes are fine if the wire protocol requirements
are documented [i.e., no duplicate topics and partitions/replicas are
either/or not both]. If I get a broker error at the protocol level
that I don't understand, the first place I look is the protocol docs.
It may cause a few more emails to the mailing lists asking for
clarification, but I think those will be easier to triage than
confused emails like "I said create topic with 10 partitions, but I
only got 5???"

-Dana

Re: [DISCUSS] KIP-4 Create Topic Schema

2016-06-15 Thread Dana Powers

> Ewen's reply sums up my thoughts on the error handling points. It doesn't
> seem ideal to justify the wire protocol behaviour based on the Java
> implementation. If we need map-like semantics in the protocol, then maybe
> we need a `Map` type to complement `Array`? Otherwise, I still think we
> should consider throwing the appropriate errors instead of silently picking
> a behaviour. It would be good to know what others think.

Agree. I would prefer to get an error code back from the broker if I
accidentally submitted duplicate topics, or if I set both partitions +
replicas. I think putting the burden on the client/driver developer to
get this right at the wire protocol level is much easier if there's
immediate error feedback.

-Dana

Re: [kafka-clients] Adding new client to wiki

2016-05-26 Thread Dana Powers

I believe wiki requests usually go to the kafka dev mailing list (cc'd)
On May 26, 2016 6:51 AM, "Vijay Jadhav"  wrote:

>
>
> Hi,
>
> Can someone point out what is procedure for adding "libasynckafkaclient
> " (C++ based single
> threaded asynchronous library ) : to client wiki
>  page ?
>
> Thanks
> Vijay
>
> --
> You received this message because you are subscribed to the Google Groups
> "kafka-clients" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kafka-clients+unsubscr...@googlegroups.com.
> To post to this group, send email to kafka-clie...@googlegroups.com.
> Visit this group at https://groups.google.com/group/kafka-clients.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/kafka-clients/c47b01c1-4b6f-42b1-887f-172e9f5c4403%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

Re: [VOTE] 0.10.0.0 RC6

2016-05-20 Thread Dana Powers

+1 -- passed kafka-python integration tests

-Dana

On Fri, May 20, 2016 at 11:16 AM, Joe Stein  wrote:
> +1 ran quick start from source and binary release
>
> On Fri, May 20, 2016 at 1:07 PM, Ewen Cheslack-Postava 
> wrote:
>
>> +1 validated connect with a couple of simple connectors and console
>> producer/consumer.
>>
>> -Ewen
>>
>> On Fri, May 20, 2016 at 9:53 AM, Guozhang Wang  wrote:
>>
>> > +1. Validated maven (should be
>> > https://repository.apache.org/content/groups/staging/org/apache/kafka/
>> > btw)
>> > and binary libraries, quick start.
>> >
>> > On Fri, May 20, 2016 at 9:36 AM, Harsha  wrote:
>> >
>> > > +1 . Ran a 3-node cluster with few system tests on our side. Looks
>> good.
>> > >
>> > > -Harsha
>> > >
>> > > On Thu, May 19, 2016, at 07:47 PM, Jun Rao wrote:
>> > > > Thanks for running the release. +1 from me. Verified the quickstart.
>> > > >
>> > > > Jun
>> > > >
>> > > > On Tue, May 17, 2016 at 10:00 PM, Gwen Shapira 
>> > > wrote:
>> > > >
>> > > > > Hello Kafka users, developers and client-developers,
>> > > > >
>> > > > > This is the seventh (!) candidate for release of Apache Kafka
>> > > > > 0.10.0.0. This is a major release that includes: (1) New message
>> > > > > format including timestamps (2) client interceptor API (3) Kafka
>> > > > > Streams.
>> > > > >
>> > > > > This RC was rolled out to fix an issue with our packaging that
>> caused
>> > > > > dependencies to leak in ways that broke our licensing, and an issue
>> > > > > with protocol versions that broke upgrade for LinkedIn and others
>> who
>> > > > > may run from trunk. Thanks to Ewen, Ismael, Becket and Jun for the
>> > > > > finding and fixing of issues.
>> > > > >
>> > > > > Release notes for the 0.10.0.0 release:
>> > > > > http://home.apache.org/~gwenshap/0.10.0.0-rc6/RELEASE_NOTES.html
>> > > > >
>> > > > > Lets try to vote within the 72h release vote window and get this
>> baby
>> > > > > out already!
>> > > > >
>> > > > > *** Please download, test and vote by Friday, May 20, 23:59 PT
>> > > > >
>> > > > > Kafka's KEYS file containing PGP keys we use to sign the release:
>> > > > > http://kafka.apache.org/KEYS
>> > > > >
>> > > > > * Release artifacts to be voted upon (source and binary):
>> > > > > http://home.apache.org/~gwenshap/0.10.0.0-rc6/
>> > > > >
>> > > > > * Maven artifacts to be voted upon:
>> > > > > https://repository.apache.org/content/groups/staging/
>> > > > >
>> > > > > * java-doc
>> > > > > http://home.apache.org/~gwenshap/0.10.0.0-rc6/javadoc/
>> > > > >
>> > > > > * tag to be voted upon (off 0.10.0 branch) is the 0.10.0.0 tag:
>> > > > >
>> > > > >
>> > >
>> >
>> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=065899a3bc330618e420673acf9504d123b800f3
>> > > > >
>> > > > > * Documentation:
>> > > > > http://kafka.apache.org/0100/documentation.html
>> > > > >
>> > > > > * Protocol:
>> > > > > http://kafka.apache.org/0100/protocol.html
>> > > > >
>> > > > > /**
>> > > > >
>> > > > > Thanks,
>> > > > >
>> > > > > Gwen
>> > > > >
>> > >
>> >
>> >
>> >
>> > --
>> > -- Guozhang
>> >
>>
>>
>>
>> --
>> Thanks,
>> Ewen
>>

Re: [VOTE] KIP-57: Interoperable LZ4 Framing

2016-05-07 Thread Dana Powers

Vote Passed! I will update the wiki.

-Dana
On May 7, 2016 3:48 AM, "Ismael Juma" <ism...@juma.me.uk> wrote:

> Dana, a long time has passed since the vote started and there are enough
> binding votes, so maybe it's time to declare that the vote has passed?
> Please mark the KIP as adopted in the KIP page and move it to the adopted
> table in the KIPs page once you do that.
>
> Ismael
> On 6 May 2016 22:16, "Ismael Juma" <ism...@juma.me.uk> wrote:
>
> +1 (assuming the changes I mentioned in the discuss thread are
> incorporated)
>
> Ismael
>
> On Thu, May 5, 2016 at 1:13 AM, Jun Rao <j...@confluent.io> wrote:
>
> > Thanks for the response. +1 on the KIP.
> >
> > Jun
> >
> > On Thu, Apr 28, 2016 at 9:01 AM, Dana Powers <dana.pow...@gmail.com>
> > wrote:
> >
> > > Sure thing. Yes, the substantive change is fixing the HC checksum.
> > >
> > > But to further improve interoperability, the kafka LZ4 class would no
> > > longer reject messages that have these optional header flags set. The
> > > flags might get set if the client/user chooses to use a non-java lz4
> > > compression library that includes them. In practice, naive support for
> > > the flags just means reading a few extra bytes in the header and/or
> > > footer of the payload. The KIP does not intend to use or validate this
> > > extra data.
> > >
> > > ContentSize is described as: "This field has no impact on decoding, it
> > > just informs the decoder how much data the frame holds (for example,
> > > to display it during decoding process, or for verification purpose).
> > > It can be safely skipped by a conformant decoder." We skip it.
> > >
> > > ContentChecksum is "Content Checksum validates the result, that all
> > > blocks were fully transmitted in the correct order and without error,
> > > and also that the encoding/decoding process itself generated no
> > > distortion." We skip it.
> > >
> > > -Dana
> > >
> > >
> > > On Thu, Apr 28, 2016 at 7:43 AM, Jun Rao <j...@confluent.io> wrote:
> > > > Hi, Dana,
> > > >
> > > > Could you explain the following from the KIP a bit more? The KIP is
> > > > intended to just fix the HC checksum, but the following seems to
> > suggest
> > > > there are other format changes?
> > > >
> > > > KafkaLZ4* code:
> > > >
> > > >- add naive support for optional header flags (ContentSize,
> > > >ContentChecksum) to enable interoperability with off-the-shelf lz4
> > > libraries
> > > >- the only flag left unsupported is dependent-block compression,
> > which
> > > >our implementation does not currently support.
> > > >
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > > On Mon, Apr 25, 2016 at 2:26 PM, Dana Powers <dana.pow...@gmail.com>
> > > wrote:
> > > >
> > > >> Hi all,
> > > >>
> > > >> Initiating a vote thread because the KIP-57 proposal is specific to
> > > >> the 0.10 release.
> > > >>
> > > >> KIP-57 can be accessed here:
> > > >> <
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-57+-+Interoperable+LZ4+Framing
> > > >> >.
> > > >>
> > > >> The related JIRA is
> https://issues.apache.org/jira/browse/KAFKA-3160
> > > >> and working github PR at https://github.com/apache/kafka/pull/1212
> > > >>
> > > >> The vote will run for 72 hours.
> > > >>
> > > >> +1 (non-binding)
> > > >>
> > >
> >
>

Re: KIP-57: Interoperable LZ4 Framing

2016-05-06 Thread Dana Powers

Ok -- removed Public Interfaces discussion. It should be up to date w/
PR review comments now.

-Dana

On Fri, May 6, 2016 at 2:15 PM, Ismael Juma <ism...@juma.me.uk> wrote:
> One more suggestion Dana, I would remove the "Public interfaces" section as
> those classes are not actually public (only the classes with Javadoc are
> public: https://kafka.apache.org/090/javadoc/index.html) and the
> information in the KIP is a bit stale when compared to the PR.
>
> Ismael
>
> On Fri, May 6, 2016 at 10:12 PM, Ismael Juma <ism...@juma.me.uk> wrote:
>
>> On Sun, May 1, 2016 at 3:57 AM, Dana Powers <dana.pow...@gmail.com> wrote:
>>>
>>> > 2. We're completely disabling checksumming of the compressed payload on
>>> > consumption. Normally you'd want to validate each level of framing for
>>> > correct end-to-end validation. You could still do this (albeit more
>>> weakly)
>>> > by validating the checksum is one of the two potentially valid values
>>> > (correct checksum or old, incorrect checksum). This obviously has
>>> > computational cost. Are we sure the tradeoff we're going with makes
>>> sense?
>>>
>>> Yes, to be honest, not validating on consumption is mostly because I just
>>> haven't dug into the bowels of the java client compressor / memory records
>>> call chains. It seems non-trivial to switch validation based on the
>>> message
>>> version in the consumer code. I did not opt for the weak validation that
>>> you
>>> suggest because I think the broker should always validate v1 messages on
>>> produce, and that piece shares the same code path within the lz4 java
>>> classes.
>>> I suppose we could make the default to raise an error on checksums that
>>> fail
>>> weak validation, and then switch to strong validation in the broker.
>>> Alternately,
>>> if you have suggestions on how to wire up the consumer code to switch lz4
>>> behavior based on message version, I would be happy to run with that.
>>
>>
>> The lack of checksum validation on consumption was a concern I had as well
>> (and Jun too, when I checked with him) so I helped Dana with this and the
>> PR now includes consumer validation for V1 messages. Dana, can you please
>> update the KIP?
>>
>> Ismael
>>

Re: KIP-57: Interoperable LZ4 Framing

2016-05-06 Thread Dana Powers

Updated: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-57+-+Interoperable+LZ4+Framing

Should I restart the vote?

-Dana

On Fri, May 6, 2016 at 2:12 PM, Ismael Juma <ism...@juma.me.uk> wrote:
> On Sun, May 1, 2016 at 3:57 AM, Dana Powers <dana.pow...@gmail.com> wrote:
>>
>> > 2. We're completely disabling checksumming of the compressed payload on
>> > consumption. Normally you'd want to validate each level of framing for
>> > correct end-to-end validation. You could still do this (albeit more
>> weakly)
>> > by validating the checksum is one of the two potentially valid values
>> > (correct checksum or old, incorrect checksum). This obviously has
>> > computational cost. Are we sure the tradeoff we're going with makes
>> sense?
>>
>> Yes, to be honest, not validating on consumption is mostly because I just
>> haven't dug into the bowels of the java client compressor / memory records
>> call chains. It seems non-trivial to switch validation based on the message
>> version in the consumer code. I did not opt for the weak validation that
>> you
>> suggest because I think the broker should always validate v1 messages on
>> produce, and that piece shares the same code path within the lz4 java
>> classes.
>> I suppose we could make the default to raise an error on checksums that
>> fail
>> weak validation, and then switch to strong validation in the broker.
>> Alternately,
>> if you have suggestions on how to wire up the consumer code to switch lz4
>> behavior based on message version, I would be happy to run with that.
>
>
> The lack of checksum validation on consumption was a concern I had as well
> (and Jun too, when I checked with him) so I helped Dana with this and the
> PR now includes consumer validation for V1 messages. Dana, can you please
> update the KIP?
>
> Ismael

Re: [jira] [Updated] (KAFKA-3160) Kafka LZ4 framing code miscalculates header checksum

2016-05-04 Thread Dana Powers

You can assign to me also

Re: [VOTE] KIP-57: Interoperable LZ4 Framing

2016-05-03 Thread Dana Powers

Yes, great point. The intent of adding "naive" support for the
remaining LZ4 header flags (contentsize and contentchecksum) is to
avoid rejecting any lz4 messages framed according to the interoperable
spec (v1.5.1). If we can accept all such messages (which this KIP
does), we should never have to make similar 'breaking' changes in the
future.

That said, there is one feature that is left unsupported:
block-dependent (de)compression, aka the lz4 "streaming" api. If a
library *only* supported lz4 streaming, it would be incompatible with
this kafka implementation. I haven't done a survey of libraries, but I
would be shocked to find an lz4 implementation that only supported
streaming -- or even that used streaming as the default. This is
because it is an optimization on the default block api (this is what
kafka has used and continues to use -- no change here). There's more
detail on this here:
https://github.com/Cyan4973/lz4/wiki/LZ4-Streaming-API-Basics

I should also note that we do not support the old dictionary-id flag,
which was available in earlier lz4f spec versions but has since been
removed. I can't find any evidence of it ever being used, which is
probably why it was dropped from the spec.

Re testing, kafka-python uses https://github.com/darkdragn/lz4tools
which implements 1.5.0 of the lz4f spec and is listed as one of the
interoperable libraries. I'm not sure which library Magnus uses in
librdkafka, but it would be great to get independent verification from
him that this patch works there as well. You'll note that there is
currently no interoperable java library :( There is some desire to
merge the kafka classes back into jpountz/lz4-java, which I think
would be reasonable after these compatibility fixes. See
https://github.com/jpountz/lz4-java/issues/21 .

-Dana

On Tue, May 3, 2016 at 10:38 AM, Ewen Cheslack-Postava
<e...@confluent.io> wrote:
> +1
>
> One caveat on the vote though -- I don't know the details of LZ4 (format,
> libraries, etc) well enough to have a handle on whether the changes under
> "KafkaLZ4* code" are going to be sufficient to get broad support from other
> LZ4 libraries. Are we going to have multiple implementations we can test a
> PR with before we merge? Or would any subsequent fixes also have to come
> with bumping the produce request version to indicate varying feature
> support if we had to further change that code? What I want to avoid is
> clients in some languages having to work with broker version numbers
> instead of protocol version numbers due to further incompatibilities we
> might find.
>
> -Ewen
>
> On Thu, Apr 28, 2016 at 9:01 AM, Dana Powers <dana.pow...@gmail.com> wrote:
>
>> Sure thing. Yes, the substantive change is fixing the HC checksum.
>>
>> But to further improve interoperability, the kafka LZ4 class would no
>> longer reject messages that have these optional header flags set. The
>> flags might get set if the client/user chooses to use a non-java lz4
>> compression library that includes them. In practice, naive support for
>> the flags just means reading a few extra bytes in the header and/or
>> footer of the payload. The KIP does not intend to use or validate this
>> extra data.
>>
>> ContentSize is described as: "This field has no impact on decoding, it
>> just informs the decoder how much data the frame holds (for example,
>> to display it during decoding process, or for verification purpose).
>> It can be safely skipped by a conformant decoder." We skip it.
>>
>> ContentChecksum is "Content Checksum validates the result, that all
>> blocks were fully transmitted in the correct order and without error,
>> and also that the encoding/decoding process itself generated no
>> distortion." We skip it.
>>
>> -Dana
>>
>>
>> On Thu, Apr 28, 2016 at 7:43 AM, Jun Rao <j...@confluent.io> wrote:
>> > Hi, Dana,
>> >
>> > Could you explain the following from the KIP a bit more? The KIP is
>> > intended to just fix the HC checksum, but the following seems to suggest
>> > there are other format changes?
>> >
>> > KafkaLZ4* code:
>> >
>> >- add naive support for optional header flags (ContentSize,
>> >ContentChecksum) to enable interoperability with off-the-shelf lz4
>> libraries
>> >- the only flag left unsupported is dependent-block compression, which
>> >our implementation does not currently support.
>> >
>> >
>> > Thanks,
>> >
>> > Jun
>> >
>> > On Mon, Apr 25, 2016 at 2:26 PM, Dana Powers <dana.pow...@gmail.com>
>> wrote:
>> >
>> >> Hi all,
>> >>
>> >> In

Re: [VOTE] 0.10.0.0 RC2

2016-05-02 Thread Dana Powers

I was unable to use/test the rc2 binary artifact without manually
patching kafka-run-class.sh . I'd vote for a new release candidate.

-Dana

On Mon, May 2, 2016 at 5:44 AM, Ismael Juma  wrote:
> On second thought, will people be able to test the binary packages if they
> can't start the broker?
>
> Ismael
>
> On Mon, May 2, 2016 at 5:43 AM, Ismael Juma  wrote:
>
>> +1
>> On 2 May 2016 07:18, "Gwen Shapira"  wrote:
>>
>>> Thanks for the catches :)
>>>
>>> Is it ok if we delay rolling out the new RC to the end of the week, to
>>> allow more time for testing?
>>>
>>> Gwen
>>>
>>> On Sun, May 1, 2016 at 11:49 AM, Ewen Cheslack-Postava
>>>  wrote:
>>> > We had a blocking issue on the release (
>>> > https://github.com/apache/kafka/pull/1302 unfortunately only easily
>>> noticed
>>> > after packaging). So we'll need another RC.
>>> >
>>> > https://issues.apache.org/jira/browse/KAFKA-3627 was also raised,
>>> which I
>>> > think should probably also be a blocker as it affects liveness on the
>>> > consumer and leaves us in effectively the same state as pre KIP-41 (i.e.
>>> > the implementation for KIP-41 is currently incorrect).
>>> >
>>> > -Ewen
>>> >
>>> > On Sat, Apr 30, 2016 at 6:16 PM, Gwen Shapira 
>>> wrote:
>>> >
>>> >> Thanks for the correction :)
>>> >>
>>> >> On Sat, Apr 30, 2016 at 2:30 AM, Ben Davison >> >
>>> >> wrote:
>>> >> > Hi Gwen,
>>> >> >
>>> >> > The release notes lead to a 404, this is the correct url:
>>> >> > http://home.apache.org/~gwenshap/0.10.0.0-rc2/RELEASE_NOTES.html
>>> >> >
>>> >> > Thanks for leading the RC effort.
>>> >> >
>>> >> > Regards,
>>> >> >
>>> >> > Ben
>>> >> >
>>> >> > On Sat, Apr 30, 2016 at 1:01 AM, Gwen Shapira 
>>> wrote:
>>> >> >
>>> >> >> Hello Kafka users, developers and client-developers,
>>> >> >>
>>> >> >> This is the first candidate for release of Apache Kafka 0.10.0.0.
>>> This
>>> >> >> is a major release that includes: (1) New message format including
>>> >> >> timestamps (2) client interceptor API (3) Kafka Streams. (4)
>>> >> >> Configurable SASL authentication mechanisms (5) API for retrieving
>>> >> >> protocol versions supported by the broker.
>>> >> >>
>>> >> >> Since this is a major release, we will give people more time to try
>>> it
>>> >> >> out and give feedback.
>>> >> >>
>>> >> >> Contributions that are especially welcome are:
>>> >> >> * Critical bugs found while testing
>>> >> >> * Especially testing related to the new functionality
>>> >> >> * More tests
>>> >> >> * Better docs
>>> >> >> * Doc reviews related to new functionality and upgrade
>>> >> >>
>>> >> >> Release notes for the 0.10.0.0 release:
>>> >> >> http://home.apache.org/~gwenshap/0.10.0.0-rc2/RELEASE_NOTES.HTML
>>> >> >>
>>> >> >> Release plan:
>>> >> >>
>>> https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+0.10.0
>>> >> >>
>>> >> >> *** Please download, test and vote by Monday, May 9, 9am PT
>>> >> >>
>>> >> >> Kafka's KEYS file containing PGP keys we use to sign the release:
>>> >> >> http://kafka.apache.org/KEYS
>>> >> >>
>>> >> >> * Release artifacts to be voted upon (source and binary):
>>> >> >> http://home.apache.org/~gwenshap/0.10.0.0-rc2/
>>> >> >>
>>> >> >> * Maven artifacts to be voted upon:
>>> >> >> https://repository.apache.org/content/groups/staging/
>>> >> >>
>>> >> >> * scala-doc
>>> >> >> http://home.apache.org/~gwenshap/0.10.0.0-rc2/scaladoc
>>> >> >>
>>> >> >> * java-doc
>>> >> >> http://home.apache.org/~gwenshap/0.10.0.0-rc2/javadoc/
>>> >> >>
>>> >> >> * tag to be voted upon (off 0.10.0 branch) is the 0.10.0.0-rc2 tag:
>>> >> >>
>>> >> >>
>>> >>
>>> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=da2745e104ba31fc980265ad835d9233652c
>>> >> >>
>>> >> >> * Documentation:
>>> >> >> http://kafka.apache.org/0100/documentation.html
>>> >> >>
>>> >> >> * Protocol:
>>> >> >> http://kafka.apache.org/0100/protocol.html
>>> >> >>
>>> >> >> /**
>>> >> >>
>>> >> >> Thanks,
>>> >> >>
>>> >> >> Gwen
>>> >> >>
>>> >> >
>>> >> > --
>>> >> >
>>> >> >
>>> >> > This email, including attachments, is private and confidential. If
>>> you
>>> >> have
>>> >> > received this email in error please notify the sender and delete it
>>> from
>>> >> > your system. Emails are not secure and may contain viruses. No
>>> liability
>>> >> > can be accepted for viruses that might be transferred by this email
>>> or
>>> >> any
>>> >> > attachment. Any unauthorised copying of this message or unauthorised
>>> >> > distribution and publication of the information contained herein are
>>> >> > prohibited.
>>> >> >
>>> >> > 7digital Limited. Registered office: 69 Wilson Street, London EC2A
>>> 2BB.
>>> >> > Registered in England and Wales. Registered No. 04843573.
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > Thanks,
>>> > Ewen
>>>
>>

Re: KIP-57: Interoperable LZ4 Framing

2016-04-30 Thread Dana Powers

On Fri, Apr 29, 2016 at 6:29 PM, Ewen Cheslack-Postava
 wrote:
> Two questions:
>
> 1. My understanding based on KIP-35 is that this won't be a problem for
> clients that want to support older broker versions since they will use v0
> produce requests with broken checksum to send to those, and any broker
> advertising support for v1 produce requests will also support valid
> checksums? In other words, the KIP is structured in terms of Java client
> versions, but I'd like to make sure we have the compatibility path for
> non-Java clients cleanly mapped out. (And think we do, especially given
> that Dana is proposing, but still would like an ack on that.)

Yes, I'm treating these as the same:

broker/client <= 0.9
messages == v0
Fetch api version <= 1
Produce api version <= 1

broker/client >= 0.10
messages >= v1
Fetch api version >= 2
Produce api version >= 2

I dont think there will be any problem for clients that want to
support both encodings.

> 2. We're completely disabling checksumming of the compressed payload on
> consumption. Normally you'd want to validate each level of framing for
> correct end-to-end validation. You could still do this (albeit more weakly)
> by validating the checksum is one of the two potentially valid values
> (correct checksum or old, incorrect checksum). This obviously has
> computational cost. Are we sure the tradeoff we're going with makes sense?

Yes, to be honest, not validating on consumption is mostly because I just
haven't dug into the bowels of the java client compressor / memory records
call chains. It seems non-trivial to switch validation based on the message
version in the consumer code. I did not opt for the weak validation that you
suggest because I think the broker should always validate v1 messages on
produce, and that piece shares the same code path within the lz4 java classes.
I suppose we could make the default to raise an error on checksums that fail
weak validation, and then switch to strong validation in the broker.
Alternately,
if you have suggestions on how to wire up the consumer code to switch lz4
behavior based on message version, I would be happy to run with that.

-Dana

[jira] [Commented] (KAFKA-3615) Exclude test jars in CLASSPATH of kafka-run-class.sh

2016-04-30 Thread Dana Powers (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-3615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15265561#comment-15265561
 ] 

Dana Powers commented on KAFKA-3615:


This PR has a bug that breaks the classpath setup for bin scripts in the rc2 
release. Should we reopen this and follow up, or open a new issue?

> Exclude test jars in CLASSPATH of kafka-run-class.sh
> 
>
> Key: KAFKA-3615
> URL: https://issues.apache.org/jira/browse/KAFKA-3615
> Project: Kafka
>  Issue Type: Improvement
>  Components: admin, build
>Affects Versions: 0.10.0.0
>Reporter: Liquan Pei
>Assignee: Liquan Pei
>  Labels: newbie
> Fix For: 0.10.1.0, 0.10.0.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [VOTE] KIP-57: Interoperable LZ4 Framing

2016-04-28 Thread Dana Powers

Sure thing. Yes, the substantive change is fixing the HC checksum.

But to further improve interoperability, the kafka LZ4 class would no
longer reject messages that have these optional header flags set. The
flags might get set if the client/user chooses to use a non-java lz4
compression library that includes them. In practice, naive support for
the flags just means reading a few extra bytes in the header and/or
footer of the payload. The KIP does not intend to use or validate this
extra data.

ContentSize is described as: "This field has no impact on decoding, it
just informs the decoder how much data the frame holds (for example,
to display it during decoding process, or for verification purpose).
It can be safely skipped by a conformant decoder." We skip it.

ContentChecksum is "Content Checksum validates the result, that all
blocks were fully transmitted in the correct order and without error,
and also that the encoding/decoding process itself generated no
distortion." We skip it.

-Dana

On Thu, Apr 28, 2016 at 7:43 AM, Jun Rao <j...@confluent.io> wrote:
> Hi, Dana,
>
> Could you explain the following from the KIP a bit more? The KIP is
> intended to just fix the HC checksum, but the following seems to suggest
> there are other format changes?
>
> KafkaLZ4* code:
>
>- add naive support for optional header flags (ContentSize,
>ContentChecksum) to enable interoperability with off-the-shelf lz4 
> libraries
>- the only flag left unsupported is dependent-block compression, which
>our implementation does not currently support.
>
>
> Thanks,
>
> Jun
>
> On Mon, Apr 25, 2016 at 2:26 PM, Dana Powers <dana.pow...@gmail.com> wrote:
>
>> Hi all,
>>
>> Initiating a vote thread because the KIP-57 proposal is specific to
>> the 0.10 release.
>>
>> KIP-57 can be accessed here:
>> <
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-57+-+Interoperable+LZ4+Framing
>> >.
>>
>> The related JIRA is https://issues.apache.org/jira/browse/KAFKA-3160
>> and working github PR at https://github.com/apache/kafka/pull/1212
>>
>> The vote will run for 72 hours.
>>
>> +1 (non-binding)
>>

[VOTE] KIP-57: Interoperable LZ4 Framing

2016-04-25 Thread Dana Powers

Hi all,

Initiating a vote thread because the KIP-57 proposal is specific to
the 0.10 release.

KIP-57 can be accessed here:
.

The related JIRA is https://issues.apache.org/jira/browse/KAFKA-3160
and working github PR at https://github.com/apache/kafka/pull/1212

The vote will run for 72 hours.

+1 (non-binding)

KIP-57: Interoperable LZ4 Framing

2016-04-25 Thread Dana Powers

Hi all,

I've written up a new KIP based on KAFKA-3160 / fixing LZ4 framing.
The write-up is here:

https://cwiki.apache.org/confluence/display/KAFKA/KIP-57+-+Interoperable+LZ4+Framing

Please take a look if you are using LZ4 compression or are interested
in framing specs.

One question: does anyone out there fall into this category: (1)
deploy from trunk, (2) upgraded to v1 message format, (3) using
LZ4-compressed messages ?


Feedback welcome,

-Dana

wiki privs

2016-04-25 Thread Dana Powers

I'd like to create a new KIP page on the wiki, but I can't figure out
how to create a new page or edit an existing page. Do I need certain
privileges on my user account to be able to do this?

Thanks,

-Dana

[jira] [Created] (KAFKA-3550) Broker does not honor MetadataRequest api version; always returns v0 MetadataResponse

2016-04-12 Thread Dana Powers (JIRA)

Dana Powers created KAFKA-3550:
--

 Summary: Broker does not honor MetadataRequest api version; always 
returns v0 MetadataResponse
 Key: KAFKA-3550
 URL: https://issues.apache.org/jira/browse/KAFKA-3550
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.9.0.1, 0.8.2.2, 0.8.1.1, 0.8.0
Reporter: Dana Powers


To reproduce:

Send a MetadataRequest (api key 3) with incorrect api version (e.g., 1234).
The expected behavior is for the broker to reject the request as unrecognized.
Broker (incorrectly) responds with MetadataResponse v0.

The problem here is that any request for a "new" MetadataRequest (i.e., KIP-4) 
sent to an old broker will generate an incorrect response.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [VOTE] KIP-4 Metadata Schema (Round 2)

2016-04-12 Thread Dana Powers

+1
On Apr 11, 2016 21:55, "Gwen Shapira"  wrote:

> +1
>
> On Mon, Apr 11, 2016 at 10:42 AM, Grant Henke  wrote:
> > Based on the discussion in the previous vote thread
> > <
> http://search-hadoop.com/m/uyzND1xlaiU10QlYX=+VOTE+KIP+4+Metadata+Schema
> >
> > I also would like to include a behavior change to the MetadataResponse. I
> > have update the wiki
> > <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-MetadataSchema
> >
> > and pull request  to include
> > this change.
> >
> > The change as described on the wiki is:
> >
> >> The behavior of the replicas and isr arrays will be changed in order to
> >> support the admin tools, and better represent the state of the cluster:
> >>
> >>- In version 0, if a broker is down the replicas and isr array will
> >>omit the brokers entry and add a REPLICA_NOT_AVAILABLE error code.
> >>- In version 1, no error code will be set and a the broker id will be
> >>included in the replicas and isr array.
> >>   - Note: A user can still detect if the replica is not available,
> by
> >>   checking if the broker is in the returned broker list.
> >>
> >>
> >
> > Being optimistic that this doesn't require to much discussion, I would
> like
> > to re-start the voting process on this thread. If more discussion is
> > needed, please don't hesitate to bring it up here.
> >
> > Ismael, Gwen, Guozhang could you please review and revote based on the
> > changes.
> >
> > Thank you,
> > Grant
> >
> > On Sat, Apr 9, 2016 at 1:03 PM, Guozhang Wang 
> wrote:
> >
> >> +1
> >>
> >> On Fri, Apr 8, 2016 at 4:36 PM, Gwen Shapira  wrote:
> >>
> >> > +1
> >> >
> >> > On Fri, Apr 8, 2016 at 2:41 PM, Grant Henke 
> wrote:
> >> >
> >> > > I would like to re-initiate the voting process for the "KIP-4
> Metadata
> >> > > Schema changes". This is not a vote for all of KIP-4, but
> specifically
> >> > for
> >> > > the metadata changes. I have included the exact changes below for
> >> > clarity:
> >> > > >
> >> > > > Metadata Request (version 1)
> >> > > >
> >> > > >
> >> > > >
> >> > > > MetadataRequest => [topics]
> >> > > >
> >> > > > Stays the same as version 0 however behavior changes.
> >> > > > In version 0 there was no way to request no topics, and and empty
> >> list
> >> > > > signified all topics.
> >> > > > In version 1 a null topics list (size -1 on the wire) will
> indicate
> >> > that
> >> > > a
> >> > > > user wants *ALL* topic metadata. Compared to an empty list (size
> 0)
> >> > which
> >> > > > indicates metadata for *NO* topics should be returned.
> >> > > > Metadata Response (version 1)
> >> > > >
> >> > > >
> >> > > >
> >> > > > MetadataResponse => [brokers] controllerId [topic_metadata]
> >> > > >   brokers => node_id host port rack
> >> > > > node_id => INT32
> >> > > > host => STRING
> >> > > > port => INT32
> >> > > > rack => NULLABLE_STRING
> >> > > >   controllerId => INT32
> >> > > >   topic_metadata => topic_error_code topic is_internal
> >> > > [partition_metadata]
> >> > > > topic_error_code => INT16
> >> > > > topic => STRING
> >> > > > is_internal => BOOLEAN
> >> > > > partition_metadata => partition_error_code partition_id leader
> >> > > [replicas] [isr]
> >> > > >   partition_error_code => INT16
> >> > > >   partition_id => INT32
> >> > > >   leader => INT32
> >> > > >   replicas => INT32
> >> > > >   isr => INT32
> >> > > >
> >> > > > Adds rack, controller_id, and is_internal to the version 0
> response.
> >> > > >
> >> > >
> >> > > The KIP is available here for reference (linked to the Metadata
> schema
> >> > > section):
> >> > > *
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-MetadataSchema
> >> > > <
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-MetadataSchema
> >> > > >*
> >> > >
> >> > > A pull request is available implementing the proposed changes here:
> >> > > https://github.com/apache/kafka/pull/1095
> >> > >
> >> > > Here are some links to past discussions on the mailing list:
> >> > >
> >> http://search-hadoop.com/m/uyzND1pd4T52H1m0u1=Re+KIP+4+Wiki+Update
> >> > >
> >> > >
> >> >
> >>
> http://search-hadoop.com/m/uyzND1J2IXeSNXAT=Metadata+and+ACLs+wire+protocol+review+KIP+4+
> >> > >
> >> > > Here is the previous vote discussion (please take a look and discuss
> >> > > there):
> >> > >
> >> > >
> >> >
> >>
> http://search-hadoop.com/m/uyzND1xlaiU10QlYX=+VOTE+KIP+4+Metadata+Schema
> >> > >
> >> > > Thank you,
> >> > > Grant
> >> > >

[jira] [Commented] (KAFKA-3160) Kafka LZ4 framing code miscalculates header checksum

2016-04-10 Thread Dana Powers (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-3160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15234221#comment-15234221
 ] 

Dana Powers commented on KAFKA-3160:


Magnus: have you made any progress on this? The more I think about it, the more 
I think this needs to get included w/ KIP-31. If the goal of KIP-31 is to avoid 
recompression, and the goal of this JIRA is to fix the compression format, and 
in all cases we need to maintain compatibility with old clients, then I think 
the only way to solve all conditions is to make the pre-KIP-31 FetchRequest / 
ProduceRequest versions use the broken LZ4 format, and require the fixed format 
in the new FetchRequest / ProduceRequest version:

Old 0.8/0.9 clients (current behavior): produce messages w/ broken checksum; 
consume messages w/ incorrect checksum only
New 0.10 clients (proposed behavior): produce messages in "new KIP-31 format" 
w/ correct checksum; consume messages in "new KIP-31 format" w/ correct 
checksum only

Proposed behavior for 0.10 broker:
 - convert all "old format" messages to "new KIP-31 format" + fix checksum to 
correct value
 - require incoming "new KIP-31 format" messages to have correct checksum, 
otherwise throw error
 - when serving requests for "old format", fixup checksum to be incorrect when 
converting "new KIP-31 format" messages to old format

Thoughts?

> Kafka LZ4 framing code miscalculates header checksum
> 
>
> Key: KAFKA-3160
> URL: https://issues.apache.org/jira/browse/KAFKA-3160
> Project: Kafka
>  Issue Type: Bug
>  Components: compression
>    Affects Versions: 0.8.2.0, 0.8.2.1, 0.9.0.0, 0.8.2.2, 0.9.0.1
>Reporter: Dana Powers
>Assignee: Magnus Edenhill
>  Labels: compatibility, compression, lz4
>
> KAFKA-1493 partially implements the LZ4 framing specification, but it 
> incorrectly calculates the header checksum. This causes 
> KafkaLZ4BlockInputStream to raise an error 
> [IOException(DESCRIPTOR_HASH_MISMATCH)] if a client sends *correctly* framed 
> LZ4 data. It also causes KafkaLZ4BlockOutputStream to generate incorrectly 
> framed LZ4 data, which means clients decoding LZ4 messages from kafka will 
> always receive incorrectly framed data.
> Specifically, the current implementation includes the 4-byte MagicNumber in 
> the checksum, which is incorrect.
> http://cyan4973.github.io/lz4/lz4_Frame_format.html
> Third-party clients that attempt to use off-the-shelf lz4 framing find that 
> brokers reject messages as having a corrupt checksum. So currently non-java 
> clients must 'fixup' lz4 packets to deal with the broken checksum.
> Magnus first identified this issue in librdkafka; kafka-python has the same 
> problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (KAFKA-3160) Kafka LZ4 framing code miscalculates header checksum

2016-04-10 Thread Dana Powers (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-3160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dana Powers updated KAFKA-3160:
---
Description: 
KAFKA-1493 partially implements the LZ4 framing specification, but it 
incorrectly calculates the header checksum. This causes 
KafkaLZ4BlockInputStream to raise an error 
[IOException(DESCRIPTOR_HASH_MISMATCH)] if a client sends *correctly* framed 
LZ4 data. It also causes KafkaLZ4BlockOutputStream to generate incorrectly 
framed LZ4 data, which means clients decoding LZ4 messages from kafka will 
always receive incorrectly framed data.

Specifically, the current implementation includes the 4-byte MagicNumber in the 
checksum, which is incorrect.
http://cyan4973.github.io/lz4/lz4_Frame_format.html

Third-party clients that attempt to use off-the-shelf lz4 framing find that 
brokers reject messages as having a corrupt checksum. So currently non-java 
clients must 'fixup' lz4 packets to deal with the broken checksum.

Magnus first identified this issue in librdkafka; kafka-python has the same 
problem.

  was:
KAFKA-1493 partially implements the LZ4 framing specification, but it 
incorrectly calculates the header checksum. This causes 
KafkaLZ4BlockInputStream to raise an error 
[IOException(DESCRIPTOR_HASH_MISMATCH)] if a client sends *correctly* framed 
LZ4 data. It also causes the kafka broker to always return incorrectly framed 
LZ4 data to clients.

Specifically, the current implementation includes the 4-byte MagicNumber in the 
checksum, which is incorrect.
http://cyan4973.github.io/lz4/lz4_Frame_format.html

Third-party clients that attempt to use off-the-shelf lz4 framing find that 
brokers reject messages as having a corrupt checksum. So currently non-java 
clients must 'fixup' lz4 packets to deal with the broken checksum.

Magnus first identified this issue in librdkafka; kafka-python has the same 
problem.


> Kafka LZ4 framing code miscalculates header checksum
> 
>
> Key: KAFKA-3160
> URL: https://issues.apache.org/jira/browse/KAFKA-3160
> Project: Kafka
>  Issue Type: Bug
>  Components: compression
>Affects Versions: 0.8.2.0, 0.8.2.1, 0.9.0.0, 0.8.2.2, 0.9.0.1
>    Reporter: Dana Powers
>Assignee: Magnus Edenhill
>  Labels: compatibility, compression, lz4
>
> KAFKA-1493 partially implements the LZ4 framing specification, but it 
> incorrectly calculates the header checksum. This causes 
> KafkaLZ4BlockInputStream to raise an error 
> [IOException(DESCRIPTOR_HASH_MISMATCH)] if a client sends *correctly* framed 
> LZ4 data. It also causes KafkaLZ4BlockOutputStream to generate incorrectly 
> framed LZ4 data, which means clients decoding LZ4 messages from kafka will 
> always receive incorrectly framed data.
> Specifically, the current implementation includes the 4-byte MagicNumber in 
> the checksum, which is incorrect.
> http://cyan4973.github.io/lz4/lz4_Frame_format.html
> Third-party clients that attempt to use off-the-shelf lz4 framing find that 
> brokers reject messages as having a corrupt checksum. So currently non-java 
> clients must 'fixup' lz4 packets to deal with the broken checksum.
> Magnus first identified this issue in librdkafka; kafka-python has the same 
> problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: How will KIP-35 and KIP-43 play together?

2016-04-04 Thread Dana Powers

I don't have anything specific to say wrt SASL features, but I think
this circumstance makes it clear that there are only 2 ways forward:

(1) official java client continues releasing w/ broker versioning as
an implicit compatibility test ("java client X.Y requires broker X.Y")
AND support is added to brokers so that all clients can query broker
version ("0.10.0.0") via API, enabling similar implicit compatibility
tests in non-java clients, or

(2) java client versioning is decoupled from broker versioning,
breaking reliance on implicit compatibility tests, AND all clients
forced to rely on explicit protocol compatibility tests exposed via
API (such as via KIP-35)

Is there any other way to avoid this continuing to be an issue?

-Dana


On Mon, Apr 4, 2016 at 7:14 PM, Gwen Shapira  wrote:
> Magnus,
>
> It sounds like KIP-43 will need to change in order to support the KIP-35
> protocol. Can you add more details on what you had in mind?
>
> On Mon, Apr 4, 2016 at 12:41 PM, Magnus Edenhill  wrote:
>
>> As Jun says the SASL (and SSL) handshake is not done using the Kafka
>> protocol
>> and is performed before any Kafka protocol requests pass between client and
>> server.
>>
>> It might make sense to move the SASL handshake from its custom protocol
>> format
>> into the Kafka protocol and make it use the proper Kafka protocol framing.
>>
>> (For SSL this is isnt needed since TLS has its own _standardised_
>> hand-shake format and existing SSL implementations take care of it.)
>>
>> 2016-04-04 21:20 GMT+02:00 Ismael Juma :
>>
>> > An option would be to add a version for the handshake in the KIP-35
>> > response.
>> >
>> > Ismael
>> > On 4 Apr 2016 20:09, "Gwen Shapira"  wrote:
>> >
>> > > I think the challenge here is that even after KIP-35 clients will not
>> > know
>> > > whether the server supports new sasl mechanisms or not, so non-Java
>> > clients
>> > > will have to assume it is not supported (and will therefore lag behind
>> on
>> > > features).
>> > >
>> > > I think this highlights a short-coming of KIP-35, and I'm wondering if
>> > > there are good ways to address this.
>> > >
>> > > Gwen
>> > >
>> > >
>> > > On Mon, Apr 4, 2016 at 12:05 PM, Jun Rao  wrote:
>> > >
>> > > > I think with KIP-43, the existing way of sasl handshake during
>> > connection
>> > > > still works. It's just that if you want to support non-GSSAPI, you
>> will
>> > > > need a new sasl handshake implementation in the client. It's
>> > unfortunate
>> > > > that Protocol currently only covers the communication after the
>> > > connection
>> > > > is ready to use, but not during handshake. For now, we can probably
>> > just
>> > > > document this change during handshake since changing the
>> implementation
>> > > is
>> > > > optional.
>> > > >
>> > > > Thanks,
>> > > >
>> > > > Jun
>> > > >
>> > > > On Mon, Apr 4, 2016 at 11:28 AM, Gwen Shapira 
>> > wrote:
>> > > >
>> > > > > Hi Kafka Team,
>> > > > >
>> > > > > As a practical test-case of KIP-35, I'd like to turn your attention
>> > to
>> > > > > KIP-43:
>> > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-43
>> > > > >
>> > > > > KIP-43 makes an interesting modification to the protocol, but only
>> > > under
>> > > > > specific conditions:
>> > > > >
>> > > > > "*Client flow:*
>> > > > >
>> > > > >1. If sasl.mechanism is not GSSAPI, send a packet with the
>> > mechanism
>> > > > >name to the server. Otherwise go to Step 3.
>> > > > >   - Packet Format: | Version (Int16) | Mechanism (String) |
>> > > > >2. Wait for response from the server. If the error code in the
>> > > > response
>> > > > >is non-zero, indicating failure, report the error and fail
>> > > > > authentication.
>> > > > >3. Perform SASL authentication with the configured client
>> > mechanism
>> > > > >
>> > > > > *Server flow:*
>> > > > >
>> > > > >1. Wait for first authentication packet from client
>> > > > >2. If this packet is a not valid mechanism request, go to Step 4
>> > and
>> > > > >process this packet as the first GSSAPI client token
>> > > > >3. If the client mechanism received in Step 2 is enabled in the
>> > > > broker,
>> > > > >send a response with error code zero and start authentication
>> > using
>> > > > the
>> > > > >specified mechanism. Otherwise, send an error response including
>> > the
>> > > > > list
>> > > > >of enabled mechanisms and fail authentication.
>> > > > >- Packet Format: | ErrorCode (Int16) | EnabledMechanisms
>> > > > > (ArrayOf(String))
>> > > > >   |
>> > > > >4. Perform SASL authentication with the selected mechanism. If
>> > > > mechanism
>> > > > >exchange was skipped, process the initial packet that was
>> received
>> > > > from
>> > > > > the
>> > > > >client first."
>> > > > >
>> > > > >
>> > > > > I'd love to know how this will be communicated to clients via
>> > > > >

Re: KIP-4 Wiki Update

2016-03-30 Thread Dana Powers

Perhaps python has corrupted my brain, but a null arg seems quite clean to
me:

getTopics() -> returns all
getTopics([]) -> returns none
getTopics([foo, bar]) -> returns foo and bar only

-Dana

On Wed, Mar 30, 2016 at 9:10 AM, Jason Gustafson  wrote:

> >
> > Yes, I think empty should be "no topics" too. However, I would suggest
> > using a boolean to indicate "all topics" and null should not be allowed
> (as
> > it is now). I think this is a clearer API and it's similar to
> > how org.apache.kafka.clients.Metadata works today.
>
>
> +1. Having null imply all is almost as weird as using empty, though at
> least it avoids the most common usage problem.
>
> -Jason

Re: KIP-4 Wiki Update

2016-03-30 Thread Dana Powers

Grant - sorry I was unable to attend. Getting API access to admin
functionality has been a big ask for python client users. I like this KIP a
lot.

I reviewed the details quickly. Here are some comments:

MetadataRequest v1: long-term / conceptually, I think a "null" topic list
aligns better with fetching all topics. Empty list aligns better with
fetching no topics. I recognize this means that empty list behaves
differently in v0 versus v1. But hey, what are protocol versions good for
if not changing behavior... :) API design comment. take it or leave it.

Error Codes: I think it would be useful to describe for each new Response
type, which of the new error codes apply under what circumstances. For
example, in CreateTopic, there is a note that "Only one from (Partitions +
ReplicationFactor), ReplicaAssignment can be defined in one instruction."
Will violating this rule generate an error code? If so, which one?

Ignoring Duplicates: "Multiple instructions for the same topic in one
request will be silently ignored, only the last from the list will be
executed." This could get confusing for clients. What are your thoughts on
treating duplicates as an error and not executing any of them w/ error code
returned? This would put the de-duplication logic burden on the client and
also make it explicitly clear what instructions were actually executed.

Request timeouts: "Because we already add a timeout field to the requests
wire protocols..." Where is this timeout specified? Is this a separate KIP
or did I miss it in KIP-4?

-Dana

On Wed, Mar 30, 2016 at 8:03 AM, Grant Henke  wrote:

> I didn't get anyone in attendance for this meeting. If you would like to
> discuss it please let me know.
>
> Thank you,
> Grant
>
> On Mon, Mar 28, 2016 at 9:18 AM, Grant Henke  wrote:
>
> > I am hoping to get more discussion and feedback around the blocking vs
> > async discussion so I can start to get KIP-4 patches reviewed.
> >
> > In order to facilitate a faster discussion I will hold an open discussion
> > on Tuesday March 29th at 12pm PST (right after the usual KIP call, if we
> > have one). Please join via the hangouts link below:
> >
> >- https://plus.google.com/hangouts/_/cloudera.com/discuss-kip-4
> >
> > If you can't make that time, please suggest another time you would like
> to
> > meet and I can hold another meeting too. I will take notes of the
> meetings
> > and update here.
> >
> > Thank you,
> > Grant
> >
> > On Tue, Mar 15, 2016 at 9:49 AM, Grant Henke 
> wrote:
> >
> >> Moving the relevant wiki text here for discussion/tracking:
> >>>
> >>> Server-side Admin Request handlers
> >>>
> >>> At the highest level, admin requests will be handled on the brokers the
> >>> same way that all message types are. However, because admin messages
> modify
> >>> cluster metadata they should be handled by the controller. This allows
> the
> >>> controller to propagate the changes to the rest of the cluster.
> However,
> >>> because the messages need to be handled by the controller does not
> >>> necessarily mean they need to be sent directly to the controller. A
> message
> >>> forwarding mechanism can be used to forward the message from any
> broker to
> >>> the correct broker for handling.
> >>>
> >>> Because supporting all of this is quite the undertaking I will describe
> >>> the "ideal functionality" and then the "intermediate functionality"
> that
> >>> gets us some basic administrative support quickly while working
> towards the
> >>> optimal state.
> >>>
> >>> *Ideal Functionality:*
> >>>
> >>>1. A client sends an admin request to *any* broker
> >>>2. The admin request is forwarded to the required broker (likely the
> >>>controller)
> >>>3. The request is handled and the server blocks until a timeout is
> >>>reached or the requested operation is completed (failure or success)
> >>>   1. An operation is considered complete/successful when *all
> >>>   required nodes have the correct/current state*.
> >>>   2. Immediate follow up requests to *any broker* will succeed.
> >>>   3. Requests that timeout may still be completed after the
> >>>   timeout. The users would need to poll to check the state.
> >>>4. The response is generated and forwarded back to the broker that
> >>>received the request.
> >>>5. A response is sent back to the client.
> >>>
> >>> *Intermediate Functionality*:
> >>>
> >>>1. A client sends an admin request to *the controller* broker
> >>>   1. As a follow up request forwarding can be added transparently.
> >>>   (see below)
> >>>2. The request is handled and the server blocks until a timeout is
> >>>reached or the requested operation is completed (failure or success)
> >>>   1. An operation is considered complete/successful when *the
> >>>   controller node has the correct/current state.*
> >>>   2. Immediate follow up requests to *the controller* will

Re: [VOTE] 0.10.0.0 RC1

2016-03-28 Thread Dana Powers

+1 -- verified that all kafka-python integration tests now pass

On Mon, Mar 28, 2016 at 2:34 PM, Gwen Shapira  wrote:

> Hello Kafka users, developers and client-developers,
>
> This is the second candidate for release of Apache Kafka 0.10.0.0.
>
> This is a major release that includes:
>
> (1) New message format including timestamps
>
> (2) client interceptor API
>
> (3) Kafka Streams.
>
>
> Since this is a major release, we will give people more time to try it
> out and give feedback.
>
> Release notes for the 0.10.0.0
> release:http://home.apache.org/~gwenshap/0.10.0.0-rc1/RELEASE_NOTES.html
>
> *** Please download, test and vote by Monday, April 4, 4pm PT
>
> Kafka's KEYS file containing PGP keys we use to sign the
> release:http://kafka.apache.org/KEYS
>
> * Release artifacts to be voted upon (source and
> binary):http://home.apache.org/~gwenshap/0.10.0.0-rc1/
>
> * Maven artifacts to be voted
> upon:https://repository.apache.org/content/groups/staging/
>
> * scala-dochttp://home.apache.org/~gwenshap/0.10.0.0-rc1/scaladoc
>
> * java-dochttp://home.apache.org/~gwenshap/0.10.0.0-rc1/javadoc/
>
> * tag to be voted upon (off 0.10.0 branch) is the 0.10.0.0
> tag:
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=759940658d805b1262101dce0ea9a9d562c5f30d
>
> * Documentation:http://kafka.apache.org/0100/documentation.html
>
> * Protocol:http://kafka.apache.org/0100/protocol.html
>
> /**
>
> Thanks,
>
> Gwen
>

Re: [VOTE] KIP-35: Retrieving protocol version

2016-03-27 Thread Dana Powers

Great questions.

But I wonder if we're expanding the scope of this KIP too much? The
questions you've raised relate to java client development, and broker
testing. Issues #2 and #3 are not related to this KIP at all: how does the
server test and validate api protocol changes and compatibility? That is a
fundamental question regardless whether clients can get version metadata or
not. It is the basis for client forwards compatibility (and broker
backwards compatibility).

While these are great questions that need good solutions, I dont think the
KIP was intended to solve them. Rather, it is aimed at kafka clients that
are attempting to be backwards compatible, namely librdkafka and
kafka-python. It also seems that the Java client dev team doesn't have
backwards compatible clients as high on the priority list as we do. That's
ok! But lets not let that delay or prevent the *server* dev team from
adding support for this simple API to help other client teams.

Recall that you were both (Jay and Gwen) in favor of this approach a year
ago:
http://mail-archives.apache.org/mod_mbox/kafka-dev/201501.mbox/%3ccaoejijgnidpnvr-tpcpkzqfkuwdj+rtymvgkkkdwmkczfqm...@mail.gmail.com%3E

KIP-35 is exactly option #2 from that thread. This doesn't seem like a
controversial API at all.

It's a bit frustrating that something this simple, and which is seemingly
unopposed, is taking so long to get approval. If there's anything I can do
to help facilitate, please let me know.

-Dana
We (Jay + me) had some extra information we wanted to see in the KIP before
we are comfortable voting:

* Where does the Java client fits in. Hopefully we can use this KIP to
standardize behavior and guarantees between Java and non-Java clients, so
when we reason about the Java clients, which most Kafka developers are
familiar with, we will make the right decisions for all clients.
* When do we bump the protocol? I think 90% of the issue is not that the
version got bumped but rather that we changed behavior without bumping
versions. For the new VersionRequest to be useful, we need to all know when
to get new versions...
* How do we test / validate - I think our recent experience shows that our
protocol tests and compatibility tests are still inadequate. Having
VersionRequest is useless if we can't validate that Kafka actually
implements the protocol it says it does (and we caught this breaks twice in
the last two weeks)
* Error handling of protocol mismatches

Ashish kindly agreed to think about this and improve the KIP.
We'll resume the vote as soon as he's back :)

Gwen

On Wed, Mar 23, 2016 at 5:55 PM, Dana Powers <dana.pow...@gmail.com> wrote:

> speaking of pending KIPs, what's the status on this one?
>
>
> On Fri, Mar 18, 2016 at 9:47 PM, Ashish Singh <asi...@cloudera.com> wrote:
>
> > Hey Jay,
> >
> > Answers inline.
> >
> > On Fri, Mar 18, 2016 at 10:45 AM, Jay Kreps <j...@confluent.io> wrote:
> >
> > Hey Ashish,
> > >
> > > Couple quick things:
> > >
> > > 1. You list as a rejected alternative "making the documentation the
> > > source of truth for the protocol", but I think what you actually
> > > describe in that section is global versioning, which of those two
> > > things are we voting to reject? I think this is a philosophical point
> > > but an important one...
> > >
> > One of the major differences between Option 3 and other options
discussed
> > on KIP is that Option 3 is documentation oriented and it is that what I
> > wanted to capture in the title. I am happy to change it to global
> > versioning.
> >
> >
> > > 2. Can you describe the changes necessary and classes we'd have to
> > > update in the java clients to make use of this feature? What would
> > > that look like? One concern I have is just the complexity necessary to
> > > do the per-connection protocol version check and really handle all the
> > > cases. I assume you've thought through what that looks like, can you
> > > sketch that out for people?
> > >
> > I would imagine any client, even Java client, would follow the steps
> > mentioned here
> > <
> >
>
https://cwiki.apache.org/confluence/display/KAFKA/KIP-35+-+Retrieving+protocol+version#KIP-35-Retrievingprotocolversion-Aclientdeveloperwantstoaddsupportforanewfeature.1
> > >.
> > Below are my thoughts on how java client can maintain api versions
> > supported by various brokers in cluster.
> >
> >1. ClusterConnectionStates can provide info on whether api versions
> have
> >been retrieved for a connection or not.
> >2. NetworkClient.handleConnections can send ApiVersionQueryRequest to
> >newly connected nodes.
> >3. NetworkClient can be enhanced to handle

Re: [VOTE] KIP-35: Retrieving protocol version

2016-03-23 Thread Dana Powers

speaking of pending KIPs, what's the status on this one?


On Fri, Mar 18, 2016 at 9:47 PM, Ashish Singh  wrote:

> Hey Jay,
>
> Answers inline.
>
> On Fri, Mar 18, 2016 at 10:45 AM, Jay Kreps  wrote:
>
> Hey Ashish,
> >
> > Couple quick things:
> >
> > 1. You list as a rejected alternative "making the documentation the
> > source of truth for the protocol", but I think what you actually
> > describe in that section is global versioning, which of those two
> > things are we voting to reject? I think this is a philosophical point
> > but an important one...
> >
> One of the major differences between Option 3 and other options discussed
> on KIP is that Option 3 is documentation oriented and it is that what I
> wanted to capture in the title. I am happy to change it to global
> versioning.
>
>
> > 2. Can you describe the changes necessary and classes we'd have to
> > update in the java clients to make use of this feature? What would
> > that look like? One concern I have is just the complexity necessary to
> > do the per-connection protocol version check and really handle all the
> > cases. I assume you've thought through what that looks like, can you
> > sketch that out for people?
> >
> I would imagine any client, even Java client, would follow the steps
> mentioned here
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-35+-+Retrieving+protocol+version#KIP-35-Retrievingprotocolversion-Aclientdeveloperwantstoaddsupportforanewfeature.1
> >.
> Below are my thoughts on how java client can maintain api versions
> supported by various brokers in cluster.
>
>1. ClusterConnectionStates can provide info on whether api versions have
>been retrieved for a connection or not.
>2. NetworkClient.handleConnections can send ApiVersionQueryRequest to
>newly connected nodes.
>3. NetworkClient can be enhanced to handle ApiVersionQueryResponse and
>set ClusterConnectionStates to indicate api versions have been retrieved
>for the node.
>4. NetworkClient maintains mapping Node -> [(api_key, min_ver,
>max_ver)], brokerApiVersions, cached.
>5. NetworkClient.processDisconnection can remove entry for a node from
>brokerApiVersions cache.
>6. NetworkClient.canSendRequest can have an added condition on node to
>have api versions available.
>
> With the above changes, at any given point of time NetworkClient will be
> aware of api versions supported by each of the connected nodes. I am not
> sure if the above changes are the best way to do it, people are welcome to
> pitch in. Does it help?
>
>
> > -Jay
> >
> > On Mon, Mar 14, 2016 at 3:54 PM, Ashish Singh 
> wrote:
> > > Hey Guys,
> > >
> > > I would like to start voting process for *KIP-35: Retrieving protocol
> > > version*. The KIP is available here
> > > <
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-35+-+Retrieving+protocol+version
> > >.
> > > Here
> > > <
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-35+-+Retrieving+protocol+version#KIP-35-Retrievingprotocolversion-SummaryofthechangesproposedaspartofthisKIP
> > >
> > > is a brief summary of the KIP.
> > >
> > > The vote will run for 72 hours.
> > >
> > > --
> > >
> > > Regards,
> > > Ashish
> >
> 
> --
>
> Regards,
> Ashish
>

[jira] [Commented] (KAFKA-3442) FetchResponse size exceeds max.partition.fetch.bytes

2016-03-23 Thread Dana Powers (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15208721#comment-15208721
 ] 

Dana Powers commented on KAFKA-3442:


+1. verified test passes on trunk commit `7af67ce` 

> FetchResponse size exceeds max.partition.fetch.bytes
> 
>
> Key: KAFKA-3442
> URL: https://issues.apache.org/jira/browse/KAFKA-3442
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.0
>    Reporter: Dana Powers
>Assignee: Jiangjie Qin
>Priority: Blocker
> Fix For: 0.10.0.0
>
>
> Produce 1 byte message to topic foobar
> Fetch foobar w/ max.partition.fetch.bytes=1024
> Test expects to receive a truncated message (~1024 bytes). 0.8 and 0.9 pass 
> this test, but 0.10 FetchResponse has full message, exceeding the max 
> specified in the FetchRequest.
> I tested with v0 and v1 apis, both fail. Have not tested w/ v2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-3442) FetchResponse size exceeds max.partition.fetch.bytes

2016-03-22 Thread Dana Powers (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207142#comment-15207142
 ] 

Dana Powers commented on KAFKA-3442:


sounds good to me!

> FetchResponse size exceeds max.partition.fetch.bytes
> 
>
> Key: KAFKA-3442
> URL: https://issues.apache.org/jira/browse/KAFKA-3442
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.0
>    Reporter: Dana Powers
>Assignee: Jiangjie Qin
>Priority: Blocker
> Fix For: 0.10.0.0
>
>
> Produce 1 byte message to topic foobar
> Fetch foobar w/ max.partition.fetch.bytes=1024
> Test expects to receive a truncated message (~1024 bytes). 0.8 and 0.9 pass 
> this test, but 0.10 FetchResponse has full message, exceeding the max 
> specified in the FetchRequest.
> I tested with v0 and v1 apis, both fail. Have not tested w/ v2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-3442) FetchResponse size exceeds max.partition.fetch.bytes

2016-03-22 Thread Dana Powers (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15206876#comment-15206876
 ] 

Dana Powers commented on KAFKA-3442:


[~junrao] Actually I meant adding the error code to v0 and v1 responses. I 
think that would be helpful if the response behavior changes to no longer 
include partial messages. But if behavior doesn't change for v0/v1 then agree 
it is probably better to defer to future release.

> FetchResponse size exceeds max.partition.fetch.bytes
> 
>
> Key: KAFKA-3442
> URL: https://issues.apache.org/jira/browse/KAFKA-3442
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.0
>    Reporter: Dana Powers
>Assignee: Jiangjie Qin
>Priority: Blocker
> Fix For: 0.10.0.0
>
>
> Produce 1 byte message to topic foobar
> Fetch foobar w/ max.partition.fetch.bytes=1024
> Test expects to receive a truncated message (~1024 bytes). 0.8 and 0.9 pass 
> this test, but 0.10 FetchResponse has full message, exceeding the max 
> specified in the FetchRequest.
> I tested with v0 and v1 apis, both fail. Have not tested w/ v2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-3442) FetchResponse size exceeds max.partition.fetch.bytes

2016-03-22 Thread Dana Powers (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15206790#comment-15206790
 ] 

Dana Powers commented on KAFKA-3442:


yes, kafka-python uses the same check for partial messages as the java client 
and will raise a RecordTooLargeException to user if there is only a partial 
message. I think it would be best if the 0.10 broker continued to return 
partial messages, but since the protocol spec refers to this behavior as an 
optimization then I think it would be acceptable to change the behavior and 
return empty payload. Would it be possible to include an error_code with the 
empty payload?

> FetchResponse size exceeds max.partition.fetch.bytes
> 
>
> Key: KAFKA-3442
> URL: https://issues.apache.org/jira/browse/KAFKA-3442
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.0
>    Reporter: Dana Powers
>Assignee: Jiangjie Qin
>Priority: Blocker
> Fix For: 0.10.0.0
>
>
> Produce 1 byte message to topic foobar
> Fetch foobar w/ max.partition.fetch.bytes=1024
> Test expects to receive a truncated message (~1024 bytes). 0.8 and 0.9 pass 
> this test, but 0.10 FetchResponse has full message, exceeding the max 
> specified in the FetchRequest.
> I tested with v0 and v1 apis, both fail. Have not tested w/ v2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-3442) FetchResponse size exceeds max.partition.fetch.bytes

2016-03-21 Thread Dana Powers (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205655#comment-15205655
 ] 

Dana Powers commented on KAFKA-3442:


In all prior broker releases clients check the response for a "partial" message 
-- i.e. the MessageSetSize is less than the MessageSize. This follows this 
statement in the protocol wiki:

"As an optimization the server is allowed to return a partial message at the 
end of the message set. Clients should handle this case."

So in this case the client is checking for a partial message, not an error code.

> FetchResponse size exceeds max.partition.fetch.bytes
> 
>
> Key: KAFKA-3442
> URL: https://issues.apache.org/jira/browse/KAFKA-3442
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.0
>Reporter: Dana Powers
>Assignee: Jiangjie Qin
>Priority: Blocker
> Fix For: 0.10.0.0
>
>
> Produce 1 byte message to topic foobar
> Fetch foobar w/ max.partition.fetch.bytes=1024
> Test expects to receive a truncated message (~1024 bytes). 0.8 and 0.9 pass 
> this test, but 0.10 FetchResponse has full message, exceeding the max 
> specified in the FetchRequest.
> I tested with v0 and v1 apis, both fail. Have not tested w/ v2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (KAFKA-3442) FetchResponse size exceeds max.partition.fetch.bytes

2016-03-21 Thread Dana Powers (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205411#comment-15205411
 ] 

Dana Powers edited comment on KAFKA-3442 at 3/22/16 12:13 AM:
--

{code}
# clone kafka-python repo
git clone https://github.com/dpkp/kafka-python.git

# install kafka fixture binaries
tar xzvf kafka_2.10-0.10.0.0.tgz -C servers/0.10.0.0/
mv servers/0.10.0.0/kafka_2.10-0.10.0.0 servers/0.10.0.0/kafka-bin

# you can install other versions of kafka to compare test results
KAFKA_VERSION=0.9.0.1 ./build_integration.sh

# install python test harness
pip install tox

# run just the failing test [replace py27 w/ py## as needed: 
py26,py27,py33,py34,py35]
KAFKA_VERSION=0.10.0.0 tox -e py27 
test/test_consumer_integration.py::TestConsumerIntegration::test_huge_messages
{code}


was (Author: dana.powers):
# clone kafka-python repo
git clone https://github.com/dpkp/kafka-python.git

# install kafka fixture binaries
tar xzvf kafka_2.10-0.10.0.0.tgz -C servers/0.10.0.0/
mv servers/0.10.0.0/kafka_2.10-0.10.0.0 servers/0.10.0.0/kafka-bin

# you can install other versions of kafka to compare test results
KAFKA_VERSION=0.9.0.1 ./build_integration.sh

# install python test harness
pip install tox

# run just the failing test [replace py27 w/ py## as needed: 
py26,py27,py33,py34,py35]
KAFKA_VERSION=0.10.0.0 tox -e py27 
test/test_consumer_integration.py::TestConsumerIntegration::test_huge_messages

> FetchResponse size exceeds max.partition.fetch.bytes
> 
>
> Key: KAFKA-3442
> URL: https://issues.apache.org/jira/browse/KAFKA-3442
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.0
>    Reporter: Dana Powers
>Priority: Blocker
> Fix For: 0.10.0.0
>
>
> Produce 1 byte message to topic foobar
> Fetch foobar w/ max.partition.fetch.bytes=1024
> Test expects to receive a truncated message (~1024 bytes). 0.8 and 0.9 pass 
> this test, but 0.10 FetchResponse has full message, exceeding the max 
> specified in the FetchRequest.
> I tested with v0 and v1 apis, both fail. Have not tested w/ v2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-3442) FetchResponse size exceeds max.partition.fetch.bytes

2016-03-21 Thread Dana Powers (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205411#comment-15205411
 ] 

Dana Powers commented on KAFKA-3442:


# clone kafka-python repo
git clone https://github.com/dpkp/kafka-python.git

# install kafka fixture binaries
tar xzvf kafka_2.10-0.10.0.0.tgz -C servers/0.10.0.0/
mv servers/0.10.0.0/kafka_2.10-0.10.0.0 servers/0.10.0.0/kafka-bin

# you can install other versions of kafka to compare test results
KAFKA_VERSION=0.9.0.1 ./build_integration.sh

# install python test harness
pip install tox

# run just the failing test [replace py27 w/ py## as needed: 
py26,py27,py33,py34,py35]
KAFKA_VERSION=0.10.0.0 tox -e py27 
test/test_consumer_integration.py::TestConsumerIntegration::test_huge_messages

> FetchResponse size exceeds max.partition.fetch.bytes
> 
>
> Key: KAFKA-3442
> URL: https://issues.apache.org/jira/browse/KAFKA-3442
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.0
>    Reporter: Dana Powers
>Priority: Blocker
> Fix For: 0.10.0.0
>
>
> Produce 1 byte message to topic foobar
> Fetch foobar w/ max.partition.fetch.bytes=1024
> Test expects to receive a truncated message (~1024 bytes). 0.8 and 0.9 pass 
> this test, but 0.10 FetchResponse has full message, exceeding the max 
> specified in the FetchRequest.
> I tested with v0 and v1 apis, both fail. Have not tested w/ v2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (KAFKA-3442) FetchResponse size exceeds max.partition.fetch.bytes

2016-03-21 Thread Dana Powers (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dana Powers updated KAFKA-3442:
---
Priority: Blocker  (was: Major)

> FetchResponse size exceeds max.partition.fetch.bytes
> 
>
> Key: KAFKA-3442
> URL: https://issues.apache.org/jira/browse/KAFKA-3442
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.0
>    Reporter: Dana Powers
>Priority: Blocker
> Fix For: 0.10.0.0
>
>
> Produce 1 byte message to topic foobar
> Fetch foobar w/ max.partition.fetch.bytes=1024
> Test expects to receive a truncated message (~1024 bytes). 0.8 and 0.9 pass 
> this test, but 0.10 FetchResponse has full message, exceeding the max 
> specified in the FetchRequest.
> I tested with v0 and v1 apis, both fail. Have not tested w/ v2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [VOTE] 0.10.0.0 RC0

2016-03-21 Thread Dana Powers

I filed a bug re max.partition.fetch.bytes here:
https://issues.apache.org/jira/browse/KAFKA-3442

Would it be useful to create a 0.10.0.0-rc0 version for JIRA tickets? Or
should issues just get filed against 0.10.0.0 ?

-Dana


On Mon, Mar 21, 2016 at 1:53 PM, Gwen Shapira  wrote:

> Hello Kafka users, developers and client-developers,
>
> This is the first candidate for release of Apache Kafka 0.10.0.0.
> This is a major release that includes: (1) New message format including
> timestamps (2) client interceptor API (3) Kafka Streams. Since this is a
> major release, we will give people more time to try it out and give
> feedback.
>
> Release notes for the 0.10.0.0 release:
> http://home.apache.org/~gwenshap/0.10.0.0-rc0/RELEASE_NOTES.HTML
>
> *** Please download, test and vote by Monday, March 28, 9am PT
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
> http://kafka.apache.org/KEYS
>
> * Release artifacts to be voted upon (source and binary):
> http://home.apache.org/~gwenshap/0.10.0.0-rc0/
>
> * Maven artifacts to be voted upon:
> https://repository.apache.org/content/groups/staging/
>
> * scala-doc
> http://home.apache.org/~gwenshap/0.10.0.0-rc0/scaladoc
>
> * java-doc
> http://home.apache.org/~gwenshap/0.10.0.0-rc0/javadoc/
>
> * tag to be voted upon (off 0.10.0 branch) is the 0.10.0.0 tag:
>
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=72fd542633a95a8bd5bdc9fdca56042b643cb4b0
>
> * Documentation:
> http://kafka.apache.org/0100/documentation.html
>
> * Protocol:
> http://kafka.apache.org/0100/protocol.html
>
> /**
>
> Thanks,
>
> Gwen
>

[jira] [Created] (KAFKA-3442) FetchResponse size exceeds max.partition.fetch.bytes

2016-03-21 Thread Dana Powers (JIRA)

Dana Powers created KAFKA-3442:
--

 Summary: FetchResponse size exceeds max.partition.fetch.bytes
 Key: KAFKA-3442
 URL: https://issues.apache.org/jira/browse/KAFKA-3442
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.10.0.0
Reporter: Dana Powers


Produce 1 byte message to topic foobar
Fetch foobar w/ max.partition.fetch.bytes=1024

Test expects to receive a truncated message (~1024 bytes). 0.8 and 0.9 pass 
this test, but 0.10 FetchResponse has full message, exceeding the max specified 
in the FetchRequest.

I tested with v0 and v1 apis, both fail. Have not tested w/ v2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [DISCUSS] KIP-35 - Retrieve protocol version

2016-03-20 Thread Dana Powers

; > > > > > > > > >> >> mechanism, discussed below, client gets to know that
> > > > BrokerA
> > > > > > has
> > > > > > > > > >> ApiVersion
> > > > > > > > > >> >> 4 and BrokerB has ApiVersion 5. With that
> information,
> > > and
> > > > > the
> > > > > > > > > available
> > > > > > > > > >> >> protocol documentations for those ApiVersions client
> > can
> > > > > deduce
> > > > > > > > what
> > > > > > > > > >> >> protocol versions does the broker supports. In this
> > case
> > > > > client
> > > > > > > > will
> > > > > > > > > >> deduce
> > > > > > > > > >> >> that it can use v0 and v1 of REQ_A and RESP_A while
> > > talking
> > > > > to
> > > > > > > > > BrokerA,
> > > > > > > > > >> >> while it can use v1 and v2 of REQ_A and RESP_A while
> > > > talking
> > > > > to
> > > > > > > > > BrokerB.
> > > > > > > > > >> >>
> > > > > > > > > >> >> On Mon, Mar 14, 2016 at 10:50 PM, Ewen
> > Cheslack-Postava <
> > > > > > > > > >> e...@confluent.io <javascript:;>
> > > > > > > > > >> >> > wrote:
> > > > > > > > > >> >>
> > > > > > > > > >> >>> Yeah, Gwen's example is a good one. And it doesn't
> > even
> > > > have
> > > > > > to
> > > > > > > be
> > > > > > > > > >> thought
> > > > > > > > > >> >>> of in terms of the implementation -- you can think
> of
> > > the
> > > > > > > protocol
> > > > > > > > > >> itself
> > > > > > > > > >> >>> as effectively being possible to branch and have
> > changes
> > > > > > > > > cherry-picked.
> > > > > > > > > >> >>> Given the way some changes interact and that only
> some
> > > may
> > > > > be
> > > > > > > > > feasible
> > > > > > > > > >> to
> > > > > > > > > >> >>> backport, this may be important.
> > > > > > > > > >> >>>
> > > > > > > > > >> >>> Similarly, it's difficult to make that definition .
> In
> > > > > > practice,
> > > > > > > > we
> > > > > > > > > >> >>> sometimes branch and effectively merge the protocol
> --
> > > > i.e.
> > > > > we
> > > > > > > > > develop
> > > > > > > > > >> 2
> > > > > > > > > >> >>> KIPs with independent changes at the same time. If
> you
> > > > > force a
> > > > > > > > > linear
> > > > > > > > > >> >>> model, you also *force* the ordering of
> > implementation,
> > > > > which
> > > > > > > will
> > > > > > > > > be a
> > > > > > > > > >> >>> pretty serious constraint in a lot of cases. Two
> > > > > > > protocol-changing
> > > > > > > > > KIPs
> > > > > > > > > >> >>> may
> > > > > > > > > >> >>> occur near in time, but one may be a much larger
> > change.
> > > > > > > > > >> >>>
> > > > > > > > > >> >>> Finally, it might be worth noting that from a client
> > > > > > developer's
> > > > > > > > > >> >>> perspective, the linear order may not be all that
> > > > intuitive
> > > > > > when
> > > > > > > > we
> > > > > > > > > >> pile
> > > > > > > > > >> >>> on
> > > > > > >

Re: [DISCUSS] KIP-35 - Retrieve protocol version

2016-03-19 Thread Dana Powers

On Thu, Mar 17, 2016 at 1:42 PM, Gwen Shapira  wrote:

> "I think I would make this approach work by looking at the released server
> version documentation for each version that I am trying to support and test
> against*, manually identify the expected "protocol vectors" each supports,
> store that as a map of vectors to "broker versions", check each vector at
> runtime until I find a match, and write code compatibility checks from
> there."
>
> How is this better than a global version ID?

As a client developer, it seems roughly the same. I think it probably
avoids the server development workflow issues, and possibly the need to
agree on semantics of the global version ID? But others surely are more
qualified than me to comment on that part.

-Dana

Re: [VOTE] KIP-35: Retrieving protocol version

2016-03-18 Thread Dana Powers

+1

On Thu, Mar 17, 2016 at 4:09 PM, Ashish Singh  wrote:

> As the KIP has been modified since we started this vote, the vote is
> restarted from now.
>
> The updated KIP is available at
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-35+-+Retrieving+protocol+version
> .
>
> On Mon, Mar 14, 2016 at 3:54 PM, Ashish Singh  wrote:
>
> > Hey Guys,
> >
> > I would like to start voting process for *KIP-35: Retrieving protocol
> > version*. The KIP is available here
> > <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-35+-+Retrieving+protocol+version
> >.
> > Here
> > <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-35+-+Retrieving+protocol+version#KIP-35-Retrievingprotocolversion-SummaryofthechangesproposedaspartofthisKIP
> >
> > is a brief summary of the KIP.
> >
> > The vote will run for 72 hours.
> > 
> > --
> >
> > Regards,
> > Ashish
> >
>
>
>
> --
>
> Regards,
> Ashish
>

Re: [DISCUSS] KIP-35 - Retrieve protocol version

2016-03-18 Thread Dana Powers

On Thu, Mar 17, 2016 at 2:07 PM, Jason Gustafson  wrote:

> It would also be nice to discuss the way the current client
> versioning works and why it is inadequate for third-party clients.
>

My understanding of the java client versioning is that it is not
backwards-compatible. Instructions are to use a java client that is <= your
broker version.

The main inadequacy of this model is that some clients want to deliver
backwards-compatible client software. I assume that doesn't need an
explanation, but these are my reasons anyways: it makes users happier,
reduces maintenance burden, and avoids unnecessary client fragmentation.

The other inadequacy, assuming you did want to follow the same practice as
the java client, is that third-party clients are not versioned in lockstep
with the broker, so we cannot use a similar policy of "use a client version
that is <= your broker version". Instead, third-party clients must manage
and communicate the relationships between client versions and broker
versions independently. That is difficult and leads to lots of user
confusion.

-Dana

Re: [DISCUSS] KIP-35 - Retrieve protocol version

2016-03-14 Thread Dana Powers

Is a linear protocol int consistent with the current release model? It
seems like that would break down w/ the multiple release branches that are
all simultaneously maintained? Or is it implicit that no patch release can
ever bump the protocol int? Or maybe the protocol int gets some extra
"wiggle" on minor / major releases to create unallocated version ints that
could be used on future patch releases / backports?

I think the protocol version int does make sense for folks deploying from
trunk.

-Dana

On Mon, Mar 14, 2016 at 6:13 PM, Jay Kreps  wrote:

> Yeah I think that is the point--we have a proposal for a new protocol
> versioning scheme and a vote on it but it doesn't actually describe
> how versioning will work yet! I gave my vague impression based on this
> thread, but I want to make sure that is correct and get it written
> down before we adopt it.
>
> -Jay
>
> On Mon, Mar 14, 2016 at 5:58 PM, Gwen Shapira  wrote:
> > On Mon, Mar 14, 2016 at 5:31 PM, Jay Kreps  wrote:
> >
> >> Couple of missing things:
> >>
> >> This KIP doesn't have a proposal on versioning it just gives different
> >> options, it'd be good to get a concrete proposal in the KIP. Here is my
> >> understanding of what we are proposing (can someone sanity check and if
> >> correct, update the kip):
> >>
> >>1. We will augment the existing api_version field in the header with
> a
> >>protocol_version that will begin at some initial value and increment
> by
> >> 1
> >>every time we make a changes to any of the api_versions (question:
> >>including internal apis?).
> >>
> >
> > Jay, this part was not in the KIP and was never discussed.
> > Are you proposing adding this? Or is it just an assumption you made?
> >
> >
> >
> >>2. The protocol_version will be added to the metadata request
> >>3. We will also add a string that this proposal is calling
> VersionString
> >>which will describe the build of kafka in some way. The clients
> should
> >> not
> >>under any circumstances do anything with this string other than
> print it
> >>out to the user.
> >>
> >> One thing I'm not sure about: I think currently metadata sits in the
> client
> >> for 10 mins by default. Say a client bootstraps and then a server is
> >> downgraded to an earlier version, won't the client's metadata version
> >> indicate that that client handles a version it doesn't actually handle
> any
> >> more? We need to document how clients will handle this.
> >>
> >> Here are some comments on other details:
> >>
> >>1. As a minor thing I think we should avoid naming the fields
> VersionId
> >>and VersionString which sort of implies they are both used for
> >> versioning.
> >>I think we should call them something like ProtocolVersion and
> >>BuildDescription, with BuildDescription being totally unspecified
> other
> >>than that it is some kind of human readable string describing a
> >> particular
> >>Kafka build. We really don't want a client attempting to use this
> >> string in
> >>any way as that would always be the wrong thing to do in the
> versioning
> >>scheme we are proposing, you should always use the protocol version.
> >>2. Does making the topics field in the metadata request nullable
> >>actually make sense? We have a history of wanting to add magical
> values
> >>rather than fields. Currently topics=[a] means give me information
> about
> >>topic a, topics=[] means give me information about all topics, and we
> >> are
> >>proposing topics=null would mean don't give me topics. I don't have a
> >>strong opinion.
> >>3. I prefer Jason's proposal on using a conservative metadata version
> >>versus the empty response hack. However I think that may actually
> >>exacerbate the downgrade scenario I described.
> >>4. I agree with Jason that we should really look at the details of
> the
> >>implementation so we know it works--implementing server support
> without
> >>actually trying it is kind of risky.
> >>
> >> As a meta comment: I'd really like to encourage us to think of the
> protocol
> >> as a document that includes the following things:
> >>
> >>- The binary format, error codes, etc
> >>- The request/response interaction
> >>- The semantics of each request in different cases
> >>- Instructions on how to use this to implement a client
> >>
> >> This document is versioned with the protocol number and is the source of
> >> truth for the protocol.
> >>
> >> Part of any protocol change needs to be an update to the instructions on
> >> how to use that part of the protocol. We should be opinionated. If there
> >> are two options there should be a reason, and then we need to document
> both
> >> and say exactly when to use each.
> >>
> >> I think we also need to get a "how to" document on protocol changes
> just so
> >> people know what they need to do to add a new protocol feature.
> >>

Re: [DISCUSS] KIP-35 - Retrieve protocol version

2016-03-14 Thread Dana Powers

There seems to be a lot of tension between support for release-deploys vs.
trunk-deploys. Is there a better way to handle this?

In my experience the vast majority of third-party client code is written
managing compatibility on the first case (release-deploys). I would prefer
a simple approach that is optimized for third-party clients that rarely
connect to trunk-deploys. As Magnus mentioned, the approach we take in
kafka-python is to attempt to identify the broker version -- 0.9, 0.8.2,
0.8.1, 0.8.0 -- and gate "blocks" of features based on that information.

Would it be reasonable to put the onus on the user to manage connecting to
trunk-deploys? If the broker always returns 'trunk' and the client is
configured to manually override the "broker version" via configuration,
would that work for people running trunk-deploys? For example, I might run
a trunk-deploy broker and configure my client to assume broker version
'0.10-dev' and write some client code to support that.

To be honest, I do not plan to release any code publicly (i.e., to pypi)
that is intended to support trunk-deploys. That really sounds like a
maintenance nightmare. I would expect anyone running a server pulled from
trunk to also run clients that aren't officially released / are in active
development.

-Dana


On Mon, Mar 14, 2016 at 11:19 AM, Ashish Singh  wrote:

> On Mon, Mar 14, 2016 at 9:37 AM, Gwen Shapira  wrote:
>
> > On Mon, Mar 14, 2016 at 7:05 AM, Ismael Juma  wrote:
> > > Hi Ashish,
> > >
> > > A few comments below.
> > >
> > > On Fri, Mar 11, 2016 at 9:59 PM, Ashish Singh 
> > wrote:
> > >
> > >> Sounds like we are mostly in agreement. Following are the key points.
> > >>
> > >>1. Every time a protocol version changes, for any request/response,
> > >>broker version, ApiVersion, will be bumped up.
> > >>
> > >
> > > Any thoughts on how we will enforce this?
> >
> > Code reviews? :)
> >
> > We are already doing it in ApiVersion (and have been since
> > 0.8.2.0-SNAPSHOT). Enforcing is awesome, but not necessarily part of
> > this KIP.
> >
> > >
> > >
> > >>2. Protocol documentation will be versioned with broker version.
> > Every
> > >>time there is a broker version change, protocol documentation
> version
> > >> needs
> > >>to be updated and linked to main documentation page.
> > >>3. Deprecation of protocol version will be done via marking the
> > version
> > >>as deprecated on the protocol documentation.
> > >>
> > >
> > > I think this is fine for the first version. We may consider doing more
> in
> > > the future (logging a warning perhaps).
> > >
> > >
> > >>4. On getting unknown protocol version, broker will send an empty
> > >>response, instead of simply closing client connection.
> > >>
> > >
> > > I am not sure about this one. It's an unusual pattern and feels like a
> > hack.
> >
> > We discussed this and failed to come up with a better solution that
> > doesn't break compatibility.
> > Improvements can be added in the future - nothing can be worse than
> > current state (where the broker silently closes the connection)
> >
> > >
> > >5. Metadata response will be enhanced to also contain broker
> version,
> > >>VersionInt and VersionString. VersionString will contain internal
> > >>version information.
> > >>
> > >
> > > Even though Magnus suggested that it's OK for clients to parse
> > > `VersionString`, I personally would rather avoid that. Do we really
> need
> > 3
> > > separate versions or could we get away with 2? I think it would be good
> > to
> > > elaborate on this a bit and explain how each of these versions would be
> > > used (both from the broker and clients perspective).
> >
> > Agree! I'm also confused.
> >
> I am working on updating KIP and hopefully that will be less confusing.
> What I meant was metadata response will have broker-version, which will be
> made up of VersionInt and VersionString. For example, (4, "0.10.0-IV0"),
> this will be based on respective ApiVersions,
>
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/api/ApiVersion.scala#L100
> .
>
> >
> > >
> > >>6. Metadata request with single null topic and size of -1 can be
> > used to
> > >>fetch metadata response with only broker version information and no
> > >>topic/broker info.
> > >
> > >7. On receiving a metadata request with single null topic with size
> of
> > >>-1, broker will respond with only broker version.
> > >>
> > >
> > > As Magnus says, the broker information should be returned. This would
> > also
> > > help us reduce unnecessary data transfer during NetworkClient's
> metadata
> > > updates (KAFKA-3358). At the moment, we get information for all topics
> in
> > > situations where we actually want no topics.
> > >
> > > Also, I think it's a bit odd to say a `single null topic with size -1`.
> > Do
> > > we mean an array of topics with size -1 and no

Re: [DISCUSS] KIP-35 - Retrieve protocol version

2016-03-02 Thread Dana Powers

In kafka-python we've been doing something like:

if version >= (0, 9):
  Do cool new stuff
elif version >= (0, 8, 2):
  Do some older stuff

else:
  raise UnsupportedVersionError

This will break if / when the new 0.9 apis are completely removed from some
future release, but should handle intermediate broker upgrades. Because we
can't add support for future apis a priori, I think the best we could do
here is throw an error that request protocol version X is not supported.
For now that comes through as a broken socket connection, so there is an
error - just not a super helpful one.

For that reason I'm also in favor of a generic error response when a
protocol req is not recognized.

-Dana
On Mar 2, 2016 5:38 PM, "Jay Kreps"  wrote:

> But won't it be the case that what clients end up doing would be something
> like
>if(version != 0.8.1)
>   throw new UnsupportedVersionException()
> which then means the client is broken as soon as we release a new server
> version even though the protocol didn't change. I'm actually not sure how
> you could use that information in a forward compatible way since you can't
> know a priori if you will work with the next release until you know if the
> protocol changed.
>
> -Jay
>
> On Wed, Mar 2, 2016 at 5:28 PM, Jason Gustafson 
> wrote:
>
> > Hey Jay,
> >
> > Yeah, I wasn't suggesting that we eliminate request API versions. They're
> > definitely needed on the broker to support compatibility. I was just
> saying
> > that if a client wants to support multiple broker versions (e.g. 0.8 and
> > 0.9), then it makes more sense to me to make the kafka release version
> > available in order to determine which version of the request API should
> be
> > used rather than adding a new request type which exposes all of the
> > different supported versions for all of the request types. Request API
> > versions all change in lockstep with Kafka releases anyway.
> >
> > -Jason
> >
> > On Wed, Mar 2, 2016 at 5:15 PM, Becket Qin  wrote:
> >
> > > I think using Kafka release version makes sense. More particularly, we
> > can
> > > use the ApiVersion and this will cover all the interval version as
> well.
> > In
> > > KAFKA-3025, we added the ApiVersion to message format version mapping,
> We
> > > can add the ApiKey to version mapping to ApiVersion as well. We can
> move
> > > ApiVersion class to o.a.k.c package and use it for both server and
> > clients.
> > >
> > > @Jason, if we cache the release info in metadata and not re-validate
> the
> > > release on reconnect, would it still work if we do a rolling downgrade?
> > >
> > > On Wed, Mar 2, 2016 at 3:16 PM, Jason Gustafson 
> > > wrote:
> > >
> > > > I think Dana's suggestion to include the Kafka release version makes
> a
> > > lot
> > > > of sense. I'm actually wondering why you would need the individual
> API
> > > > versions if you have that? It sounds like keeping track of all the
> api
> > > > version information would add a lot of complexity to clients since
> > > they'll
> > > > have to try to handle different version permutations which are not
> > > actually
> > > > possible in practice. Wouldn't it be simpler to know that you're
> > talking
> > > to
> > > > an 0.9 broker than that you're talking to a broker which supports
> > > version 2
> > > > of the group coordinator request, version 1 of fetch request, etc?
> > Also,
> > > > the release version could be included in the broker information in
> the
> > > > topic metadata request which would save the need for the additional
> > round
> > > > trip on every reconnect.
> > > >
> > > > -Jason
> > > >
> > > > On Tue, Mar 1, 2016 at 7:59 PM, Ashish Singh 
> > > wrote:
> > > >
> > > > > On Tue, Mar 1, 2016 at 6:30 PM, Gwen Shapira 
> > > wrote:
> > > > >
> > > > > > One more thing, the KIP actually had 3 parts:
> > > > > > 1. The version protocol
> > > > > > 2. New response on messages of wrong API key or wrong version
> > > > > > 3. Protocol documentation
> > > > > >
> > > > > There is a WIP patch for adding protocol docs,
> > > > > https://github.com/apache/kafka/pull/970 . By protocol
> > documentation,
> > > > you
> > > > > mean updating this, right?
> > > > >
> > > > > >
> > > > > > I understand that you are offering to only implement part 1?
> > > > > > But the KIP discussion and vote should still cover all three
> parts,
> > > > > > they will just be implemented in separate JIRA?
> > > > > >
> > > > > The patch for KAFKA-3307, https://github.com/apache/kafka/pull/986
> ,
> > > > covers
> > > > > 1 and 2. KAFKA-3309 tracks documentation part. Yes, we should
> include
> > > all
> > > > > the three points you mentioned while discussing or voting for
> KIP-35.
> > > > >
> > > > > >
> > > > > > On Tue, Mar 1, 2016 at 5:25 PM, Ashish Singh <
> asi...@cloudera.com>
> > > > > wrote:
> > > > > > > On Tue, Mar 1, 2016 at 5:11 PM, Gwen Shapira <
>

Re: [DISCUSS] KIP-35 - Retrieve protocol version

2016-02-29 Thread Dana Powers

This is a fantastic and much-needed KIP. All third-party clients have had
to deal with this issue. In my experience most clients are either declaring
they only support version broker version X, or are spending a lot of time
hacking around the issue. I think the community of non-java drivers would
see significant benefit from this proposal.

My specific thought is that for kafka-python it has been easier to manage
compatibility using broker release version to gate various features by
api-protocol version. For example, only enable group coordination apis if
>= (0, 9), kafka-backed offsets >= (0, 8, 2), etc. As an example, here are
some backwards compatibility issues that I think are difficult to capture
w/ just the protocol versions:

- LZ4 compression only supported in brokers >= 0.8.2, but no protocol
change.
- kafka-backed offset storage, in additional to requiring new offset
commit/fetch protocol versions, also requires adding support for tracking
the group coordinator.
- 0.8.2-beta OffsetCommit api [different than 0.8.X release]


Could release version be added to the api response in this proposal?
Perhaps include a KafkaRelease string in the Response before the array of
api versions?

Thanks for the great KIP, Magnus. And thanks for restarting the discussion,
Ashish. I also would like to see this addressed in 0.10

-Dana


On Mon, Feb 29, 2016 at 3:55 PM, Ashish Singh  wrote:

> I hope it is OK for me to make some progress here. I have made the
> following changes.
>
> 1. Updated KIP-35, to adopt Jay's suggestion on maintaining separate list
> of deprecated versions, instead of using a version of -1.
> 2. Added information on required permissions, Describe action on Cluster
> resource, to be able to retrieve protocol versions from a auth enabled
> Kafka cluster.
>
> Created https://issues.apache.org/jira/browse/KAFKA-3304. Primary patch is
> available to review, https://github.com/apache/kafka/pull/986
>
> On Thu, Feb 25, 2016 at 1:27 PM, Ashish Singh  wrote:
>
> > Kafka clients in Hadoop ecosystem, Flume, Spark, etc, have found it
> really
> > difficult to cope up with Kafka releases as they want to support users on
> > different Kafka versions. Capability to retrieve protocol version will
> go a
> > long way to ease out those pain points. I will be happy to help out with
> > the work on this KIP. @Magnus, thanks for driving this, is it OK if I
> carry
> > forward the work from here. It will be ideal to have this in 0.10.0.0.
> >
> > On Mon, Oct 12, 2015 at 9:29 PM, Jay Kreps  wrote:
> >
> >> I wonder if we need to solve the error problem? I think this KIP gives a
> >> descent work around.
> >>
> >> Probably we should have included an error in the response header, but we
> >> debated it at the time decided not to and now it is pretty hard to add
> >> because the headers aren't versioned (d'oh).
> >>
> >> It seems like any other solution is going to be kind of a hack, right?
> >> Sending malformed responses back seems like not a clean solution...
> >>
> >> (Not sure if I was pro- having a top-level error or not, but in any case
> >> the rationale for the decision was that so many of the requests were
> >> per-partition or per-topic or whatever and hence fail or succeed at that
> >> level and this makes it hard to know what the right top-level error code
> >> is
> >> and hard for the client to figure out what to do with the top level
> error
> >> if some of the partitions succeed but there is a top-level error).
> >>
> >> I think actually this new API actually gives a way to handle this
> >> gracefully on the client side by just having clients that want to be
> >> graceful check for support for their version. Clients that do that will
> >> have a graceful message.
> >>
> >> At some point if we're ever reworking the headers we should really
> >> consider
> >> (a) versioning them and (b) adding a top-level error code in the
> response.
> >> But given this would be a big breaking change and this is really just to
> >> give a nicer error message seems like it probably isn't worth it to try
> to
> >> do something now.
> >>
> >> -Jay
> >>
> >>
> >>
> >> On Mon, Oct 12, 2015 at 8:11 PM, Jiangjie Qin  >
> >> wrote:
> >>
> >> > I am thinking instead of returning an empty response, it would be
> >> better to
> >> > return an explicit UnsupportedVersionException code.
> >> >
> >> > Today KafkaApis handles the error in the following way:
> >> > 1. For requests/responses using old Scala classes, KafkaApis uses
> >> > RequestOrResponse.handleError() to return an error response.
> >> > 2. For requests/response using Java classes (only JoinGroupRequest and
> >> > Heartbeat now), KafkaApis calls AbstractRequest.getErrorResponse() to
> >> > return an error response.
> >> >
> >> > In KAFKA-2512, I am returning an UnsupportedVersionException for case
> >> [1]
> >> > when see an unsupported version. This will put the error code per
>

[jira] [Commented] (KAFKA-3200) MessageSet from broker seems invalid

2016-02-03 Thread Dana Powers (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15131142#comment-15131142
 ] 

Dana Powers commented on KAFKA-3200:


the minimum Message payload size is 26 bytes (8 offset, 4 size, 14 for an 
'empty' message), so generally I would break if there are less than 26 bytes 
left and then also break if the decoded size is larger than the remaining 
buffer.

for reference, the code I wrote to handle message set decoding in kafka-python 
is here: 
https://github.com/dpkp/kafka-python/blob/master/kafka/protocol/message.py#L123-L158

> MessageSet from broker seems invalid
> 
>
> Key: KAFKA-3200
> URL: https://issues.apache.org/jira/browse/KAFKA-3200
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.0
> Environment: Linux,  running Oracle JVM 1.8
>Reporter: Rajiv Kurian
>
> I am writing a java consumer client for Kafka and using the protocol guide at 
> https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol
>  to parse buffers. I am currently running into a problem parsing certain 
> fetch responses. Many times it works fine but some other times it does not. 
> It might just be a bug with my implementation in which case I apologize.
> My messages are uncompressed and exactly 23 bytes in length and has null 
> keys. So each Message in my MessageSet is exactly size 4 (crc) + 
> 1(magic_bytes) + 1 (attributes) + 4(key_num_bytes) + 0 (key is null) + 
> 4(num_value_bytes) + 23(value_bytes) = 37 bytes.
> So each element of the MessageSet itself is exactly 37 (size of message) + 8 
> (offset) + 4 (message_size) = 49 bytes.
> In comparison an empty message set element should be of size 8 (offset) + 4 
> (message_size) + 4 (crc) + 1(magic_bytes) + 1 (attributes) + 4(key_num_bytes) 
> + 0 (key is null) + 4(num_value_bytes) + 0(value is null)  = 26 bytes
> I occasionally receive a MessageSet which says size is 1000. A size of 1000 
> is not divisible by my MessageSet element size which is 49 bytes. When I 
> parse such a message set I can actually read 20 of message set elements(49 
> bytes) which is 980 bytes. I have 20 extra bytes to parse now which is 
> actually less than even an empty message (26 bytes). At this moment I don't 
> know how to parse the messages any more.
> I will attach a file for a response that can actually cause me to run into 
> this problem as well as the sample ccde.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-3177) Kafka consumer can hang when position() is called on a non-existing partition.

2016-02-01 Thread Dana Powers (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127701#comment-15127701
 ] 

Dana Powers commented on KAFKA-3177:


A similar infinite loop happens when the partition exists but has no leader b/c 
it is under-replicated. In that case, Fetcher.listOffset infinitely retries on 
the leaderNotAvailableError returned by sendListOffsetRequest.

> Kafka consumer can hang when position() is called on a non-existing partition.
> --
>
> Key: KAFKA-3177
> URL: https://issues.apache.org/jira/browse/KAFKA-3177
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.9.0.0
>Reporter: Jiangjie Qin
>Assignee: Jiangjie Qin
> Fix For: 0.9.0.1
>
>
> This can be easily reproduced as following:
> {code}
> {
> ...
> consumer.assign(SomeNonExsitingTopicParition);
> consumer.position();
> ...
> }
> {code}
> It seems when position is called we will try to do the following:
> 1. Fetch committed offsets.
> 2. If there is no committed offsets, try to reset offset using reset 
> strategy. in sendListOffsetRequest(), if the consumer does not know the 
> TopicPartition, it will refresh its metadata and retry. In this case, because 
> the partition does not exist, we fall in to the infinite loop of refreshing 
> topic metadata.
> Another orthogonal issue is that if the topic in the above code piece does 
> not exist, position() call will actually create the topic due to the fact 
> that currently topic metadata request could automatically create the topic. 
> This is a known separate issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1493) Use a well-documented LZ4 compression format and remove redundant LZ4HC option

2016-01-27 Thread Dana Powers (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15119703#comment-15119703
 ] 

Dana Powers commented on KAFKA-1493:


Hi all - it appears that the header checksum (HC) byte is incorrect. The Kafka 
implementation hashes the magic bytes + header, but the spec is to only hash 
header (don't include magic).

We are having some trouble encoding/decoding from non-java clients because the 
framing must be munged before reading / writing to kafka. Is this known? I 
don't see another JIRA for it. Should I file separately or should this be 
reopened?

> Use a well-documented LZ4 compression format and remove redundant LZ4HC option
> --
>
> Key: KAFKA-1493
> URL: https://issues.apache.org/jira/browse/KAFKA-1493
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.8.2.0
>Reporter: James Oliver
>Assignee: James Oliver
>Priority: Blocker
> Fix For: 0.8.2.0
>
> Attachments: KAFKA-1493.patch, KAFKA-1493.patch, 
> KAFKA-1493_2014-10-16_13:49:34.patch, KAFKA-1493_2014-10-16_21:25:23.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1493) Use a well-documented LZ4 compression format and remove redundant LZ4HC option

2016-01-27 Thread Dana Powers (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15120072#comment-15120072
 ] 

Dana Powers commented on KAFKA-1493:


filed KAFKA-3160

> Use a well-documented LZ4 compression format and remove redundant LZ4HC option
> --
>
> Key: KAFKA-1493
> URL: https://issues.apache.org/jira/browse/KAFKA-1493
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.8.2.0
>Reporter: James Oliver
>Assignee: James Oliver
>Priority: Blocker
> Fix For: 0.8.2.0
>
> Attachments: KAFKA-1493.patch, KAFKA-1493.patch, 
> KAFKA-1493_2014-10-16_13:49:34.patch, KAFKA-1493_2014-10-16_21:25:23.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (KAFKA-3160) Kafka LZ4 framing code miscalculates header checksum

2016-01-27 Thread Dana Powers (JIRA)

Dana Powers created KAFKA-3160:
--

 Summary: Kafka LZ4 framing code miscalculates header checksum
 Key: KAFKA-3160
 URL: https://issues.apache.org/jira/browse/KAFKA-3160
 Project: Kafka
  Issue Type: Bug
  Components: compression
Affects Versions: 0.9.0.0, 0.8.2.1, 0.8.2.0, 0.8.2.2
Reporter: Dana Powers


KAFKA-1493 implements the LZ4 framing specification, but it incorrectly 
calculates the header checksum. Specifically, the current implementation 
includes the 4-byte MagicNumber in the checksum, which is incorrect.
http://cyan4973.github.io/lz4/lz4_Frame_format.html

Third-party clients that attempt to use off-the-shelf lz4 framing find that 
brokers reject messages as having a corrupt checksum. So currently non-java 
clients must 'fixup' lz4 packets to deal with the broken checksum.

Magnus first identified this issue in librdkafka; kafka-python has the same 
problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-3088) 0.9.0.0 broker crash on receipt of produce request with empty client ID

2016-01-21 Thread Dana Powers (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-3088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15111020#comment-15111020
 ] 

Dana Powers commented on KAFKA-3088:


But client ids are not globally unique, even in the java implementation, right? 
Start 2 consumers in different processes / different servers, and you'll get 
two identical client-ids (consumer-1) as I understand that code. Also note that 
a user-supplied client-id does not get any incremented value added, so all 
metrics get blended in that case. And most third-party clients that set a 
client-id dont attempt to add a unique incremented number at all. So I don't 
think option-2 adds much value.

Another alternative to consider is skipping client metrics if there is no 
client-id.

> 0.9.0.0 broker crash on receipt of produce request with empty client ID
> ---
>
> Key: KAFKA-3088
> URL: https://issues.apache.org/jira/browse/KAFKA-3088
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.9.0.0
>Reporter: Dave Peterson
>Assignee: Jun Rao
>
> Sending a produce request with an empty client ID to a 0.9.0.0 broker causes 
> the broker to crash as shown below.  More details can be found in the 
> following email thread:
> http://mail-archives.apache.org/mod_mbox/kafka-users/201601.mbox/%3c5693ecd9.4050...@dspeterson.com%3e
>[2016-01-10 23:03:44,957] ERROR [KafkaApi-3] error when handling request 
> Name: ProducerRequest; Version: 0; CorrelationId: 1; ClientId: null; 
> RequiredAcks: 1; AckTimeoutMs: 1 ms; TopicAndPartition: [topic_1,3] -> 37 
> (kafka.server.KafkaApis)
>java.lang.NullPointerException
>   at 
> org.apache.kafka.common.metrics.JmxReporter.getMBeanName(JmxReporter.java:127)
>   at 
> org.apache.kafka.common.metrics.JmxReporter.addAttribute(JmxReporter.java:106)
>   at 
> org.apache.kafka.common.metrics.JmxReporter.metricChange(JmxReporter.java:76)
>   at 
> org.apache.kafka.common.metrics.Metrics.registerMetric(Metrics.java:288)
>   at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:177)
>   at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:162)
>   at 
> kafka.server.ClientQuotaManager.getOrCreateQuotaSensors(ClientQuotaManager.scala:209)
>   at 
> kafka.server.ClientQuotaManager.recordAndMaybeThrottle(ClientQuotaManager.scala:111)
>   at 
> kafka.server.KafkaApis.kafka$server$KafkaApis$$sendResponseCallback$2(KafkaApis.scala:353)
>   at 
> kafka.server.KafkaApis$$anonfun$handleProducerRequest$1.apply(KafkaApis.scala:371)
>   at 
> kafka.server.KafkaApis$$anonfun$handleProducerRequest$1.apply(KafkaApis.scala:371)
>   at 
> kafka.server.ReplicaManager.appendMessages(ReplicaManager.scala:348)
>   at kafka.server.KafkaApis.handleProducerRequest(KafkaApis.scala:366)
>   at kafka.server.KafkaApis.handle(KafkaApis.scala:68)
>   at 
> kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60)
>   at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

The JIRA Awakens [KAFKA-1841]

2015-12-23 Thread Dana Powers

Hi all,

I've been helping debug an issue filed against kafka-python related to
compatibility w/ Hortonworks 2.3.0.0 kafka release. As I understand it, HDP
is currently based on snapshots of apache/kafka trunk, merged with some
custom patches from HDP itself.

In this case, HDP's 2.3.0.0 kafka release missed a compatibility patch that
I believe is critical for third-party library support. Unfortunately the
patch -- KAFKA-1841 -- was initially only applied to the 0.8.2 branch (it
was merged to trunk several months later in KAFKA-2068). Because it wasn't
on trunk, it didn't get included in the HDP kafka releases.

If you recall, KAFKA-1841 was needed to maintain backwards and forwards
compatibility wrt the change from zookeeper to kafka-backed offset storage.
Not having this patch is fine if you only ever use the clients / libraries
distributed in the that release -- and I imagine that is probably most
folks that are using it. But if you remember the thread on this issue back
in the 0.8.2-beta review, the API incompatibility made third-party clients
hard to develop and maintain if the goal is to support multiple broker
versions w/ the same client code [this is the goal of kafka-python].
Anyways, I'm really glad that the fix made it into the apache release, but
now I'm sad that it didn't make it into HDP's release.

Anyways, I think there's a couple takeaways here:

(1) I'd recommend anyone using HDP who intends to use third-party kafka
consumers should upgrade to 2.3.4.0 or later. That version appears to
include the compatibility patch (KAFKA-2068). Of course if anyone is on
list from HDP, they may be able to provide better help on this.

(2) I think more care should probably be taken to help vendors or anyone
tracking changes on trunk wrt released versions. Is there a list of all
KAFKA- patches that are released but not merged into trunk ? KAFKA-1841
is obviously near and dear to my heart, but I wonder if there are other
patches like it?

Happy holidays to all, and may the force be with you

-Dana

Re: group protocol/metadata documentation

2015-11-25 Thread Dana Powers

Thanks, Jason. I see the range and roundrobin assignment strategies
documented in the source. I don't see userdata used by either -- is that
correct (I may be misreading)? The notes suggest userdata for something
more detailed in the future, like rack-aware placements?

One other question: in what circumstances would consumer processes in a
single group want to use different topic subscriptions rather than
configure a new group?

Thanks again,

-Dana
On Nov 25, 2015 8:59 AM, "Jason Gustafson" <ja...@confluent.io> wrote:

> Hey Dana,
>
> Have a look at this wiki, which has more detail on the consumer's embedded
> protocol:
>
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Client-side+Assignment+Proposal
> .
>
> At the moment, the group protocol supports consumer groups and kafka
> connect groups. Kafka tooling currently depends on the structure for these
> protocol types, so reuse of the same names might cause problems. I will
> look into updating the protocol documentation to standardize the protocol
> formats that are in use and to provide guidance for client implementations.
> My own view is that unless there's a good reason not to, all consumer
> implementation should use the same consumer protocol format so that tooling
> will work correctly.
>
> Thanks,
> Jason
>
>
>
> On Tue, Nov 24, 2015 at 4:16 PM, Dana Powers <dana.pow...@gmail.com>
> wrote:
>
> > Hi all - I've been reading through the wiki docs and mailing list threads
> > for the new JoinGroup/SyncGroup/Heartbeat APIs, hoping to add
> functionality
> > to the python driver. It appears that there is a shared notion of group
> > "protocols" (client publishes supported protocols, coordinator picks
> > protocol for group to use), and their associated metadata. Is there any
> > documentation available for existing protocols? Will there be an official
> > place to document that supporting protocol X means foo? I think I can
> > probably construct a simple working protocol, but if I pick a protocol
> name
> > that already exists, will things break?
> >
> > -Dana
> >
>

Re: Writing a client: Connection pooling

2015-05-18 Thread Dana Powers

cc: kafka-clients mailing list
On May 18, 2015 4:24 PM, Warren Falk war...@warrenfalk.com wrote:

 Thanks Guozhang,

 Actually your previous email was clear and I understood it.  Current broker
 design means that parallel requests require parallel connections.

 But I think you misunderstood me.  I am not asking if that is how the
 broker works, now, I am proposing that the broker should not work that way
 as it makes writing optimal client libraries impossible.  And so I am
 asking if a change to that could be considered (if someone submitted a
 patch).

 What I propose is that the request ordering guarantees of the broker be
 limited only to those guarantees which are useful (e.g. the order of
 produce requests on a single topic-partition, and whatever else might be
 useful).

 TCP connections have a non-zero overhead, and in fact, the underlying
 sockets, the multi-roundtrip handshake, (and especially the TLS handshake
 if supported in future versions), and congestion control/re-ordering
 algorithms are all quite expensive.  Using TCP connections as a way to get
 serialization for free is in fact not free at all.  This is the flaw that
 HTTP1.1 suffers from and what made HTTP2.0 necessary.

 The problem my original email mentioned is also known as Head of Line
 Blocking.  (http://en.wikipedia.org/wiki/Head-of-line_blocking).
 HTTP1.1's solution to head-of-line-blocking is likewise use multiple
 connections, but this is not a real solution, which is why HTTP2.0 is
 necessary and why I'm sending this long email, pleading that Kafka not
 repeat the mistake.

 Luckily, Kafka's network protocol appears to have been designed with the
 foresight of a future multiplexing feature in mind.  It already has a
 correlation id and that statement in the wiki that only one connection
 should be necessary suggests that this was a plan at one time (?).  (I.e.
 if every response can be paired with its request by virtue of the order it
 appears, then why waste bandwidth sending a useless correlation id?)

 So I'm not suggesting a change to the protocol, only to the behavior of the
 broker.  What I want to know is whether 'head of line blocking' has become
 a fundamental *requirement* of Kafka, or is it just a behavior that could
 be corrected (i.e. if I submitted a patch to remove it, could it be
 accepted?)

 Thanks for your thoughtful responses.  I really appreciate the time and
 effort you and the others put into Kafka.

 Warren





 On Mon, May 18, 2015 at 1:37 PM, Guozhang Wang wangg...@gmail.com wrote:

  Hi Warren,
 
  Maybe I was a little misleading in my previous email:
 
  1. The statement that The server guarantees that on a single TCP
  connection, requests will be processed in the order is still valid. On a
  single connection, all types of requests (produce/fetch/metadata) are
  handled in order.
 
  2. As for the clients, it is just a matter of whether or not you want to
  have order preserving across different requests, which is orthogonal to
 the
  protocol/server-side guarantees. The statement that it should not
  generally be necessary to maintain multiple connections to a single
 broker
  from a single client instance is a generally suggestion, not restriction
  on client implementation. In fact, if your client acts as both a producer
  and a consumer, and you do not need to make sure the produce and fetch
  requests' ordering needs to be preserved, but rather only the ordering
  within producer requests and the ordering within fetch requests to be
  honored, then you'd better use separate channels for these two types of
  requests, otherwise as you noticed a long pooling request can block
  subsequent produce requests. As an example, the current Java consumer
  client used a single channel for each broker, but people are discussing
  about using another separate channel to the broker acting as the consumer
  coordinator, etc. Again, this is just the client implementation details,
 of
  how you would like to make use of the protocol guarantees while
 considering
  performance, usability, etc.
 
  Guozhang
 
 
  On Thu, May 14, 2015 at 3:34 PM, Warren Falk war...@warrenfalk.com
  wrote:
 
   The C# client (kafka-net) isn't written by me; I'm just working on it.
  It
   has a separate producer and consumer, but the client is designed to
  connect
   to each broker exactly once and then reuse (multiplex over) that one
   connection across all consumers and producers.
  
   It's a bit disappointing to see such a great feature of the Kafka
  protocol
   be abandoned.  It seems such a shame to implement request/response
   correlation and turn around and incur the latency overhead of
 additional
   TCP handshakes anyway.  If requests didn't block (if the server
  guaranteed
   ordering only per partition etc.) then there would seem to be no reason
  to
   use separate channels.  Are we definitely giving up on that feature?
  
   I fear both C# clients will have to be mostly redesigned in light of
  this,

Re: 0.8.3 release plan

2015-03-12 Thread Dana Powers

Re last beta, I think it was extremely useful.  We identified big issues
wrt API versioning and third-party client compatibility.  Even if the
server code doesnt get deployed widely, I think the beta period is still a
good signal to third party client devs to rev up tests and make any updates
necessary to support the new version in advance of a full release.

Dana
On Mar 12, 2015 2:37 PM, Jun Rao j...@confluent.io wrote:

 Since we have decided to only support security on the new clients, the
 security feature will only be available after the new consumer is done. So,
 for the next release, we probably should just target finishing the new
 consumer as the main feature. If security can be done in the same release,
 that's just a bonus.

 Thanks,

 Jun

 On Thu, Mar 12, 2015 at 6:37 AM, Harsha ka...@harsha.io wrote:

  I would like to include ssl/sasl
  1) Kafka-1684 (Patch posted for a review)
  2) Kafka-1686 (Patch depends on kafka-1684)
  3) Kafka-1688 (work is in progress)
  Thanks,
  Harsha
  On Thu, Mar 12, 2015, at 04:35 AM, Guozhang Wang wrote:
   Gwen,
  
   Just for clarification, you were suggesting we should or should not
   include
   MM improvement in 0.8.3? I personally would prefer it (KAFKA-1650 and
   KAFKA-1997) to go into 0.8.3.
  
   I see Joe has made a pass over the tickets and mark them 0.8.3. We can
   probably do another pass and consider adding:
  
   1) Purgatory improvement (KAFKA-1989).
   2) Compression improvement (KAFKA-527).
   3) Some unit test failures (KAFKA-1501, I think we are pretty close in
   getting it fixed).
   4) any other tickets?
  
   Guozhang
  
   On Wed, Mar 11, 2015 at 9:40 PM, Jay Kreps jay.kr...@gmail.com
 wrote:
  
With regard to mm, I was kind of assuming just based on the amount of
  work
that that would go in for sure, but yeah I agree it is important.
   
-Jay
   
On Wed, Mar 11, 2015 at 9:39 PM, Jay Kreps jay.kr...@gmail.com
  wrote:
   
 What I was trying to say was let's do a real release whenever
 either
 consumer or authn is done whichever happens first (or both if they
  can
 happen close together)--not sure which is more likely to slip.

 WRT the beta thing I think the question for people is whether the
  beta
 period was helpful or not in getting a more stable release? We
 could
either
 do a beta release again or we could just do a normal release and
  call the
 consumer feature experimental or whatever...basically something
 to
  get
it
 in peoples hands before it is supposed to work perfectly and never
  change
 again.

 -Jay


 On Wed, Mar 11, 2015 at 9:27 PM, Gwen Shapira 
 gshap...@cloudera.com
  
 wrote:

 So basically you are suggesting - lets do a beta release whenever
 we
 feel the new consumer is done?

 This can definitely work.

 I'd prefer holding for MM improvements too. IMO, its not just more
 improvements like flush() and compression optimization.
 Current MirrorMaker can lose data, which makes it pretty useless
 for
 its job. We hear lots of requests for robust MM from our
 customers,
  so
 I can imagine its pretty important to the Kafka community (unless
 I
 have a completely skewed sample).

 Gwen



 On Wed, Mar 11, 2015 at 9:18 PM, Jay Kreps jay.kr...@gmail.com
  wrote:
  Yeah the real question is always what will we block on?
 
  I don't think we should try to hold back smaller changes. In
 this
 bucket I
  would include most things you described: mm improvements,
 replica
  assignment tool improvements, flush, purgatory improvements,
compression
  optimization, etc. Likely these will all get done in time as
 well
  as
 many
  things that kind of pop up from users but probably aren't worth
  doing
a
  release for on their own. If one of them slips that fine. I also
  don't
  think we should try to hold back work that is done if it isn't
 on
  a
 list.
 
  I would consider either SSL+SASL or the consumer worthy of a
  release
on
 its
  own. If they finish close to the same time that is great. We can
  maybe
 just
  assess as these evolve where the other one is at and make a call
 whether it
  will be one or both?
 
  -Jay
 
  On Wed, Mar 11, 2015 at 8:51 PM, Gwen Shapira 
  gshap...@cloudera.com
 wrote:
 
  If we are going in terms of features, I can see the following
features
  getting in in the next month or two:
 
  * New consumer
  * Improved Mirror Maker (I've seen tons of interest)
  * Centralized admin requests (aka KIP-4)
  * Nicer replica-reassignment tool
  * SSL (and perhaps also SASL)?
 
  I think this collection will make a nice release. Perhaps we
 can
  cap
  it there and focus (as a community) on getting these in, we can
  have
a
  release without too much scope creep in the

Re: Cannot stop Kafka server if zookeeper is shutdown first

2015-02-04 Thread Dana Powers

While on the subject of zkclient, also consider KAFKA-1793.  A more
abstract interface to the distributed coordination service that could be
configured to use alternatives like consul or etcd would be very useful
imho.

Dana
FWIW - the ZkClient project team have merged the pull request that I had
submitted to allow for timeouts to operations https://github.com/sgroschupf/
zkclient/pull/29. I heard from Johannes (from the ZkClient project team)
that they don't have any specific release date in mind but are willing to
release a new version if/when we need one.

-Jaikiran

On Wednesday 04 February 2015 12:33 AM, Gwen Shapira wrote:

 So I think the current plan is:
 1. Add timeout in zkclient
 2. Ask zkclient to release new version (we need it for few other things
 too)
 3. Rebase on new zkclient
 4. Fix this jira and the few others than were waiting for the new zkclient

 Does that make sense?

 Gwen

 On Mon, Feb 2, 2015 at 8:33 PM, Jaikiran Pai jai.forums2...@gmail.com
 wrote:

 I just heard back from Stefan, who manages the ZkClient repo and he seems
 to
 be open to have these changes be part of ZkClient project. I'll be
 creating
 a pull request for that project to have it reviewed and merged. Although I
 haven't heard of exact release plans, Stefan's reply did indicate that the
 project could be released after this change is merged.

 -Jaikiran

 On Tuesday 03 February 2015 09:03 AM, Jaikiran Pai wrote:

 Thanks for pointing to that repo!

 I just had a look at it and it appears that the project isn't much active
 (going by the lack of activity). The latest contribution is from Gwen and
 that was around 3 months back. I haven't found release plans for that
 project or a place to ask about it (filing an issue doesn't seem right to
 ask this question). So I'll get in touch with the repo owner and see what
 his plans for the project are.

 -Jaikiran

 On Monday 02 February 2015 11:33 PM, Gwen Shapira wrote:

 I did!

 Thanks for clarifying :)

 The client that is part of Zookeeper itself actually does support
 timeouts.

 On Mon, Feb 2, 2015 at 9:54 AM, Guozhang Wang wangg...@gmail.com
 wrote:

 Hi Jaikiran,

 I think Gwen was talking about contributing to ZkClient project:

 https://github.com/sgroschupf/zkclient

 Guozhang


 On Sun, Feb 1, 2015 at 5:30 AM, Jaikiran Pai jai.forums2...@gmail.com
 
 wrote:

  Hi Gwen,

 Yes, the KafkaZkClient is a wrapper around ZkClient and not a complete
 replacement.

 As for contributing to Zookeeper, yes that indeed in on my mind, but I
 haven't yet had a chance to really look deeper into Zookeeper or get
 in
 touch with their dev team to try and explain this potential
 improvement
 to
 them. I have no objection to contributing this or something similar to
 Zookeeper directly. I think I should be able to bring this up in the
 Zookeeper dev forum, sometime soon in the next few weekends.

 -Jaikiran


 On Sunday 01 February 2015 11:40 AM, Gwen Shapira wrote:

  It looks like the new KafkaZkClient is a wrapper around ZkClient, but
 not a replacement. Did I get it right?

 I think a wrapper for ZkClient can be useful - for example KAFKA-1664
 can also use one.

 However, I'm wondering why not contribute the fix directly to
 ZKClient
 project and ask for a release that contains the fix?
 This will benefit other users of the project who may also need a
 timeout (thats pretty basic...)

 As an alternative, if we don't want to collaborate with ZKClient for
 some reason, forking the project into Kafka will probably give us
 more
 control than wrappers and without much downside.

 Just a thought.

 Gwen





 On Sat, Jan 31, 2015 at 6:32 AM, Jaikiran Pai
 jai.forums2...@gmail.com
 wrote:

  Neha, Ewen (and others), my initial attempt to solve this is
 uploaded
 here
 https://reviews.apache.org/r/30477/. It solves the shutdown problem
 and
 now
 the server shuts down even when Zookeeper has gone down before the
 Kafka
 server.

 I went with the approach of introducing a custom (enhanced) ZkClient
 which
 for now allows time outs to be optionally specified for certain
 operations.
 I intentionally haven't forced the use of this new KafkaZkClient all
 over
 the code and instead for now have just used it in the KafkaServer.

 Does this patch look like something worth using?

 -Jaikiran


 On Thursday 29 January 2015 10:41 PM, Neha Narkhede wrote:

  Ewen is right. ZkClient APIs are blocking and the right fix for
 this
 seems
 to be patching ZkClient. At some point, if we find ourselves
 fiddling
 too
 much with ZkClient, it wouldn't hurt to write our own little
 zookeeper
 client wrapper.

 On Thu, Jan 29, 2015 at 12:57 AM, Ewen Cheslack-Postava
 e...@confluent.io
 wrote:

Looks like a bug to me -- the underlying ZK library wraps a lot
 of

 blocking
 method implementations with waitUntilConnected() calls without any
 timeouts. Ideally we could just add a version of
 ZkUtils.getController()
 with a timeout, but I don't see an easy way to accomplish that
 with
 ZkClient.

Re: [kafka-clients] Re: Heads up: KAFKA-1697 - remove code related to ack1 on the broker

2015-01-15 Thread Dana Powers

 clients don't break on unknown errors

maybe true for the official java clients, but I dont think the assumption
holds true for community-maintained clients and users of those clients.
 kafka-python generally follows the fail-fast philosophy and raises an
exception on any unrecognized error code in any server response.  in this
case, kafka-python allows users to set their own required-acks policy when
creating a producer instance.  It is possible that users of kafka-python
have deployed producer code that uses ack1 -- perhaps in production
environments -- and for those users the new error code will crash their
producer code.  I would not be surprised if the same were true of other
community clients.

*one reason for the fail-fast approach is that there isn't great
documentation on what errors to expect for each request / response -- so we
use failures to alert that some error case is not handled properly.  and
because of that, introducing new error cases without bumping the api
version is likely to cause those errors to get raised/thrown all the way
back up to the user.  of course we (client maintainers) can fix the issues
in the client libraries and suggest users upgrade, but it's not the ideal
situation.


long-winded way of saying: I agree w/ Joe.

-Dana


On Thu, Jan 15, 2015 at 12:07 PM, Gwen Shapira gshap...@cloudera.com
wrote:

 Is the protocol bump caused by the behavior change or the new error code?

 1) IMO, error_codes are data, and clients can expect to receive errors
 that they don't understand (i.e. unknown errors). AFAIK, clients don't
 break on unknown errors, they are simple more challenging to debug. If
 we document the new behavior, then its definitely debuggable and
 fixable.

 2) The behavior change is basically a deprecation - i.e. acks  1 were
 never documented, and are not supported by Kafka clients starting with
 version 0.8.2. I'm not sure this requires a protocol bump either,
 although its a better case than new error codes.

 Thanks,
 Gwen

 On Thu, Jan 15, 2015 at 10:10 AM, Joe Stein joe.st...@stealth.ly wrote:
  Looping in the mailing list that the client developers live on because
 they
  are all not on dev (though they should be if they want to be helping to
  build the best client libraries they can).
 
  I whole hardily believe that we need to not break existing functionality
 of
  the client protocol, ever.
 
  There are many reasons for this and we have other threads on the mailing
  list where we are discussing that topic (no pun intended) that I don't
 want
  to re-hash here.
 
  If we change wire protocol functionality OR the binary format (either) we
  must bump version AND treat version as a feature flag with backward
  compatibility support until it is deprecated for some time for folks to
 deal
  with it.
 
  match version = {
  case 0: keepDoingWhatWeWereDoing()
  case 1: doNewStuff()
  case 2: doEvenMoreNewStuff()
  }
 
  has to be a practice we adopt imho ... I know feature flags can be
 construed
  as messy code but I am eager to hear another (better? different?)
 solution
  to this.
 
  If we don't do a feature flag like this specifically with this change
 then
  what happens is that someone upgrades their brokers with a rolling
 restart
  in 0.8.3 and every single one of their producer requests start to fail
 and
  they have a major production outage. k
 
  I do 100% agree that  1 makes no sense and we *REALLY* need people to
 start
  using 0,1,-1 but we need to-do that in a way that is going to work for
  everyone.
 
  Old producers and consumers must keep working with new brokers and if we
 are
  not going to support that then I am unclear what the use of version is
  based on our original intentions of having it because of the 0.7=-0.8.
 We
  said no more breaking changes when we did that.
 
  - Joe Stein
 
  On Thu, Jan 15, 2015 at 12:38 PM, Ewen Cheslack-Postava 
 e...@confluent.io
  wrote:
 
  Right, so this looks like it could create an issue similar to what's
  currently being discussed in
  https://issues.apache.org/jira/browse/KAFKA-1649 where users now get
  errors
  under conditions when they previously wouldn't. Old clients won't even
  know
  about the error code, so besides failing they won't even be able to log
  any
  meaningful error messages.
 
  I think there are two options for compatibility:
 
  1. An alternative change is to remove the ack  1 code, but silently
  upgrade requests with acks  1 to acks = -1. This isn't the same as
  other
  changes to behavior since the interaction between the client and server
  remains the same, no error codes change, etc. The client might just see
  some increased latency since the message might need to be replicated to
  more brokers than they requested.
  2. Split this into two patches, one that bumps the protocol version on
  that
  message to include the new error code but maintains both old (now
  deprecated) and new behavior, then a second that would be applied in a
  later release

0.8.2.0 behavior change: ReplicaNotAvailableError

2015-01-14 Thread Dana Powers

Overall the 0.8.2.0 release candidate looks really good.

All of the kafka-python integration tests pass as they do w/ prior servers,
except one... When testing recovery from a broker failure / leader switch,
we now see a ReplicaNotAvailableError in broker metadata /
PartitionMetadata, which we do not see in the same test against previous
servers.  I understand from discussion around KAFKA-1609 and KAFKA-1649
that this behavior is expected and that clients should ignore the error (or
at least treat it as non-critical).  But strictly speaking this is a
behavior change and could cause client issues.  Indeed, anyone using older
versions of kafka-python against this release candidate will get bad
failures on leader switch (exactly when you don't want bad client
failures!).  It may be that it is our fault for not handling this in
kafka-python, but at the least I think this needs to be flagged as a
possible issue for 3rd party clients.  Also KAFKA-1649 doesn't look like it
was ever actually resolved... The protocol document does not mention
anything about clients ignoring this error code.

Dana Powers
Rdio, Inc.
dana.pow...@rd.io
rdio.com/people/dpkp/

[jira] [Commented] (KAFKA-1649) Protocol documentation does not indicate that ReplicaNotAvailable can be ignored

2015-01-14 Thread Dana Powers (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277690#comment-14277690
 ] 

Dana Powers commented on KAFKA-1649:


I think this could be a problem -- it is not updated in the wiki and there 
appears to be a server-side change between 0.8.1.1 and 0.8.2.0 that could 
expose third-party clients to uncaught errors, assuming that the client 
authors/maintainers were not aware that the ReplicaNotAvailable Error should be 
ignored.

 Protocol documentation does not indicate that ReplicaNotAvailable can be 
 ignored
 

 Key: KAFKA-1649
 URL: https://issues.apache.org/jira/browse/KAFKA-1649
 Project: Kafka
  Issue Type: Improvement
  Components: website
Affects Versions: 0.8.1.1
Reporter: Hernan Rivas Inaka
Priority: Minor
  Labels: protocol-documentation
   Original Estimate: 10m
  Remaining Estimate: 10m

 The protocol documentation here 
 https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-ErrorCodes
  should indicate that error 9 (ReplicaNotAvailable) can be safely ignored on 
 producers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1649) Protocol documentation does not indicate that ReplicaNotAvailable can be ignored

2015-01-14 Thread Dana Powers (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277807#comment-14277807
 ] 

Dana Powers commented on KAFKA-1649:


I am only testing from the wire-protocol level.  Running a broker failure test 
with 2 brokers, 1 topic w/ num.partitions=2 and default.replication.factor=2 .  
Send 100 random messages directly to partition 0, kill the leader for partition 
0, attempt to write messages to partition 0, with retries and metadata reloads.

Running the test against 0.8.2.0 returns ReplicaNotAvailable error code in the 
PartitionMetadata, whereas 0.8.1.1 does not.

This is the metadata w/ both brokers up:
Topic metadata: [TopicMetadata(topic='test_switch_leader-qkUBJTZGLA', error=0, 
partitions=[PartitionMetadata(topic='test_switch_leader-qkUBJTZGLA', 
partition=1, leader=1, replicas=(1, 0), isr=(1, 0), error=0), 
PartitionMetadata(topic='test_switch_leader-qkUBJTZGLA', partition=0, leader=0, 
replicas=(0, 1), isr=(0, 1), error=0)])]

And this is the metadata after killing one broker (0.8.2.0):
Topic metadata: [TopicMetadata(topic='test_switch_leader-qkUBJTZGLA', error=0, 
partitions=[PartitionMetadata(topic='test_switch_leader-qkUBJTZGLA', 
partition=1, leader=1, replicas=(1,), isr=(1,), error=9), 
PartitionMetadata(topic='test_switch_leader-qkUBJTZGLA', partition=0, leader=1, 
replicas=(1,), isr=(1,), error=9)])]

The 0.8.1.1 output is slightly different -- and significantly no error in 
PartitionMetadata
Before killing partition 0 leader (broker 1):
Topic metadata: [TopicMetadata(topic='test_switch_leader-eMbCMlVrOC', error=0, 
partitions=[PartitionMetadata(topic='test_switch_leader-eMbCMlVrOC', 
partition=0, leader=1, replicas=(1, 0), isr=(1, 0), error=0), 
PartitionMetadata(topic='test_switch_leader-eMbCMlVrOC', partition=1, leader=0, 
replicas=(0, 1), isr=(0, 1), error=0)])]

After killing partition 0 leader:
Topic metadata: [TopicMetadata(topic='test_switch_leader-eMbCMlVrOC', error=0, 
partitions=[PartitionMetadata(topic='test_switch_leader-eMbCMlVrOC', 
partition=0, leader=0, replicas=(1, 0), isr=(0,), error=0), 
PartitionMetadata(topic='test_switch_leader-eMbCMlVrOC', partition=1, leader=0, 
replicas=(0, 1), isr=(0, 1), error=0)])]

 Protocol documentation does not indicate that ReplicaNotAvailable can be 
 ignored
 

 Key: KAFKA-1649
 URL: https://issues.apache.org/jira/browse/KAFKA-1649
 Project: Kafka
  Issue Type: Improvement
  Components: website
Affects Versions: 0.8.1.1
Reporter: Hernan Rivas Inaka
Priority: Minor
  Labels: protocol-documentation
   Original Estimate: 10m
  Remaining Estimate: 10m

 The protocol documentation here 
 https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-ErrorCodes
  should indicate that error 9 (ReplicaNotAvailable) can be safely ignored on 
 producers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: 0.8.2.0 behavior change: ReplicaNotAvailableError

2015-01-14 Thread Dana Powers

Thanks -- i see that this was more of a bug in 0.8.1 than a regression in
0.8.2. But I do think the 0.8.2 bug fix to the metadata cache means that
the very common scenario of a single broker failure (and subsequent
partition leadership change) will now return error codes in the
MetadataResponse -- different from 0.8.1 -- and those errors may cause pain
to some users if the client doesn't know how to handle them. The fix for
users is to upgrade client code (or verify that existing client code
handles this well) before upgrading to 0.8.2 in a production environment.

What would be really useful for the non-java community is a list or
specification of what error codes should be expected for each API response
(here the MetadataResponse) along with perhaps even some context-related
notes on what they mean. As it stands, the protocol document leaves all of
the ErrorCode documentation until the end and doesn't give any context
about *when* to handle each error:
https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-ErrorCodes

I would volunteer to go in to the wiki and help with that effort, but I
also feel like perhaps protocol document changes deserve a more strict
review process, . Maybe the KIP process mentioned separately on the
dev-list. Maybe the protocol document itself should be versioned and
released with core

Nonetheless, right now the error-handling part of managing clients is
fairly ad-hoc and I think we should work to tighten that process up.

Dana Powers
Rdio, Inc.
dana.pow...@rd.io
rdio.com/people/dpkp/

On Wed, Jan 14, 2015 at 5:43 PM, Jun Rao j...@confluent.io wrote:

Hi, Dana,

Thanks for reporting this. I investigated this a bit more. What you
observed is the following: a client getting a partition level error
code of ReplicaNotAvailableError
in a TopicMetadataResponse when one of replicas is offline. The short story
is that that behavior can already happen in 0.8.1, although the probability
of it showing up in 0.8.1 is less than that in 0.8.2.

Currently, when sending a topic metadata response, the broker only includes
replicas (in either isr or assigned replica set) that are alive. To
indicate that a replica is missing, we set the partition level error code
to ReplicaNotAvailableError. In most cases, the client probably just cares
about the leader in the response. However, this error code could be useful
for some other clients (e.g., building admin tools). Since our java/scala
producer/consumer clients (both 0.8.1 and 0.8.2) only care about the
leader, they are ignoring the error code. That's why they are not affected
by this behavior. The reason why this behavior doesn't show up as often in
0.8.1 as in 0.8.2 is that in 0.8.1, we had a bug such that dead brokers are
never removed from the metadata cache on the broker. That bug has since
been fixed in 0.8.2. To reproduce that behavior in 0.8.1, you can do the
following: (1) start 2 brokers, (2) create a topic with 1 partition and 2
replicas, (3) bring down both brokers, (4) restart only 1 broker, (5) issue
a TopicMetadataRequest on that topic, (6) you should see the
ReplicaNotAvailableError
code.

So, technically, this is not a regression from 0.8.1. I agree that we
should have documented this behavior more clearly. Really sorry about that.

Thanks,

Jun

On Wed, Jan 14, 2015 at 1:14 PM, Dana Powers dana.pow...@rd.io wrote:

Overall the 0.8.2.0 release candidate looks really good.

All of the kafka-python integration tests pass as they do w/ prior
servers,
except one... When testing recovery from a broker failure / leader
switch,
we now see a ReplicaNotAvailableError in broker metadata /
PartitionMetadata, which we do not see in the same test against previous
servers. I understand from discussion around KAFKA-1609 and KAFKA-1649
that this behavior is expected and that clients should ignore the error
(or
at least treat it as non-critical). But strictly speaking this is a
behavior change and could cause client issues. Indeed, anyone using
older
versions of kafka-python against this release candidate will get bad
failures on leader switch (exactly when you don't want bad client
failures!). It may be that it is our fault for not handling this in
kafka-python, but at the least I think this needs to be flagged as a
possible issue for 3rd party clients. Also KAFKA-1649 doesn't look like
it
was ever actually resolved... The protocol document does not mention
anything about clients ignoring this error code.

Dana Powers
Rdio, Inc.
dana.pow...@rd.io
rdio.com/people/dpkp/

Compatibility + Unknown APIs

2015-01-12 Thread Dana Powers

Hi all -- continuing on the compatibility discussion:

I've found that it is very difficult to identify when a server does not
recognize an api
(I'm using kafka-python to submit wire-protocol requests).  For example,
when I
send a ConsumerMetadataRequest to an 0.8.1.1 server, I get a closed socket
*[stacktrace below].  The server raises an error internally, but does not
send any
meaningful response.  I'm not sure whether this is the intended behavior,
but
maintaining clients in an ecosystem of multiple server versions with
different API
support it would be great to have a way to determine what the server
supports
and what it does not.

Some suggestions:

(1) An UnknownAPIResponse that is returned for any API or API Version
request
that is unsupported.

(2) A server metadata API to get the list of supported APIs and/or API
versions supported.

(3) A server metadata API to get the published version of the server (0.8.2
v. 0.8.1.1, etc).


Thoughts?


Dana Powers
Rdio, Inc.
dana.pow...@rd.io
rdio.com/people/dpkp/

*stacktrace:
```
[2015-01-12 13:03:55,719] ERROR Closing socket for /127.0.0.1 because of
error (kafka.network.Processor)
kafka.common.KafkaException: Wrong request type 10
   at kafka.api.RequestKeys$.deserializerForKey(RequestKeys.scala:57)
   at kafka.network.RequestChannel$Request.init(RequestChannel.scala:53)
   at kafka.network.Processor.read(SocketServer.scala:353)
   at kafka.network.Processor.run(SocketServer.scala:245)
   at java.lang.Thread.run(Thread.java:722)
```

Re: Compatibility + Unknown APIs

2015-01-12 Thread Dana Powers

Perhaps a bit hacky, but you could also reserve a specific correlationId
(maybe -1)
to represent errors and send back to the client an UnknownAPIResponse like:

Response = -1 UnknownAPIResponse

UnknownAPIResponse = originalCorrelationId errorCode

The benefit here would be that it does not break the current API and current
clients should be able to continue operating as usual as long as they ignore
unknown correlationIds and don't use the reserved Id.  For clients that
want to
catch unknownAPI errors, they can handle -1 correlationIds and dispatch as
needed.

Otherwise perhaps bump the Metadata Request / Response API and include
the supported api / versions in the Broker metadata:

Broker = NodeId Host Port [ApiKey ApiVersion] (any number of brokers may
be returned)
NodeId = int32
Host = string
Port = int32
ApiKey = int16
ApiVersion = int16

So that Metadata includes the list of all supported API/Versions for each
broker
in the cluster.

And echo the problem with continuing with the current behavior pointed out
by Jay:
clients cannot know the difference between a network error and an unknown
API
error.  And because network errors require a lot of state resets, that can
be a
big performance hit.  Generally on a network error a client needs to assume
the
worst and reload cluster metadata at least.  And it is difficult to prevent
this happening
every time because the client doesn't know whether to avoid the API in the
future because it is not supported, or keep retrying because the network is
flaking.


Dana Powers
Rdio, Inc.
dana.pow...@rd.io
rdio.com/people/dpkp/

On Mon, Jan 12, 2015 at 3:51 PM, Jay Kreps jay.kr...@gmail.com wrote:

 Yeah I totally agree--using the closed socket to indicate not supported
 does work since any network error could lead to that.

 Arguably we should have a request level error. We discussed this at the
 time we were defining the protocols for 0.8 and the conclusion was not to
 do that. The reasoning was that since almost all the requests end up having
 errors at either a per-topic or per-partition level this makes correctly
 setting/interpreting the global error a bit confusing. I.e. if you are
 implementing a client and a given partition gets an error but there is no
 global error, what do you do? Likewise in most cases it is a bit ambiguous
 how to set the global error on the server side (i.e. if some partitions are
 unavailable but some are available). The result was that error reporting is
 defined per-request.

 We could change this now, but it would mean bumping compatibility on all
 the apis to add the new field which would be annoying to people, right? I
 actually agree it might have been better to do it this way both for this
 and also to make generic error handling easier but I'm not sure if it is
 worth such a big break now. The other proposal, introducing a
 get_protocol_versions() method seems almost as good for probing for support
 and is much less invasive. That seems better to me because I think
 generally clients shouldn't need to do this, they should just build against
 a minimum Kafka version and trust it will keep working into the future.

 -Jay

 On Mon, Jan 12, 2015 at 2:24 PM, Gwen Shapira gshap...@cloudera.com
 wrote:

   I think #1 may be tricky in practice. The response format is always
   dictated by the request format so how do I know if the bytes I got back
  are
   a valid response to the given request or are the
 UnknownRequestResponse?
 
  On the other hand, from the client developer perspective, having to
  figure out that you are looking at a closed socket because you tried
  to use an API that wasn't implemented in a specific version can be
  pretty annoying.
 
  Another way to do it is to move error_code field (currently
  implemented in pretty much every single Response schema) to the
  Response Header, and then we could use it for meta errors such as
  UnknownAPI.
 
  Its a much bigger change than adding a new Request type, but possibly
  worth it?
 
  
   #2 would be a good fix for the problem I think. This might be a good
   replacement for the echo api and would probably serve the same purpose
   (checking if the server is alive).
  
   #3 is a little dangerous because we actually want clients to only pay
   attention to the protocol versions which are per-api, not the server
   version. I.e. we actually don't want the client to do something like
  check
   serverVersion.equals(0.8.2) because we want to be able to release the
   server at will and have it keep answering protocols in a backwards
   compatible way. I.e. a client that uses just metadata request and
 produce
   request should only care about the version of these protocols it
  implements
   being supported not about the version of the server or the version of
 any
   other protocol it doesn't use. This is the rationale behind versioning
  the
   apis independently rather than having a single protocol version that we
   would have to bump every time

[jira] [Updated] (KAFKA-1841) OffsetCommitRequest API - timestamp field is not versioned

2015-01-05 Thread Dana Powers (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dana Powers updated KAFKA-1841:
---
Description: 
Timestamp field was added to the OffsetCommitRequest wire protocol api for 
0.8.2 by KAFKA-1012 .  The 0.8.1.1 server does not support the timestamp field, 
so I think the api version of OffsetCommitRequest should be incremented and 
checked by the 0.8.2 kafka server before attempting to read a timestamp from 
the network buffer in OffsetCommitRequest.readFrom 
(core/src/main/scala/kafka/api/OffsetCommitRequest.scala)

It looks like a subsequent patch (KAFKA-1462) added another api change to 
support a new constructor w/ params generationId and consumerId, calling that 
version 1, and a pending patch (KAFKA-1634) adds retentionMs as another field, 
while possibly removing timestamp altogether, calling this version 2.  So the 
fix here is not straightforward enough for me to submit a patch.

This could possibly be merged into KAFKA-1634, but opening as a separate Issue 
because I believe the lack of versioning in the current trunk should block 
0.8.2 release.

  was:
Timestamp field was added to the OffsetCommitRequest wire protocol api for 
0.8.2 by KAFKA-1012 .  The 0.8.1.1 server does not support the timestamp field, 
so I think the api version of OffsetCommitRequest should be incremented and 
checked by the 0.8.2 kafka server before attempting to read a timestamp from 
the network buffer in OffsetCommitRequest.readFrom 
(core/src/main/scala/kafka/api/OffsetCommitRequest.scala)

It looks like a subsequent patch (kafka-1462) added another api change to 
support a new constructor w/ params generationId and consumerId, calling that 
version 1, and a pending patch (kafka-1634) adds retentionMs as another field, 
while possibly removing timestamp altogether, calling this version 2.  So the 
fix here is not straightforward enough for me to submit a patch.

This could possibly be merged into KAFKA-1634, but opening as a separate Issue 
because I believe the lack of versioning in the current trunk should block 
0.8.2 release.


 OffsetCommitRequest API - timestamp field is not versioned
 --

 Key: KAFKA-1841
 URL: https://issues.apache.org/jira/browse/KAFKA-1841
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.2
 Environment: wire-protocol
Reporter: Dana Powers
Priority: Blocker

 Timestamp field was added to the OffsetCommitRequest wire protocol api for 
 0.8.2 by KAFKA-1012 .  The 0.8.1.1 server does not support the timestamp 
 field, so I think the api version of OffsetCommitRequest should be 
 incremented and checked by the 0.8.2 kafka server before attempting to read a 
 timestamp from the network buffer in OffsetCommitRequest.readFrom 
 (core/src/main/scala/kafka/api/OffsetCommitRequest.scala)
 It looks like a subsequent patch (KAFKA-1462) added another api change to 
 support a new constructor w/ params generationId and consumerId, calling that 
 version 1, and a pending patch (KAFKA-1634) adds retentionMs as another 
 field, while possibly removing timestamp altogether, calling this version 2.  
 So the fix here is not straightforward enough for me to submit a patch.
 This could possibly be merged into KAFKA-1634, but opening as a separate 
 Issue because I believe the lack of versioning in the current trunk should 
 block 0.8.2 release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1634) Improve semantics of timestamp in OffsetCommitRequests and update documentation

2015-01-05 Thread Dana Powers (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265042#comment-14265042
 ] 

Dana Powers commented on KAFKA-1634:


possibly related to this JIRA: KAFKA-1841 .  The timestamp field itself was not 
in the released api version 0 and if it is to be included in 0.8.2 (this JIRA 
suggests it is, but to be removed in 0.8.3 ?) then I think it will need to be 
versioned.

 Improve semantics of timestamp in OffsetCommitRequests and update 
 documentation
 ---

 Key: KAFKA-1634
 URL: https://issues.apache.org/jira/browse/KAFKA-1634
 Project: Kafka
  Issue Type: Bug
Reporter: Neha Narkhede
Assignee: Guozhang Wang
Priority: Blocker
 Fix For: 0.8.3

 Attachments: KAFKA-1634.patch, KAFKA-1634_2014-11-06_15:35:46.patch, 
 KAFKA-1634_2014-11-07_16:54:33.patch, KAFKA-1634_2014-11-17_17:42:44.patch, 
 KAFKA-1634_2014-11-21_14:00:34.patch, KAFKA-1634_2014-12-01_11:44:35.patch, 
 KAFKA-1634_2014-12-01_18:03:12.patch


 From the mailing list -
 following up on this -- I think the online API docs for OffsetCommitRequest
 still incorrectly refer to client-side timestamps:
 https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-OffsetCommitRequest
 Wasn't that removed and now always handled server-side now?  Would one of
 the devs mind updating the API spec wiki?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

91 matches

Mail list logo