Re: [ANNOUCE] Apache Kafka 0.11.0.1 Released

2017-09-13 Thread Sriram Subramanian
Thanks for driving the release Damian.

> On Sep 13, 2017, at 1:18 PM, Guozhang Wang  wrote:
> 
> Thanks for driving this Damian!
> 
> 
> Guozhang
> 
>> On Wed, Sep 13, 2017 at 4:36 AM, Damian Guy  wrote:
>> 
>> The Apache Kafka community is pleased to announce the release for Apache
>> Kafka 0.11.0.1. This is a bug fix release that fixes 51 issues in 0.11.0.0.
>> 
>> All of the changes in this release can be found in the release notes:
>> *https://archive.apache.org/dist/kafka/0.11.0.1/RELEASE_NOTES.html
>> 
>> > .>
>> 
>> Apache Kafka is a distributed streaming platform with four four core APIs:
>> 
>> ** The Producer API allows an application to publish a stream records to
>> one or more Kafka topics.
>> 
>> ** The Consumer API allows an application to subscribe to one or more
>> topics and process the stream of records produced to them.
>> 
>> ** The Streams API allows an application to act as a stream processor,
>> consuming an input stream from one or more topics and producing an output
>> stream to one or more output topics, effectively transforming the input
>> streams to output streams.
>> 
>> ** The Connector API allows building and running reusable producers or
>> consumers that connect Kafka topics to existing applications or data
>> systems. For example, a connector to a relational database might capture
>> every change to a table.three key capabilities:
>> 
>> 
>> With these APIs, Kafka can be used for two broad classes of application:
>> 
>> ** Building real-time streaming data pipelines that reliably get data
>> between systems or applications.
>> 
>> ** Building real-time streaming applications that transform or react to the
>> streams of data.
>> 
>> 
>> You can download the source release from
>> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.11.0.1/k
>> afka-0.11.0.1-src.tgz
>> > 1/kafka-0.10.2.1-src.tgz>
>> 
>> and binary releases from
>> *https://www.apache.org/dyn/closer.cgi?path=/kafka/0.11.0.1/
>> kafka_2.11-0.11.0.1.tgz
>> > 0/kafka_2.11-0.11.0.0.tgz>
>> > kafka_2.11-0.11.0.1.tgz
>> > 0/kafka_2.11-0.11.0.0.tgz>
>>> *
>> 
>> *https://www.apache.org/dyn/closer.cgi?path=/kafka/0.11.0.1/
>> kafka_2.12-0.11.0.1.tgz
>> > 0/kafka_2.12-0.11.0.0.tgz>
>> > kafka_2.12-0.11.0.1.tgz
>> > 0/kafka_2.12-0.11.0.0.tgz>
>>> *
>> 
>> 
>> A big thank you for the following 33 contributors to this release!
>> 
>> Apurva Mehta, Bill Bejeck, Colin P. Mccabe, Damian Guy, Derrick Or, Dong
>> Lin, dongeforever, Eno Thereska, Ewen Cheslack-Postava, Gregor Uhlenheuer,
>> Guozhang Wang, Hooman Broujerdi, huxihx, Ismael Juma, Jan Burkhardt, Jason
>> Gustafson, Jeff Klukas, Jiangjie Qin, Joel Dice, Konstantine Karantasis,
>> Manikumar Reddy, Matthias J. Sax, Max Zheng, Paolo Patierno, ppatierno,
>> radzish, Rajini Sivaram, Randall Hauch, Robin Moffatt, Stephane Roset,
>> umesh chaudhary, Vahid Hashemian, Xavier Léauté
>> 
>> We welcome your help and feedback. For more information on how to
>> report problems, and to get involved, visit the project website at
>> http://kafka.apache.org/
>> 
>> 
>> Thanks,
>> Damian
> 
> 
> 
> -- 
> -- Guozhang


Re: [ANNOUNCE] New committer: Damian Guy

2017-06-09 Thread Sriram Subramanian
Congrats Damian! Thanks for all your contributions.

On Fri, Jun 9, 2017 at 2:52 PM, Martin Gainty  wrote:

> congratulations damian!
>
>
> Martin
>
>
> 
> From: Gwen Shapira 
> Sent: Friday, June 9, 2017 4:55 PM
> To: users@kafka.apache.org
> Cc: d...@kafka.apache.org; priv...@kafka.apache.org
> Subject: Re: [ANNOUNCE] New committer: Damian Guy
>
> Congratulations :)
>
> On Fri, Jun 9, 2017 at 1:49 PM Vahid S Hashemian <
> vahidhashem...@us.ibm.com>
> wrote:
>
> > Great news.
> >
> > Congrats Damian!
> >
> > --Vahid
> >
> >
> >
> > From:   Guozhang Wang 
> > To: "d...@kafka.apache.org" ,
> > "users@kafka.apache.org" ,
> > "priv...@kafka.apache.org" 
> > Date:   06/09/2017 01:34 PM
> > Subject:[ANNOUNCE] New committer: Damian Guy
> >
> >
> >
> > Hello all,
> >
> >
> > The PMC of Apache Kafka is pleased to announce that we have invited
> Damian
> > Guy as a committer to the project.
> >
> > Damian has made tremendous contributions to Kafka. He has not only
> > contributed a lot into the Streams api, but have also been involved in
> > many
> > other areas like the producer and consumer clients, broker-side
> > coordinators (group coordinator and the ongoing transaction coordinator).
> > He has contributed more than 100 patches so far, and have been driving on
> > 6
> > KIP contributions.
> >
> > More importantly, Damian has been a very prolific reviewer on open PRs
> and
> > has been actively participating on community activities such as email
> > lists
> > and slack overflow questions. Through his code contributions and reviews,
> > Damian has demonstrated good judgement on system design and code
> > qualities,
> > especially on thorough unit test coverages. We believe he will make a
> > great
> > addition to the committers of the community.
> >
> >
> > Thank you for your contributions, Damian!
> >
> >
> > -- Guozhang, on behalf of the Apache Kafka PMC
> >
> >
> >
> >
> >
>


Re: [VOTE] KIP-156 Add option "dry run" to Streams application reset tool

2017-05-10 Thread Sriram Subramanian
+1

On Wed, May 10, 2017 at 9:45 AM, Neha Narkhede  wrote:

> +1
>
> On Wed, May 10, 2017 at 12:32 PM Gwen Shapira  wrote:
>
> > +1. Also not sure that adding a parameter to a CLI requires a KIP. It
> seems
> > excessive.
> >
> >
> > On Tue, May 9, 2017 at 7:57 PM Jay Kreps  wrote:
> >
> > > +1
> > > On Tue, May 9, 2017 at 3:41 PM BigData dev 
> > > wrote:
> > >
> > > > Hi, Everyone,
> > > >
> > > > Since this is a relatively simple change, I would like to start the
> > > voting
> > > > process for KIP-156: Add option "dry run" to Streams application
> reset
> > > tool
> > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/pages/viewpage.
> action?pageId=69410150
> > > >
> > > >
> > > > The vote will run for a minimum of 72 hours.
> > > >
> > > >
> > > > Thanks,
> > > >
> > > > Bharat
> > > >
> > >
> >
> --
> Thanks,
> Neha
>


Re: [ANNOUNCE] New committer: Rajini Sivaram

2017-04-24 Thread Sriram Subramanian
Congrats Rajini!

On Mon, Apr 24, 2017 at 2:21 PM, Onur Karaman 
wrote:

> Congrats!
>
> On Mon, Apr 24, 2017 at 2:20 PM, Guozhang Wang  wrote:
>
> > Congrats Rajini!
> >
> > Guozhang
> >
> > On Mon, Apr 24, 2017 at 2:08 PM, Vahid S Hashemian <
> > vahidhashem...@us.ibm.com> wrote:
> >
> > > Great news.
> > >
> > > Congrats Rajini!
> > >
> > > --Vahid
> > >
> > >
> > >
> > >
> > > From:   Gwen Shapira 
> > > To: d...@kafka.apache.org, Users ,
> > > priv...@kafka.apache.org
> > > Date:   04/24/2017 02:06 PM
> > > Subject:[ANNOUNCE] New committer: Rajini Sivaram
> > >
> > >
> > >
> > > The PMC for Apache Kafka has invited Rajini Sivaram as a committer and
> we
> > > are pleased to announce that she has accepted!
> > >
> > > Rajini contributed 83 patches, 8 KIPs (all security and quota
> > > improvements) and a significant number of reviews. She is also on the
> > > conference committee for Kafka Summit, where she helped select content
> > > for our community event. Through her contributions she's shown good
> > > judgement, good coding skills, willingness to work with the community
> on
> > > finding the best
> > > solutions and very consistent follow through on her work.
> > >
> > > Thank you for your contributions, Rajini! Looking forward to many more
> :)
> > >
> > > Gwen, for the Apache Kafka PMC
> > >
> > >
> > >
> > >
> > >
> >
> >
> > --
> > -- Guozhang
> >
>


Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-03-13 Thread Sriram Subramanian
StreamsBuilder would be my vote.

> On Mar 13, 2017, at 9:42 PM, Jay Kreps  wrote:
> 
> Hey Matthias,
> 
> Make sense, I'm more advocating for removing the word topology than any
> particular new replacement.
> 
> -Jay
> 
> On Mon, Mar 13, 2017 at 12:30 PM, Matthias J. Sax 
> wrote:
> 
>> Jay,
>> 
>> thanks for your feedback
>> 
>>> What if instead we called it KStreamsBuilder?
>> 
>> That's the current name and I personally think it's not the best one.
>> The main reason why I don't like KStreamsBuilder is, that we have the
>> concepts of KStreams and KTables, and the builder creates both. However,
>> the name puts he focus on KStream and devalues KTable.
>> 
>> I understand your argument, and I am personally open the remove the
>> "Topology" part, and name it "StreamsBuilder". Not sure what others
>> think about this.
>> 
>> 
>> About Processor API: I like the idea in general, but I thinks it's out
>> of scope for this KIP. KIP-120 has the focus on removing leaking
>> internal APIs and do some cleanup how our API reflects some concepts.
>> 
>> However, I added your idea to API discussion Wiki page and we take if
>> from there:
>> https://cwiki.apache.org/confluence/display/KAFKA/
>> Kafka+Streams+Discussions
>> 
>> 
>> 
>> -Matthias
>> 
>> 
>>> On 3/13/17 11:52 AM, Jay Kreps wrote:
>>> Two things:
>>> 
>>>   1. This is a minor thing but the proposed new name for KStreamBuilder
>>>   is StreamsTopologyBuilder. I actually think we should not put
>> topology in
>>>   the name as topology is not a concept you need to understand at the
>>>   kstreams layer right now. I'd think of three categories of concepts:
>> (1)
>>>   concepts you need to understand to get going even for a simple
>> example, (2)
>>>   concepts you need to understand to operate and debug a real
>> production app,
>>>   (3) concepts we truly abstract and you don't need to ever understand.
>> I
>>>   think in the kstream layer topologies are currently category (2), and
>> this
>>>   is where they belong. By introducing the name in even the simplest
>> example
>>>   it means the user has to go read about toplogies to really understand
>> even
>>>   this simple snippet. What if instead we called it KStreamsBuilder?
>>>   2. For the processor api, I think this api is mostly not for end
>> users.
>>>   However this are a couple cases where it might make sense to expose
>> it. I
>>>   think users coming from Samza, or JMS's MessageListener (
>>>   https://docs.oracle.com/javaee/7/api/javax/jms/MessageListener.html)
>>>   understand a simple callback interface for message processing. In
>> fact,
>>>   people often ask why Kafka's consumer doesn't provide such an
>> interface.
>>>   I'd argue we do, it's KafkaStreams. The only issue is that the
>> processor
>>>   API documentation is a bit scary for a person implementing this type
>> of
>>>   api. My observation is that people using this style of API don't do a
>> lot
>>>   of cross-message operations, then just do single message operations
>> and use
>>>   a database for anything that spans messages. They also don't factor
>> their
>>>   code into many MessageListeners and compose them, they just have one
>>>   listener that has the complete handling logic. Say I am a user who
>> wants to
>>>   implement a single Processor in this style. Do we have an easy way to
>> do
>>>   that today (either with the .transform/.process methods in kstreams
>> or with
>>>   the topology apis)? Is there anything we can do in the way of trivial
>>>   helper code to make this better? Also, how can we explain that
>> pattern to
>>>   people? I think currently we have pretty in-depth docs on our apis
>> but I
>>>   suspect a person trying to figure out how to implement a simple
>> callback
>>>   might get a bit lost trying to figure out how to wire it up. A simple
>> five
>>>   line example in the docs would probably help a lot. Not sure if this
>> is
>>>   best addressed in this KIP or is a side comment.
>>> 
>>> Cheers,
>>> 
>>> -Jay
>>> 
>>> On Fri, Feb 3, 2017 at 3:33 PM, Matthias J. Sax 
>>> wrote:
>>> 
 Hi All,
 
 I did prepare a KIP to do some cleanup some of Kafka's Streaming API.
 
 Please have a look here:
 https://cwiki.apache.org/confluence/display/KAFKA/KIP-
 120%3A+Cleanup+Kafka+Streams+builder+API
 
 Looking forward to your feedback!
 
 
 -Matthias
>> 
>> 


Re: [ANNOUNCE] Apache Kafka 0.10.2.0 Released

2017-02-22 Thread Sriram Subramanian
Thanks Ewen for driving this.

On Wed, Feb 22, 2017 at 12:40 AM, Guozhang Wang  wrote:

> Thanks Ewen for driving the release!
>
> Guozhang
>
> On Wed, Feb 22, 2017 at 12:33 AM, Ewen Cheslack-Postava  >
> wrote:
>
> > The Apache Kafka community is pleased to announce the release for Apache
> > Kafka 0.10.2.0. This is a feature release which includes the completion
> > of 15 KIPs, over 200 bug fixes and improvements, and more than 500 pull
> > requests merged.
> >
> > All of the changes in this release can be found in the release notes:
> > https://archive.apache.org/dist/kafka/0.10.2.0/RELEASE_NOTES.html
> >
> > Apache Kafka is a distributed streaming platform with four four core
> > APIs:
> >
> > ** The Producer API allows an application to publish a stream records to
> > one or more Kafka topics.
> >
> > ** The Consumer API allows an application to subscribe to one or more
> > topics and process the stream of records produced to them.
> >
> > ** The Streams API allows an application to act as a stream processor,
> > consuming an input stream from one or more topics and producing an
> > output
> > stream to one or more output topics, effectively transforming the input
> > streams to output streams.
> >
> > ** The Connector API allows building and running reusable producers or
> > consumers that connect Kafka topics to existing applications or data
> > systems. For example, a connector to a relational database might capture
> > every change to a table.three key capabilities:
> >
> >
> > With these APIs, Kafka can be used for two broad classes of application:
> >
> > ** Building real-time streaming data pipelines that reliably get data
> > between systems or applications.
> >
> > ** Building real-time streaming applications that transform or react to
> > the
> > streams of data.
> >
> >
> > You can download the source release from
> > https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.2.
> > 0/kafka-0.10.2.0-src.tgz
> >
> > and binary releases from
> > https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.2.
> > 0/kafka_2.11-0.10.2.0.tgz
> > https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.2.
> > 0/kafka_2.10-0.10.2.0.tgz
> > https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.2.
> > 0/kafka_2.12-0.10.2.0.tgz
> > (experimental 2.12 artifact)
> >
> > Thanks to the 101 contributors on this release!
> >
> > Akash Sethi, Alex Loddengaard, Alexey Ozeritsky, amethystic, Andrea
> > Cosentino, Andrew Olson, Andrew Stevenson, Anton Karamanov, Antony
> > Stubbs, Apurva Mehta, Arun Mahadevan, Ashish Singh, Balint Molnar, Ben
> > Stopford, Bernard Leach, Bill Bejeck, Colin P. Mccabe, Damian Guy, Dan
> > Norwood, Dana Powers, dasl, Derrick Or, Dong Lin, Dustin Cote, Edoardo
> > Comar, Edward Ribeiro, Elias Levy, Emanuele Cesena, Eno Thereska, Ewen
> > Cheslack-Postava, Flavio Junqueira, fpj, Geoff Anderson, Guozhang Wang,
> > Gwen Shapira, Hikiko Murakami, Himani Arora, himani1, Hojjat Jafarpour,
> > huxi, Ishita Mandhan, Ismael Juma, Jakub Dziworski, Jan Lukavsky, Jason
> > Gustafson, Jay Kreps, Jeff Widman, Jeyhun Karimov, Jiangjie Qin, Joel
> > Koshy, Jon Freedman, Joshi, Jozef Koval, Json Tu, Jun He, Jun Rao,
> > Kamal, Kamal C, Kamil Szymanski, Kim Christensen, Kiran Pillarisetty,
> > Konstantine Karantasis, Lihua Xin, LoneRifle, Magnus Edenhill, Magnus
> > Reftel, Manikumar Reddy O, Mark Rose, Mathieu Fenniak, Matthias J. Sax,
> > Mayuresh Gharat, MayureshGharat, Michael Schiff, Mickael Maison,
> > MURAKAMI Masahiko, Nikki Thean, Olivier Girardot, pengwei-li, pilo,
> > Prabhat Kashyap, Qian Zheng, Radai Rosenblatt, radai-rosenblatt, Raghav
> > Kumar Gautam, Rajini Sivaram, Rekha Joshi, rnpridgeon, Ryan Pridgeon,
> > Sandesh K, Scott Ferguson, Shikhar Bhushan, steve, Stig Rohde Døssing,
> > Sumant Tambe, Sumit Arrawatia, Theo, Tim Carey-Smith, Tu Yang, Vahid
> > Hashemian, wangzzu, Will Marshall, Xavier Léauté, Xavier Léauté, Xi Hu,
> > Yang Wei, yaojuncn, Yuto Kawamura
> >
> > We welcome your help and feedback. For more information on how to
> > report problems, and to get involved, visit the project website at
> > http://kafka.apache.org/
> >
> > Thanks,
> > Ewen
> >
>
>
>
> --
> -- Guozhang
>


Re: [ANNOUNCE] New committer: Grant Henke

2017-01-11 Thread Sriram Subramanian
Congratulations Grant!

On Wed, Jan 11, 2017 at 11:51 AM, Gwen Shapira  wrote:

> The PMC for Apache Kafka has invited Grant Henke to join as a
> committer and we are pleased to announce that he has accepted!
>
> Grant contributed 88 patches, 90 code reviews, countless great
> comments on discussions, a much-needed cleanup to our protocol and the
> on-going and critical work on the Admin protocol. Throughout this, he
> displayed great technical judgment, high-quality work and willingness
> to contribute where needed to make Apache Kafka awesome.
>
> Thank you for your contributions, Grant :)
>
> --
> Gwen Shapira
> Product Manager | Confluent
> 650.450.2760 | @gwenshap
> Follow us: Twitter | blog
>


Re: [VOTE] KIP-106 - Default unclean.leader.election.enabled True => False

2017-01-11 Thread Sriram Subramanian
+1

On Wed, Jan 11, 2017 at 11:10 AM, Ismael Juma  wrote:

> Thanks for raising this, +1.
>
> Ismael
>
> On Wed, Jan 11, 2017 at 6:56 PM, Ben Stopford  wrote:
>
> > Looks like there was a good consensus on the discuss thread for KIP-106
> so
> > lets move to a vote.
> >
> > Please chime in if you would like to change the default for
> > unclean.leader.election.enabled from true to false.
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/%
> > 5BWIP%5D+KIP-106+-+Change+Default+unclean.leader.
> > election.enabled+from+True+to+False
> >
> > B
> >
>


Re: [ANNOUNCE] New committer: Jiangjie (Becket) Qin

2016-10-31 Thread Sriram Subramanian
Congratulations!

On Mon, Oct 31, 2016 at 12:23 PM, Ismael Juma  wrote:

> Congratulations Becket. :)
>
> Ismael
>
> On 31 Oct 2016 1:44 pm, "Joel Koshy"  wrote:
>
> > The PMC for Apache Kafka has invited Jiangjie (Becket) Qin to join as a
> > committer and we are pleased to announce that he has accepted!
> >
> > Becket has made significant contributions to Kafka over the last two
> years.
> > He has been deeply involved in a broad range of KIP discussions and has
> > contributed several major features to the project. He recently completed
> > the implementation of a series of improvements (KIP-31, KIP-32, KIP-33)
> to
> > Kafka’s message format that address a number of long-standing issues such
> > as avoiding server-side re-compression, better accuracy for time-based
> log
> > retention, log roll and time-based indexing of messages.
> >
> > Congratulations Becket! Thank you for your many contributions. We are
> > excited to have you on board as a committer and look forward to your
> > continued participation!
> >
> > Joel
> >
>


Re: [VOTE] Add REST Server to Apache Kafka

2016-10-25 Thread Sriram Subramanian
-1 for all the reasons that have been described before. This does not need
to be part of the core project.

On Tue, Oct 25, 2016 at 3:25 PM, Suresh Srinivas 
wrote:

> +1.
>
> This is an http access to core Kafka. This is very much needed as part of
> Apache Kafka under ASF governance model.  This would be great for the
> community instead of duplicated and splintered efforts that may spring up.
>
> Get Outlook for iOS
>
> _
> From: Harsha Chintalapani mailto:ka...@harsha.io>>
> Sent: Tuesday, October 25, 2016 2:20 PM
> Subject: [VOTE] Add REST Server to Apache Kafka
> To: mailto:d...@kafka.apache.org>>, <
> users@kafka.apache.org>
>
>
> Hi All,
>We are proposing to have a REST Server as part of  Apache Kafka
> to provide producer/consumer/admin APIs. We Strongly believe having
> REST server functionality with Apache Kafka will help a lot of users.
> Here is the KIP that Mani Kumar wrote
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 80:+Kafka+Rest+Server.
> There is a discussion thread in dev list that had differing opinions on
> whether to include REST server in Apache Kafka or not. You can read more
> about that in this thread
> http://mail-archives.apache.org/mod_mbox/kafka-dev/201610.mbox/%3CCAMVt_
> aymqeudm39znsxgktpdde46sowmqhsxop-+jmbcuv7...@mail.gmail.com%3E
>
>   This is a VOTE thread to check interest in the community for
> adding REST Server implementation in Apache Kafka.
>
> Thanks,
> Harsha
>
>
>


Re: [ANNOUNCE] New committer: Jason Gustafson

2016-09-06 Thread Sriram Subramanian
Congratulations Jason!

On Tue, Sep 6, 2016 at 3:40 PM, Vahid S Hashemian  wrote:

> Congratulations Jason on this very well deserved recognition.
>
> --Vahid
>
>
>
> From:   Neha Narkhede 
> To: "d...@kafka.apache.org" ,
> "users@kafka.apache.org" 
> Cc: "priv...@kafka.apache.org" 
> Date:   09/06/2016 03:26 PM
> Subject:[ANNOUNCE] New committer: Jason Gustafson
>
>
>
> The PMC for Apache Kafka has invited Jason Gustafson to join as a
> committer and
> we are pleased to announce that he has accepted!
>
> Jason has contributed numerous patches to a wide range of areas, notably
> within the new consumer and the Kafka Connect layers. He has displayed
> great taste and judgement which has been apparent through his involvement
> across the board from mailing lists, JIRA, code reviews to contributing
> features, bug fixes and code and documentation improvements.
>
> Thank you for your contribution and welcome to Apache Kafka, Jason!
> --
> Thanks,
> Neha
>
>
>
>
>


Re: [ANNOUNCE] New committer: Ismael Juma

2016-04-26 Thread Sriram Subramanian
Congratulations!

On Tue, Apr 26, 2016 at 6:57 AM, Gwen Shapira  wrote:

> Congratulations, very well deserved.
> On Apr 25, 2016 10:53 PM, "Neha Narkhede"  wrote:
>
> > The PMC for Apache Kafka has invited Ismael Juma to join as a committer
> and
> > we are pleased to announce that he has accepted!
> >
> > Ismael has contributed 121 commits
> >  to a wide range of
> > areas, notably within the security and the network layer. His involvement
> > has been phenomenal across the board from mailing lists, JIRA, code
> reviews
> > and helping us move to GitHub pull requests to contributing features, bug
> > fixes and code and documentation improvements.
> >
> > Thank you for your contribution and welcome to Apache Kafka, Ismael!
> >
> > --
> > Thanks,
> > Neha
> >
>


Re: [DISCUSSION] adding the serializer api back to the new java producer

2014-12-08 Thread Sriram Subramanian
Thank you Jay. I agree with the issue that you point w.r.t paired
serializers. I also think having mix serialization types is rare. To get
the current behavior, one can simply use a ByteArraySerializer. This is
best understood by talking with many customers and you seem to have done
that. I am convinced about the change.

For the rest who gave -1 or 0 for this proposal, does the answers for the
three points(updated) below seem reasonable? Are these explanations
convincing? 


1. Can we keep the serialization semantics outside the Producer interface
and have simple bytes in / bytes out for the interface (This is what we
have today).

The points for this is to keep the interface simple and usage easy to
understand. The points against this is that it gets hard to share common
usage patterns around serialization/message validations for the future.

2. Can we create a wrapper producer that does the serialization and have
different variants of it for different data formats?

The points for this is again to keep the main API clean. The points
against this is that it duplicates the API, increases the surface area and
creates redundancy for a minor addition.

3. Do we need to support different data types per record? The current
interface (bytes in/bytes out) lets you instantiate one producer and use
it to send multiple data formats. There seems to be some valid use cases
for this.


Mixed serialization types are rare based on interactions with customers.
To get the current behavior, one can simply use a ByteArraySerializer.

On 12/5/14 5:00 PM, "Jay Kreps"  wrote:

>Hey Sriram,
>
>Thanks! I think this is a very helpful summary.
>
>Let me try to address your point about passing in the serde at send time.
>
>I think the first objection is really to the paired key/value serializer
>interfaces. This leads to kind of a weird combinatorial thing where you
>would have an avro/avro serializer a string/avro serializer, a pb/pb
>serializer, and a string/pb serializer, and so on. But your proposal would
>work as well with separate serializers for key and value.
>
>I think the downside is just the one you call out--that this is a corner
>case and you end up with two versions of all the apis to support it. This
>also makes the serializer api more annoying to implement. I think the
>alternative solution to this case and any other we can give people is just
>configuring ByteArraySerializer which gives you basically the api that you
>have now with byte arrays. If this is incredibly common then this would be
>a silly solution, but I guess the belief is that these cases are rare and
>a
>really well implemented avro or json serializer should be 100% of what
>most
>people need.
>
>In practice the cases that actually mix serialization types in a single
>stream are pretty rare I think just because the consumer then has the
>problem of guessing how to deserialize, so most of these will end up with
>at least some marker or schema id or whatever that tells you how to read
>the data. Arguable this mixed serialization with marker is itself a
>serializer type and should have a serializer of its own...
>
>-Jay
>
>On Fri, Dec 5, 2014 at 3:48 PM, Sriram Subramanian <
>srsubraman...@linkedin.com.invalid> wrote:
>
>> This thread has diverged multiple times now and it would be worth
>> summarizing them.
>>
>> There seems to be the following points of discussion -
>>
>> 1. Can we keep the serialization semantics outside the Producer
>>interface
>> and have simple bytes in / bytes out for the interface (This is what we
>> have today).
>>
>> The points for this is to keep the interface simple and usage easy to
>> understand. The points against this is that it gets hard to share common
>> usage patterns around serialization/message validations for the future.
>>
>> 2. Can we create a wrapper producer that does the serialization and have
>> different variants of it for different data formats?
>>
>> The points for this is again to keep the main API clean. The points
>> against this is that it duplicates the API, increases the surface area
>>and
>> creates redundancy for a minor addition.
>>
>> 3. Do we need to support different data types per record? The current
>> interface (bytes in/bytes out) lets you instantiate one producer and use
>> it to send multiple data formats. There seems to be some valid use cases
>> for this.
>>
>> I have still not seen a strong argument against not having this
>> functionality. Can someone provide their views on why we don't need this
>> support that is possible with the current API?
>>
>> One possible approach for the per record serialization would be to
>>define
>>
>> publ

Re: [DISCUSSION] adding the serializer api back to the new java producer

2014-12-05 Thread Sriram Subramanian
This thread has diverged multiple times now and it would be worth
summarizing them. 

There seems to be the following points of discussion -

1. Can we keep the serialization semantics outside the Producer interface
and have simple bytes in / bytes out for the interface (This is what we
have today).

The points for this is to keep the interface simple and usage easy to
understand. The points against this is that it gets hard to share common
usage patterns around serialization/message validations for the future.

2. Can we create a wrapper producer that does the serialization and have
different variants of it for different data formats?

The points for this is again to keep the main API clean. The points
against this is that it duplicates the API, increases the surface area and
creates redundancy for a minor addition.

3. Do we need to support different data types per record? The current
interface (bytes in/bytes out) lets you instantiate one producer and use
it to send multiple data formats. There seems to be some valid use cases
for this.

I have still not seen a strong argument against not having this
functionality. Can someone provide their views on why we don't need this
support that is possible with the current API?

One possible approach for the per record serialization would be to define

public interface SerDe {
  public byte[] serializeKey();

  public K deserializeKey();

  public byte[] serializeValue();

  public V deserializeValue();
}

This would be used by both the Producer and the Consumer.

The send APIs can then be

public Future send(ProducerRecord record);
public Future send(ProducerRecord record, Callback
callback);


public Future send(ProducerRecord record, SerDe
serde);

public Future send(ProducerRecord record, SerDe
serde, Callback callback);


A default SerDe can be set in the config. The producer would use the
default from the config if the non-serde send APIs are used. The downside
to this approach is that we would need to have four variants of Send API
for the Producer. 






On 12/5/14 3:16 PM, "Jun Rao"  wrote:

>Jiangjie,
>
>The issue with adding the serializer in ProducerRecord is that you need to
>implement all combinations of serializers for key and value. So, instead
>of
>just implementing int and string serializers, you will have to implement
>all 4 combinations.
>
>Adding a new producer constructor like Producer(KeySerializer,
>ValueSerializer, Properties properties) can be useful.
>
>Thanks,
>
>Jun
>
>On Thu, Dec 4, 2014 at 10:33 AM, Jiangjie Qin 
>wrote:
>
>>
>> I'm just thinking instead of binding serialization with producer,
>>another
>> option is to bind serializer/deserializer with
>> ProducerRecord/ConsumerRecord (please see the detail proposal below.)
>>The arguments for this option is:
>> A. A single producer could send different message types. There
>>are
>> several use cases in LinkedIn for per record serializer
>> - In Samza, there are some in-stream order-sensitive control
>> messages
>> having different deserializer from other messages.
>> - There are use cases which need support for sending both Avro
>> messages
>> and raw bytes.
>> - Some use cases needs to deserialize some Avro messages into
>> generic
>> record and some other messages into specific record.
>> B. In current proposal, the serializer/deserilizer is
>>instantiated
>> according to config. Compared with that, binding serializer with
>> ProducerRecord and ConsumerRecord is less error prone.
>>
>>
>> This option includes the following changes:
>> A. Add serializer and deserializer interfaces to replace
>>serializer
>> instance from config.
>> Public interface Serializer  {
>> public byte[] serializeKey(K key);
>> public byte[] serializeValue(V value);
>> }
>> Public interface deserializer  {
>> Public K deserializeKey(byte[] key);
>> public V deserializeValue(byte[] value);
>> }
>>
>> B. Make ProducerRecord and ConsumerRecord abstract class
>> implementing
>> Serializer  and Deserializer  respectively.
>> Public abstract class ProducerRecord  implements
>> Serializer 
>> {...}
>> Public abstract class ConsumerRecord  implements
>> Deserializer > V> {...}
>>
>> C. Instead of instantiate the serializer/Deserializer from
>>config,
>> let
>> concrete ProducerRecord/ConsumerRecord extends the abstract class and
>> override the serialize/deserialize methods.
>>
>> Public class AvroProducerRecord extends ProducerRecord
>> > GenericRecord> {
>> ...
>> @Override
>> Public byte[] serializeKey(String key) {Š}
>> @Override
>> public byte[] serializeValue(GenericRecord
>>value);
>> }

Re: 0.8.1 stability

2014-03-18 Thread Sriram Subramanian
Auto rebalance is already turned off in 0.8.1.

On 3/18/14 5:56 PM, "Neha Narkhede"  wrote:

>Thanks for giving the 0.8.1 release a spin! A few people have reported
>bugs
>with delete topic  and
>also the automatic leader
>rebalancingfeature
>that we released as part of 0.8.1. These features were released as
>beta and were not widely tested at LinkedIn, unlike most of the stuff that
>we have released in the past. As a result, more bugs are being reported on
>0.8.1 that are of a rather serious nature. It is unclear how many of these
>bugs might show up on 0.8.1 as more people adopt it. To mitigate the
>impact
>and also to continue releasing stable code to our users, I'm proposing we
>do a point release (0.8.1.1?) to turn off these features for the time
>being. Stabilize and test these and then release these as part of 0.8.2.
>
>Do people have any thoughts on this?
>
>Thanks,
>Neha



Re: Config for new clients (and server)

2014-02-10 Thread Sriram Subramanian
+1 on Jun's suggestion.

On 2/10/14 2:01 PM, "Jun Rao"  wrote:

>I actually prefer to see those at INFO level. The reason is that the
>config
>system in an application can be complex. Some configs can be overridden in
>different layers and it may not be easy to determine what the final
>binding
>value is. The logging in Kafka will serve as the source of truth.
>
>For reference, ZK client logs all overridden values during initialization.
>It's a one time thing during starting up, so shouldn't add much noise.
>It's
>very useful for debugging subtle config issues.
>
>Exposing final configs programmatically is potentially useful. If we don't
>want to log overridden values out of box, an app can achieve the same
>thing
>using the programming api. The only missing thing is that we won't know
>those unused property keys, which is probably less important than seeing
>the overridden values.
>
>Thanks,
>
>Jun
>
>
>On Mon, Feb 10, 2014 at 10:15 AM, Jay Kreps  wrote:
>
>> Hey Jun,
>>
>> I think that is reasonable but would object to having it be debug
>>logging?
>> I think logging out a bunch of noise during normal operation in a client
>> library is pretty ugly. Also, is there value in exposing the final
>>configs
>> programmatically?
>>
>> -Jay
>>
>>
>>
>> On Sun, Feb 9, 2014 at 9:23 PM, Jun Rao  wrote:
>>
>> > +1 on the new config. Just one comment. Currently, when initiating a
>> config
>> > (e.g. ProducerConfig), we log those overridden property values and
>>unused
>> > property keys (likely due to mis-spelling). This has been very useful
>>for
>> > config verification. It would be good to add similar support in the
>>new
>> > config.
>> >
>> > Thanks,
>> >
>> > Jun
>> >
>> >
>> > On Tue, Feb 4, 2014 at 9:34 AM, Jay Kreps  wrote:
>> >
>> > > We touched on this a bit in previous discussions, but I wanted to
>>draw
>> > out
>> > > the approach to config specifically as an item of discussion.
>> > >
>> > > The new producer and consumer use a similar key-value config
>>approach
>> as
>> > > the existing scala clients but have different implementation code to
>> help
>> > > define these configs. The plan is to use the same approach on the
>> server,
>> > > once the new clients are complete; so if we agree on this approach
>>it
>> > will
>> > > be the new default across the board.
>> > >
>> > > Let me split this into two parts. First I will try to motivate the
>>use
>> of
>> > > key-value pairs as a configuration api. Then let me discuss the
>> mechanics
>> > > of specifying and parsing these. If we agree on the public api then
>>the
>> > > public api then the implementation details are interesting as this
>>will
>> > be
>> > > shared across producer, consumer, and broker and potentially some
>> tools;
>> > > but if we disagree about the api then there is no point in
>>discussing
>> the
>> > > implementation.
>> > >
>> > > Let me explain the rationale for this. In a sense a key-value map of
>> > > configs is the worst possible API to the programmer using the
>>clients.
>> > Let
>> > > me contrast the pros and cons versus a POJO and motivate why I
>>think it
>> > is
>> > > still superior overall.
>> > >
>> > > Pro: An application can externalize the configuration of its kafka
>> > clients
>> > > into its own configuration. Whatever config management system the
>> client
>> > > application is using will likely support key-value pairs, so the
>>client
>> > > should be able to directly pull whatever configurations are present
>>and
>> > use
>> > > them in its client. This means that any configuration the client
>> supports
>> > > can be added to any application at runtime. With the pojo approach
>>the
>> > > client application has to expose each pojo getter as some config
>> > parameter.
>> > > The result of many applications doing this is that the config is
>> > different
>> > > for each and it is very hard to have a standard client config shared
>> > > across. Moving config into config files allows the usual tooling
>> (version
>> > > control, review, audit, config deployments separate from code
>>pushes,
>> > > etc.).
>> > >
>> > > Pro: Backwards and forwards compatibility. Provided we stick to our
>> java
>> > > api many internals can evolve and expose new configs. The
>>application
>> can
>> > > support both the new and old client by just specifying a config that
>> will
>> > > be unused in the older version (and of course the reverse--we can
>> remove
>> > > obsolete configs).
>> > >
>> > > Pro: We can use a similar mechanism for both the client and the
>>server.
>> > > Since most people run the server as a stand-alone process it needs a
>> > config
>> > > file.
>> > >
>> > > Pro: Systems like Samza that need to ship configs across the network
>> can
>> > > easily do so as configs have a natural serialized form. This can be
>> done
>> > > with pojos using java serialization but it is ugly and has bizare
>> failure
>> > > cases.
>> > >
>> > > Con: The IDE gives nice auto-completion for pojos.
>> > >
>> > > Con: There 

Re: unclean shutdown

2013-10-14 Thread Sriram Subramanian
Is this a bug? Can an unclean shutdown leave the index in a corrupt state
that the broker cannot start (expected state)?

On 10/14/13 4:52 PM, "Neha Narkhede"  wrote:

> It is possible that the unclean broker shutdown left the index in an
>inconsistent state. You can delete the corrupted index files, that will
>make the Kafka server rebuild the index on startup.
>
>Thanks,
>Neha
>
>
>On Mon, Oct 14, 2013 at 3:05 PM, Kane Kane  wrote:
>
>> After unclean shutdown kafka reports this error on startup:
>> [2013-10-14 16:44:24,898] FATAL Fatal error during KafkaServerStable
>> startup. Prepare to shutdown (kafka.server.KafkaServerStartable)
>> java.lang.IllegalArgumentException: requirement failed: Corrupt index
>> found, index file
>>(/disk1/kafka-logs/perf2-22/.index)
>> has non-zero size but the last offset is 0 and the base offset is 0
>>
>> How it's supposed to bring broker back after unclean shutdown?
>>
>> Thanks.
>>



Re: Question about auto-rebalancing

2013-10-11 Thread Sriram Subramanian
We already have a JIRA for auto rebalance. I would be working on this soon.

KAFKA-930 



On 10/11/13 5:39 PM, "Guozhang Wang"  wrote:

>Hello Siyuan,
>
>For the automatic leader re-election, yes we are considering to make it
>work. Could you file a JIRA for this issue?
>
>For the high-level consumer's rebalancing logic, you can find it at
>
>https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-CanIpredictthere
>sultsoftheconsumerrebabalance%3F
>
>Guozhang
>
>
>On Fri, Oct 11, 2013 at 11:06 AM, hsy...@gmail.com 
>wrote:
>
>> Hi Jun,
>>
>> Thanks for your reply, but in a real cluster, one broker could serve
>> different topics and different partitions, the simple consumer only has
>> knowledge of brokers that are available but it has no knowledge to
>>decide
>> which broker is best to pick up to consume messages.  If you don't
>>choose
>> carefully, multiple simple consumer might end up with reading from same
>> node which is definitely not good for performance.
>> Interesting thing is I find out there is
>> command kafka-preferred-replica-election.sh which will try to equally
>> distribute the leadership among different brokers, this is good that I
>>can
>> always let my simple consumer reads from leader broker(even it fails,
>>the
>> replica will pick up as leader which is fine).  But why don't kafka
>>cluster
>> run this command automatically when there is a broker change(up/down) in
>> the cluster so that the leadership can always be equally distributed
>>among
>> different brokers ASAP?  I think it's very good for simple consumer to
>> decide which broker is good to read from.
>>
>> Another question is I'm also curious how high-level consumer is
>>balanced. I
>> assume each high-level consumer know other consumers(int the same group)
>> which broker they read message from and it can try to avoid those
>>brokers
>> and to pick up a free one?  Is there a document for the balancing rule
>> among high-level consumer. Does it always guarantee that after several
>> leadership change/temporary broker fail, it can always equally
>>distribute
>> the read among the brokers. Basically I think it's nice to have a API to
>> let dev know which consumer reads from which broker otherwise I don't
>>know
>> anything behind the high-level consumer
>>
>> Thanks!
>>
>> Best,
>> Siyuan
>>
>
>
>
>-- 
>-- Guozhang



Re: Strategies for auto generating broker ID

2013-10-02 Thread Sriram Subramanian
Jason - You should be able to solve that with Jay's proposal below. If you
just persist the id in a meta file, you can copy the meta file over to the
new broker and broker will not re-generate another id.

On 10/2/13 11:10 AM, "Jason Rosenberg"  wrote:

>I recently moved away from generating a unique brokerId for each node, in
>favor of assigning ids in configuration.  The reason for this, is that in
>0.8, there isn't a convenient way yet to reassign partitions to a new
>brokerid, should one broker have a failure.  So, it seems the only
>work-around at the moment is to bring up a replacement broker, assign it
>the same brokerId as one that has failed and is no longer running.  The
>cluster will then automatically replicate all the partitions that were
>assigned to the failed broker to the new broker.
>
>This appears the only operational way to deal with failed brokers, at the
>moment.
>
>Longer term, it would be great if the cluster were self-healing, and if a
>broker went down, we could mark it as no longer available somehow, and the
>cluster would then reassign and re-replicate partitions to new brokers,
>that were previously assigned to the failed broker.  I expect something
>like this will be available in future versions, but that doesn't appear
>the
>case at present.
>
>And related, it would be nice, in the interests of horizontal scalability,
>to have an easy way for the cluster to dynamically rebalance load, if new
>nodes are added to the cluster (or to at least prefer assigning new
>partitions to brokers which have more space available).  I expect this
>will
>be something to prioritize in the future versions as well.
>
>Jason
>
>
>On Wed, Oct 2, 2013 at 1:00 PM, Sriram Subramanian <
>srsubraman...@linkedin.com> wrote:
>
>> I agree that we need a unique id and have something independent of the
>> machine. I am not sure you want a dependency on ZK to generate the
>>unique
>> id though. There are other ways to generate an unique id (Example -
>>UUID).
>> In case there was a collision (highly unlikely), the node creation in ZK
>> will anyways fail and the broker can regenerate another id.
>>
>> On 10/2/13 9:52 AM, "Jay Kreps"  wrote:
>>
>> >There are scenarios in which you want a hostname to change or you want
>>to
>> >move the stored data off one machine onto another. This is the
>>motivation
>> >systems have for having a layer of indirection between the location and
>> >the
>> >identity of the nodes.
>> >
>> >-Jay
>> >
>> >
>> >On Wed, Oct 2, 2013 at 9:23 AM, Guozhang Wang 
>>wrote:
>> >
>> >> Wondering what is the reason behind decoupling the node id with its
>> >> physical host(port)? If we found that for example, node 1 is not
>>owning
>> >>any
>> >> partitions, how would we know which physical machine is this node
>>then?
>> >>
>> >> Guozhang
>> >>
>> >>
>> >> On Wed, Oct 2, 2013 at 9:07 AM, Jay Kreps 
>>wrote:
>> >>
>> >> > I'm in favor of doing this if someone is willing to work on it! I
>> >>agree
>> >> it
>> >> > would really help with easy provisioning.
>> >> >
>> >> > I filed a bug to discuss and track:
>> >> > https://issues.apache.org/jira/browse/KAFKA-1070
>> >> >
>> >> > Some comments:
>> >> > 1. I'm not in favor of having a pluggable strategy, unless we are
>> >>really
>> >> > really sure this is an area where people are going to get a lot of
>> >>value
>> >> by
>> >> > writing lots of plugins. I am not at all sure why you would want to
>> >> retain
>> >> > the current behavior if you had a good strategy for automatically
>> >> > generating ids. Basically plugins are an evil we only want to
>>accept
>> >>when
>> >> > either we don't understand the problem or the solutions have such
>> >>extreme
>> >> > tradeoffs that there is no single "good approach". Plugins cause
>> >>problems
>> >> > for upgrades, testing, documentation, user understandability, code
>> >> > understandability, etc.
>> >> > 2. The node id can't be the host or port or anything tied to the
>> >>physical
>> >> > machine or its location on the network because you need to be able
>>to
>> >> > change these things. I recommend we just k

Re: Strategies for auto generating broker ID

2013-10-02 Thread Sriram Subramanian
I agree that we need a unique id and have something independent of the
machine. I am not sure you want a dependency on ZK to generate the unique
id though. There are other ways to generate an unique id (Example - UUID).
In case there was a collision (highly unlikely), the node creation in ZK
will anyways fail and the broker can regenerate another id.

On 10/2/13 9:52 AM, "Jay Kreps"  wrote:

>There are scenarios in which you want a hostname to change or you want to
>move the stored data off one machine onto another. This is the motivation
>systems have for having a layer of indirection between the location and
>the
>identity of the nodes.
>
>-Jay
>
>
>On Wed, Oct 2, 2013 at 9:23 AM, Guozhang Wang  wrote:
>
>> Wondering what is the reason behind decoupling the node id with its
>> physical host(port)? If we found that for example, node 1 is not owning
>>any
>> partitions, how would we know which physical machine is this node then?
>>
>> Guozhang
>>
>>
>> On Wed, Oct 2, 2013 at 9:07 AM, Jay Kreps  wrote:
>>
>> > I'm in favor of doing this if someone is willing to work on it! I
>>agree
>> it
>> > would really help with easy provisioning.
>> >
>> > I filed a bug to discuss and track:
>> > https://issues.apache.org/jira/browse/KAFKA-1070
>> >
>> > Some comments:
>> > 1. I'm not in favor of having a pluggable strategy, unless we are
>>really
>> > really sure this is an area where people are going to get a lot of
>>value
>> by
>> > writing lots of plugins. I am not at all sure why you would want to
>> retain
>> > the current behavior if you had a good strategy for automatically
>> > generating ids. Basically plugins are an evil we only want to accept
>>when
>> > either we don't understand the problem or the solutions have such
>>extreme
>> > tradeoffs that there is no single "good approach". Plugins cause
>>problems
>> > for upgrades, testing, documentation, user understandability, code
>> > understandability, etc.
>> > 2. The node id can't be the host or port or anything tied to the
>>physical
>> > machine or its location on the network because you need to be able to
>> > change these things. I recommend we just keep an integer.
>> >
>> > -Jay
>> >
>> >
>> > On Tue, Oct 1, 2013 at 7:08 AM, Aniket Bhatnagar <
>> > aniket.bhatna...@gmail.com
>> > > wrote:
>> >
>> > > Right. It is currently java integer. However, as per previous
>>thread,
>> it
>> > > seems possible to change it to a string. In that case, we can use
>> > instance
>> > > IDs, IP addresses, custom ID generators, etc.
>> > > How are you currently generating broker IDs from IP address? Chef
>> script
>> > or
>> > > custom shell script?
>> > >
>> > >
>> > > On 1 October 2013 18:34, Maxime Brugidou 
>> > > wrote:
>> > >
>> > > > I think it currently is a java (signed) integer or maybe this was
>> > > > zookeeper?
>> > > > We are generating the id from IP address for now but this is not
>> ideal
>> > > (and
>> > > > can cause integer overflow with java signed ints)
>> > > > On Oct 1, 2013 12:52 PM, "Aniket Bhatnagar" <
>> > aniket.bhatna...@gmail.com>
>> > > > wrote:
>> > > >
>> > > > > I would like to revive an older thread around auto generating
>> broker
>> > > ID.
>> > > > As
>> > > > > a AWS user, I would like Kafka to just use the instance's ID or
>> > > > instance's
>> > > > > IP or instance's internal domain (whichever is easier). This
>>would
>> > > mean I
>> > > > > can easily clone from a AMI to launch kafka instances without
>> having
>> > to
>> > > > > worry about setting a unique broker ID. This also alows me to
>>setup
>> > > auto
>> > > > > scaling.
>> > > > >
>> > > > > I realize 1 size may not fit all in this case. Other strategies
>> that
>> > > may
>> > > > > work for other cloud providers are generate the UUID and
>>persist it
>> > on
>> > > a
>> > > > > disk, etc.
>> > > > >
>> > > > > What I propose is a way to define a a broker ID generation
>>strategy
>> > in
>> > > > the
>> > > > > configuration file which points to a class file that is
>>responsible
>> > for
>> > > > > generating the ID. Is this something being already worked upon?
>> > > > >
>> > > >
>> > >
>> >
>>
>>
>>
>> --
>> -- Guozhang
>>



Re: Patch for mmap + windows

2013-09-09 Thread Sriram Subramanian
I did take a look at KAFKA-1008 a while back and added some comments.

On 9/9/13 3:52 PM, "Jay Kreps"  wrote:

>Cool can we get a reviewer for KAFKA-1008 then? I can take on the other
>issue for the checkpoint files.
>
>-Jay
>
>
>On Mon, Sep 9, 2013 at 3:16 PM, Neha Narkhede
>wrote:
>
>> +1 for windows support on 0.8
>>
>> Thanks,
>> Neha
>>
>>
>> On Mon, Sep 9, 2013 at 10:48 AM, Jay Kreps  wrote:
>>
>> > So guys, do we want to do these in 0.8? The first patch was a little
>> > involved but I think it would be good to have windows support in 0.8
>>and
>> it
>> > sounds like Tim is able to get things working after these changes.
>> >
>> > -Jay
>> >
>> >
>> > On Mon, Sep 9, 2013 at 10:19 AM, Timothy Chen 
>>wrote:
>> >
>> > > Btw, I've been running this patch in our cloud env and it's been
>> working
>> > > fine so far.
>> > >
>> > > I actually filed another bug as I saw another problem on windows
>> locally
>> > (
>> > > https://issues.apache.org/jira/browse/KAFKA-1036).
>> > >
>> > > Tim
>> > >
>> > >
>> > > On Wed, Aug 21, 2013 at 4:29 PM, Jay Kreps 
>> wrote:
>> > >
>> > > > That would be great!
>> > > >
>> > > > -Jay
>> > > >
>> > > >
>> > > > On Wed, Aug 21, 2013 at 3:13 PM, Timothy Chen 
>> > wrote:
>> > > >
>> > > > > Hi Jay,
>> > > > >
>> > > > > I'm planning to test run Kafka on Windows in our test
>>environments
>> > > > > evaluating if it's suitable for production usage.
>> > > > >
>> > > > > I can provide feedback with the patch how well it works and if
>>we
>> > > > encounter
>> > > > > any functional or perf problems.
>> > > > >
>> > > > > Tim
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Wed, Aug 21, 2013 at 2:54 PM, Jay Kreps 
>> > > wrote:
>> > > > >
>> > > > > > Elizabeth and I have a patch to support our memory mapped
>>offset
>> > > index
>> > > > > > files properly on Windows:
>> > > > > > https://issues.apache.org/jira/browse/KAFKA-1008
>> > > > > >
>> > > > > > Question: Do we want this on 0.8 or trunk? I would feel more
>> > > > comfortable
>> > > > > > with it in trunk, but that means windows support in 0.8 is
>>known
>> to
>> > > be
>> > > > > > broken (as opposed to "not known to be broken but not known
>>to be
>> > > > working
>> > > > > > either" since we are not doing aggressive system testing on
>> > windows).
>> > > > > >
>> > > > > > I would feel more comfortable doing the patch on 0.8 if there
>>was
>> > > > someone
>> > > > > > who would be willing to take on real load testing and/or
>> production
>> > > > > > operation on Windows so we could have some confidence that
>>Kafka
>> on
>> > > > > Windows
>> > > > > > actually works, otherwise this could just be the tip of the
>> > iceberg.
>> > > > > >
>> > > > > > Also it would be great to get review on that patch regardless
>>of
>> > the
>> > > > > > destination.
>> > > > > >
>> > > > > > -Jay
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>



Re: problems reasignating partitions

2013-09-04 Thread Sriram Subramanian
I have few questions before I can help you with your issue -

1. Which version/build of Kafka are you using?
2. Is the current status that you provided below before or after issuing
the reassign replica command?
3. If the answer to 2 above is "after issuing the command", how long was
it after which you checked zk state?

On 9/3/13 9:08 AM, "Jun Rao"  wrote:

>What you saw is not normal. In sync replicas should always be a subset of
>assigned replicas. Did that anomaly happen before or after reassigning
>partitions? Is that reproducible?
>
>Thanks,
>
>Jun
>
>
>
>On Tue, Sep 3, 2013 at 12:16 AM, Pablo Nebrera <
>pablonebr...@eneotecnologia.com> wrote:
>
>> Hello
>>
>> I have a cluster with:
>> * three nodes: rb01, rb02 and rb03
>> * three topics: rb_event, rb_monitor, rb_flow
>> * three partitions per topic
>> * replicas 2
>>
>>
>> The current status is:
>>
>> Topic  Partition  Leader  Replicas In-Sync replicas
>>
>> 
>>-
>>--
>> rb_event   0  2   2,1  2,0
>>1  2   1,2  2,0
>>2  2   1,2  2
>>
>> 
>>-
>>--
>> rb_flow0  2   1,2  2,0
>>1  2   2,0  2,0
>>2  0   0,1  0,1
>>
>> 
>>-
>>--
>> rb_monitor 0  2   1,2  2
>>1  2   2,1  2,0
>>2  0   0,2  2,0
>>
>> 
>>-
>>--
>>
>>
>> I see something strange. For example:
>>
>> * rb_event -> partition 0 has replicas 2,1 but In sync replicas are 2,0
>>  -> that is normal ???
>>
>> And the worst it when I try to reassign propertly the partitions:
>>
>> The kafka-reassign-partitions.sh  tries to write
>> to /admin/reassign_partitions the following:
>>
>>
>> 
>>{"partitions":[{"topic":"rb_event","partition":0,"replicas":[0,1]},{"topi
>>c":"rb_event","partition":1,"replicas":[1,2]},{"topic":"rb_event","partit
>>ion":2,"replicas":[2,0]},{"topic":"rb_monitor","partition":0,"replicas":[
>>2,0]},{"topic":"rb_monitor","partition":1,"replicas":[0,1]},{"topic":"rb_
>>monitor","partition":2,"replicas":[1,2]}]}
>>
>>
>> When I access to zookeeper I do see:
>> [zk: rb02:2181,rb01:2181,rb03:2181(CONNECTED) 3] get
>> /admin/reassign_partitions
>> { "partitions":[ { "partition":2, "replicas":[ 1, 2 ],
>>"topic":"rb_monitor"
>> }, { "partition":0, "replicas":[ 2, 0 ], "topic":"rb_monitor" }, {
>> "partition":1, "replicas":[ 0, 1 ], "topic":"rb_monitor" }, {
>> "partition":0, "replicas":[ 0, 1 ], "topic":"rb_event" }, {
>>"partition":2,
>> "replicas":[ 2, 0 ], "topic":"rb_event" } ], "version":1 }
>>
>>
>> And keep there for ever
>>
>> What can I do ?
>>
>> Thanks
>>
>>
>>
>> Pablo
>>



Re: compression performance

2013-08-15 Thread Sriram Subramanian
We need to first decide on the right behavior before optimizing on the
implementation.

Few key goals that I would put forward are -

1. Decoupling compression codec of the producer and the log
2. Ensuring message validity by the server on receiving bytes. This is
done by the iterator today and this is important to ensure bad data does
not creep in
3. Simple consumer implementation
4. Implementation that has good performance and efficiency

With the above points in mind, I suggest we switch to Snappy as the
default compression, optimize the code on the server end to avoid
unnecessary copies and remove producer side compression completely except
for cross DC sends.

On 8/15/13 11:28 AM, "Jay Kreps"  wrote:

>Here is a comment from Guozhong on this issue. He posted it on the
>compression byte-copying issue, but it is really about not needing to do
>compression. His suggestion is interesting though it ends up pushing more
>complexity into consumers.
>
>Guozhang Wang commented on KAFKA-527:
>-
>
>One alternative approach would be like this:
>
>Currently in the compression code (ByteBufferMessageSet.create), for each
>message we write 1) the incrementing logical offset in LONG, 2) the
>message
>byte size in INT, and 3) the message payload.
>
>The idea is that since the logical offset is just incrementing, hence with
>a compressed message, as long as we know the offset of the first message,
>we would know the offset of the rest messages without even reading the
>offset field.
>
>So we can ignore reading the offset of each message inside of the
>compressed message but only the offset of the wrapper message which is the
>offset of the last message + 1, and then in assignOffsets just modify the
>offset of the wrapper message. Another change would be at the consumer
>side, the iterator would need to be smart of interpreting the offsets of
>messages while deep-iterating the compressed message.
>
>As Jay pointed out, this method would not work with log compaction since
>it
>would break the assumption that offsets increments continuously. Two
>workarounds of this issue:
>
>1) In log compaction, instead of deleting the to-be-deleted-message just
>setting its payload to null but keep its header and hence keeping its slot
>in the incrementing offset.
>2) During the compression process, instead of writing the absolute value
>of
>the logical offset of messages, write the deltas of their offset compared
>with the offset of the wrapper message. So -1 would mean continuously
>decrementing from the wrapper message offset, and -2/3/... would be
>skipping holes in side the compressed message.
>
>
>On Fri, Aug 2, 2013 at 10:19 PM, Jay Kreps  wrote:
>
>> Chris commented in another thread about the poor compression performance
>> in 0.8, even with snappy.
>>
>> Indeed if I run the linear log write throughput test on my laptop I see
>> 75MB/sec with no compression and 17MB/sec with snappy.
>>
>> This is a little surprising as snappy claims 200MB round-trip
>>performance
>> (compress + uncompress) from java. So what is going on?
>>
>> Well you may remember I actually filed a bug a while back on all the
>> inefficient byte copying in the compression path (KAFKA-527). I didn't
>> think too much of it, other than it is a bit sloppy, since after all
>> computers are good at copying bytes, right?
>>
>> Turns out not so much, if you look at a profile of the standalone log
>>test
>> you see that with no compression 80% of the time goes to
>>FileChannel.write,
>> which is reasonable since that is what a log does.
>>
>> But with snappy enabled only 5% goes to writing data, 50% of the time
>>goes
>> to byte copying and allocation, and only about 22% goes to actual
>> compression and decompression (with lots of misc stuff in their I
>>haven't
>> bothered to tally).
>>
>> If someone was to optimize this code path I think we could take a patch
>>in
>> 0.8.1. It shouldn't be too hard, just using the backing array on the
>>byte
>> buffer and avoiding all the input streams, output streams, byte array
>> output streams, and intermediate message blobs.
>>
>> I summarized this along with how to reproduce the test results here:
>> https://issues.apache.org/jira/browse/KAFKA-527
>>
>> -Jay
>>



Re: Stale TopicMetadata

2013-07-11 Thread Sriram Subramanian
We need to improve how the metadata caching works in kafka. Currently, we
have multiple places where we send the updated metadata to the individual
brokers from the controller when the state of the metadata changes. This
is hard to track. What we need to implement is to let the metadata
structure in the controller to automatically send updates when a list of
fields change in the metadata structure. This structure would be
initialized on startup with the list of fields that need to trigger
metadata update requests to the individual brokers. Once that is done, the
structure would take care of reporting the state changes to the individual
brokers. I will follow up on this with a JIRA for the next release.

On 7/11/13 10:53 AM, "Colin Blower"  wrote:

>Hm... the cache may explain some odd behavior I was seeing in our
>cluster yesterday.
>
>The zookeeper information for which nodes were In Sync Replicas was
>different that the data I received from a metadata request response.
>Zookeeper said two nodes were ISR and the metadata response said only
>the leader was ISR. This was after one of our two nodes went down, but
>when both were back up.
>
>I don't have the time to properly try to reproduce it, but it may be
>something to think about if you are looking at caching issues.
>
>
>On 07/10/2013 10:16 PM, Jun Rao wrote:
>> That's actually not expected. We should only return live brokers to the
>> client. It seems that we never clear the live broker cache in the
>>brokers.
>> This is a bug. Could you file a jira?
>>
>> Thanks,
>>
>> Jun
>>
>>
>> On Wed, Jul 10, 2013 at 8:52 AM, Vinicius Carvalho <
>> viniciusccarva...@gmail.com> wrote:
>>
>>> Hi there. Once again, I don't think I could get the docs on another
>>>topic.
>>>
>>> So my nodejs client connects to the broker and the first thing it does
>>>is
>>> store the topic metadata:
>>>
>>> data received
>>> {
>>> "brokers": [
>>> {
>>> "nodeId": 0,
>>> "host": "10.139.245.106",
>>> "port": 9092,
>>> "byteLength": 24
>>> },
>>> {
>>> "nodeId": 1,
>>> "host": "localhost",
>>> "port": 9093,
>>> "byteLength": 19
>>> }
>>> ],
>>> "topicMetadata": [
>>> {
>>> "topicErrorCode": 0,
>>> "topicName": "foozbar",
>>> "partitions": [
>>> {
>>> "replicas": [
>>> 0
>>> ],
>>> "isr": [
>>> 0
>>> ],
>>> "partitionErrorCode": 0,
>>> "partitionId": 0,
>>> "leader": 0,
>>> "byteLength": 26
>>> },
>>> {
>>> "replicas": [
>>> 1
>>> ],
>>> "isr": [
>>> 1
>>> ],
>>> "partitionErrorCode": 0,
>>> "partitionId": 1,
>>> "leader": 1,
>>> "byteLength": 26
>>> },
>>> {
>>> "replicas": [
>>> 0
>>> ],
>>> "isr": [
>>> 0
>>> ],
>>> "partitionErrorCode": 0,
>>> "partitionId": 2,
>>> "leader": 0,
>>> "byteLength": 26
>>> },
>>> {
>>> "replicas": [
>>> 1
>>> ],
>>> "isr": [
>>> 1
>>> ],
>>> "partitionErrorCode": 0,
>>> "partitionId": 3,
>>> "leader": 1,
>>> "byteLength": 26
>>> },
>>> {
>>> "replicas": [
>>> 0
>>> ],
>>> "isr": [
>>> 0
>>> ],
>>> "partitionErrorCode": 0,
>>> "partitionId": 4,
>>> "leader": 0,
>>> "byteLength": 26
>>> }
>>> ],
>>> "byteLength": 145
>>> }
>>> ],
>>> "responseSize": 200,
>>> "correlationId": -1000
>>> }
>>>
>>> Ok, so far so good. So I kill node 0 on purpose. Trying to simulate a
>>> broker failure, and then I fetch metadata again:
>>>
>>> data received
>>> {
>>> "brokers": [
>>> {
>>> "nodeId": 0,
>>> "host": "10.139.245.106",
>>> "port": 9092,
>>> "byteLength": 24
>>> },
>>> {
>>> "nodeId": 1,
>>> "host": "localhost",
>>> "port": 9093,
>>> "byteLe

Re: Does ShutdownBroker returns meaningful return code

2013-07-10 Thread Sriram Subramanian
One thing to note is that we do support controlled shutdown as part of the
regular shutdown hook in the broker. The wiki was not very clear w.r.t
this and I have updated it to convey this. You can turn on controlled
shutdown by setting "controlled.shutdown.enable" to true in kafka config.
This will ensure that on shutdown it moves the leadership over and then
gracefully shuts down. Two configs control the number of retries and back
off during the shutdown "controlled.shutdown.max.retries" and
"controlled.shutdown.retry.backoff.ms". Going forward we would be using
this approach to automate our shutdown.

On 7/10/13 11:05 PM, "Joel Koshy"  wrote:

>It's not ideal - right now we use the JMX operation (which returns an
>empty set on a successful controlled shutdown). If not it returns a
>set containing the partitions still being led on the broker. We retry
>(with appropriate intervals) until it succeeds. After that we do a
>regular broker shutdown (SIGTERM). i.e., it is currently not automated
>and it does take a while (an hour or so) to do a rolling bounce across
>a 16 node cluster with a few hundred topics. We could also use the
>inbuilt controlled shutdown feature on the broker to do the same thing
>- which is also better because the JMX port is not always open to
>remote access in production environments.
>
>It is possible to automate it to some degree - and if controlled
>shutdown fails after 'n' retries the automation policy could be to
>proceed with the unclean shutdown or abort and wait for manual
>intervention. Another issue is that when a broker is taken down there
>will be underreplicated partitions in the cluster. When the broker
>comes back up we should (ideally) wait until the underreplicated
>partition count goes back down to zero before proceeding to the next
>broker - otherwise that broker could take longer to do its controlled
>shutdown (since it needs to move leadership of partitions it leads to
>other replicas which would not be possible if the other replica is the
>broker that just came up). We currently don't have an easy way to
>integrate this seamlessly with the deployment system at Linkedin.
>
>Joel
>
>On Wed, Jul 10, 2013 at 9:48 PM, Vadim Keylis 
>wrote:
>> Joel. How do you guys do kafka service shutdown and startup?
>>
>>
>> On Wed, Jul 10, 2013 at 5:32 PM, Joel Koshy  wrote:
>>
>>> https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools
>>> has more details. The ShutdownBroker tool does not return anything.
>>> i.e., it does not exit with a System.exit code to pass back to a
>>> shell. it only logs if controlled shutdown was complete or not. You
>>> will need to configure the number of shutdown retries and retry
>>> interval suitably for the controlled shutdown to complete. E.g., if
>>> you have a very large number of partitions led by a broker it may take
>>> more time to do a controlled shutdown on that broker (because leaders
>>> need to be moved). (This yet another reason why we recommend that you
>>> maintain even distribution of leaders - this can be accomplished by
>>> running the PreferredReplicaLeaderElection command.)
>>>
>>> On Wed, Jul 10, 2013 at 4:06 PM, Vadim Keylis 
>>> wrote:
>>> > We have deployed kafka 0.8 beta1. It was my understanding that
>>> > ShutdownBroker program needs be used to initiate proper shutdown of
>>>the
>>> > server. We are going to use this script in automated fashion. Does
>>>the
>>> > script return meaningful error code that can be capture by calling
>>>script
>>> > and act up on? What other proper ways to shutdown kafka 08?
>>> >
>>> > Thanks,
>>> > Vadim
>>>



Re: Error when processing messages in Windows

2013-07-09 Thread Sriram Subramanian
As far as I am aware it is not possible to resize mapped buffer without
unmapping in Windows. W.r.t Java the bug here gives more context on why it
does not support synchronous unmap function.

http://bugs.sun.com/view_bug.do?bug_id=4724038



On 7/9/13 9:54 PM, "Jay Kreps"  wrote:

>The problem appears to be that we are resizing a memory mapped file which
>it looks like windows does not allow (which is kind of sucky).
>
>The offending method is OffsetIndex.resize().
>
>The most obvious fix would be to first unmap the file, then resize, then
>remap it. We can't do this though because Java actually doesn't support
>unmapping files (it does this lazily with garbage collection, which really
>sucks). In fact as far as I know there is NO way to guarantee an unmap
>occurs at a particular time, so if this is correct and windows doesn't
>allow resizing then this combination of suckiness means that there is no
>way to resize a file that has ever been mapped short of closing the
>process.
>
>I actually don't have access to a windows machine so it is a little hard
>for me to test this. The question is whether there is any work around. I
>am
>happy to change that method but we do need to be able to resize memory
>mapped files.
>
>
>
>
>
>
>On Tue, Jul 9, 2013 at 9:04 PM, Jun Rao  wrote:
>
>> Hmm, not sure what the issue is. Any windows user wants to chime in?
>>
>> Thanks,
>>
>> Jun
>>
>>
>> On Tue, Jul 9, 2013 at 9:00 AM, Denny Lee  wrote:
>>
>> > Hey Jun,
>> >
>> > We've been running into this issue when running perf.Performance as
>>per
>> > http://blog.liveramp.com/2013/04/08/kafka-0-8-producer-performance-2/.
>> > When running it using 100K messages, it works fine on Windows with
>>about
>> > 20-30K msg/s.  But when running it with 1M messages, then the broker
>> fails
>> > as per the message below.  It does not appear that modifying the JVM
>> > memory configurations nor running on SSDs has any effect.   As for
>>JVMs -
>> > no plug ins and we've tried both 1.6 and OpenJDK 1.7.
>> >
>> > This looks like a JVM memory map issue on Windows issue - perhaps
>>running
>> > some System.gc() to prevent the roll?
>> >
>> > Any thoughts?
>> >
>> > Thanks!
>> > Denny
>> >
>> >
>> >
>> >
>> > On 7/9/13 7:55 AM, "Jun Rao"  wrote:
>> >
>> > >A couple of users seem to be able to get 0.8 working on Windows. Any
>> thing
>> > >special about your Windows environment? Are you using any jvm
>>plugins?
>> > >
>> > >Thanks,
>> > >
>> > >Jun
>> > >
>> > >
>> > >On Tue, Jul 9, 2013 at 12:59 AM, Timothy Chen 
>> wrote:
>> > >
>> > >> Hi all,
>> > >>
>> > >> I've tried pushing a large amount of messages into Kafka on
>>Windows,
>> and
>> > >> got the following error:
>> > >>
>> > >> Caused by: java.io.IOException: The requested operation cannot be
>> > >>performed
>> > >> on a
>> > >>  file with a user-mapped section open
>> > >> at java.io.RandomAccessFile.setLength(Native Method)
>> > >> at 
>>kafka.log.OffsetIndex.liftedTree2$1(OffsetIndex.scala:263)
>> > >> at kafka.log.OffsetIndex.resize(OffsetIndex.scala:262)
>> > >> at
>> kafka.log.OffsetIndex.trimToValidSize(OffsetIndex.scala:247)
>> > >> at kafka.log.Log.rollToOffset(Log.scala:518)
>> > >> at kafka.log.Log.roll(Log.scala:502)
>> > >> at kafka.log.Log.maybeRoll(Log.scala:484)
>> > >> at kafka.log.Log.append(Log.scala:297)
>> > >> ... 19 more
>> > >>
>> > >> I suspect that Windows is not releasing memory mapped file
>>references
>> > >>soon
>> > >> enough.
>> > >>
>> > >> I wonder if there is any good workaround or solutions for this?
>> > >>
>> > >> Thanks!
>> > >>
>> > >> Tim
>> > >>
>> >
>> >
>> >
>>



Re: site updates

2013-07-01 Thread Sriram Subramanian
; documentation="The port used by the kafka
>> > broker to handle requests.")
>> > If we did it this way we could actually have a dumpConfigs method that
>> > would print out the up-to-date table of configs so we could more
>> > easily keep the docs in sync.
>> >
>> > For the time being, though, we should just keep the docs updated.
>> >
>> > -Jay
>> >
>> > On Fri, Jun 28, 2013 at 5:55 PM, Joel Koshy 
>>wrote:
>> > > Looks good overall - thanks a lot for the improvements.
>> > >
>> > > Couple of comments: clicking on the 0.7 link goes to the migration
>> > > page (which should probably be on the 0.8 link)
>> > > Also, for the configuration.html file, I used to find the old scala
>> > > docs pointing to the actual *Config classes more current and
>> > > informative. The site can drift over time.
>> > >
>> > > Joel
>> > >
>> > > On Fri, Jun 28, 2013 at 2:49 PM, Sriram Subramanian
>> > >  wrote:
>> > >>
>> > >>
>> > >> On 6/28/13 2:48 PM, "Sriram Subramanian"
>>
>> > >> wrote:
>> > >>
>> > >>>1. I have moved the FAQ to a wiki. I have separated the sections
>>into
>> > >>>producer, consumers and broker related questions. I would still
>>need
>> to
>> > >>>add replication FAQ. The main FAQ will now link to this. Let me
>>know
>> if
>> > >>>you guys have better ways of representing the FAQ.
>> > >>>
>> > >>>https://cwiki.apache.org/confluence/display/KAFKA/FAQ
>> > >>>
>> > >>>
>> > >>>2. I yanked the implementation part of the design doc and added it
>>as
>> a
>> > >>>separate section for 0.7. We need to add similar section for 0.8.
>> > >>>
>> > >>>3. I also made the migration link directly point to the wiki. It
>>might
>> > >>>also make sense to convert the wiki to an html page.
>> > >>>
>> > >>>
>> > >>>
>> > >>>On 6/27/13 7:17 PM, "Sriram Subramanian"
>>
>> > >>>wrote:
>> > >>>
>> > >>>>Looks much better.
>> > >>>>
>> > >>>>1. We need to update FAQ for 0.8
>> > >>>>2. We should probably have a separate section for implementation.
>> > >>>>3. The migration tool explanation seems to be hard to get to.
>> > >>>>
>> > >>>>
>> > >>>>
>> > >>>>On 6/27/13 5:40 PM, "Jay Kreps"  wrote:
>> > >>>>
>> > >>>>>Hey Folks,
>> > >>>>>
>> > >>>>>I did a pass on the website. Changes:
>> > >>>>>1. Rewrote the 0.8 quickstart and included a section on running
>>in
>> > >>>>>distributed mode.
>> > >>>>>2. Fixed up the styles a bit.
>> > >>>>>3. Fixed the bizarre menu thing with 0.7 and 0.8 specific docs.
>> > >>>>>4. Re-wrote the copy on the front page.
>> > >>>>>
>> > >>>>>I would love to get any feedback on how we could improve the
>>site,
>> > >>>>>documentation, etc.
>> > >>>>>
>> > >>>>>I would like to do at least the following:
>> > >>>>>1. Generate scaladoc just for the client classes.
>> > >>>>>2. See if there isn't some way to generate javadoc for the java
>>api
>> > >>>>>3. Rewrite the design document for 0.8
>> > >>>>>4. Update the operations guide to cover 0.8
>> > >>>>>
>> > >>>>>-Jay
>> > >>>>
>> > >>>
>> > >>
>> >
>>



Re: site updates

2013-06-28 Thread Sriram Subramanian


On 6/28/13 2:48 PM, "Sriram Subramanian" 
wrote:

>1. I have moved the FAQ to a wiki. I have separated the sections into
>producer, consumers and broker related questions. I would still need to
>add replication FAQ. The main FAQ will now link to this. Let me know if
>you guys have better ways of representing the FAQ.
>
>https://cwiki.apache.org/confluence/display/KAFKA/FAQ
>
>
>2. I yanked the implementation part of the design doc and added it as a
>separate section for 0.7. We need to add similar section for 0.8.
>
>3. I also made the migration link directly point to the wiki. It might
>also make sense to convert the wiki to an html page.
>
>
>
>On 6/27/13 7:17 PM, "Sriram Subramanian" 
>wrote:
>
>>Looks much better.
>>
>>1. We need to update FAQ for 0.8
>>2. We should probably have a separate section for implementation.
>>3. The migration tool explanation seems to be hard to get to.
>>
>>
>>
>>On 6/27/13 5:40 PM, "Jay Kreps"  wrote:
>>
>>>Hey Folks,
>>>
>>>I did a pass on the website. Changes:
>>>1. Rewrote the 0.8 quickstart and included a section on running in
>>>distributed mode.
>>>2. Fixed up the styles a bit.
>>>3. Fixed the bizarre menu thing with 0.7 and 0.8 specific docs.
>>>4. Re-wrote the copy on the front page.
>>>
>>>I would love to get any feedback on how we could improve the site,
>>>documentation, etc.
>>>
>>>I would like to do at least the following:
>>>1. Generate scaladoc just for the client classes.
>>>2. See if there isn't some way to generate javadoc for the java api
>>>3. Rewrite the design document for 0.8
>>>4. Update the operations guide to cover 0.8
>>>
>>>-Jay
>>
>



Re: site updates

2013-06-28 Thread Sriram Subramanian
1. I have moved the FAQ to a wiki. I have separated the sections into
producer, consumers and broker related questions. I would still need to
add replication FAQ. The main FAQ will now link to this. Let me know if
you guys have better ways of representing the FAQ.

https://cwiki.apache.org/confluence/display/KAFKA/FAQ


2. I yanked the implementation part of the design doc and added it as a
separate section for 0.7. We need to add similar section for 0.8.

3. I also made the migration link directly point to the wiki. It might
also make sense to convert the wiki to an html page.



On 6/27/13 7:17 PM, "Sriram Subramanian" 
wrote:

>Looks much better.
>
>1. We need to update FAQ for 0.8
>2. We should probably have a separate section for implementation.
>3. The migration tool explanation seems to be hard to get to.
>
>
>
>On 6/27/13 5:40 PM, "Jay Kreps"  wrote:
>
>>Hey Folks,
>>
>>I did a pass on the website. Changes:
>>1. Rewrote the 0.8 quickstart and included a section on running in
>>distributed mode.
>>2. Fixed up the styles a bit.
>>3. Fixed the bizarre menu thing with 0.7 and 0.8 specific docs.
>>4. Re-wrote the copy on the front page.
>>
>>I would love to get any feedback on how we could improve the site,
>>documentation, etc.
>>
>>I would like to do at least the following:
>>1. Generate scaladoc just for the client classes.
>>2. See if there isn't some way to generate javadoc for the java api
>>3. Rewrite the design document for 0.8
>>4. Update the operations guide to cover 0.8
>>
>>-Jay
>



Re: site updates

2013-06-27 Thread Sriram Subramanian
Looks much better.

1. We need to update FAQ for 0.8
2. We should probably have a separate section for implementation.
3. The migration tool explanation seems to be hard to get to.



On 6/27/13 5:40 PM, "Jay Kreps"  wrote:

>Hey Folks,
>
>I did a pass on the website. Changes:
>1. Rewrote the 0.8 quickstart and included a section on running in
>distributed mode.
>2. Fixed up the styles a bit.
>3. Fixed the bizarre menu thing with 0.7 and 0.8 specific docs.
>4. Re-wrote the copy on the front page.
>
>I would love to get any feedback on how we could improve the site,
>documentation, etc.
>
>I would like to do at least the following:
>1. Generate scaladoc just for the client classes.
>2. See if there isn't some way to generate javadoc for the java api
>3. Rewrite the design document for 0.8
>4. Update the operations guide to cover 0.8
>
>-Jay



Re: batch sending in sync mode

2013-06-24 Thread Sriram Subramanian
The messages will be grouped by their destination broker and further
grouped by topic/partition. The send then happens to each broker with a
list of topic/partitions and messages for them and waits for an
acknowledgement from each broker. This happens sequentially. So, the
messages are acknowledged in batches per broker. Going forward, we are
looking into providing some kind of callback mechanism for the async mode
after a successful send that would help one to perform further processing.

On 6/24/13 1:29 AM, "Jason Rosenberg"  wrote:

>I have been using async mode with 0.7.2, but I'm wondering if I should
>switch to sync mode, so I can use the new request.required.acks mode in a
>sensible way.
>
>I am already managing an async queue that then dispatches to the samsa
>producer.
>
>I'm wondering how the acknowledgement mode works when sending lists of
>messages with a synchronous producer.  Will each message in the list be
>individually acknowledged, or is there a chance for it to be acknowledged
>as a single batch (which would appear to be more desirable for efficiency
>reasons)?
>
>Thanks,
>
>Jason



Re: preferred leader election...

2013-06-23 Thread Sriram Subramanian
Hi Jason,

A rolling bounce will create an imbalance in the number of leaders
distribution between the brokers and this is not ideal. We do plan to have
the preferred leader election tool integrated into kafka that periodically
balances the leader count across the brokers in the cluster. For now you
are left with the tool and you should be able to script it to be invoked
whenever you need it.

On 6/23/13 2:10 AM, "Jason Rosenberg"  wrote:

>So, I'm running into the case where after issuing a rolling restart, with
>controlled shutdown enabled, the last server restarted ends up without any
>partitions that it's the leader of.  This is more pronounced of course if
>I
>have only 2 servers in the cluster (during testing).  I presume it's kind
>of a problem to have imbalance in the number of partitions each server is
>leader of?  (Or maybe not such a big deal, since the servers still need to
>forward messages between themselves anyway?).
>
>So, I was looking at the 'preferred leader election' tool here:
>
>https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools
>
>Is there a way to programmatically invoke this.  I was thinking it make
>sense to do periodically (e.g. once an hour), and also after a restart.
> But I'd like it to happen from within my app (which contains kafka),
>rather than invoking an external tool.
>
>Thanks,
>
>Jason



Re: produce request failed: due to Leader not local for partition

2013-06-23 Thread Sriram Subramanian
Hey Jason,

The producer on failure initiates a metadata request to refresh its state
and should issue subsequent requests to the new leader. The errors that
you see should only happen once per topic partition per producer. Let me
know if this is not what you see. On the producer end you should see the
following info logging -

"Back off for x ms before retrying send. Remaining retries = y"

If all the retries of the producer failed, you should see error message
below - 

"Failed to send requests for topics"



On 6/23/13 1:45 AM, "Jason Rosenberg"  wrote:

>I'm working on trying on having seamless rolling restarts for my kafka
>servers, running 0.8.  I have it so that each server will be restarted
>sequentially.  Each server takes itself out of the load balancer (e.g.
>sets
>a status that the lb will recognize, and then waits more than long enough
>for the lb to stop sending meta-data requests to that server).  Then I
>initiate the shutdown (with controlled.shutdown.enable=true).  This seems
>to work well, however, I occasionally see warnings like this in the log
>from the server, after restart:
>
>2013-06-23 08:28:46,770  WARN [kafka-request-handler-2] server.KafkaApis -
>[KafkaApi-508818741] Produce request with correlation id 7136261 from
>client  on partition [mytopic,0] failed due to Leader not local for
>partition [mytopic,0] on broker 508818741
>
>This WARN seems to persistently repeat, until the producer client
>initiates
>a new meta-data request (e.g. every 10 minutes, by default).  However, the
>producer doesn't log any errors/exceptions when the server is logging this
>WARN.
>
>What's happening here?  Is the message silently being forwarded on to the
>correct leader for the partition?  Is the message dropped?  Are these
>WARNS
>particularly useful?
>
>Thanks,
>
>Jason