Re: [ANNOUNCE] New Kafka PMC member: Matthias J. Sax

2019-04-22 Thread Becket Qin
Congrats, Matthias!

On Sat, Apr 20, 2019 at 10:28 AM Matthias J. Sax 
wrote:

> Thank you all!
>
> -Matthias
>
>
> On 4/19/19 3:58 PM, Lei Chen wrote:
> > Congratulations Matthias! Well deserved!
> >
> > -Lei
> >
> > On Fri, Apr 19, 2019 at 2:55 PM James Cheng  > > wrote:
> >
> > Congrats!!
> >
> > -James
> >
> > Sent from my iPhone
> >
> > > On Apr 18, 2019, at 2:35 PM, Guozhang Wang  > > wrote:
> > >
> > > Hello Everyone,
> > >
> > > I'm glad to announce that Matthias J. Sax is now a member of Kafka
> > PMC.
> > >
> > > Matthias has been a committer since Jan. 2018, and since then he
> > continued
> > > to be active in the community and made significant contributions
> the
> > > project.
> > >
> > >
> > > Congratulations to Matthias!
> > >
> > > -- Guozhang
> >
>
>


Re: [ANNOUNCE] New Committer: Randall Hauch

2019-02-17 Thread Becket Qin
Congratulations, Randall!

On Sat, Feb 16, 2019 at 2:44 AM Matthias J. Sax 
wrote:

> Congrats Randall!
>
>
> -Matthias
>
> On 2/14/19 6:16 PM, Guozhang Wang wrote:
> > Hello all,
> >
> > The PMC of Apache Kafka is happy to announce another new committer
> joining
> > the project today: we have invited Randall Hauch as a project committer
> and
> > he has accepted.
> >
> > Randall has been participating in the Kafka community for the past 3
> years,
> > and is well known as the founder of the Debezium project, a popular
> project
> > for database change-capture streams using Kafka (https://debezium.io).
> More
> > recently he has become the main person keeping Kafka Connect moving
> > forward, participated in nearly all KIP discussions and QAs on the
> mailing
> > list. He's authored 6 KIPs and authored 50 pull requests and conducted
> over
> > a hundred reviews around Kafka Connect, and has also been evangelizing
> > Kafka Connect at several Kafka Summit venues.
> >
> >
> > Thank you very much for your contributions to the Connect community
> Randall
> > ! And looking forward to many more :)
> >
> >
> > Guozhang, on behalf of the Apache Kafka PMC
> >
>
>


[ANNOUNCE] New Committer: Dong Lin

2018-03-28 Thread Becket Qin
Hello everyone,

The PMC of Apache Kafka is pleased to announce that Dong Lin has accepted
our invitation to be a new Kafka committer.

Dong started working on Kafka about four years ago, since which he has
contributed numerous features and patches. His work on Kafka core has been
consistent and important. Among his contributions, most noticeably, Dong
developed JBOD (KIP-112, KIP-113) to handle disk failures and to reduce
overall cost, added deleteDataBefore() API (KIP-107) to allow users
actively remove old messages. Dong has also been active in the community,
participating in KIP discussions and doing code reviews.

Congratulations and looking forward to your future contribution, Dong!

Jiangjie (Becket) Qin, on behalf of Apache Kafka PMC


Re: Streams meetup@LinkedIn (Mar 21)

2018-03-15 Thread Becket Qin
Bump. We are going to have the following talks:

6:35-7:10 PM: Apache Pulsar - The next generation messaging system (Karthik
Ramasamy, Co-Founder at Streamlio)
7:15-7:50 PM: Conquering the Lambda architecture in LinkedIn metrics
platform with Apache Calcite and Apache Samza (Khai Tran, Staff Software
Engineer, LinkedIn)
7:55-8:30 PM: Building Venice with Apache Kafka & Samza (Gaojie Liu, Senior
Software Engineer, LinkedIn)

Please use the following link to RSVP if you are interested.
https://www.meetup.com/Stream-Processing-Meetup-LinkedIn/events/248309045/

Thanks,

Jiangjie (Becket) Qin

On Sat, Mar 10, 2018 at 2:08 PM, Becket Qin <becket@gmail.com> wrote:

> Hi Kafka users and developers,
>
> We are going to host our quarterly Stream Processing Meetup@LinkedIn on
> Mar 21. There will be three talks about Apache Pulsar, Apache Calcite and
> Apache Samza, as well as LinkedIn's latest K-V store Venice built on top of
> Apache Kafka and Apache Samza.
>
> Please check the details below if you are interested.
>
> https://www.meetup.com/Stream-Processing-Meetup-LinkedIn/events/248309045/
> <https://www.meetup.com/Stream-Processing-Meetup-LinkedIn/events/248309045/>
>
> Thanks,
>
> Jiangjie (Becket) Qin
>


Streams meetup@LinkedIn (Mar 21)

2018-03-10 Thread Becket Qin
Hi Kafka users and developers,

We are going to host our quarterly Stream Processing Meetup@LinkedIn on Mar
21. There will be three talks about Apache Pulsar, Apache Calcite and
Apache Samza, as well as LinkedIn's latest K-V store Venice built on top of
Apache Kafka and Apache Samza.

Please check the details below if you are interested.

https://www.meetup.com/Stream-Processing-Meetup-LinkedIn/events/248309045/
<https://www.meetup.com/Stream-Processing-Meetup-LinkedIn/events/248309045/>

Thanks,

Jiangjie (Becket) Qin


Re: [ANNOUNCE] New Kafka PMC Member: Rajini Sivaram

2018-01-17 Thread Becket Qin
Congratulations, Rajini!

On Wed, Jan 17, 2018 at 1:52 PM, Ismael Juma  wrote:

> Congratulations Rajini!
>
> On 17 Jan 2018 10:49 am, "Gwen Shapira"  wrote:
>
> Dear Kafka Developers, Users and Fans,
>
> Rajini Sivaram became a committer in April 2017.  Since then, she remained
> active in the community and contributed major patches, reviews and KIP
> discussions. I am glad to announce that Rajini is now a member of the
> Apache Kafka PMC.
>
> Congratulations, Rajini and looking forward to your future contributions.
>
> Gwen, on behalf of Apache Kafka PMC
>


Re: Stream Processing Meetup@LinkedIn (Dec 4th)

2017-11-17 Thread Becket Qin
Hi Paolo,

Yes, we will stream the meetup. Usually the link will be posted to the
meetup website a couple of hours before the meetup. Feel free to ping me if
you don't see it.

Thanks,

Jiangjie (Becket) Qin

On Fri, Nov 17, 2017 at 11:59 AM, Paolo Patierno <ppatie...@live.com> wrote:

> Hi Becket,
> I watched some of these meetups on the related YouTube channel in the past.
> Will be it available in streaming or just recorded for watching it later ?
>
> Thanks
> Paolo
> ________
> From: Becket Qin <becket@gmail.com>
> Sent: Friday, November 17, 2017 8:33:04 PM
> To: d...@kafka.apache.org; users@kafka.apache.org
> Subject: Stream Processing Meetup@LinkedIn (Dec 4th)
>
> Hi Kafka users and developers,
>
> We are going to host our quarterly Stream Processing Meetup@LinkedIn on
> Dec
> 4. There will be three speakers from Slack, Uber and LinkedIn. Please check
> the details below if you are interested.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> *Stream Processing with Apache Kafka & Apache Samza*
>
>- Meetup Link: here
><https://www.meetup.com/Stream-Processing-Meetup-
> LinkedIn/events/244889719/>
>- When: Dec 4th 2017 @ 6:00pm
>- Where:  LinkedIn Building F , 605 West Maude Avenue, Sunnyvale, CA
> (edit
>map
><https://www.meetup.com/Stream-Processing-Meetup-
> LinkedIn/events/244889719/>
>)
>
>
> *Abstract*
>
>1. Stream processing using Samza-SQL @ LinkedIn
>
> *Speaker: Srinivasulu Punuru, LinkedIn*
> Imagine if you can develop and run a stream processing job in few minutes
> and Imagine if a vast majority of your organization (business analysts,
> Product manager, Data scientists) can do this on their own without a need
> for a development team.
> Need for real time insights into the big data is increasing at a rapid
> pace. The traditional Java based development model of developing, deploying
> and managing the stream processing application is becoming a huge
> constraint.
> With Samza SQL we can simplify application development by enabling users to
> create stream processing applications and get real time insights into their
> business using SQL statements.
>
> In this talk we try to answer the following questions
>
>1. How SQL language can be used to perform stream processing?
>2. How is Samza SQL implemented - Architecture?
>3. How can you deploy Samza SQL in your company?
>
>
> 2.   Streaming data pipeline @ Slack
> *Speaker:- Ananth Packkildurai, Slack*
> *Abstract:  *Slack is a communication and collaboration platform for teams.
> Our millions of users spend 10+ hrs connected to the service on a typical
> working day. They expect reliability, low latency, and extraordinarily rich
> client experiences across a wide variety of devices and network conditions.
> It is crucial for the developers to get the realtime insights on Slack
> operational metrics.
> In this talk, I will talk about how our data platform evolves from the
> batch system to near realtime. I will also touch base on how Samza helps us
> to build low latency data pipelines & Experimentation framework.
>
> 3.   Improving Kafka at-least-once performance
> *Speaker: Ying Zheng, Uber*
> *Abstract:*
> Abstract:
> At Uber, we are seeing an increased demand for Kafka at-least-once
> delivery. So far, we are running a dedicated at-least-once Kafka cluster
> with special settings. With a very low workload, the dedicated
> at-least-once cluster has been working well for more than a year. Now, when
> we want to turn on at-least-once producing on all the Kafka clusters, the
> at-least-once producing performance is one of the concerns. I have worked a
> couple of months to investigate the Kafka performance issues. With Kafka
> code changes and Kafka / Java configuration changes, I have reduced
> at-least-once producing latency by about 60% to 70%. Some of those
> improvements can also improve the general Kafka throughput or reducing
> end-to-end Kafka latency, when ack = 0 or ack = 1.
>


Stream Processing Meetup@LinkedIn (Dec 4th)

2017-11-17 Thread Becket Qin
Hi Kafka users and developers,

We are going to host our quarterly Stream Processing Meetup@LinkedIn on Dec
4. There will be three speakers from Slack, Uber and LinkedIn. Please check
the details below if you are interested.

Thanks,

Jiangjie (Becket) Qin

*Stream Processing with Apache Kafka & Apache Samza*

   - Meetup Link: here
   <https://www.meetup.com/Stream-Processing-Meetup-LinkedIn/events/244889719/>
   - When: Dec 4th 2017 @ 6:00pm
   - Where:  LinkedIn Building F , 605 West Maude Avenue, Sunnyvale, CA (edit
   map
   <https://www.meetup.com/Stream-Processing-Meetup-LinkedIn/events/244889719/>
   )


*Abstract*

   1. Stream processing using Samza-SQL @ LinkedIn

*Speaker: Srinivasulu Punuru, LinkedIn*
Imagine if you can develop and run a stream processing job in few minutes
and Imagine if a vast majority of your organization (business analysts,
Product manager, Data scientists) can do this on their own without a need
for a development team.
Need for real time insights into the big data is increasing at a rapid
pace. The traditional Java based development model of developing, deploying
and managing the stream processing application is becoming a huge
constraint.
With Samza SQL we can simplify application development by enabling users to
create stream processing applications and get real time insights into their
business using SQL statements.

In this talk we try to answer the following questions

   1. How SQL language can be used to perform stream processing?
   2. How is Samza SQL implemented - Architecture?
   3. How can you deploy Samza SQL in your company?


2.   Streaming data pipeline @ Slack
*Speaker:- Ananth Packkildurai, Slack*
*Abstract:  *Slack is a communication and collaboration platform for teams.
Our millions of users spend 10+ hrs connected to the service on a typical
working day. They expect reliability, low latency, and extraordinarily rich
client experiences across a wide variety of devices and network conditions.
It is crucial for the developers to get the realtime insights on Slack
operational metrics.
In this talk, I will talk about how our data platform evolves from the
batch system to near realtime. I will also touch base on how Samza helps us
to build low latency data pipelines & Experimentation framework.

3.   Improving Kafka at-least-once performance
*Speaker: Ying Zheng, Uber*
*Abstract:*
Abstract:
At Uber, we are seeing an increased demand for Kafka at-least-once
delivery. So far, we are running a dedicated at-least-once Kafka cluster
with special settings. With a very low workload, the dedicated
at-least-once cluster has been working well for more than a year. Now, when
we want to turn on at-least-once producing on all the Kafka clusters, the
at-least-once producing performance is one of the concerns. I have worked a
couple of months to investigate the Kafka performance issues. With Kafka
code changes and Kafka / Java configuration changes, I have reduced
at-least-once producing latency by about 60% to 70%. Some of those
improvements can also improve the general Kafka throughput or reducing
end-to-end Kafka latency, when ack = 0 or ack = 1.


Re: [ANNOUNCE] New committer: Onur Karaman

2017-11-06 Thread Becket Qin
Congrats, Onur!

On Mon, Nov 6, 2017 at 2:45 PM, James Cheng  wrote:

> Congrats Onur! Well deserved!
>
> -James
>
> > On Nov 6, 2017, at 9:24 AM, Jun Rao  wrote:
> >
> > Hi, everyone,
> >
> > The PMC of Apache Kafka is pleased to announce a new Kafka committer Onur
> > Karaman.
> >
> > Onur's most significant work is the improvement of Kafka controller,
> which
> > is the brain of a Kafka cluster. Over time, we have accumulated quite a
> few
> > correctness and performance issues in the controller. There have been
> > attempts to fix controller issues in isolation, which would make the code
> > base more complicated without a clear path of solving all problems. Onur
> is
> > the one who took a holistic approach, by first documenting all known
> > issues, writing down a new design, coming up with a plan to deliver the
> > changes in phases and executing on it. At this point, Onur has completed
> > the two most important phases: making the controller single threaded and
> > changing the controller to use the async ZK api. The former fixed
> multiple
> > deadlocks and race conditions. The latter significantly improved the
> > performance when there are many partitions. Experimental results show
> that
> > Onur's work reduced the controlled shutdown time by a factor of 100 times
> > and the controller failover time by a factor of 3 times.
> >
> > Congratulations, Onur!
> >
> > Thanks,
> >
> > Jun (on behalf of the Apache Kafka PMC)
>
>


Streams Processing Meetup @LinkedIn

2017-05-08 Thread Becket Qin
Hello!

LinkedIn will be hosting a Streams Processing meetup

 on Wednesday May 24, 6-9PM in our Sunnyvale HQ.

We'll have 3 exciting talks planned:


   - Streaming Data Pipelines with Brooklin
   - Kafka at Half the Price
   - Managed or Standalone, Streaming or Batch; Unified Processing with
   Samza Fluent API

Please find additional details (food, streaming, location, etc.) and RSVP
via our invitation on meetup.com

. Hope to see you there!

Thank you,
Streams Infra @ LinkedIn


Re: [ANNOUNCE] New committer: Rajini Sivaram

2017-04-24 Thread Becket Qin
Congratulations! Rajini! Great work!

On Mon, Apr 24, 2017 at 3:33 PM, Jason Gustafson  wrote:

> Woohoo! Great work, Rajini!
>
> On Mon, Apr 24, 2017 at 3:27 PM, Jun Rao  wrote:
>
> > Congratulations, Rajini ! Thanks for all your contributions.
> >
> > Jun
> >
> > On Mon, Apr 24, 2017 at 2:06 PM, Gwen Shapira  wrote:
> >
> > > The PMC for Apache Kafka has invited Rajini Sivaram as a committer and
> we
> > > are pleased to announce that she has accepted!
> > >
> > > Rajini contributed 83 patches, 8 KIPs (all security and quota
> > > improvements) and a significant number of reviews. She is also on the
> > > conference committee for Kafka Summit, where she helped select content
> > > for our community event. Through her contributions she's shown good
> > > judgement, good coding skills, willingness to work with the community
> on
> > > finding the best
> > > solutions and very consistent follow through on her work.
> > >
> > > Thank you for your contributions, Rajini! Looking forward to many more
> :)
> > >
> > > Gwen, for the Apache Kafka PMC
> > >
> >
>


Re: [ANNOUNCE] Apache Kafka 0.10.2.0 Released

2017-02-22 Thread Becket Qin
Thanks Ewen :)

On Wed, Feb 22, 2017 at 5:15 AM, Kenny Gorman  wrote:

> We are excited about this release! Excellent work!
>
> Thanks
> Kenny Gorman
> www.eventador.io
>
> > On Feb 22, 2017, at 2:33 AM, Ewen Cheslack-Postava 
> wrote:
> >
> > The Apache Kafka community is pleased to announce the release for Apache
> > Kafka 0.10.2.0. This is a feature release which includes the completion
> > of 15 KIPs, over 200 bug fixes and improvements, and more than 500 pull
> > requests merged.
> >
> > All of the changes in this release can be found in the release notes:
> > https://archive.apache.org/dist/kafka/0.10.2.0/RELEASE_NOTES.html
> >
> > Apache Kafka is a distributed streaming platform with four four core
> > APIs:
> >
> > ** The Producer API allows an application to publish a stream records to
> > one or more Kafka topics.
> >
> > ** The Consumer API allows an application to subscribe to one or more
> > topics and process the stream of records produced to them.
> >
> > ** The Streams API allows an application to act as a stream processor,
> > consuming an input stream from one or more topics and producing an
> > output
> > stream to one or more output topics, effectively transforming the input
> > streams to output streams.
> >
> > ** The Connector API allows building and running reusable producers or
> > consumers that connect Kafka topics to existing applications or data
> > systems. For example, a connector to a relational database might capture
> > every change to a table.three key capabilities:
> >
> >
> > With these APIs, Kafka can be used for two broad classes of application:
> >
> > ** Building real-time streaming data pipelines that reliably get data
> > between systems or applications.
> >
> > ** Building real-time streaming applications that transform or react to
> > the
> > streams of data.
> >
> >
> > You can download the source release from
> > https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.2.
> 0/kafka-0.10.2.0-src.tgz
> >
> > and binary releases from
> > https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.2.
> 0/kafka_2.11-0.10.2.0.tgz
> > https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.2.
> 0/kafka_2.10-0.10.2.0.tgz
> > https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.2.
> 0/kafka_2.12-0.10.2.0.tgz
> > (experimental 2.12 artifact)
> >
> > Thanks to the 101 contributors on this release!
> >
> > Akash Sethi, Alex Loddengaard, Alexey Ozeritsky, amethystic, Andrea
> > Cosentino, Andrew Olson, Andrew Stevenson, Anton Karamanov, Antony
> > Stubbs, Apurva Mehta, Arun Mahadevan, Ashish Singh, Balint Molnar, Ben
> > Stopford, Bernard Leach, Bill Bejeck, Colin P. Mccabe, Damian Guy, Dan
> > Norwood, Dana Powers, dasl, Derrick Or, Dong Lin, Dustin Cote, Edoardo
> > Comar, Edward Ribeiro, Elias Levy, Emanuele Cesena, Eno Thereska, Ewen
> > Cheslack-Postava, Flavio Junqueira, fpj, Geoff Anderson, Guozhang Wang,
> > Gwen Shapira, Hikiko Murakami, Himani Arora, himani1, Hojjat Jafarpour,
> > huxi, Ishita Mandhan, Ismael Juma, Jakub Dziworski, Jan Lukavsky, Jason
> > Gustafson, Jay Kreps, Jeff Widman, Jeyhun Karimov, Jiangjie Qin, Joel
> > Koshy, Jon Freedman, Joshi, Jozef Koval, Json Tu, Jun He, Jun Rao,
> > Kamal, Kamal C, Kamil Szymanski, Kim Christensen, Kiran Pillarisetty,
> > Konstantine Karantasis, Lihua Xin, LoneRifle, Magnus Edenhill, Magnus
> > Reftel, Manikumar Reddy O, Mark Rose, Mathieu Fenniak, Matthias J. Sax,
> > Mayuresh Gharat, MayureshGharat, Michael Schiff, Mickael Maison,
> > MURAKAMI Masahiko, Nikki Thean, Olivier Girardot, pengwei-li, pilo,
> > Prabhat Kashyap, Qian Zheng, Radai Rosenblatt, radai-rosenblatt, Raghav
> > Kumar Gautam, Rajini Sivaram, Rekha Joshi, rnpridgeon, Ryan Pridgeon,
> > Sandesh K, Scott Ferguson, Shikhar Bhushan, steve, Stig Rohde Døssing,
> > Sumant Tambe, Sumit Arrawatia, Theo, Tim Carey-Smith, Tu Yang, Vahid
> > Hashemian, wangzzu, Will Marshall, Xavier Léauté, Xavier Léauté, Xi Hu,
> > Yang Wei, yaojuncn, Yuto Kawamura
> >
> > We welcome your help and feedback. For more information on how to
> > report problems, and to get involved, visit the project website at
> > http://kafka.apache.org/
> >
> > Thanks,
> > Ewen
>


Re: [ANNOUNCE] New committer: Grant Henke

2017-01-11 Thread Becket Qin
Congrats Grant!

On Wed, Jan 11, 2017 at 2:17 PM, Kaufman Ng  wrote:

> Congrats Grant!
>
> On Wed, Jan 11, 2017 at 4:28 PM, Jay Kreps  wrote:
>
> > Congrats Grant!
> >
> > -Jay
> >
> > On Wed, Jan 11, 2017 at 11:51 AM, Gwen Shapira 
> wrote:
> >
> > > The PMC for Apache Kafka has invited Grant Henke to join as a
> > > committer and we are pleased to announce that he has accepted!
> > >
> > > Grant contributed 88 patches, 90 code reviews, countless great
> > > comments on discussions, a much-needed cleanup to our protocol and the
> > > on-going and critical work on the Admin protocol. Throughout this, he
> > > displayed great technical judgment, high-quality work and willingness
> > > to contribute where needed to make Apache Kafka awesome.
> > >
> > > Thank you for your contributions, Grant :)
> > >
> > > --
> > > Gwen Shapira
> > > Product Manager | Confluent
> > > 650.450.2760 | @gwenshap
> > > Follow us: Twitter | blog
> > >
> >
>
>
>
> --
> Kaufman Ng
> +1 646 961 8063
> Solutions Architect | Confluent | www.confluent.io
>


Re: Deadlock using latest 0.10.1 Kafka release

2016-11-11 Thread Becket Qin
Hi Marcos,

Thanks for the update. It looks the deadlock you saw was another one. Do
you mind sending us a full stack trace after this happens?

Regarding the downgrade, the steps would be the following:
1. change the inter.broker.protocol to 0.10.0
2. rolling bounce the cluster
3. deploy the 0.10.0.1 code

There might be a bunch of .timeindex file left over but that should be fine.

Thanks,

Jiangjie (Becket) Qin


On Fri, Nov 11, 2016 at 9:51 AM, Marcos Juarez <mjua...@gmail.com> wrote:

> Becket/Jason,
>
> We deployed a jar with the base 0.10.1.0 release plus the KAFKA-3994 patch,
> but we're seeing the same exact issue.  It doesnt' seem like the patch
> fixes the problem we're seeing.
>
> At this point, we're considering downgrading our prod clusters back to
> 0.10.0.1.  Is there any concern/issues we should be aware of when
> downgrading the cluster like that?
>
> Thanks,
>
> Marcos Juarez
>
>
> On Mon, Nov 7, 2016 at 5:47 PM, Marcos Juarez <mjua...@gmail.com> wrote:
>
> > Thanks Becket.
> >
> > I was working on that today.  I have a working jar, created from the
> > 0.10.1.0 branch, and that specific KAFKA-3994 patch applied to it.  I've
> > left it running in one test broker today, will try tomorrow to trigger
> the
> > issue, and try it with both the patched and un-patched versions.
> >
> > I'll let you know what we find.
> >
> > Thanks,
> >
> > Marcos
> >
> > On Mon, Nov 7, 2016 at 11:25 AM, Becket Qin <becket@gmail.com>
> wrote:
> >
> >> Hi Marcos,
> >>
> >> Is it possible for you to apply the patch of KAFKA-3994 and see if the
> >> issue is still there. The current patch of KAFKA-3994 should work, the
> >> only
> >> reason we haven't checked that in was because when we ran stress test it
> >> shows noticeable performance impact when producers are producing with
> >> acks=all. So if you are blocking on this issue maybe you can pick up the
> >> patch as a short term solution. Meanwhile we will prioritize the ticket.
> >>
> >> Thanks,
> >>
> >> Jiangjie (Becket) Qin
> >>
> >> On Mon, Nov 7, 2016 at 9:47 AM, Marcos Juarez <mjua...@gmail.com>
> wrote:
> >>
> >> > We ran into this issue several more times over the weekend.
> Basically,
> >> > FDs are exhausted so fast now, we can't even get to the server in
> time,
> >> the
> >> > JVM goes down in less than 5 minutes.
> >> >
> >> > I can send the whole thread dumps if needed, but for brevity's sake, I
> >> > just copied over the relevant deadlock segment, and concatenated them
> >> all
> >> > together in the attached text file.
> >> >
> >> > Do you think this is something I should add to KAFKA-3994 ticket?  Or
> is
> >> > the information in that ticket enough for now?
> >> >
> >> > Thanks,
> >> >
> >> > Marcos
> >> >
> >> > On Fri, Nov 4, 2016 at 2:05 PM, Marcos Juarez <mjua...@gmail.com>
> >> wrote:
> >> >
> >> >> That's great, thanks Jason.
> >> >>
> >> >> We'll try and apply the patch in the meantime, and wait for the
> >> official
> >> >> release for 0.10.1.1.
> >> >>
> >> >> Please let us know if you need more details about the deadlocks on
> our
> >> >> side.
> >> >>
> >> >> Thanks again!
> >> >>
> >> >> Marcos
> >> >>
> >> >> On Fri, Nov 4, 2016 at 1:02 PM, Jason Gustafson <ja...@confluent.io>
> >> >> wrote:
> >> >>
> >> >>> Hi Marcos,
> >> >>>
> >> >>> I think we'll try to get this into 0.10.1.1 (I updated the JIRA).
> >> Since
> >> >>> we're now seeing users hit this in practice, I'll definitely bump up
> >> the
> >> >>> priority on a fix. I can't say for sure when the release will be,
> but
> >> >>> we'll
> >> >>> merge the fix into the 0.10.1 branch and you can build from there if
> >> you
> >> >>> need something in a hurry.
> >> >>>
> >> >>> Thanks,
> >> >>> Jason
> >> >>>
> >> >>> On Fri, Nov 4, 2016 at 9:48 AM, Marcos Juarez <mjua...@gmail.com>
> >> wrote:
> >> >>>
> >> >>> > Jason,
> >> >>> >
> &g

Re: Deadlock using latest 0.10.1 Kafka release

2016-11-07 Thread Becket Qin
Hi Marcos,

Is it possible for you to apply the patch of KAFKA-3994 and see if the
issue is still there. The current patch of KAFKA-3994 should work, the only
reason we haven't checked that in was because when we ran stress test it
shows noticeable performance impact when producers are producing with
acks=all. So if you are blocking on this issue maybe you can pick up the
patch as a short term solution. Meanwhile we will prioritize the ticket.

Thanks,

Jiangjie (Becket) Qin

On Mon, Nov 7, 2016 at 9:47 AM, Marcos Juarez <mjua...@gmail.com> wrote:

> We ran into this issue several more times over the weekend.  Basically,
> FDs are exhausted so fast now, we can't even get to the server in time, the
> JVM goes down in less than 5 minutes.
>
> I can send the whole thread dumps if needed, but for brevity's sake, I
> just copied over the relevant deadlock segment, and concatenated them all
> together in the attached text file.
>
> Do you think this is something I should add to KAFKA-3994 ticket?  Or is
> the information in that ticket enough for now?
>
> Thanks,
>
> Marcos
>
> On Fri, Nov 4, 2016 at 2:05 PM, Marcos Juarez <mjua...@gmail.com> wrote:
>
>> That's great, thanks Jason.
>>
>> We'll try and apply the patch in the meantime, and wait for the official
>> release for 0.10.1.1.
>>
>> Please let us know if you need more details about the deadlocks on our
>> side.
>>
>> Thanks again!
>>
>> Marcos
>>
>> On Fri, Nov 4, 2016 at 1:02 PM, Jason Gustafson <ja...@confluent.io>
>> wrote:
>>
>>> Hi Marcos,
>>>
>>> I think we'll try to get this into 0.10.1.1 (I updated the JIRA). Since
>>> we're now seeing users hit this in practice, I'll definitely bump up the
>>> priority on a fix. I can't say for sure when the release will be, but
>>> we'll
>>> merge the fix into the 0.10.1 branch and you can build from there if you
>>> need something in a hurry.
>>>
>>> Thanks,
>>> Jason
>>>
>>> On Fri, Nov 4, 2016 at 9:48 AM, Marcos Juarez <mjua...@gmail.com> wrote:
>>>
>>> > Jason,
>>> >
>>> > Thanks for that link.  It does appear to be a very similar issue, if
>>> not
>>> > identical.  In our case, the deadlock is reported as across 3 threads,
>>> one
>>> > of them being a group_metadata_manager thread. Otherwise, it looks the
>>> > same.
>>> >
>>> > On your questions:
>>> >
>>> > - We did not see this in prior releases, but we are ramping up usage
>>> of our
>>> > kafka clusters lately, so maybe we didn't have the needed volume
>>> before to
>>> > trigger it.
>>> >
>>> > - Across our multiple staging and production clusters, we're seeing the
>>> > problem roughly once or twice a day.
>>> >
>>> > - Our clusters are small at the moment.  The two that are experiencing
>>> the
>>> > issue are 5 and 8 brokers, respectively.  The number of consumers is
>>> small,
>>> > I'd say less than 20 at the moment.  The amount of data being produced
>>> is
>>> > small also, in the tens of megabytes per second range, but the number
>>> of
>>> > connects/disconnects is really high, because they're usually
>>> short-lived
>>> > processes.  Our guess at the moment is that this is triggering the
>>> bug.  We
>>> > have a separate cluster where we don't have short-lived producers, and
>>> that
>>> > one has been rock solid.
>>> >
>>> >
>>> > We'll look into applying the patch, which could help reduce the
>>> problem.
>>> > The ticket says it's being targeted for the 0.10.2 release.  Any rough
>>> > estimate of a timeline for that to come out?
>>> >
>>> > Thanks!
>>> >
>>> > Marcos
>>> >
>>> >
>>> > On Thu, Nov 3, 2016 at 5:19 PM, Jason Gustafson <ja...@confluent.io>
>>> > wrote:
>>> >
>>> > > Hey Marcos,
>>> > >
>>> > > Thanks for the report. Can you check out
>>> > > https://issues.apache.org/jira/browse/KAFKA-3994 and see if it
>>> matches?
>>> > At
>>> > > a glance, it looks like the same problem. We tried pretty hard to
>>> get the
>>> > > fix into the release, but it didn't quite make it. A few questions:
>>> > >
>>> > > 1. Did you not see this in prio

Re: Checking the consumer lag when using manual partition assignment with the KafkaConsumer

2016-11-07 Thread Becket Qin
Mattias,

> kafka-consumer-offet-checker works for older Kakfa versions for which
> offsets are stored in ZK. It does not work for v0.9+ when offsets are
> stored in topic __consumer_offsets

That is not true. The offset checker works with both ZK based and broker
based offset commit. It query broker first to see if there is committed
offset for the partition, if there isn't, it then fall back to looking at
ZK.

Thanks,

Jiangjie (Becket) Qin

On Sun, Nov 6, 2016 at 6:36 PM, Matthias J. Sax <matth...@confluent.io>
wrote:

> kafka-consumer-offet-checker works for older Kakfa versions for which
> offsets are stored in ZK. It does not work for v0.9+ when offsets are
> stored in topic __consumer_offsets
>
> Robert, you are right that kafka-consumer-groups only show information
> of active consumer groups...
>
> @Vincent, I am not sure if this is a bug or not. If yes, we should make
> sure that there is a ticket... I'll double check.
>
> -Matthias
>
> On 11/06/2016 05:39 PM, Becket Qin wrote:
> > Hi Robert,
> >
> > Have you tried Kafka-consumer-offset-checker.sh come with Kafka? It
> should
> > work regardless whether Kafka based group management is used or not.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> > On Sun, Nov 6, 2016 at 4:00 PM, Vincent Dautremont <
> > vincent.dautrem...@olamobile.com> wrote:
> >
> >> By the way I remember having read somewhere on this list that this
> utility
> >> not showing info for consumer groups that do not have current active
> >> consumers was a bug .
> >>
> >> That would be a thing to fix, is there an expected fix date / fix
> release
> >> for this?
> >>
> >>
> >>> Le 6 nov. 2016 à 13:23, Robert Metzger <rmetz...@apache.org> a écrit :
> >>>
> >>> Hi Matthias,
> >>>
> >>> the bin/kafka-consumer-groups.sh utility is exactly what the user in
> >>> http://grokbase.com/t/kafka/users/163rrq9ne8/new-consumer-
> >> group-not-showing-up
> >>> has used.
> >>> It seems that the broker is not returning consumer groups that don't
> >>> participate in the consumer group balancing mechanism.
> >>>
> >>>
> >>> On Fri, Nov 4, 2016 at 6:07 PM, Matthias J. Sax <matth...@confluent.io
> >
> >>> wrote:
> >>>
> > Robert,
> >
> > you can use bin/kafka-consumer-groups.sh instead.
> >
> >>>>>> bin/kafka-consumer-offset-checker.sh [2016-11-04 10:05:07,852] WARN
> >>>>>> WARNING: ConsumerOffsetChecker is deprecated and will be dropped in
> >>>>>> releases following 0.9.0. Use ConsumerGroupCommand instead.
> >>>>>> (kafka.tools.ConsumerOffsetChecker$)
> >
> >
> > -Matthias
> >
> >>>>>> On 11/3/16 4:07 AM, Robert Metzger wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> some Flink users recently noticed that they can not check the
> >>>>>> consumer lag when using Flink's kafka consumer [1]. According to
> >>>>>> this discussion on the Kafka user list [2] the
> >>>>>> kafka-consumer-groups.sh utility doesn't work with KafkaConsumers
> >>>>>> with manual partition assignment.
> >>>>>>
> >>>>>> Is there a way to get the consumer lag for a consumer that
> >>>>>> regularly commits its offsets to the broker?
> >>>>>>
> >>>>>> Regards, Robert
> >>>>>>
> >>>>>>
> >>>>>> [1] https://issues.apache.org/jira/browse/FLINK-5001 [2]
> >>>>>> http://grokbase.com/t/kafka/users/163rrq9ne8/new-consumer-
> > group-not-showing-up
> >>>>>>
> >>>>
> >>
> >> --
> >> The information transmitted is intended only for the person or entity to
> >> which it is addressed and may contain confidential and/or privileged
> >> material. Any review, retransmission, dissemination or other use of, or
> >> taking of any action in reliance upon, this information by persons or
> >> entities other than the intended recipient is prohibited. If you
> received
> >> this in error, please contact the sender and delete the material from
> any
> >> computer.
> >>
> >
>
>


Re: Checking the consumer lag when using manual partition assignment with the KafkaConsumer

2016-11-06 Thread Becket Qin
Hi Robert,

Have you tried Kafka-consumer-offset-checker.sh come with Kafka? It should
work regardless whether Kafka based group management is used or not.

Thanks,

Jiangjie (Becket) Qin

On Sun, Nov 6, 2016 at 4:00 PM, Vincent Dautremont <
vincent.dautrem...@olamobile.com> wrote:

> By the way I remember having read somewhere on this list that this utility
> not showing info for consumer groups that do not have current active
> consumers was a bug .
>
> That would be a thing to fix, is there an expected fix date / fix release
> for this?
>
>
> > Le 6 nov. 2016 à 13:23, Robert Metzger <rmetz...@apache.org> a écrit :
> >
> > Hi Matthias,
> >
> > the bin/kafka-consumer-groups.sh utility is exactly what the user in
> > http://grokbase.com/t/kafka/users/163rrq9ne8/new-consumer-
> group-not-showing-up
> > has used.
> > It seems that the broker is not returning consumer groups that don't
> > participate in the consumer group balancing mechanism.
> >
> >
> > On Fri, Nov 4, 2016 at 6:07 PM, Matthias J. Sax <matth...@confluent.io>
> > wrote:
> >
> >> -BEGIN PGP SIGNED MESSAGE-
> >> Hash: SHA512
> >>
> >> Robert,
> >>
> >> you can use bin/kafka-consumer-groups.sh instead.
> >>
> >>> bin/kafka-consumer-offset-checker.sh [2016-11-04 10:05:07,852] WARN
> >>> WARNING: ConsumerOffsetChecker is deprecated and will be dropped in
> >>> releases following 0.9.0. Use ConsumerGroupCommand instead.
> >>> (kafka.tools.ConsumerOffsetChecker$)
> >>
> >>
> >> - -Matthias
> >>
> >>> On 11/3/16 4:07 AM, Robert Metzger wrote:
> >>> Hi,
> >>>
> >>> some Flink users recently noticed that they can not check the
> >>> consumer lag when using Flink's kafka consumer [1]. According to
> >>> this discussion on the Kafka user list [2] the
> >>> kafka-consumer-groups.sh utility doesn't work with KafkaConsumers
> >>> with manual partition assignment.
> >>>
> >>> Is there a way to get the consumer lag for a consumer that
> >>> regularly commits its offsets to the broker?
> >>>
> >>> Regards, Robert
> >>>
> >>>
> >>> [1] https://issues.apache.org/jira/browse/FLINK-5001 [2]
> >>> http://grokbase.com/t/kafka/users/163rrq9ne8/new-consumer-
> >> group-not-showing-up
> >>>
> >> -BEGIN PGP SIGNATURE-
> >> Comment: GPGTools - https://gpgtools.org
> >>
> >> iQIcBAEBCgAGBQJYHMBkAAoJECnhiMLycopPJvEQALdx2dOVa30QsDBtNIJAzwwJ
> >> tk+ZS3lgefNkAWYBMccNTDFqDH0OtLKUhi9Pproye0LTItlaNGYgemC7kzzPnlDv
> >> R6PrJe+GzRNd13eNSBzb7kJ9nxodsZ9flpUsMrX3k2g3v6w6cwV7Xon1LlHV5HtO
> >> xbJ0ei41lNcBFjbgYFbwyzsS+W3FQt1FTw/we4mL3TwaBYwHG5pfByK0eC0NUEEy
> >> Q2rSPnZ8701igC4RJoWRIe0QibzZSdzEk/S8rWpNxTrmG/TaBhM2mObbf96iqM0J
> >> VvJcUiuUpAY/CKiOY6tjCGRpCAGYdrdPismaTQA6hGxzr+91l3eXebn1lFEw2Auc
> >> E0fbkZ5XoNo/JQaHcDA2M+iwVZZMIOP/yfierkG7g3XmYfbCpIKd9+AU79GYGwe+
> >> nhCUB2BbhEi+saNharAHtm6ai1T55bmSnCBYiz14wJpEYYzDJv1BnRN9YvV3Topl
> >> GNMSK3/zWIALXPp8fFE0JFxX0syaGRqMS5XqlsnklulCaeB/M1gWwGyKxYF+cAWG
> >> 5F5kuKGvr8YgNmRPSOAO8sastbL9se1Kw6uDQj0eN8kimbXptCw+LtEOlQ3z9cgb
> >> IQcbl33oJUcUJdUejQ34rGK84P59CVzIt2R7r722QwJYE9pWBIR9S/+Ft4QnkBaE
> >> VnmHxH/PmQrXMTl8DYs1
> >> =wVbw
> >> -END PGP SIGNATURE-
> >>
>
> --
> The information transmitted is intended only for the person or entity to
> which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipient is prohibited. If you received
> this in error, please contact the sender and delete the material from any
> computer.
>


Re: Re-consume messages

2016-11-06 Thread Becket Qin
Hi Amit,

In Kafka 0.9, the closest approach would be use
SimpleConsumer.getOffsetBefore() to search the offset by timestamp.  And
then you can consume from the returned biggest offset. Notice that the
search is at log segment level and the result may not be accurate if the
partition has been moved. In the worst case, you may consume a lot more
messages than you want to, but you should not miss any messages.

Jiangjie (Becket) Qin

On Sun, Nov 6, 2016 at 1:30 AM, Amit K <amitk@gmail.com> wrote:

> Hi All,
>
> I am using kafka 0.9.0.1 with high level java producer and consumer.
> I need to handle a case wherein I need to re-consume the already consumed
> (and processed) messages say for last 5 days (configurable).
>
> Is there any way of achieving the same apart from identifying the offsets
> for the partition (for given topic) and process the same?
>
> For the records, I am not storing the offsets processed.
>
> Thanks
>


Re: Mysterious timeout

2016-11-04 Thread Becket Qin
Hi Mike,

>From what you described it seems the socket on the producer was still
connected even after the broker instance is down. This is possible if the
broker instance went down in a sudden without closing the TCP connection
(e.g. lost power). Otherwise the producer should be able to detect the
disconnect and won't wait until request timeout.

Another possibility is that the connection between producer and broker goes
through some sort of proxy. When the broker went down, the socket from the
producer to the proxy was still alive so the producer will wait until
request timeout in that case.

Could you check if the cause of the issue you saw matches one of the above?

Thanks,

Jiangjie (Becket) Qin

On Fri, Nov 4, 2016 at 11:52 AM, Jeff Widman <j...@netskope.com> wrote:

> Mike,
> Did you ever figure this out?
>
> We're considering using Kafka on Kubernetes and very interested in how it's
> going for you.
>
> On Thu, Oct 27, 2016 at 8:34 AM, Martin Gainty <mgai...@hotmail.com>
> wrote:
>
> > MG>can u write simpleConsumer to determine when lead broker times-out..
> > then you'll need to tweak connection settings
> > https://cwiki.apache.org/confluence/display/KAFKA/0.8.
> > 0+SimpleConsumer+Example
> >
> > MG>to debug the response determine the leadBroker and the reason for
> fetch
> > failure as seen here:
> > if (fetchResponse.hasError()) {
> >  numErrors++;
> >  // Something went wrong!
> >  short code = fetchResponse.errorCode(a_topic, a_partition);
> >  System.out.println("Error fetching data from the Broker:" +
> > leadBroker + " Reason: " + code);
> > 
> > From: Mike Kaplinskiy <m...@ladderlife.com>
> > Sent: Thursday, October 27, 2016 3:11:14 AM
> > To: users@kafka.apache.org
> > Subject: Mysterious timeout
> >
> > Hey folks,
> >
> > We're observing a very peculiar behavior on our Kafka cluster. When one
> of
> > the Kafka broker instances goes down, we're seeing the producer block (at
> > .flush) for right about `request.timeout.ms` before returning success
> (or
> > at least not throwing an exception) and moving on.
> >
> > We're running Kafka on Kubernetes, so this may be related. Kafka is a
> > Kubernetes PetSet with a global Service (like a load balancer) for
> > consumers/producers to use for the bootstrap list. Our Kafka brokers are
> > configured to come up with a predetermined set of broker ids (kafka-0,
> > kafka-1 & kafka-2), but the IP likely changes every time it's restarted.
> >
> > Our Kafka settings are as follows:
> > Producer:
> > "acks" "all"
> > "batch.size" "16384"
> > "linger.ms" "1"
> > "request.timeout.ms" "3000"
> > "max.in.flight.requests.per.connection" "1"
> > "retries" "2"
> > "max.block.ms" "1"
> > "buffer.memory" "33554432"
> >
> > Broker:
> > min.insync.replicas=1
> >
> > I'm having a bit of a hard time debugging why this happens, mostly
> because
> > I'm not seeing any logs from the producer. Is there a guide somewhere for
> > turning up the logging information from the kafka java client? I'm using
> > logback if that helps.
> >
> > Thanks,
> > Mike.
> >
> > Ladder <http://bit.ly/1VRtWfS>. The smart, modern way to insure your
> life.
> >
>


Re: log compaction

2016-11-03 Thread Becket Qin
Hi Francesco,

There are a few things to think about before turning on log compaction for
a topic.

1. Does the topic have non-keyed message? Log compaction only works if all
the messages have a key.
2. The log cleaner needs some memory to build the offset map for log
compaction, so the memory consumption may be higher.

Given you are still running on 0.8, there are a few additional things to be
aware of:
3. Log compaction doesn't work with compressed topics until Kafka 0.9.0.
4. Some other potential issues that may caused by log compaction. (e.g.
KAFKA-2024).

Jiangjie (Becket) Qin



On Wed, Nov 2, 2016 at 5:55 AM, Francesco laTorre <
francesco.lato...@openbet.com> wrote:

> Hi,
>
> We want to enable log compaction on an existing topic (in production).
> Is it a safe operation or there are things to take into consideration ?
>
> Kafka version 0.8
>
> Cheers,
> Francesco
>
> --
> <http://www.openbet.com/> Francesco laTorre
> Senior Developer
> T: +44 208 742 1600
> +44 203 249 8394
>
> E: francesco.lato...@openbet.com
> W: www.openbet.com
> OpenBet Ltd
> Chiswick Park Building 9
> 566 Chiswick High Rd
> London
> W4 5XT
> <https://www.openbet.com/email_promo>
> This message is confidential and intended only for the addressee. If you
> have received this message in error, please immediately notify the
> postmas...@openbet.com and delete it from your system as well as any
> copies. The content of e-mails as well as traffic data may be monitored by
> OpenBet for employment and security purposes. To protect the environment
> please do not print this e-mail unless necessary. OpenBet Ltd. Registered
> Office: Chiswick Park Building 9, 566 Chiswick High Road, London, W4 5XT,
> United Kingdom. A company registered in England and Wales. Registered no.
> 3134634. VAT no. GB927523612
>


Re: [ANNOUNCE] New committer: Jiangjie (Becket) Qin

2016-10-31 Thread Becket Qin
Thanks everyone! It is really awesome to be working with you on Kafka!!

On Mon, Oct 31, 2016 at 11:26 AM, Jun Rao <j...@confluent.io> wrote:

> Congratulations, Jiangjie. Thanks for all your contributions to Kafka.
>
> Jun
>
> On Mon, Oct 31, 2016 at 10:35 AM, Joel Koshy <jjkosh...@gmail.com> wrote:
>
> > The PMC for Apache Kafka has invited Jiangjie (Becket) Qin to join as a
> > committer and we are pleased to announce that he has accepted!
> >
> > Becket has made significant contributions to Kafka over the last two
> years.
> > He has been deeply involved in a broad range of KIP discussions and has
> > contributed several major features to the project. He recently completed
> > the implementation of a series of improvements (KIP-31, KIP-32, KIP-33)
> to
> > Kafka’s message format that address a number of long-standing issues such
> > as avoiding server-side re-compression, better accuracy for time-based
> log
> > retention, log roll and time-based indexing of messages.
> >
> > Congratulations Becket! Thank you for your many contributions. We are
> > excited to have you on board as a committer and look forward to your
> > continued participation!
> >
> > Joel
> >
>


Re: client use high cpu which caused by delayedFetch operation immediately return

2016-10-18 Thread Becket Qin
Glad to know :)

On Tue, Oct 18, 2016 at 1:24 AM, Json Tu <kafka...@126.com> wrote:

> Thanks. I patch it, and everything goes ok.
> > 在 2016年10月9日,下午12:39,Becket Qin <becket@gmail.com> 写道:
> >
> > Can you check if you have KAFKA-3003 when you run the code?
> >
> > On Sat, Oct 8, 2016 at 12:52 AM, Kafka <kafka...@126.com> wrote:
> >
> >> Hi all,
> >>we found our consumer have high cpu load in our product
> >> enviroment,as we know,fetch.min.bytes and fetch.wait.ma <
> >> http://fetch.wait.ma/>x.ms will affect the frequency of consumer’s
> return,
> >> so we adjust them to very big so that broker is very hard to satisfy it.
> >>then we found the problem is not be solved,then we check the
> >> kafka’s code,we check delayedFetch’s tryComplete() function has these
> codes,
> >>
> >> if (endOffset.messageOffset != fetchOffset.messageOffset) {
> >>  if (endOffset.onOlderSegment(fetchOffset)) {
> >>// Case C, this can happen when the new fetch operation
> is
> >> on a truncated leader
> >>debug("Satisfying fetch %s since it is fetching later
> >> segments of partition %s.".format(fetchMetadata, topicAndPartition))
> >>return forceComplete()
> >>  } else if (fetchOffset.onOlderSegment(endOffset)) {
> >>// Case C, this can happen when the fetch operation is
> >> falling behind the current segment
> >>// or the partition has just rolled a new segment
> >>debug("Satisfying fetch %s immediately since it is
> >> fetching older segments.".format(fetchMetadata))
> >>return forceComplete()
> >>  } else if (fetchOffset.messageOffset <
> >> endOffset.messageOffset) {
> >>// we need take the partition fetch size as upper bound
> >> when accumulating the bytes
> >>accumulatedSize += math.min(endOffset.
> positionDiff(fetchOffset),
> >> fetchStatus.fetchInfo.fetchSize)
> >>  }
> >>}
> >>
> >> so we can ensure that our fetchOffset’s segmentBaseOffset is not the
> same
> >> as endOffset’s segmentBaseOffset,then we check our topic-partition’s
> >> segment, we found the data in the segment is all cleaned by the kafka
> for
> >> log.retention.
> >> and we guess that the  fetchOffset’s segmentBaseOffset is smaller than
> >> endOffset’s segmentBaseOffset leads this problem.
> >>
> >> but my point is should we use we use these code to make client use less
> >> cpu,
> >>   if (endOffset.messageOffset != fetchOffset.messageOffset) {
> >>  if (endOffset.onOlderSegment(fetchOffset)) {
> >>return false
> >>  } else if (fetchOffset.onOlderSegment(endOffset)) {
> >>return false
> >>  }
> >>}
> >>
> >> and then it will response after fetch.wait.ma <http://fetch.wait.ma/>
> x.ms
> >> in this scene instead of immediately return.
> >>
> >> Feedback is greatly appreciated. Thanks.
> >>
> >>
> >>
> >>
>
>
>


Re: client use high cpu which caused by delayedFetch operation immediately return

2016-10-08 Thread Becket Qin
Can you check if you have KAFKA-3003 when you run the code?

On Sat, Oct 8, 2016 at 12:52 AM, Kafka  wrote:

> Hi all,
> we found our consumer have high cpu load in our product
> enviroment,as we know,fetch.min.bytes and fetch.wait.ma <
> http://fetch.wait.ma/>x.ms will affect the frequency of consumer’s return,
> so we adjust them to very big so that broker is very hard to satisfy it.
> then we found the problem is not be solved,then we check the
> kafka’s code,we check delayedFetch’s tryComplete() function has these codes,
>
>  if (endOffset.messageOffset != fetchOffset.messageOffset) {
>   if (endOffset.onOlderSegment(fetchOffset)) {
> // Case C, this can happen when the new fetch operation is
> on a truncated leader
> debug("Satisfying fetch %s since it is fetching later
> segments of partition %s.".format(fetchMetadata, topicAndPartition))
> return forceComplete()
>   } else if (fetchOffset.onOlderSegment(endOffset)) {
> // Case C, this can happen when the fetch operation is
> falling behind the current segment
> // or the partition has just rolled a new segment
> debug("Satisfying fetch %s immediately since it is
> fetching older segments.".format(fetchMetadata))
> return forceComplete()
>   } else if (fetchOffset.messageOffset <
> endOffset.messageOffset) {
> // we need take the partition fetch size as upper bound
> when accumulating the bytes
> accumulatedSize += 
> math.min(endOffset.positionDiff(fetchOffset),
> fetchStatus.fetchInfo.fetchSize)
>   }
> }
>
> so we can ensure that our fetchOffset’s segmentBaseOffset is not the same
> as endOffset’s segmentBaseOffset,then we check our topic-partition’s
> segment, we found the data in the segment is all cleaned by the kafka for
> log.retention.
> and we guess that the  fetchOffset’s segmentBaseOffset is smaller than
> endOffset’s segmentBaseOffset leads this problem.
>
> but my point is should we use we use these code to make client use less
> cpu,
>if (endOffset.messageOffset != fetchOffset.messageOffset) {
>   if (endOffset.onOlderSegment(fetchOffset)) {
> return false
>   } else if (fetchOffset.onOlderSegment(endOffset)) {
> return false
>   }
> }
>
> and then it will response after fetch.wait.ma x.ms
> in this scene instead of immediately return.
>
> Feedback is greatly appreciated. Thanks.
>
>
>
>


Re: Does Kafka 0.9 can guarantee not loss data

2016-09-22 Thread Becket Qin
In order to satisfy a produce response, there are two conditions:
A. The leader's high watermark should be higher than the requiredOffset
(max offset in that produce request of that partition)
B. The number of in sync replica is greater than min.isr.

The ultimate goal here is to make sure at least min.isr number of replicas
has caught up to requiredOffset. So the check is not only whether we have
enough number of replicas in the isr, but also whether those replicas in
the ISR has caught up to the required offset.

In your example, if numAcks is 0 and curInSyncReplica.size >= minIsr, the
produce response won't return if min.isr > 0, because
leaderReplica.highWatermark must be less than requiredOffset given the fact
that numAcks is 0. i.e. condition A is not met.

We are actually even doing a stronger than necessary check here.
Theoretically as long as min.isr number of replicas has caught up to
requiredOffset, we should be able to return the response, but we also
require those replicas to be in the ISR.

On Thu, Sep 22, 2016 at 8:15 PM, Kafka  wrote:

> @wangguozhang,could you give me some advices.
>
> > 在 2016年9月22日,下午6:56,Kafka  写道:
> >
> > Hi all,
> >   in terms of topic, we create a topic with 6 partition,and each
> with 3 replicas.
> >in terms of producer,when we send message with ack -1 using sync
> interface.
> >   in terms of brokers,we set min.insync.replicas to 2.
> >
> > after we review the kafka broker’s code,we know that we send a message
> to broker with ack -1, then we can get response if ISR of this partition is
> great than or equal to min.insync.replicas,but what confused
> > me is replicas in ISR is not strongly consistent,in kafka 0.9 we use
> replica.lag.time.max.ms param to judge whether to shrink ISR, and the
> defaults is 1 ms, so replicas’ data in isr can lag 1ms at most,
> > we we restart broker which own this partitions’ leader, then controller
> will start a new leader election, which will choose the first replica in
> ISR that not equals to current leader as new leader, then this will loss
> data.
> >
> >
> > The main produce handle code shows below:
> > val numAcks = curInSyncReplicas.count(r => {
> >  if (!r.isLocal)
> >if (r.logEndOffset.messageOffset >= requiredOffset) {
> >  trace("Replica %d of %s-%d received offset
> %d".format(r.brokerId, topic, partitionId, requiredOffset))
> >  true
> >}
> >else
> >  false
> >  else
> >true /* also count the local (leader) replica */
> >})
> >
> >trace("%d acks satisfied for %s-%d with acks =
> -1".format(numAcks, topic, partitionId))
> >
> >val minIsr = leaderReplica.log.get.config.minInSyncReplicas
> >
> >if (leaderReplica.highWatermark.messageOffset >= requiredOffset
> ) {
> >  /*
> >  * The topic may be configured not to accept messages if there
> are not enough replicas in ISR
> >  * in this scenario the request was already appended locally and
> then added to the purgatory before the ISR was shrunk
> >  */
> >  if (minIsr <= curInSyncReplicas.size) {
> >(true, ErrorMapping.NoError)
> >  } else {
> >(true, ErrorMapping.NotEnoughReplicasAfterAppendCode)
> >  }
> >} else
> >  (false, ErrorMapping.NoError)
> >
> >
> > why only logging unAcks and not use numAcks to compare with minIsr, if
> numAcks is 0, but curInSyncReplicas.size >= minIsr, then this will return,
> as ISR shrink procedure is not real time, does this will loss data after
> leader election?
> >
> > Feedback is greatly appreciated. Thanks.
> > meituan.inf
> >
> >
> >
>
>
>


Re: [ANNOUNCE] New committer: Jason Gustafson

2016-09-06 Thread Becket Qin
Congrats, Jason!

On Tue, Sep 6, 2016 at 5:09 PM, Onur Karaman 
wrote:

> congrats jason!
>
> On Tue, Sep 6, 2016 at 4:12 PM, Sriram Subramanian 
> wrote:
>
> > Congratulations Jason!
> >
> > On Tue, Sep 6, 2016 at 3:40 PM, Vahid S Hashemian <
> > vahidhashem...@us.ibm.com
> > > wrote:
> >
> > > Congratulations Jason on this very well deserved recognition.
> > >
> > > --Vahid
> > >
> > >
> > >
> > > From:   Neha Narkhede 
> > > To: "d...@kafka.apache.org" ,
> > > "users@kafka.apache.org" 
> > > Cc: "priv...@kafka.apache.org" 
> > > Date:   09/06/2016 03:26 PM
> > > Subject:[ANNOUNCE] New committer: Jason Gustafson
> > >
> > >
> > >
> > > The PMC for Apache Kafka has invited Jason Gustafson to join as a
> > > committer and
> > > we are pleased to announce that he has accepted!
> > >
> > > Jason has contributed numerous patches to a wide range of areas,
> notably
> > > within the new consumer and the Kafka Connect layers. He has displayed
> > > great taste and judgement which has been apparent through his
> involvement
> > > across the board from mailing lists, JIRA, code reviews to contributing
> > > features, bug fixes and code and documentation improvements.
> > >
> > > Thank you for your contribution and welcome to Apache Kafka, Jason!
> > > --
> > > Thanks,
> > > Neha
> > >
> > >
> > >
> > >
> > >
> >
>


Re: KIP-33 Opt out from Time Based indexing

2016-08-29 Thread Becket Qin
Hi Jun,

I just created KAFKA-4099 and will submit patch soon.

Thanks,

Jiangjie (Becket) Qin

On Mon, Aug 29, 2016 at 11:55 AM, Jun Rao <j...@confluent.io> wrote:

> Jiangjie,
>
> Good point on the time index format related to uncompressed messages. It
> does seem that indexing based on file position requires a bit more
> complexity. Since the time index is going to be used infrequently, having a
> level of indirection doesn't seem a big concern. So, we can leave the logic
> as it is.
>
> Do you plan to submit a patch to fix the time-based rolling issue?
>
> Thanks,
>
> Jun
>
> On Fri, Aug 26, 2016 at 3:23 PM, Becket Qin <becket@gmail.com> wrote:
>
> > Jun,
> >
> > Good point about new log rolling behavior issue when move replicas.
> Keeping
> > the old behavior sounds reasonable to me.
> >
> > Currently the time index entry points to the exact shallow message with
> the
> > indexed timestamp, are you suggesting we change it to point to the
> starting
> > offset of the appended batch(message set)? That doesn't seem to work for
> > truncation. For example, imagine an uncompressed message set [(m1,
> > offset=100, timestamp=100, position=1000), (m2, offset=101,timestamp=105,
> > position=1100)], if we build the time index based on the starting offset
> of
> > this message set, the index entry would be (105, 1000), later on when log
> > is truncated, and m2 is truncated but m1 is not (this is possible for a
> > uncompressed message set) in this case, we will not delete the time index
> > entry because it is technically pointing to m1. Pointing to the end of
> the
> > batch will not work either because then search by timestamp would miss
> m2.
> >
> > I am not sure if it is worth doing, but if we are willing to change the
> > semantic to let the time index entry not point to the exact shallow
> > message. I am thinking maybe we should just switch the semantic to the
> very
> > original one, i.e. time index only means "Max timestamp up util this
> > offset", which is also aligned with offset index entry.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> > On Fri, Aug 26, 2016 at 10:29 AM, Jun Rao <j...@confluent.io> wrote:
> >
> > > Jiangjie,
> > >
> > > I am not sure about changing the default to LogAppendTime since
> > CreateTime
> > > is probably what most people want. It also doesn't solve the problem
> > > completely. For example, if you do partition reassignment and need to
> > copy
> > > a bunch of old log segments to a new broker, this may cause log rolling
> > on
> > > every message.
> > >
> > > Another alternative is to just keep the old time rolling behavior,
> which
> > is
> > > rolling based on the create time of the log segment. I had two use
> cases
> > of
> > > time-based rolling in mind. The first one is for users who don't want
> to
> > > retain a message (say sensitive data) in the log for too long. For
> this,
> > > one can set a time-based retention. If the log can roll periodically
> > based
> > > on create time, it will freeze the largest timestamp in the rolled
> > segment
> > > and cause it to be deleted when the time limit has been reached.
> Rolling
> > > based on the timestamp of the first message doesn't help much here
> since
> > > the retention is always based on the largest timestamp. The second one
> is
> > > for log cleaner to happen quicker. Rolling logs periodically based on
> > > create time will also work. So, it seems that if we preserve the old
> time
> > > rolling behavior, we won't lose much functionality, but will avoid the
> > > corner case where the logs could be rolled on every message. What do
> you
> > > think?
> > >
> > > About storing file position in the time index, I don't think it needs
> to
> > > incur overhead during append. At the beginning of append, we are
> already
> > > getting the end position of the log (for maintaining the offset index).
> > We
> > > can just keep track of that together with the last seen offset. Storing
> > the
> > > position has the slight benefit that it avoids another indirection and
> > > seems more consistent with the offset index. It's worth thinking
> through
> > > whether this is better. If we want to change it, it's better to change
> it
> > > now than later.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Thu, 

Re: KIP-33 Opt out from Time Based indexing

2016-08-28 Thread Becket Qin
Jan,

Thanks for the example of reprocessing the messages. I think in any case,
reconsuming all the messages will definitely work. What we want to do here
is to see if we can avoid doing that by only reconsuming necessary
messages.

In the scenario you mentioned, can you store an "offset-of-last-update" for
each window? It is essentially the offset of the last message that goes
into the window. During the reprocess, if the earlier messages has an
offset less than or equals to the "offset-of-last-update" for the
corresponding window, the processor knows that this message has been added
to the aggregated window. This may need some work, but seems cheaper than
reconsuming everything.

Admittedly, if the messages are in create time and the timestamp is spread
over a wide range, users may not be able to gain much from search by
timestamp.

Regarding OffstRequest. I think what Jun meant was that we are going to
deprecate the current OffsetRequest v0 which returns a list of segment base
offsets. I am working on a new KIP for OffsetReqeust v1 which returns the
accurate message offset based on timestamp. I can hardly imagine we will
let users deal with a physical position directly :)

Thanks,

Jiangjie (Becket) Qin


On Fri, Aug 26, 2016 at 3:41 PM, Jan Filipiak <jan.filip...@trivago.com>
wrote:

> Hi Jun,
>
> thanks for taking the time to answer on such a detailed level. You are
> right Log.fetchOffsetByTimestamp works, the comment is just confusing
> "// Get all the segments whose largest timestamp is smaller than target
> timestamp" wich is apparently is not what takeWhile does (I am more on the
> Java end of things, so I relied on the comment).
>
> Regarding the frequent file rolling i didn't think of Logcompaction but
> that indeed is a place where  can hit the fan pretty easy. especially
> if you don't have many updates in there and you pass the timestamp along in
> a kafka-streams application. Bootstrapping a new application then indeed
> could produce quite a few old messages kicking this logrolling of until a
> recent message appears. I guess that makes it a practical issue again even
> with the 7 days. Thanks for pointing out! Id like to see the appendTime as
> default, I am very happy that I have it in the backpocket for purpose of
> tighter sleep and not to worry to much about someone accidentally doing
> something dodgy on a weekend with our clusters
>
> Regarding the usefulness, you will not be able to sell it for me. I don't
> know how people build applications with this ¯\_(ツ)_/¯ but I don't want to
> see them.
> Look at the error recovery with timestamp seek:
> For fixing a bug, a user needs to stop the SP, truncate all his downstream
> data perfectly based on their time window.Then restart and do the first
> fetch based
> again on the perfect window timeout. From then on, he still has NO clue
> whatsoever if messages that come later now with an earlier timestamp need
> to go into the
> previous window or not. (Note that there is  >>>absolutly no<<< way to
> determine this in aggregated downstream windowed stores). So the user is in
>  even though he can seek, he
> can't rule out his error. IMO it helps them to build the wrong thing, that
> will just be operational pain *somewhere*
>
> Look at the error recovery without timestamp seek:
> start your application from beginning with a different output
> (version,key,partition) wait for it to fully catch up. drop the timewindows
> the error happend + confidence interval (if your data isnt there anymore,
> no seek will help) in from the old version. Stop the stream processor,
> merge the data it created, switch back to the original
> (version,key,partition) and start the SP again.
> Done. As bigger you choose the confidence interval, the more correct, the
> less the index helps. usually you want maximum confidence => no index
> usage, get everything that is still there. (Maybe even redump from hadoop
> in extreme cases) ironically causing the log to roll all the time (as you
> probably publish to a new topic and have the streams application use both)
> :(
>
> As you can see, even though the users can seek, if they want to create
> proper numbers, Billing information eg. They are in trouble, and giving
> them this index will just make them implement the wrong solution! It boils
> down to: this index is not the kafka way of doing things. The index can
> help the second approach but usually one chooses the confidence interval =
> as much as one can get.
>
> Then the last thing. "OffsetRequest is a legacy request. It's awkward to
> use and we plan to deprecate it over time". You got to be kidding me. It
> was wired to get the byteposition back then, but getting the offsets is
> perfectly reasonable 

Re: KIP-33 Opt out from Time Based indexing

2016-08-26 Thread Becket Qin
Jun,

Good point about new log rolling behavior issue when move replicas. Keeping
the old behavior sounds reasonable to me.

Currently the time index entry points to the exact shallow message with the
indexed timestamp, are you suggesting we change it to point to the starting
offset of the appended batch(message set)? That doesn't seem to work for
truncation. For example, imagine an uncompressed message set [(m1,
offset=100, timestamp=100, position=1000), (m2, offset=101,timestamp=105,
position=1100)], if we build the time index based on the starting offset of
this message set, the index entry would be (105, 1000), later on when log
is truncated, and m2 is truncated but m1 is not (this is possible for a
uncompressed message set) in this case, we will not delete the time index
entry because it is technically pointing to m1. Pointing to the end of the
batch will not work either because then search by timestamp would miss m2.

I am not sure if it is worth doing, but if we are willing to change the
semantic to let the time index entry not point to the exact shallow
message. I am thinking maybe we should just switch the semantic to the very
original one, i.e. time index only means "Max timestamp up util this
offset", which is also aligned with offset index entry.

Thanks,

Jiangjie (Becket) Qin

On Fri, Aug 26, 2016 at 10:29 AM, Jun Rao <j...@confluent.io> wrote:

> Jiangjie,
>
> I am not sure about changing the default to LogAppendTime since CreateTime
> is probably what most people want. It also doesn't solve the problem
> completely. For example, if you do partition reassignment and need to copy
> a bunch of old log segments to a new broker, this may cause log rolling on
> every message.
>
> Another alternative is to just keep the old time rolling behavior, which is
> rolling based on the create time of the log segment. I had two use cases of
> time-based rolling in mind. The first one is for users who don't want to
> retain a message (say sensitive data) in the log for too long. For this,
> one can set a time-based retention. If the log can roll periodically based
> on create time, it will freeze the largest timestamp in the rolled segment
> and cause it to be deleted when the time limit has been reached. Rolling
> based on the timestamp of the first message doesn't help much here since
> the retention is always based on the largest timestamp. The second one is
> for log cleaner to happen quicker. Rolling logs periodically based on
> create time will also work. So, it seems that if we preserve the old time
> rolling behavior, we won't lose much functionality, but will avoid the
> corner case where the logs could be rolled on every message. What do you
> think?
>
> About storing file position in the time index, I don't think it needs to
> incur overhead during append. At the beginning of append, we are already
> getting the end position of the log (for maintaining the offset index). We
> can just keep track of that together with the last seen offset. Storing the
> position has the slight benefit that it avoids another indirection and
> seems more consistent with the offset index. It's worth thinking through
> whether this is better. If we want to change it, it's better to change it
> now than later.
>
> Thanks,
>
> Jun
>
> On Thu, Aug 25, 2016 at 6:30 PM, Becket Qin <becket@gmail.com> wrote:
>
> > Hi Jan,
> >
> > It seems your main concern is for the changed behavior of time based log
> > rolling and time based retention. That is actually why we have two
> > timestamp types. If user set the log.message.timestamp.type to
> > LogAppendTime, the broker will behave exactly the same as they were,
> except
> > the rolling and retention would be more accurate and independent to the
> > replica movements.
> >
> > The log.message.timestam.max.difference.ms is only useful when users are
> > using CreateTime. It is kind of a protection on the broker because an
> > insanely large timestamp could ruin the time index due to the way we
> handle
> > out-of-order timestamps when using CreateTime. But the users who are
> using
> > LogAppendTime do not need to worry about this at all.
> >
> > The first odd thing is a valid concern. In your case, because you have
> the
> > timestamp in the message value, it is probably fine to just use
> > LogAppendTime on the broker, so the timestamp will only be used to
> provide
> > accurate log retention and log rolling based on when the message was
> > produced to the broker regardless when the message was created. This
> should
> > provide the exact same behavior on the broker side as before. (Apologies
> > for the stale WIKI statement on the lines you quoted, as Jun said, the
> log
> > segmen

Re: KIP-33 Opt out from Time Based indexing

2016-08-25 Thread Becket Qin
Hi Jan,

It seems your main concern is for the changed behavior of time based log
rolling and time based retention. That is actually why we have two
timestamp types. If user set the log.message.timestamp.type to
LogAppendTime, the broker will behave exactly the same as they were, except
the rolling and retention would be more accurate and independent to the
replica movements.

The log.message.timestam.max.difference.ms is only useful when users are
using CreateTime. It is kind of a protection on the broker because an
insanely large timestamp could ruin the time index due to the way we handle
out-of-order timestamps when using CreateTime. But the users who are using
LogAppendTime do not need to worry about this at all.

The first odd thing is a valid concern. In your case, because you have the
timestamp in the message value, it is probably fine to just use
LogAppendTime on the broker, so the timestamp will only be used to provide
accurate log retention and log rolling based on when the message was
produced to the broker regardless when the message was created. This should
provide the exact same behavior on the broker side as before. (Apologies
for the stale WIKI statement on the lines you quoted, as Jun said, the log
segment rolling is based on the timestamp of the first message instead of
the largest timestamp in the log segment. I sent a change notification to
the mailing list but forgot to update the wiki page. I just updated the
wiki page.)

The second odd thing, as Jun mentioned, by design we do not keep a global
timestamp order. During the search, we start from the oldest segment and
scan over the segment until we find the first segment that contains a
timestamp which is larger than the target timestamp. This should guarantee
no message with larger timestamp will be missed. For example if we have 3
segments whose largest timestamps are 100 300 200, and we are looking for
timestamp 250, we will start to scan at first segment and stop at the
second segment and search inside that segment for the first timestmap
greater or equals to 250. So reordered largest timestamp across segments
should not be an issue.

The third odd thing is a good point. There are a few reasons we chose to
store the offsets instead physical position in the time index. Easier
truncation is one of the reasons but this may not be a big issue. Another
reason is that in the early implementation, the time index and offset index
are actually aligned, i.e. each offset in the time index as a corresponding
entry in the offset index ( the reverse is not true). So the physical
position is already stored in the offset index. Later on we switched to the
current implementation, which has the time index pointing to the exact
shallow message in the log segment. With this implementation, if the
message with the largest timestamp appears in the middle of an uncompressed
message set, we may need to calculate the physical position for that
message. This is doable but could potentially be an overhead for each
append and adding some complexity. Given that OffsetRequest is supposed to
be a pretty infrequent request, it is probably OK to do the secondary
lookup but save the work on each append.

Jun has already mentioned a few use cases for searching by timestamp. At
LinkedIn we also have several such use cases where people want to rewind
the offsets to a certain time and reprocess the streams.

@Jun, currently we are using CreateTime as the default value for
log.message.timestamp.type. I am wondering would it be less surprising if
we change the default value to LogAppendTime so that the previous behavior
is maintained, because for users it would be bad if upgrading cause their
message got deleted due the change of the behavior. What do you think?

Thanks,

Jiangjie (Becket) Qin





On Thu, Aug 25, 2016 at 2:36 PM, Jun Rao <j...@confluent.io> wrote:

> Jan,
>
> Thanks a lot for the feedback. Now I understood your concern better. The
> following are my comments.
>
> The first odd thing that you pointed out could be a real concern.
> Basically, if a producer publishes messages with really old timestamp, our
> default log.roll.hours (7 days) will indeed cause the broker to roll a log
> on ever message, which would be bad. Time-based rolling is actually used
> infrequently. The only use case that I am aware of is that for compacted
> topics, rolling logs based on time could allow the compaction to happen
> sooner (since the active segment is never cleaned). One option is to change
> the default log.roll.hours to infinite and also document the impact on
> changing log.roll.hours. Jiangjie, what do you think?
>
> For the second odd thing, the OffsetRequest is a legacy request. It's
> awkward to use and we plan to deprecate it over time. That's why we haven't
> change the logic in serving OffsetRequest after KIP-33. The plan is to
> introduce a new OffsetRequest that will be exploiting the 

Re: [ANNOUCE] Apache Kafka 0.10.0.0 Released

2016-05-24 Thread Becket Qin
Awesome!

On Tue, May 24, 2016 at 9:41 AM, Jay Kreps  wrote:

> Woohoo!!! :-)
>
> -Jay
>
> On Tue, May 24, 2016 at 9:24 AM, Gwen Shapira  wrote:
>
> > The Apache Kafka community is pleased to announce the release for Apache
> > Kafka 0.10.0.0.
> > This is a major release with exciting new features, including first
> > release of KafkaStreams and many other improvements.
> >
> > All of the changes in this release can be found:
> >
> >
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.0/RELEASE_NOTES.html
> >
> > Apache Kafka is high-throughput, publish-subscribe messaging system
> > rethought of as a distributed commit log.
> >
> > ** Fast => A single Kafka broker can handle hundreds of megabytes of
> reads
> > and
> > writes per second from thousands of clients.
> >
> > ** Scalable => Kafka is designed to allow a single cluster to serve as
> the
> > central data backbone
> > for a large organization. It can be elastically and transparently
> expanded
> > without downtime.
> > Data streams are partitioned and spread over a cluster of machines to
> allow
> > data streams
> > larger than the capability of any single machine and to allow clusters of
> > co-ordinated consumers.
> >
> > ** Durable => Messages are persisted on disk and replicated within the
> > cluster to prevent
> > data loss. Each broker can handle terabytes of messages without
> performance
> > impact.
> >
> > ** Distributed by Design => Kafka has a modern cluster-centric design
> that
> > offers
> > strong durability and fault-tolerance guarantees.
> >
> > You can download the source release from
> >
> >
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.0/kafka-0.10.0.0-src.tgz
> >
> > and binary releases from
> >
> >
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.0/kafka_2.10-0.10.0.0.tgz
> >
> >
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.0/kafka_2.11-0.10.0.0.tgz
> >
> > A big thank you for the following people who have contributed to the
> > 0.10.0.0 release.
> >
> > Adam Kunicki, Aditya Auradkar, Alex Loddengaard, Alex Sherwin, Allen
> > Wang, Andrea Cosentino, Anna Povzner, Ashish Singh, Atul Soman, Ben
> > Stopford, Bill Bejeck, BINLEI XUE, Chen Shangan, Chen Zhu, Christian
> > Posta, Cory Kolbeck, Damian Guy, dan norwood, Dana Powers, David
> > Jacot, Denise Fernandez, Dionysis Grigoropoulos, Dmitry Stratiychuk,
> > Dong Lin, Dongjoon Hyun, Drausin Wulsin, Duncan Sands, Dustin Cote,
> > Eamon Zhang, edoardo, Edward Ribeiro, Eno Thereska, Ewen
> > Cheslack-Postava, Flavio Junqueira, Francois Visconte, Frank Scholten,
> > Gabriel Zhang, gaob13, Geoff Anderson, glikson, Grant Henke, Greg
> > Fodor, Guozhang Wang, Gwen Shapira, Igor Stepanov, Ishita Mandhan,
> > Ismael Juma, Jaikiran Pai, Jakub Nowak, James Cheng, Jason Gustafson,
> > Jay Kreps, Jeff Klukas, Jeremy Custenborder, Jesse Anderson, jholoman,
> > Jiangjie Qin, Jin Xing, jinxing, Jonathan Bond, Jun Rao, Ján Koščo,
> > Kaufman Ng, kenji yoshida, Kim Christensen, Kishore Senji, Konrad,
> > Liquan Pei, Luciano Afranllie, Magnus Edenhill, Maksim Logvinenko,
> > manasvigupta, Manikumar reddy O, Mark Grover, Matt Fluet, Matt
> > McClure, Matthias J. Sax, Mayuresh Gharat, Micah Zoltu, Michael Blume,
> > Michael G. Noll, Mickael Maison, Onur Karaman, ouyangliduo, Parth
> > Brahmbhatt, Paul Cavallaro, Pierre-Yves Ritschard, Piotr Szwed,
> > Praveen Devarao, Rafael Winterhalter, Rajini Sivaram, Randall Hauch,
> > Richard Whaling, Ryan P, Samuel Julius Hecht, Sasaki Toru, Som Sahu,
> > Sriharsha Chintalapani, Stig Rohde Døssing, Tao Xiao, Tom Crayford,
> > Tom Dearman, Tom Graves, Tom Lee, Tomasz Nurkiewicz, Vahid Hashemian,
> > William Thurston, Xin Wang, Yasuhiro Matsuda, Yifan Ying, Yuto
> > Kawamura, zhuchen1018
> >
> > We welcome your help and feedback. For more information on how to
> > report problems, and to get involved, visit the project website at
> > http://kafka.apache.org/
> >
> > Thanks,
> >
> > Gwen
> >
>


Re: Messages corrupted in kafka

2016-03-24 Thread Becket Qin
You mentioned that you saw few corrupted messages, (< 0.1%). If so are you
able to see some corrupted messages if you produce, say, 10M messages?

On Wed, Mar 23, 2016 at 9:40 PM, sunil kalva <kalva.ka...@gmail.com> wrote:

>  I am using java client and kafka 0.8.2, since events are corrupted in
> kafka broker i cant read and replay them again.
>
> On Thu, Mar 24, 2016 at 9:42 AM, Becket Qin <becket@gmail.com> wrote:
>
> > Hi Sunil,
> >
> > The messages in Kafka has a CRC stored with each of them. When consumer
> > receives a message, it will compute the CRC from the message bytes and
> > compare it to the stored CRC. If the computed CRC and stored CRC does not
> > match, that indicates the message has corrupted. I am not sure in your
> case
> > why the message is corrupted. Corrupted message seems to  be pretty rare
> > because the broker actually validate the CRC before it stores the
> messages
> > on to the disk.
> >
> > Is this problem reproduceable? If so, can you find out the messages that
> > are corrupted? Also, are you using the Java clients or some other
> clients?
> >
> > Jiangjie (Becket) Qin
> >
> > On Wed, Mar 23, 2016 at 8:28 PM, sunil kalva <kalva.ka...@gmail.com>
> > wrote:
> >
> > > can some one help me out here.
> > >
> > > On Wed, Mar 23, 2016 at 7:36 PM, sunil kalva <kalva.ka...@gmail.com>
> > > wrote:
> > >
> > > > Hi
> > > > I am seeing few messages getting corrupted in kafka, It is not
> > happening
> > > > frequently and percentage is also very very less (less than 0.1%).
> > > >
> > > > Basically i am publishing thrift events in byte array format to kafka
> > > > topics(with out encoding like base64), and i also see more events
> than
> > i
> > > > publish (i confirm this by looking at the offset for that topic).
> > > > For example if i publish 100 events and i see 110 as offset for that
> > > topic
> > > > (since it is in production i could not get exact messages which
> causing
> > > > this problem, and we will only realize this problem when we consume
> > > because
> > > > our thrift deserialization fails).
> > > >
> > > > So my question is, is there any magic byte which actually determines
> > the
> > > > boundary of the message which is same as the byte i am sending or or
> > for
> > > > any n/w issues messages get chopped and stores as one message to
> > multiple
> > > > messages on server side ?
> > > >
> > > > tx
> > > > SunilKalva
> > > >
> > >
> >
>


Re: Messages corrupted in kafka

2016-03-23 Thread Becket Qin
Hi Sunil,

The messages in Kafka has a CRC stored with each of them. When consumer
receives a message, it will compute the CRC from the message bytes and
compare it to the stored CRC. If the computed CRC and stored CRC does not
match, that indicates the message has corrupted. I am not sure in your case
why the message is corrupted. Corrupted message seems to  be pretty rare
because the broker actually validate the CRC before it stores the messages
on to the disk.

Is this problem reproduceable? If so, can you find out the messages that
are corrupted? Also, are you using the Java clients or some other clients?

Jiangjie (Becket) Qin

On Wed, Mar 23, 2016 at 8:28 PM, sunil kalva <kalva.ka...@gmail.com> wrote:

> can some one help me out here.
>
> On Wed, Mar 23, 2016 at 7:36 PM, sunil kalva <kalva.ka...@gmail.com>
> wrote:
>
> > Hi
> > I am seeing few messages getting corrupted in kafka, It is not happening
> > frequently and percentage is also very very less (less than 0.1%).
> >
> > Basically i am publishing thrift events in byte array format to kafka
> > topics(with out encoding like base64), and i also see more events than i
> > publish (i confirm this by looking at the offset for that topic).
> > For example if i publish 100 events and i see 110 as offset for that
> topic
> > (since it is in production i could not get exact messages which causing
> > this problem, and we will only realize this problem when we consume
> because
> > our thrift deserialization fails).
> >
> > So my question is, is there any magic byte which actually determines the
> > boundary of the message which is same as the byte i am sending or or for
> > any n/w issues messages get chopped and stores as one message to multiple
> > messages on server side ?
> >
> > tx
> > SunilKalva
> >
>


Re: [DISCUSS] KIP-45 Standardize all client sequence interaction on j.u.Collection.

2016-03-07 Thread Becket Qin
Hi Jason,

Yes, 0.9 clients should still work with 0.10 brokers.

Thanks,

Jiangjie (Becket) Qin

On Mon, Mar 7, 2016 at 4:10 PM, Jason Gustafson <ja...@confluent.io> wrote:

> +users
>
> On Mon, Mar 7, 2016 at 4:09 PM, Jason Gustafson <ja...@confluent.io>
> wrote:
>
> > Hey Ismael,
> >
> > Thanks for bringing this up again. Just a quick question: if we do #1,
> > then there's no way that a user binary could work against both 0.9 and
> 0.10
> > of kafka-clients, right? I'm not sure if that is much of a problem, but
> may
> > cause a little pain if a user somehow depends transitively on different
> > versions. Excluding this change, would we otherwise expect
> > kafka-clients-0.9 to work with an 0.10 broker? I thought the changes for
> > KIP-32 continued to support the old message format, but I could be wrong.
> >
> > -Jason
> >
> > On Mon, Mar 7, 2016 at 3:29 PM, Ismael Juma <ism...@juma.me.uk> wrote:
> >
> >> Coming back to this, see below.
> >>
> >> On Wed, Jan 27, 2016 at 9:01 PM, Jason Gustafson <ja...@confluent.io>
> >> wrote:
> >>
> >> >
> >> > 1. For subscribe() and assign(), change the parameter type to
> >> collection as
> >> > planned in the KIP. This is at least source-compatible, so as long as
> >> users
> >> > compile against the updated release, there shouldn't be any problems.
> >> >
> >>
> >> I think this one seems to be the least controversial part of the
> proposal.
> >> And I agree with this suggestion.
> >>
> >> 2. Instead of changing the signatures of the current pause/resume/seek
> >> > APIs, maybe we can overload them. This keeps compatibility and
> supports
> >> the
> >> > more convenient collection usage, but the cost is some API bloat.
> >> >
> >>
> >> It seems like there is no clear winner, so I am OK with this too.
> >>
> >> Given the release plan for 0.10.0.0 that is being voted on, I think we
> >> should make a decision on this one way or another very soon.
> >>
> >> Ismael
> >>
> >
> >
>


Re: Kafka 0.9.0.1 plan

2016-02-05 Thread Becket Qin
Hi Jun,

I am taking KAFKA-3177 off the list because the correct fix might involve
some refactoring of exception hierarchy in new consumer. That may take some
time and 0.9.0.1 probably does not need to block on it.

Please let me know if you think we should have it fixed in 0.9.0.1.

Thanks,

Jiangjie (Becket) Qin

On Fri, Feb 5, 2016 at 10:28 AM, Jun Rao <j...@confluent.io> wrote:

> Hi, Everyone,
>
> We have fixed a few critical bugs since 0.9.0.0 was released and are still
> investigating a few more issues. The current list of issues tracked for
> 0.9.0.1 can be found below. Among them, only KAFKA-3159 seems to be
> critical.
>
>
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20KAFKA%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened%2C%20%22Patch%20Available%22)%20AND%20fixVersion%20%3D%200.9.0.1
>
> Once all critical issues are resolved, we will start the release process of
> 0.9.0.1. Our current plan is to do that next week.
>
> Thanks,
>
> Jun
>


Re: High delay during controlled shutdown and acks=-1

2015-11-02 Thread Becket Qin
Hi Federico,

What is your replica.lag.time.max.ms?

When acks=-1, the ProducerResponse won't return until all the broker in ISR
get the message. During controlled shutdown, the shutting down broker is
doing a lot of leader migration and could slow down on fetching data. The
broker won't be kicked out of ISR until at least replica.lag.time.max.ms.
Reducing the configuration will let the shutting down broker to be kicked
out of ISR quicker if it cannot catch up. But if you set it too small,
there could be more ISR expansion/shrinking.

That said, currently controlled shutdown is not very efficient. We might
improve it and hopefully later on it won't slow down the replication on the
shutting down broker.

Thanks,

Jiangjie (Becket) Qin

On Mon, Nov 2, 2015 at 5:52 AM, Federico Giraud <giraud.feder...@gmail.com>
wrote:

> Hi,
>
> I have few java async producers sending data to a 4-node Kafka cluster
> version 0.8.2, containing few thousand topics, all with replication factor
> 2. When i use acks=1 and trigger a controlled shutdown + restart on one
> broker, the producers will send data to the new leader, reporting a very
> low additional delay during the transition (as expected). However if i use
> acks=-1, the producers will report a ~15 seconds delay between the send and
> the future.get. Is this behavior expected? Is there a way to make it
> faster? Or maybe it is a problem with my configuration?
>
> Broker configs:
> broker.id=0
> log.dirs=/var/kafka/md1/kafka-logs
> zookeeper.connect=10.40.27.107,10.40.27.108,10.40.27.109
> auto.create.topics.enable=true
> default.replication.factor=2
> delete.topic.enable=true
> log.retention.hours=24
> num.io.threads=5
>
> Producer configs:
> acks = -1
> retries = 3
> timeout.ms = 3000
> batch.size = 1048576
> linger.ms= 100
> metadata.fetch.timeout.ms = 5000
> metadata.max.age.ms = 6
>
> I tried different configurations, but i wasn't able to reduce the delay
> during broker restart. The logs in the broker indicate that the controlled
> shutdown was successful.
>
> Thank you
>
> regards,
> Federico
>


Re: 0.9.0 release branch

2015-11-02 Thread Becket Qin
Hi Jun,

I added KAFKA-2722 as a blocker for 0.9. It fixes the ISR propagation
scalability issue we saw.

Thanks,

Jiangjie (Becket) Qin

On Mon, Nov 2, 2015 at 9:16 AM, Jun Rao <j...@confluent.io> wrote:

> Hi, everyone,
>
> We are getting close to the 0.9.0 release. The current plan is to have the
> following remaining 0.9.0 blocker issues resolved this week, cut the 0.9.0
> release branch by Nov. 6 (Friday) and start the RC on Nov. 9 (Monday).
>
>
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20KAFKA%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened%2C%20%22Patch%20Available%22)%20AND%20priority%20%3D%20Blocker%20AND%20fixVersion%20%3D%200.9.0.0%20ORDER%20BY%20updated%20DESC
>
> Thanks,
>
> Jun
>


Re: Replica Fetcher Reset Its Offset to beginning

2015-10-29 Thread Becket Qin
It might be related to KAFKA-2477.

On Thu, Oct 29, 2015 at 6:44 AM, Andrew Otto  wrote:

> Hi all,
>
> This morning I woke up to see a very high max replica lag on one of my
> brokers.  I looked at logs, and it seems that one of the replica fetchers
> for a partition just decided that its offset was out of range, so it reset
> its offset to the beginning of the leader’s log and started replicating
> from there.  This broker is currently catching back up, so things will be
> fine.
>
> But, I’m curious.  Has anyone seen this before?  Why would this just
> happen?
>
> The logs show that many segments for this partition were scheduled for
> deletion all at once, right before the fetcher reset its offset:
>
>
> [2015-10-29 09:27:11,899] 5421994218 [ReplicaFetcherThread-5-14] INFO
> kafka.log.Log  - Scheduling log segment 28493996399 for log
> webrequest_upload-0 for deletion.
> …
> (repeats for about 950 segments…)
> …
> [2015-10-29 09:27:12,606] 5421994925 [ReplicaFetcherThread-5-14] WARN
> kafka.server.ReplicaFetcherThread  - [ReplicaFetcherThread-5-14], Replica
> 18 for partition [webrequest_upload,0] reset its fetch offset from
> 28493996399 to current leader 14's start offset 28493996399
> [2015-10-29 09:27:12,606] 5421994925 [ReplicaFetcherThread-5-14] ERROR
> kafka.server.ReplicaFetcherThread  - [ReplicaFetcherThread-5-14], Current
> offset 31062784634 for partition [webrequest_upload,0] out of range; reset
> offset to 28493996399
> …
>
>
> A more complete capture of this log is here:
> https://gist.github.com/ottomata/033ddef8f699ca09cfa8 <
> https://gist.github.com/ottomata/033ddef8f699ca09cfa8>
>
> Thanks!
> -Ao
>
>