Re: [DISCUSS] Kafka 3.0

2020-05-05 Thread Jeff Widman
 I'm onboard with naming it "3.0"
> instead.
> > > > >
> > > > > That said, we should aim to remove legacy MirrorMaker before 3.0 as
> > > well.
> > > > > I'm happy to drive that additional breaking change. Maybe 2.6 can
> be
> > > the
> > > > > "bridge" for MM2 as well.
> > > > >
> > > > > Ryanne
> > > > >
> > > > > On Mon, May 4, 2020, 5:05 PM Colin McCabe 
> > wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > We've had a few proposals recently for incompatible changes.  One
> > of
> > > > them
> > > > > > is my KIP-604: Remove ZooKeeper Flags from the Administrative
> > Tools.
> > > > The
> > > > > > other is Boyang's KIP-590: Redirect ZK Mutation Protocols to the
> > > > > > Controller.  I think it's time to start thinking about Kafka 3.0.
> > > > > > Specifically, I think we should move to 3.0 after the 2.6
> release.
> > > > > >
> > > > > > From the perspective of KIP-500, in Kafka 3.x we'd like to make
> > > running
> > > > > in
> > > > > > a ZooKeeper-less mode possible (but not yet the default.)  This
> is
> > > the
> > > > > > motivation behind KIP-590 and KIP-604, as well as some of the
> other
> > > > KIPs
> > > > > > we've done recently.  Since it will take some time to stabilize
> the
> > > new
> > > > > > ZooKeeper-free Kafka code, we will hide it behind an option
> > > initially.
> > > > > > (We'll have a KIP describing this all in detail soon.)
> > > > > >
> > > > > > What does everyone think about having Kafka 3.0 come up next
> after
> > > 2.6?
> > > > > > Are there any other things we should change in the 2.6 -> 3.0
> > > > transition?
> > > > > >
> > > > > > best,
> > > > > > Colin
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Gwen Shapira
> > > > Engineering Manager | Confluent
> > > > 650.450.2760 | @gwenshap
> > > > Follow us: Twitter | blog
> > > >
> > >
> > >
> > > --
> > > -- Guozhang
> > >
> >
> >
> > --
> > Gwen Shapira
> > Engineering Manager | Confluent
> > 650.450.2760 | @gwenshap
> > Follow us: Twitter | blog
> >
>


-- 

*Jeff Widman*
jeffwidman.com <http://www.jeffwidman.com/> | 740-WIDMAN-J (943-6265)
<><


[VOTE] KIP-592: Kafka MirrorMaker 1 should replicate topics from "earliest"

2020-05-04 Thread Jeff Widman
Hi all,

I'd like to start a vote on KIP-592:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-592%3A+MirrorMaker+1.0+should+replicate+topics+from+earliest

Thanks

-- 

*Jeff Widman*
jeffwidman.com <http://www.jeffwidman.com/> | 740-WIDMAN-J (943-6265)
<><


Re: [DISCUSS] KIP-592: MirrorMaker should replicate topics from earliest

2020-05-01 Thread Jeff Widman
Thanks for chiming in Ryanne.

Personally, I'd rather fix MM1 to have a default of not-skipping-messages,
as that seems more correct, rather than fix the MM2 "legacy" switch to
match the bad behavior.

Especially since this won't affect currently-running mirrormakers, so
there's not that much breaking change... it would only affect
newly-started mirrormakers.

Anyone else want to chime in on this? I'd like to drive to a decision soon,
so will call for a vote in the next day or two.



On Mon, Apr 27, 2020 at 10:49 AM Ryanne Dolan  wrote:

> Conversely, we could consider making MM2 use "latest" in "legacy mode", and
> leave MM1 as it is? (Just thinking out loud.)
>
> Ryanne
>
> On Mon, Apr 27, 2020 at 12:39 PM Jeff Widman  wrote:
>
> > Good questions:
> >
> >
> > *I agree that `auto.offset.reset="earliest"` would be a better default.
> > However, I am a little worried about backwardcompatibility. *
> >
> > Keep in mind that existing mirrormaker instances will *not* be affected
> for
> > topics they are currently consuming because they will already have saved
> > offsets. This will only affect mirrormakers that start consuming new
> > topics, for which they don't have a saved offset. In those cases, they
> will
> > stop seeing data loss when they first start consuming. My guess is the
> > majority of those new topics are going to be newly-created topics anyway,
> > so most of the time starting from the earliest simply prevents skipping
> the
> > first few seconds/minutes of data written to the topic.
> >
> > *What I am also wondering thought is, does this only affect MirrorMaker
> or
> > also MirrorMaker 2? *
> >
> > I checked and MM2 already sets `auto.offset.reset = 'earliest'`
> > <
> >
> https://github.com/apache/kafka/blob/d63e0181bb7b9b4f5ed088abc00d7b32aeb0/connect/mirror/src/main/java/org/apache/kafka/connect/mirror/MirrorConnectorConfig.java#L233
> > >
> > .
> >
> > *Also, is it worth to change MirrorMaker now that **MirrorMaker 2 is
> > available?*
> >
> > Given that it's 1-line of code, doesn't affect existing instances, and
> > prevents data loss on new regex subscriptions, I think it's worth
> > setting... I basically view it as a bugfix rather than a feature change.
> >
> > I realize MM1 is deprecated, but there's still a lot of old mirrormakers
> > running, so flipping this now will ease the future transition to MM2
> > because it brings the behavior of MM1 in line with MM2.
> >
> > Thoughts?
> >
> >
> >
> > On Sat, Apr 11, 2020 at 11:59 AM Matthias J. Sax 
> wrote:
> >
> > > Jeff,
> > >
> > > thanks for the KIP. I agree that `auto.offset.reset="earliest"` would
> be
> > > a better default. However, I am a little worried about backward
> > > compatibility. And even if the current default is not idea, users can
> > > still change it.
> > >
> > > What I am also wondering thought is, does this only affect MirrorMaker
> > > or also MirrorMaker 2? Also, is it worth to change MirrorMaker now that
> > > MirrorMaker 2 is available?
> > >
> > >
> > > -Matthias
> > >
> > >
> > > On 4/10/20 9:56 PM, Jeff Widman wrote:
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-592%3A+MirrorMaker+should+replicate+topics+from+earliest
> > > >
> > > > It's a relatively minor change, only one line of code. :-D
> > > >
> > > >
> > > >
> > >
> > >
> >
> > --
> >
> > *Jeff Widman*
> > jeffwidman.com <http://www.jeffwidman.com/> | 740-WIDMAN-J (943-6265)
> > <><
> >
>


-- 

*Jeff Widman*
jeffwidman.com <http://www.jeffwidman.com/> | 740-WIDMAN-J (943-6265)
<><


Re: [DISCUSS] KIP-592: MirrorMaker should replicate topics from earliest

2020-04-27 Thread Jeff Widman
Good questions:


*I agree that `auto.offset.reset="earliest"` would be a better default.
However, I am a little worried about backwardcompatibility. *

Keep in mind that existing mirrormaker instances will *not* be affected for
topics they are currently consuming because they will already have saved
offsets. This will only affect mirrormakers that start consuming new
topics, for which they don't have a saved offset. In those cases, they will
stop seeing data loss when they first start consuming. My guess is the
majority of those new topics are going to be newly-created topics anyway,
so most of the time starting from the earliest simply prevents skipping the
first few seconds/minutes of data written to the topic.

*What I am also wondering thought is, does this only affect MirrorMaker or
also MirrorMaker 2? *

I checked and MM2 already sets `auto.offset.reset = 'earliest'`
<https://github.com/apache/kafka/blob/d63e0181bb7b9b4f5ed088abc00d7b32aeb0/connect/mirror/src/main/java/org/apache/kafka/connect/mirror/MirrorConnectorConfig.java#L233>
.

*Also, is it worth to change MirrorMaker now that **MirrorMaker 2 is
available?*

Given that it's 1-line of code, doesn't affect existing instances, and
prevents data loss on new regex subscriptions, I think it's worth
setting... I basically view it as a bugfix rather than a feature change.

I realize MM1 is deprecated, but there's still a lot of old mirrormakers
running, so flipping this now will ease the future transition to MM2
because it brings the behavior of MM1 in line with MM2.

Thoughts?



On Sat, Apr 11, 2020 at 11:59 AM Matthias J. Sax  wrote:

> Jeff,
>
> thanks for the KIP. I agree that `auto.offset.reset="earliest"` would be
> a better default. However, I am a little worried about backward
> compatibility. And even if the current default is not idea, users can
> still change it.
>
> What I am also wondering thought is, does this only affect MirrorMaker
> or also MirrorMaker 2? Also, is it worth to change MirrorMaker now that
> MirrorMaker 2 is available?
>
>
> -Matthias
>
>
> On 4/10/20 9:56 PM, Jeff Widman wrote:
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-592%3A+MirrorMaker+should+replicate+topics+from+earliest
> >
> > It's a relatively minor change, only one line of code. :-D
> >
> >
> >
>
>

-- 

*Jeff Widman*
jeffwidman.com <http://www.jeffwidman.com/> | 740-WIDMAN-J (943-6265)
<><


[DISCUSS] KIP-592: MirrorMaker should replicate topics from earliest

2020-04-10 Thread Jeff Widman
https://cwiki.apache.org/confluence/display/KAFKA/KIP-592%3A+MirrorMaker+should+replicate+topics+from+earliest

It's a relatively minor change, only one line of code. :-D



-- 

*Jeff Widman*
jeffwidman.com <http://www.jeffwidman.com/> | 740-WIDMAN-J (943-6265)
<><


Re: [ANNOUNCE] Apache Kafka 2.0.0 Released

2018-07-30 Thread Jeff Widman
ws building and running reusable producers
> > or
> > > > >
> > > > > consumers that connect Kafka topics to existing applications or
> data
> > > > >
> > > > > systems. For example, a connector to a relational database might
> > > > >
> > > > > capture every change to a table.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > With these APIs, Kafka can be used for two broad classes of
> > > application:
> > > > >
> > > > >
> > > > >
> > > > > ** Building real-time streaming data pipelines that reliably get
> data
> > > > >
> > > > > between systems or applications.
> > > > >
> > > > >
> > > > >
> > > > > ** Building real-time streaming applications that transform or
> react
> > > > >
> > > > > to the streams of data.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > Apache Kafka is in use at large and small companies worldwide,
> > > including
> > > > >
> > > > > Capital One, Goldman Sachs, ING, LinkedIn, Netflix, Pinterest,
> > > Rabobank,
> > > > >
> > > > > Target, The New York Times, Uber, Yelp, and Zalando, among others.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > A big thank you for the following 131 contributors to this release!
> > > > >
> > > > >
> > > > >
> > > > > Adem Efe Gencer, Alex D, Alex Dunayevsky, Allen Wang, Andras Beni,
> > > > >
> > > > > Andy Bryant, Andy Coates, Anna Povzner, Arjun Satish, asutosh936,
> > > > >
> > > > > Attila Sasvari, bartdevylder, Benedict Jin, Bill Bejeck, Blake
> > Miller,
> > > > >
> > > > > Boyang Chen, cburroughs, Chia-Ping Tsai, Chris Egerton, Colin P.
> > > Mccabe,
> > > > >
> > > > > Colin Patrick McCabe, ConcurrencyPractitioner, Damian Guy, dan
> > norwood,
> > > > >
> > > > > Daniel Shuy, Daniel Wojda, Dark, David Glasser, Debasish Ghosh,
> > > Detharon,
> > > > >
> > > > > Dhruvil Shah, Dmitry Minkovsky, Dong Lin, Edoardo Comar, emmanuel
> > > Harel,
> > > > >
> > > > > Eugene Sevastyanov, Ewen Cheslack-Postava, Fedor Bobin,
> > > > fedosov-alexander,
> > > > >
> > > > > Filipe Agapito, Florian Hussonnois, fredfp, Gilles Degols, gitlw,
> > > > Gitomain,
> > > > >
> > > > > Guangxian, Gunju Ko, Gunnar Morling, Guozhang Wang, hmcl, huxi,
> > huxihx,
> > > > >
> > > > > Igor Kostiakov, Ismael Juma, Jacek Laskowski, Jagadesh Adireddi,
> > > > >
> > > > > Jarek Rudzinski, Jason Gustafson, Jeff Klukas, Jeremy Custenborder,
> > > > >
> > > > > Jiangjie (Becket) Qin, Jiangjie Qin, JieFang.He, Jimin Hsieh, Joan
> > > > Goyeau,
> > > > >
> > > > > Joel Hamill, John Roesler, Jon Lee, Jorge Quilcate Otoya, Jun Rao,
> > > > >
> > > > > Kamal C, khairy, Koen De Groote, Konstantine Karantasis, Lee
> Dongjin,
> > > > >
> > > > > Liju John, Liquan Pei, lisa2lisa, Lucas Wang, Magesh Nandakumar,
> > > > >
> > > > > Magnus Edenhill, Magnus Reftel, Manikumar Reddy, Manikumar Reddy O,
> > > > >
> > > > > manjuapu, Mats Julian Olsen, Matthias J. Sax, Max Zheng, maytals,
> > > > >
> > > > > Michael Arndt, Michael G. Noll, Mickael Maison, nafshartous, Nick
> > > > Travers,
> > > > >
> > > > > nixsticks, Paolo Patierno, parafiend, Patrik Erdes, Radai
> Rosenblatt,
> > > > >
> > > > > Rajini Sivaram, Randall Hauch, ro7m, Robert Yokota, Roman Khlebnov,
> > > > >
> > > > > Ron Dagostino, Sandor Murakozi, Sasaki Toru, Sean Glover,
> > > > >
> > > > > Sebastian Bauersfeld, Siva Santhalingam, Stanislav Kozlovski,
> > Stephane
> > > > > Maarek,
> > > > >
> > > > > Stuart Perks, Surabhi Dixit, Sönke Liebau, taekyung, tedyu, Thomas
> > > > Leplus,
> > > > >
> > > > > UVN, Vahid Hashemian, Valentino Proietti, Viktor Somogyi, Vitaly
> > > Pushkar,
> > > > >
> > > > > Wladimir Schmidt, wushujames, Xavier Léauté, xin, yaphet,
> > > > >
> > > > > Yaswanth Kumar, ying-zheng, Yu
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > We welcome your help and feedback. For more information on how to
> > > > >
> > > > > report problems, and to get involved, visit the project website at
> > > > >
> > > > > https://kafka.apache.org/
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > Thank you!
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > Regards,
> > > > >
> > > > >
> > > > >
> > > > > Rajini
> > > > >
> > > >
> > > > --
> > > > *Dongjin Lee*
> > > >
> > > > *A hitchhiker in the mathematical world.*
> > > >
> > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > <http://github.com/dongjinleekr>linkedin:
> > > kr.linkedin.com/in/dongjinleekr
> > > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > www.slideshare.net/dongjinleekr
> > > > <http://www.slideshare.net/dongjinleekr>*
> > > >
> > >
> >
> >
> > --
> > Senior Software Engineer, Lightbend, Inc.
> >
> > <http://lightbend.com>
> >
> > @seg1o <https://twitter.com/seg1o>, in/seanaglover
> > <https://www.linkedin.com/in/seanaglover/>
> >
>



-- 

*Jeff Widman*
jeffwidman.com <http://www.jeffwidman.com/> | 740-WIDMAN-J (943-6265)
<><


Re: Use of a formatter like Scalafmt

2018-05-09 Thread Jeff Widman
It certainly annoys me every time I open the code and my linter starts
highlighting that some code is indented with spaces and some with tabs...
increasing consistency across the codebase would be appreciated.

On Wed, May 9, 2018 at 9:10 PM, Ismael Juma  wrote:

> Sounds good about doing this for Kafka streams scala first. Core is a bit
> more complicated so may require more discussion.
>
> Ismael
>
> On Wed, 9 May 2018, 16:59 Matthias J. Sax,  wrote:
>
> > Joan,
> >
> > thanks for starting this initiative. I am overall +1
> >
> > However, I am worried about applying it to `core` module as the change
> > is massive. For the Kafka Streams Scala module, that is new and was not
> > released yet, I am +1.
> >
> > A personal thing about the style: the 2-space indention for JavaDocs is
> > a little weird.
> >
> > /**
> >  *
> >  */
> >
> > is changed to
> >
> > /**
> >   *
> >   */
> >
> > Not sure if this can be fixed easily in the style file? If not, I am
> > also fine with the change.
> >
> > This change also affect the license headers of many files and exposing
> > that those use the wrong comment format anyway. They should use regular
> > comments
> >
> > /*
> >  *
> >  */
> >
> > but not JavaDoc comments
> >
> > /**
> >  *
> >  */
> >
> > (We fixed this for Java source code in the past already -- maybe it's
> > time to fix it for Scala code base, too.
> >
> >
> >
> > -Matthias
> >
> > On 5/9/18 4:45 PM, Joan Goyeau wrote:
> > > Hi Ted,
> > >
> > > As far as I understand this is an issue for PRs and back porting the
> > > changes to other branches.
> > >
> > > Applying the tool to the other branches should also resolve the
> conflicts
> > > as the formattings will match, leaving only the actual changes in the
> > diffs.
> > > That's what we did sometime ago at my work and it went quiet smoothly.
> > >
> > > If we don't want to do a big bang commit then I'm thinking we might
> want
> > to
> > > make it gradually by applying it module by module?
> > > This is one idea do you have any other?
> > >
> > > I know formatting sounds like the useless thing that doesn't matter
> and I
> > > totally agree with this, that's why I don't want to care about it while
> > > coding.
> > >
> > > Thanks
> > >
> > > On Thu, 10 May 2018 at 00:15 Ted Yu  wrote:
> > >
> > >> Applying the tool across code base would result in massive changes.
> > >> How would this be handled ?
> > >>  Original message From: Joan Goyeau 
> > >> Date: 5/9/18  3:31 PM  (GMT-08:00) To: dev@kafka.apache.org Subject:
> > Use
> > >> of a formatter like Scalafmt
> > >> Hi,
> > >>
> > >> Contributing to Kafka Streams' Scala API, I've been kinda lost on how
> > >> should I format my code.
> > >> I know formatting is the start of religion wars but personally I have
> no
> > >> preference at all. I just want consistency across the codebase, no
> > >> unnecessary formatting diffs in PRs and offload the formatting to a
> tool
> > >> that will do it for me and concentrate on what matters (not
> formatting).
> > >>
> > >> So I opened the following PR where I put arbitrary rules in
> > .scalafmt.conf
> > >> <
> > >>
> > https://github.com/apache/kafka/pull/4965/files#diff-
> 8af3e1355c23c331ee2b848e12c5219f
> > >>>
> > >> :
> > >> https://github.com/apache/kafka/pull/4965
> > >>
> > >> Please let me know what do you think and if we can move this forward
> and
> > >> settle something.
> > >>
> > >> Thanks
> > >>
> > >
> >
> >
>



-- 

*Jeff Widman*
jeffwidman.com <http://www.jeffwidman.com/> | 740-WIDMAN-J (943-6265)
<><


Re: Requesting wiki access for creating a KIP

2018-05-06 Thread Jeff Widman
My apologies, my username is "jeffwidman".



I forgot confluence isn't a search by user email system.

On Sun, May 6, 2018 at 7:53 PM, Matthias J. Sax 
wrote:

> What's your user id? Could not find a user `j...@jeffwidman.com`
>
> -Matthias
>
> On 5/6/18 11:12 AM, Jeff Widman wrote:
> > I thought I already had wiki access, but apparently I don't...
> >
> > Can someone give j...@jeffwidman.com proper permissions for creating a
> new
> > KIP?
> >
> > Thanks!
> >
>
>


-- 

*Jeff Widman*
jeffwidman.com <http://www.jeffwidman.com/> | 740-WIDMAN-J (943-6265)
<><


Requesting wiki access for creating a KIP

2018-05-06 Thread Jeff Widman
I thought I already had wiki access, but apparently I don't...

Can someone give j...@jeffwidman.com proper permissions for creating a new
KIP?

Thanks!

-- 

*Jeff Widman*
jeffwidman.com <http://www.jeffwidman.com/> | 740-WIDMAN-J (943-6265)
<><


Re: Kafka KIP meeting on Apr. 9 at 9:00am PDT

2018-04-05 Thread Jeff Widman
Please add me as well: j...@jeffwidman.com

thanks

On Thu, Apr 5, 2018 at 5:16 PM, James Cheng  wrote:

> Jun,
>
> Can you add me as well? wushuja...@gmail.com <mailto:wushuja...@gmail.com>
>
> Thanks,
> -James
>
> > On Apr 5, 2018, at 1:34 PM, Jun Rao  wrote:
> >
> > Hi, Ted, Vahid,
> >
> > Added you to the invite.
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Thu, Apr 5, 2018 at 10:42 AM, Vahid S Hashemian <
> > vahidhashem...@us.ibm.com> wrote:
> >
> >> Hi Jun,
> >>
> >> I used to receive these invites, but didn't get this one.
> >> Please send me an invite. Thanks.
> >>
> >> Regards,
> >> --Vahid
> >>
> >>
> >>
> >> From:   Jun Rao 
> >> To: dev 
> >> Date:   04/05/2018 10:25 AM
> >> Subject:Kafka KIP meeting on Apr. 9 at 9:00am PDT
> >>
> >>
> >>
> >> Hi, Everyone,
> >>
> >> We plan to have a Kafka KIP meeting this coming Monday (Apr. 9)  at
> 9:00am
> >> PDT. If you plan to attend but haven't received an invite, please let me
> >> know. The following is the agenda.
> >>
> >> Agenda:
> >> KIP-253: Partition expansion
> >>
> >> Thanks,
> >>
> >> Jun
> >>
> >>
> >>
> >>
> >>
>
>


-- 

*Jeff Widman*
jeffwidman.com <http://www.jeffwidman.com/> | 740-WIDMAN-J (943-6265)
<><


Re: [DISCUSS] KIP-272: Add API version tag to broker's RequestsPerSec metric

2018-03-21 Thread Jeff Widman
I agree with Allen.

Go with the intuitive name, even if it means not deprecating. The impact of
breakage here is small since it only breaks monitoring and the folks who
watch their dashboards closely are the ones likely to read the release
notes carefully and see this change.

On Wed, Mar 21, 2018, 3:24 PM Allen Wang  wrote:

> I understand the impact to jmx based tools. But adding a new metric is
> unnecessary for more advanced monitoring systems that can aggregate with or
> without tags. Duplicating the metric (including the "request" tag) also
> adds performance cost to the broker, especially for the metric that needs
> to be updated for every request.
>
> For KIP-225, the choice makes sense because the deprecated metric's name is
> undesirable anyway and the new metric name is much better than the prefixed
> metric name. Not the case for RequestsPerSec. It is hard for me to come up
> with another intuitive name.
>
> Thanks,
> Allen
>
>
>
> On Wed, Mar 21, 2018 at 2:01 AM, Ted Yu  wrote:
>
> > Creating new metric and deprecating existing one seems better from
> > compatibility point of view.
> >  Original message From: James Cheng <
> wushuja...@gmail.com>
> > Date: 3/21/18  1:39 AM  (GMT-08:00) To: dev@kafka.apache.org Subject:
> Re:
> > [DISCUSS] KIP-272: Add API version tag to broker's RequestsPerSec metric
> > Manikumar brings up a good point. This is a breaking change to the
> > existing metric. Do we want to break compatibility, or do we want to add
> a
> > new metric and (optionally) deprecate the existing metric?
> >
> > For reference, in KIP-153 [1], we changed an existing metric without
> doing
> > proper deprecation.
> >
> > However, in KIP-225, we noticed that that we maybe shouldn't have done
> > that. For KIP-225 [2], we instead decided to create a new metric, and
> > deprecate (but not remove) the old one.
> >
> > -James
> >
> > [1] https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 153%3A+Include+only+client+traffic+in+BytesOutPerSec+metric
> >
> > [2] https://cwiki.apache.org/confluence/pages/viewpage.
> > action?pageId=74686649
> >
> >
> > > On Mar 21, 2018, at 12:14 AM, Manikumar 
> > wrote:
> > >
> > > Can we retain total RequestsPerSec metric and add new version tag
> metric?
> > > When monitoring with simple jconsole/jmx based tools, It is useful to
> > have
> > > total metric
> > > to monitor request rate.
> > >
> > >
> > > Thanks,
> > >
> > > On Wed, Mar 21, 2018 at 11:01 AM, Gwen Shapira 
> > wrote:
> > >
> > >> I love this. Not much to add - it is an elegant solution, clean
> > >> implementation and it addresses a real need, especially during
> upgrades.
> > >>
> > >> On Tue, Mar 20, 2018 at 2:49 PM, Ted Yu  wrote:
> > >>
> > >>> Thanks for the response.
> > >>>
> > >>> Assuming number of client versions is limited in a cluster, memory
> > >>> consumption is not a concern.
> > >>>
> > >>> Cheers
> > >>>
> > >>> On Tue, Mar 20, 2018 at 10:47 AM, Allen Wang 
> > >> wrote:
> > >>>
> >  Hi Ted,
> > 
> >  The additional hash map is very small, possibly a few KB. Each
> request
> > >>> type
> >  ("produce", "fetch", etc.) will have such a map which have a few
> > >> entries
> >  depending on the client API versions the broker will encounter. So
> if
> >  broker encounters two client versions for "produce", there will be
> two
> >  entries in the map for "produce" requests mapping from version to
> > >> meter.
> > >>> Of
> >  course, hash map always have additional memory overhead.
> > 
> >  Thanks,
> >  Allen
> > 
> > 
> >  On Mon, Mar 19, 2018 at 3:49 PM, Ted Yu 
> wrote:
> > 
> > > bq. *additional hash lookup is needed when updating the metric to
> > >>> locate
> > > the metric *
> > >
> > > *Do you have estimate how much memory is needed for maintaining the
> > >>> hash
> > > map ?*
> > >
> > > *Thanks*
> > >
> > > On Mon, Mar 19, 2018 at 3:19 PM, Allen Wang 
> >  wrote:
> > >
> > >> Hi all,
> > >>
> > >> I have created KIP-272: Add API version tag to broker's
> > >>> RequestsPerSec
> > >> metric.
> > >>
> > >> Here is the link to the KIP:
> > >>
> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > >> 272%3A+Add+API+version+tag+to+broker%27s+RequestsPerSec+metric
> > >>
> > >> Looking forward to the discussion.
> > >>
> > >> Thanks,
> > >> Allen
> > >>
> > >
> > 
> > >>>
> > >>
> > >>
> > >>
> > >> --
> > >> *Gwen Shapira*
> > >> Product Manager | Confluent
> > >> 650.450.2760 | @gwenshap
> > >> Follow us: Twitter  | blog
> > >> 
> > >>
> >
> >
>


Re: Do the Jackson security vulnerabilities affect Kafka at all?

2018-02-21 Thread Jeff Widman
My bad, I forgot I had checked out the 1.0.1 source which has Jackson
2.9.1...

I thought the fix required 2.9.3 based on what I'd been told by the
security team at a customer (the original motivation behind my email), but
I dug a bit deeper and it looks like 2.9.1 has the patch
<https://github.com/FasterXML/jackson-databind/issues/1847#issuecomment-348409708>,
so 1.0.1 is already protected against this.

Thanks Ismael, and my apologies for wasting everyone's time.



On Tue, Feb 20, 2018 at 11:49 PM, Ismael Juma  wrote:

> Hi Jeff,
>
> Have you checked trunk and 1.1? They should be using the latest version.
>
> Ismael
>
> On Tue, Feb 20, 2018 at 10:38 PM, Jeff Widman  wrote:
>
> > The Jackson JSON parser library had a couple of CVE's announced:
> > 1. CVE-2017-7525
> > 2. CVE 2017-15095
> >
> > Here's a skimmable summary:
> > https://adamcaudill.com/2017/10/04/exploiting-jackson-rce-cve-2017-7525/
> >
> > Looking at the source, it appears Kafka uses an older version of Jackson
> > which has the vulnerabilities.
> >
> > However, these vulnerabilities only happen when Jackson is used in
> specific
> > ways. I'm not familiar enough with all the places that Kafka uses Jackson
> > to understand whether Kafka is susceptible, and I come from a non-Java
> > background so it's difficult for me to parse the Java source with 100%
> > confidence that I understand what's happening.
> >
> > I know primarily Kafka uses JSON for inter-cluster communication through
> > Zookeeper, so if an attacker could access Zookeeper could they update the
> > znode payloads to exploit this? Additionally, I think there are some util
> > scripts that (de)serialize JSON files, for example the
> > partition-reassignment scripts...
> >
> > So do these CVE's apply to Kafka?
> >
> > If so, it seem the patch is fairly trivial of just upgrading to a newer
> > version of Jackson...
> > should this also be backported to the 1.0.1 release?
> >
> >
> >
> > --
> >
> > *Jeff Widman*
> > jeffwidman.com <http://www.jeffwidman.com/> | 740-WIDMAN-J (943-6265)
> > <><
> >
>



-- 

*Jeff Widman*
jeffwidman.com <http://www.jeffwidman.com/> | 740-WIDMAN-J (943-6265)
<><


Do the Jackson security vulnerabilities affect Kafka at all?

2018-02-20 Thread Jeff Widman
The Jackson JSON parser library had a couple of CVE's announced:
1. CVE-2017-7525
2. CVE 2017-15095

Here's a skimmable summary:
https://adamcaudill.com/2017/10/04/exploiting-jackson-rce-cve-2017-7525/

Looking at the source, it appears Kafka uses an older version of Jackson
which has the vulnerabilities.

However, these vulnerabilities only happen when Jackson is used in specific
ways. I'm not familiar enough with all the places that Kafka uses Jackson
to understand whether Kafka is susceptible, and I come from a non-Java
background so it's difficult for me to parse the Java source with 100%
confidence that I understand what's happening.

I know primarily Kafka uses JSON for inter-cluster communication through
Zookeeper, so if an attacker could access Zookeeper could they update the
znode payloads to exploit this? Additionally, I think there are some util
scripts that (de)serialize JSON files, for example the
partition-reassignment scripts...

So do these CVE's apply to Kafka?

If so, it seem the patch is fairly trivial of just upgrading to a newer
version of Jackson...
should this also be backported to the 1.0.1 release?



-- 

*Jeff Widman*
jeffwidman.com <http://www.jeffwidman.com/> | 740-WIDMAN-J (943-6265)
<><


Re: [VOTE] KIP-222 - Add "describe consumer group" to KafkaAdminClient

2018-02-01 Thread Jeff Widman
; Bill
> > > > > >>
> > > > > >> On Thu, Jan 18, 2018 at 4:24 AM, Rajini Sivaram <
> > > > > rajinisiva...@gmail.com>
> > > > > >> wrote:
> > > > > >>
> > > > > >> > +1 (binding)
> > > > > >> >
> > > > > >> > Thanks for the KIP, Jorge.
> > > > > >> >
> > > > > >> > Regards,
> > > > > >> >
> > > > > >> > Rajini
> > > > > >> >
> > > > > >> > On Wed, Jan 17, 2018 at 9:04 PM, Guozhang Wang <
> > > wangg...@gmail.com>
> > > > > wrote:
> > > > > >> >
> > > > > >> > > +1 (binding). Thanks Jorge.
> > > > > >> > >
> > > > > >> > >
> > > > > >> > > Guozhang
> > > > > >> > >
> > > > > >> > > On Wed, Jan 17, 2018 at 11:29 AM, Gwen Shapira <
> > > g...@confluent.io
> > > > >
> > > > > >> > wrote:
> > > > > >> > >
> > > > > >> > > > Hey, since there were no additional comments in the
> > > discussion,
> > > > > I'd
> > > > > >> > like
> > > > > >> > > to
> > > > > >> > > > resume the voting.
> > > > > >> > > >
> > > > > >> > > > +1 (binding)
> > > > > >> > > >
> > > > > >> > > > On Fri, Nov 17, 2017 at 9:15 AM Guozhang Wang <
> > > > wangg...@gmail.com
> > > > > >
> > > > > >> > > wrote:
> > > > > >> > > >
> > > > > >> > > > > Hello Jorge,
> > > > > >> > > > >
> > > > > >> > > > > I left some comments on the discuss thread. The wiki
> page
> > > > itself
> > > > > >> > looks
> > > > > >> > > > good
> > > > > >> > > > > overall.
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > > Guozhang
> > > > > >> > > > >
> > > > > >> > > > > On Tue, Nov 14, 2017 at 10:02 AM, Jorge Esteban Quilcate
> > > > Otoya <
> > > > > >> > > > > quilcate.jo...@gmail.com> wrote:
> > > > > >> > > > >
> > > > > >> > > > > > Added.
> > > > > >> > > > > >
> > > > > >> > > > > > El mar., 14 nov. 2017 a las 19:00, Ted Yu (<
> > > > > yuzhih...@gmail.com>)
> > > > > >> > > > > > escribió:
> > > > > >> > > > > >
> > > > > >> > > > > > > Please fill in JIRA number in Status section.
> > > > > >> > > > > > >
> > > > > >> > > > > > > On Tue, Nov 14, 2017 at 9:57 AM, Jorge Esteban
> > Quilcate
> > > > > Otoya <
> > > > > >> > > > > > > quilcate.jo...@gmail.com> wrote:
> > > > > >> > > > > > >
> > > > > >> > > > > > > > JIRA issue title updated.
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > El mar., 14 nov. 2017 a las 18:45, Ted Yu (<
> > > > > >> > yuzhih...@gmail.com
> > > > > >> > > >)
> > > > > >> > > > > > > > escribió:
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > > Can you fill in JIRA number (KAFKA-6058
> > > > > >> > > > > > > > > <https://issues.apache.org/
> jira/browse/KAFKA-6058
> > >)
> > > ?
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > If one JIRA is used for the two additions,
> > consider
> > > > > updating
> > > > > >> > > the
> > > > > >> > > > > JIRA
> > > > > >> > > > > > > > > title.
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > On Tue, Nov 14, 2017 at 9:04 AM, Jorge Esteban
> > > > Quilcate
> > > > > >> > Otoya <
> > > > > >> > > > > > > > > quilcate.jo...@gmail.com> wrote:
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > Hi all,
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > As I didn't see any further discussion around
> > this
> > > > > KIP, I'd
> > > > > >> > > > like
> > > > > >> > > > > to
> > > > > >> > > > > > > > start
> > > > > >> > > > > > > > > > voting.
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > KIP documentation:
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > >
> > https://cwiki.apache.org/confluence/pages/viewpage.
> > > > > >> > > > > > > > action?pageId=74686265
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > Cheers,
> > > > > >> > > > > > > > > > Jorge.
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > > --
> > > > > >> > > > > -- Guozhang
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> > >
> > > > > >> > >
> > > > > >> > > --
> > > > > >> > > -- Guozhang
> > > > > >> > >
> > > > > >> >
> > > > >
> > > >
> > >
> >
>



-- 

*Jeff Widman*
jeffwidman.com <http://www.jeffwidman.com/> | 740-WIDMAN-J (943-6265)
<><


Re: [DISCUSS] KIP-211: Revise Expiration Semantics of Consumer Group Offsets

2018-01-23 Thread Jeff Widman
Bumping this as I'd like to see it land...

It's one of the "features" that tends to catch Kafka n00bs unawares and
typically results in message skippage/loss, vs the proposed solution is
much more intuitive behavior.

Plus it's more wire efficient because consumers no longer need to commit
offsets for partitions that have no new messages just to keep those offsets
alive.

On Fri, Jan 12, 2018 at 10:21 AM, Vahid S Hashemian <
vahidhashem...@us.ibm.com> wrote:

> There has been no further discussion on this KIP for about two months.
> So I thought I'd provide the scoop hoping it would spark additional
> feedback and move the KIP forward.
>
> The KIP proposes a method to preserve group offsets as long as the group
> is not in Empty state (even when offsets are committed very rarely), and
> start the offset expiration of the group as soon as the group becomes
> Empty.
> It suggests dropping the `retention_time` field from the `OffsetCommit`
> request and, instead, enforcing it via the broker config
> `offsets.retention.minutes` for all groups. In other words, all groups
> will have the same retention time.
> The KIP presumes that this global retention config would suffice common
> use cases and does not lead to, e.g., unmanageable offset cache size (for
> groups that don't need to stay around that long). It suggests opening
> another KIP if this global retention setting proves to be problematic in
> the future. It was suggested earlier in the discussion thread that the KIP
> should propose a per-group retention config to circumvent this risk.
>
> I look forward to hearing your thoughts. Thanks!
>
> --Vahid
>
>
>
>
> From:   "Vahid S Hashemian" 
> To: dev 
> Date:   10/18/2017 04:45 PM
> Subject:[DISCUSS] KIP-211: Revise Expiration Semantics of Consumer
> Group Offsets
>
>
>
> Hi all,
>
> I created a KIP to address the group offset expiration issue reported in
> KAFKA-4682:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.
> apache.org_confluence_display_KAFKA_KIP-2D211-253A-2BRevise-
> 2BExpiration-2BSemantics-2Bof-2BConsumer-2BGroup-2BOffsets&
> d=DwIFAg&c=jf_iaSHvJObTbx-siA1ZOg&r=Q_itwloTQj3_xUKl7Nzswo6KE4Nj-
> kjJc7uSVcviKUc&m=JkzH_2jfSMhCUPMk3rUasrjDAId6xbAEmX7_shSYdU4&s=
> UBu7D2Obulg0fterYxL5m8xrDWkF_O2kGlygTCWsfFc&e=
>
>
> Your feedback is welcome!
>
> Thanks.
> --Vahid
>
>
>
>
>
>


-- 

*Jeff Widman*
jeffwidman.com <http://www.jeffwidman.com/> | 740-WIDMAN-J (943-6265)
<><


Re: KIP-250 Add Support for Quorum-based Producer Acknowledgment

2018-01-23 Thread Jeff Widman
I agree with Dong, we should see if it's possible to change the default
behavior so that as soon as min.insync.replicas brokers respond than the
broker acknowledges the message back to the client without waiting for
additional brokers who are in the in-sync replica list to respond. (I
actually thought it already worked this way).

As you implied in the KIP though, changing this default introduces a weird
state where an in-sync follower broker is not guaranteed to have a
message...

So at a minimum, the leadership failover algorithm would need to be sure to
pick the most up-to-date follower... I thought it already did this?

But if multiple brokers fail in quick succession, then a broker that was in
the ISR could become a leader without ever receiving the message...
violating the current promises of unclean.leader.election.enable=False...
so changing the default might be not be a tenable solution.

What also jumped out at me in the KIP was the goal of reducing p999 when
setting replica lag time at 10 seconds(!!)... I understand the desire to
minimize frequent ISR shrink/expansion, as I face this same issue at my day
job. But what you're essentially trying to do here is create an additional
replication state that is in-between acks=1 and acks = ISR to paper over a
root problem of ISR shrink/expansion...

I'm just wary of shipping more features (and more operational confusion) if
it's only addressing the symptom rather than the root cause. For example,
my day job's problem is we run a very high number of low-traffic
partitions-per-broker, so the fetch requests hit many partitions before
they fill. Solving that requires changing our architecture + making the
replication protocol more efficient (KIP-227).





On Tue, Jan 23, 2018 at 10:31 PM, Dong Lin  wrote:

> Hey Litao,
>
> Thanks for the KIP. I have one quick comment before you provide more detail
> on how to select the leader with the largest LEO.
>
> Do you think it would make sense to change the default behavior of acks=-1,
> such that broker will acknowledge the message once the message has been
> replicated to min.insync.replicas brokers? This would allow us to keep the
> same durability guarantee, improve produce request latency without having a
> new config.
>
> Thanks,
> Dong
>
> On Tue, Jan 23, 2018 at 8:38 PM, Litao Deng 
> wrote:
>
> > Hey folks. I would like to add a feature to support the quorum-based
> > acknowledgment for the producer request. We have been running a
> > modified version of Kafka on our testing cluster for weeks, the
> > improvement of P999 is significant with very stable latency.
> > Additionally, I have a proposal to achieve a similar data durability
> > as with the insync.replicas-based acknowledgment through LEO-based
> > leader election.
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 250+Add+Support+for+Quorum-based+Producer+Acknowledge
> >
>



-- 

*Jeff Widman*
jeffwidman.com <http://www.jeffwidman.com/> | 740-WIDMAN-J (943-6265)
<><


Re: [DISCUSS] Release Plan for 1.0.1

2018-01-22 Thread Jeff Widman
Any update on this release?

Haven't seen anything other than Guozhang's email...

Waiting for it to drop so we can upgrade to 1.0 series...

On Tue, Jan 16, 2018 at 1:32 PM, Guozhang Wang  wrote:

> Hello Ewen,
>
> Could you include one more notable changes in 1.0.1:
> https://issues.apache.org/jira/browse/KAFKA-6398 ?
>
> My PR is ready for reviews and should be mergable at any time.
>
>
> Guozhang
>
> On Tue, Jan 16, 2018 at 10:54 AM, Ewen Cheslack-Postava  >
> wrote:
>
> > Hi all,
> >
> > I'd like to start the process for doing a 1.0.1 bug fix release. 1.0.0
> was
> > released Nov 1, 2017, and about 2.5 mos have passed and 32 bug fixes have
> > accumulated so far. A few of the more notable fixes that we've merged so
> > far:
> >
> > https://issues.apache.org/jira/browse/KAFKA-6269 - KTable restore fails
> > after rebalance
> > https://issues.apache.org/jira/browse/KAFKA-6185 - Selector memory leak
> > with high likelihood of OOM in case of down conversion
> > https://issues.apache.org/jira/browse/KAFKA-6167 - Timestamp on streams
> > directory contains a colon, which is an illegal character
> > https://issues.apache.org/jira/browse/KAFKA-6190 - GlobalKTable never
> > finishes restoring when consuming transactional messages
> > https://issues.apache.org/jira/browse/KAFKA-6252 - A fix for cleaning up
> > new Connect metrics when a connector does not shut down properly
> >
> > These represent important fixes across all components -- core, Connect
> and
> > Streams.
> >
> > I've prepared
> > https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+1.0.1 to
> > help track the fixes for 1.0.1. As you can see we have only 1 item
> > currently marked as a blocker for 1.0.1. If folks with outstanding bugs
> > could take a pass and a) make sure anything they think should go into
> 1.0.1
> > is marked as such, with the appropriate priority, and b) make sure
> anything
> > marked for 1.0.1 should really be there.
> >
> > Once people have taken a pass we can work on a VOTE thread and getting
> any
> > outstanding PRs reviewed.
> >
> > -Ewen
> >
>
>
>
> --
> -- Guozhang
>



-- 

*Jeff Widman*
jeffwidman.com <http://www.jeffwidman.com/> | 740-WIDMAN-J (943-6265)
<><


Re: [VOTE] KIP-229: DeleteGroups API

2018-01-15 Thread Jeff Widman
+1 (non-binding)

On Jan 15, 2018 10:23 AM, "Vahid S Hashemian" 
wrote:

> Happy Monday,
>
> I believe the concerns on this KIP have been addressed in the current
> version of the KIP:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 229%3A+DeleteGroups+API
> So I'd like to start a vote.
>
> Thanks.
> --Vahid
>
>


Re: [DISCUSS] KIP-211: Revise Expiration Semantics of Consumer Group Offsets

2017-11-15 Thread Jeff Widman
I thought about this scenario as well.

However, my conclusion was that because __consumer_offsets is a compacted
topic, this extra clutter from short-lived consumer groups is negligible.

The disk size is the product of the number of consumer groups and the
number of partitions in the group's subscription. Typically I'd expect that
for short-lived consumer groups, that number < 100K.

The one area I wasn't sure of was how the group coordinator's in-memory
cache of offsets works. Is it a pull-through cache of unbounded size or
does it contain all offsets of all groups that use that broker as their
coordinator? If the latter, possibly there's an OOM risk there. If so,
might be worth investigating changing the cache design to a bounded size.

Also, switching to this design means that consumer groups no longer need to
commit all offsets, they only need to commit the ones that changed. I
expect in certain cases there will be broker-side performance gains due to
parsing smaller OffsetCommit requests. For example, due to some bad design
decisions we have some a couple of topics that have 1500 partitions of
which ~10% are regularly used. So 90% of the OffsetCommit request
processing is unnecessary.



On Wed, Nov 15, 2017 at 11:27 AM, Vahid S Hashemian <
vahidhashem...@us.ibm.com> wrote:

> I'm forwarding this feedback from John to the mailing list, and responding
> at the same time:
>
> John, thanks for the feedback. I agree that the scenario you described
> could lead to unnecessary long offset retention for other consumer groups.
> If we want to address that in this KIP we could either keep the
> 'retention_time' field in the protocol, or propose a per group retention
> configuration.
>
> I'd like to ask for feedback from the community on whether we should
> design and implement a per-group retention configuration as part of this
> KIP; or keep it simple at this stage and go with one broker level setting
> only.
> Thanks in advance for sharing your opinion.
>
> --Vahid
>
>
>
>
> From:   John Crowley 
> To: vahidhashem...@us.ibm.com
> Date:   11/15/2017 10:16 AM
> Subject:[DISCUSS] KIP-211: Revise Expiration Semantics of Consumer
> Group Offsets
>
>
>
> Sorry for the clutter, first found KAFKA-3806, then -4682, and finally
> this KIP - they have more detail which I’ll avoid duplicating here.
>
> Think that not starting the expiration until all consumers have ceased,
> and clearing all offsets at the same time, does clean things up and solves
> 99% of the original issues - and 100% of my particular concern.
>
> A valid use-case may still have a periodic application - say production
> applications posting to Topics all week, and then a weekend batch job
> which consumes all new messages.
>
> Setting offsets.retention.minutes = 10 days does cover this but at the
> cost of extra clutter if there are other consumer groups which are truly
> created/used/abandoned on a frequent basis. Being able to set
> offsets.retention.minutes on a per groupId basis allows this to also be
> covered cleanly, and makes it visible that these groupIds are a special
> case.
>
> But relatively minor, and should not delay the original KIP.
>
> Thanks,
>
> John Crowley
>
>
>
>
>
>
>
>


-- 

*Jeff Widman*
jeffwidman.com <http://www.jeffwidman.com/> | 740-WIDMAN-J (943-6265)
<><


Re: [DISCUSS] KIP-211: Revise Expiration Semantics of Consumer Group Offsets

2017-11-14 Thread Jeff Widman
Any other input on this?

Otherwise Vahid what do you think about moving this to a vote?

On Tue, Nov 7, 2017 at 2:34 PM, Jeff Widman  wrote:

> Any other feedback from folks on KIP-211?
>
> A prime benefit of this KIP is that it removes the need for the consumer
> to commit offsets for partitions where the offset hasn't changed. Right
> now, if the consumer doesn't commit those offsets, they will be deleted, so
> the consumer keeps blindly (re)committing duplicate offsets, wasting
> network/disk I/O.
>
> On Mon, Oct 30, 2017 at 3:47 PM, Jeff Widman  wrote:
>
>> I support this as the proposed change seems both more intuitive and
>> safer.
>>
>> Right now we've essentially hacked this at my day job by bumping the
>> offset retention period really high, but this is a much cleaner solution.
>>
>> I don't have any use-cases that require custom retention periods on a
>> per-group basis.
>>
>> On Mon, Oct 30, 2017 at 10:15 AM, Vahid S Hashemian <
>> vahidhashem...@us.ibm.com> wrote:
>>
>>> Bump!
>>>
>>>
>>>
>>> From:   Vahid S Hashemian/Silicon Valley/IBM
>>> To: dev 
>>> Date:   10/18/2017 04:45 PM
>>> Subject:[DISCUSS] KIP-211: Revise Expiration Semantics of
>>> Consumer
>>> Group Offsets
>>>
>>>
>>> Hi all,
>>>
>>> I created a KIP to address the group offset expiration issue reported in
>>> KAFKA-4682:
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-211%3A
>>> +Revise+Expiration+Semantics+of+Consumer+Group+Offsets
>>>
>>> Your feedback is welcome!
>>>
>>> Thanks.
>>> --Vahid
>>>
>>>
>>>
>>>
>>
>>
>> --
>>
>> *Jeff Widman*
>> jeffwidman.com <http://www.jeffwidman.com/> | 740-WIDMAN-J (943-6265)
>> <><
>>
>
>
>
> --
>
> *Jeff Widman*
> jeffwidman.com <http://www.jeffwidman.com/> | 740-WIDMAN-J (943-6265)
> <><
>



-- 

*Jeff Widman*
jeffwidman.com <http://www.jeffwidman.com/> | 740-WIDMAN-J (943-6265)
<><


Re: [DISCUSS] Kafka 2.0.0 in June 2018

2017-11-10 Thread Jeff Widman
 > releases?
> > > > >
> > > > > By the way, why June 2018? :)
> > > > >
> > > > > -Jaikiran
> > > > >
> > > > >
> > > > >
> > > > > On 09/11/17 3:14 PM, Ismael Juma wrote:
> > > > >
> > > > >> Hi all,
> > > > >>
> > > > >> I'm starting this discussion early because of the potential
> impact.
> > > > >>
> > > > >> Kafka 1.0.0 was just released and the focus was on achieving the
> > > > original
> > > > >> project vision in terms of features provided while maintaining
> > > > >> compatibility for the most part (i.e. we did not remove deprecated
> > > > >> components like the Scala clients).
> > > > >>
> > > > >> This was the right decision, in my opinion, but it's time to start
> > > > >> thinking
> > > > >> about 2.0.0, which is an opportunity for us to remove major
> > deprecated
> > > > >> components and to benefit from Java 8 language enhancements (so
> that
> > > we
> > > > >> can
> > > > >> move faster). So, I propose the following for Kafka 2.0.0:
> > > > >>
> > > > >> 1. It should be released in June 2018
> > > > >> 2. The Scala clients (Consumer, SimpleConsumer, Producer,
> > > SyncProducer)
> > > > >> will be removed
> > > > >> 3. Java 8 or higher will be required, i.e. support for Java 7 will
> > be
> > > > >> dropped.
> > > > >>
> > > > >> Thoughts?
> > > > >>
> > > > >> Ismael
> > > > >>
> > > > >>
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > -- Guozhang
> > >
> >
>



-- 

*Jeff Widman*
jeffwidman.com <http://www.jeffwidman.com/> | 740-WIDMAN-J (943-6265)
<><


Can someone update this KIP's status to "accepted"?

2017-11-08 Thread Jeff Widman
https://cwiki.apache.org/confluence/display/KAFKA/KIP-164-+Add+UnderMinIsrPartitionCount+and+per-partition+UnderMinIsr+metrics

the JIRA shows it was shipped in 1.0, but the wiki page lists it as
"Discussion"

-- 

*Jeff Widman*
jeffwidman.com <http://www.jeffwidman.com/> | 740-WIDMAN-J (943-6265)
<><


Re: [DISCUSS] KIP-211: Revise Expiration Semantics of Consumer Group Offsets

2017-11-07 Thread Jeff Widman
Any other feedback from folks on KIP-211?

A prime benefit of this KIP is that it removes the need for the consumer to
commit offsets for partitions where the offset hasn't changed. Right now,
if the consumer doesn't commit those offsets, they will be deleted, so the
consumer keeps blindly (re)committing duplicate offsets, wasting
network/disk I/O.

On Mon, Oct 30, 2017 at 3:47 PM, Jeff Widman  wrote:

> I support this as the proposed change seems both more intuitive and safer.
>
> Right now we've essentially hacked this at my day job by bumping the
> offset retention period really high, but this is a much cleaner solution.
>
> I don't have any use-cases that require custom retention periods on a
> per-group basis.
>
> On Mon, Oct 30, 2017 at 10:15 AM, Vahid S Hashemian <
> vahidhashem...@us.ibm.com> wrote:
>
>> Bump!
>>
>>
>>
>> From:   Vahid S Hashemian/Silicon Valley/IBM
>> To: dev 
>> Date:   10/18/2017 04:45 PM
>> Subject:[DISCUSS] KIP-211: Revise Expiration Semantics of Consumer
>> Group Offsets
>>
>>
>> Hi all,
>>
>> I created a KIP to address the group offset expiration issue reported in
>> KAFKA-4682:
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-211%
>> 3A+Revise+Expiration+Semantics+of+Consumer+Group+Offsets
>>
>> Your feedback is welcome!
>>
>> Thanks.
>> --Vahid
>>
>>
>>
>>
>
>
> --
>
> *Jeff Widman*
> jeffwidman.com <http://www.jeffwidman.com/> | 740-WIDMAN-J (943-6265)
> <><
>



-- 

*Jeff Widman*
jeffwidman.com <http://www.jeffwidman.com/> | 740-WIDMAN-J (943-6265)
<><


Re: [DISCUSS] KIP-217: Expose a timeout to allow an expired ZK session to be re-created

2017-11-02 Thread Jeff Widman
#x27;s
> > > question:
> > > > > if the default timeout is infinite, then it won't change anything
> to
> > > how
> > > > > Kafka works from today, does it? (unless I'm missing something
> > sorry).
> > > If
> > > > > not set to infinite, then we introduce the risk of a whole cluster
> > > > shutting
> > > > > down at once?
> > > > >
> > > > > Thanks,
> > > > > Stephane
> > > > >
> > > > > On 31/10/17, 1:00 pm, "Jun Rao"  wrote:
> > > > >
> > > > > Hi, Stephane,
> > > > >
> > > > > Thanks for the reply.
> > > > >
> > > > > 1) Fixing the issue in ZK will be ideal. Not sure when it will
> > > happen
> > > > > though. Once it's fixed, we can probably deprecate this config.
> > > > >
> > > > > 2) That could be useful. Is there a java api to do that at
> > runtime?
> > > > > Also,
> > > > > invalidating DNS cache doesn't always fix the issue of
> unresolved
> > > > > host. In
> > > > > some of the cases, human intervention is needed.
> > > > >
> > > > > 3) The default timeout is infinite though.
> > > > >
> > > > > Jun
> > > > >
> > > > >
> > > > > On Sat, Oct 28, 2017 at 11:48 PM, Stephane Maarek <
> > > > > steph...@simplemachines.com.au> wrote:
> > > > >
> > > > > > Hi Jun,
> > > > > >
> > > > > > I think this is very helpful. Restarting Kafka brokers in
> case
> > of
> > > > > zookeeper
> > > > > > host change is not a well known operation.
> > > > > >
> > > > > > Few questions:
> > > > > > 1) would it not be worth fixing the problem at the source ?
> > This
> > > > has
> > > > > been
> > > > > > stuck for a while though, maybe a little push would help :
> > > > > > https://issues.apache.org/jira/plugins/servlet/mobile#
> > > > > issue/ZOOKEEPER-2184
> > > > > >
> > > > > > 2) upon recreating the zookeeper object , is it not possible
> to
> > > > > invalidate
> > > > > > the DNS cache so that it resolves the new hostname?
> > > > > >
> > > > > > 3) could the cluster be down in this situation: one migrates
> an
> > > > > entire
> > > > > > zookeeper cluster to new machines (one by one). The quorum is
> > > still
> > > > > alive
> > > > > > without downtime, but now every broker in a cluster can't
> > resolve
> > > > > zookeeper
> > > > > > at the same time. They all shut down at the same time after
> the
> > > new
> > > > > > time-out setting.
> > > > > >
> > > > > > Thanks !
> > > > > > Stéphane
> > > > > >
> > > > > > On 28 Oct. 2017 9:42 am, "Jun Rao"  wrote:
> > > > > >
> > > > > > > Hi, Everyone,
> > > > > > >
> > > > > > > We created "KIP-217: Expose a timeout to allow an expired
> ZK
> > > > > session to
> > > > > > be
> > > > > > > re-created".
> > > > > > >
> > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > > 217%3A+Expose+a+timeout+to+allow+an+expired+ZK+session+
> > > > > to+be+re-created
> > > > > > >
> > > > > > > Please take a look and provide your feedback.
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Jun
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>



-- 

*Jeff Widman*
jeffwidman.com <http://www.jeffwidman.com/> | 740-WIDMAN-J (943-6265)
<><


Re: [DISCUSS] KIP-217: Expose a timeout to allow an expired ZK session to be re-created

2017-10-31 Thread Jeff Widman
Agree with Stephane that it's worth at least taking a shot at trying to get
ZOOKEEPER-2184 fixed rather than adding a config that will be deprecated in
the not-too distant future.

I know Zookeeper development feels more like the turtle than the hare these
days, but Kafka is a high-visibility project, so there's a decent chance
you'll be able to get the attention of the zookeeper maintainers to get a
patch merged and possibly even a new release cut incorporating this fix.

On Tue, Oct 31, 2017 at 3:28 PM, Stephane Maarek <
steph...@simplemachines.com.au> wrote:

> Hi Jun,
>
> Thanks for the reply.
>
> 1) The reason I'm asking about it is I wonder if it's not worth focusing
> the development efforts on taking ownership of the existing PR (
> https://github.com/apache/zookeeper/pull/150)  to fix ZOOKEEPER-2184,
> rebase it and have it merged into the ZK codebase shortly.  I feel this KIP
> might introduce a setting that could be deprecated shortly and confuse the
> end user a bit further with one more knob to turn.
>
> 3) I'm not sure if I fully understand, sorry for the beginner's question:
> if the default timeout is infinite, then it won't change anything to how
> Kafka works from today, does it? (unless I'm missing something sorry). If
> not set to infinite, then we introduce the risk of a whole cluster shutting
> down at once?
>
> Thanks,
> Stephane
>
> On 31/10/17, 1:00 pm, "Jun Rao"  wrote:
>
> Hi, Stephane,
>
> Thanks for the reply.
>
> 1) Fixing the issue in ZK will be ideal. Not sure when it will happen
> though. Once it's fixed, we can probably deprecate this config.
>
> 2) That could be useful. Is there a java api to do that at runtime?
> Also,
> invalidating DNS cache doesn't always fix the issue of unresolved
> host. In
> some of the cases, human intervention is needed.
>
> 3) The default timeout is infinite though.
>
> Jun
>
>
> On Sat, Oct 28, 2017 at 11:48 PM, Stephane Maarek <
> steph...@simplemachines.com.au> wrote:
>
> > Hi Jun,
> >
> > I think this is very helpful. Restarting Kafka brokers in case of
> zookeeper
> > host change is not a well known operation.
> >
> > Few questions:
> > 1) would it not be worth fixing the problem at the source ? This has
> been
> > stuck for a while though, maybe a little push would help :
> > https://issues.apache.org/jira/plugins/servlet/mobile#
> issue/ZOOKEEPER-2184
> >
> > 2) upon recreating the zookeeper object , is it not possible to
> invalidate
> > the DNS cache so that it resolves the new hostname?
> >
> > 3) could the cluster be down in this situation: one migrates an
> entire
> > zookeeper cluster to new machines (one by one). The quorum is still
> alive
> > without downtime, but now every broker in a cluster can't resolve
> zookeeper
> > at the same time. They all shut down at the same time after the new
> > time-out setting.
> >
> > Thanks !
> > Stéphane
> >
> > On 28 Oct. 2017 9:42 am, "Jun Rao"  wrote:
> >
> > > Hi, Everyone,
> > >
> > > We created "KIP-217: Expose a timeout to allow an expired ZK
> session to
> > be
> > > re-created".
> > >
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 217%3A+Expose+a+timeout+to+allow+an+expired+ZK+session+
> to+be+re-created
> > >
> > > Please take a look and provide your feedback.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> >
>
>
>
>


-- 

*Jeff Widman*
jeffwidman.com <http://www.jeffwidman.com/> | 740-WIDMAN-J (943-6265)
<><


Re: [DISCUSS] KIP-211: Revise Expiration Semantics of Consumer Group Offsets

2017-10-30 Thread Jeff Widman
I support this as the proposed change seems both more intuitive and safer.

Right now we've essentially hacked this at my day job by bumping the offset
retention period really high, but this is a much cleaner solution.

I don't have any use-cases that require custom retention periods on a
per-group basis.

On Mon, Oct 30, 2017 at 10:15 AM, Vahid S Hashemian <
vahidhashem...@us.ibm.com> wrote:

> Bump!
>
>
>
> From:   Vahid S Hashemian/Silicon Valley/IBM
> To: dev 
> Date:   10/18/2017 04:45 PM
> Subject:[DISCUSS] KIP-211: Revise Expiration Semantics of Consumer
> Group Offsets
>
>
> Hi all,
>
> I created a KIP to address the group offset expiration issue reported in
> KAFKA-4682:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 211%3A+Revise+Expiration+Semantics+of+Consumer+Group+Offsets
>
> Your feedback is welcome!
>
> Thanks.
> --Vahid
>
>
>
>


-- 

*Jeff Widman*
jeffwidman.com <http://www.jeffwidman.com/> | 740-WIDMAN-J (943-6265)
<><


Re: [VOTE] KIP-214: Add zookeeper.max.in.flight.requests config to the broker

2017-10-30 Thread Jeff Widman
+1 (non-binding)

Thanks for putting the work in to benchmark various defaults.

On Mon, Oct 30, 2017 at 3:05 PM, Ismael Juma  wrote:

> Thanks for the KIP, +1 (binding).
>
> On 27 Oct 2017 6:15 pm, "Onur Karaman" 
> wrote:
>
> > I'd like to start the vote for KIP-214: Add
> > zookeeper.max.in.flight.requests config to the broker
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 214%3A+Add+zookeeper.max.in.flight.requests+config+to+the+broker
> >
> > - Onur
> >
>



-- 

*Jeff Widman*
jeffwidman.com <http://www.jeffwidman.com/> | 740-WIDMAN-J (943-6265)
<><


Re: [VOTE] KIP-175: Additional '--describe' views for ConsumerGroupCommand

2017-09-28 Thread Jeff Widman
+1 (non-binding)

On Sep 28, 2017 12:13 PM, "Vahid S Hashemian" 
wrote:

> I'm bumping this up as it's awaiting one more binding +1, but I'd like to
> also mention a recent change to the KIP.
>
> Since the current DescribeGroup response protocol does not include
> member-specific information such as preferred assignment strategies, or
> topic subscriptions, I've removed the corresponding ASSIGNMENT-STRATEGY
> and SUBSCRIPTION columns from --members and --members --verbose options,
> respectively. These columns will be added back once KIP-181 (that aims to
> enhance DescribeGroup response) is in place. I hope this small
> modification is reasonable. If needed, we can continue the discussion on
> the discussion thread.
>
> And I'm not sure if this change requires a re-vote.
>
> Thanks.
> --Vahid
>
>
>
> From:   "Vahid S Hashemian" 
> To: dev 
> Date:   07/27/2017 02:04 PM
> Subject:[VOTE] KIP-175: Additional '--describe' views for
> ConsumerGroupCommand
>
>
>
> Hi all,
>
> Thanks to everyone who participated in the discussion on KIP-175, and
> provided feedback.
> The KIP can be found at
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 175%3A+Additional+%27--describe%27+views+for+ConsumerGroupCommand
>
> .
> I believe the concerns have been addressed in the recent version of the
> KIP; so I'd like to start a vote.
>
> Thanks.
> --Vahid
>
>
>
>
>
>


Re: Comments on JIRAs

2017-07-14 Thread Jeff Widman
Thanks for bringing this up. I don't know what the solution is, but I agree
it would be good to have something.

On Jul 13, 2017 6:46 AM, "Tom Bentley"  wrote:

> The project recently switched from all JIRA events being sent to the dev
> mailling list, to just issue creations. This seems like a good thing
> because the dev mailling list was very noisy before, and if you want to see
> all the JIRA comments etc you can subscribe to the JIRA list. If you don't
> subscribe to the JIRA list you need to take the time to become a watcher on
> each issue that interests you.
>
> However, the flip-side of this is that when you comment on a JIRA you have
> no idea who's going to get notified (apart from the watchers). In
> particular, commenters don't know whether any of the committers will see
> their comment, unless they mention them individually by name. But for an
> issue in which no committer has thus far taken an interest, who is the
> commenter to @mention? There is no @kafka_commiters that you can use to
> bring the comment to the attention of the whole group of committers.
>
> There is also the fact that there are an awful lot of historical issues
> which interested people won't be watching because they assumed at the time
> that they'd get notified via the dev list.
>
> I can well imagine that people who aren't working a lot of Kafka won't
> realise that there's a good chance that their comments on JIRAs won't reach
> relevant people.
>
> I'm mentioning this mainly to highlight to people that this is what's
> happening, because it wasn't obvious to me that commenting on a JIRA might
> not reach (all of) the committers/interested parties.
>
> Cheers,
>
> Tom
>


Re: [DISCUSS] KIP-175: Additional '--describe' views for ConsumerGroupCommand

2017-07-06 Thread Jeff Widman
Thanks for the KIP Vahid. I think it'd be useful to have these filters.

That said, I also agree with Edo.

We don't currently rely on the output, but there's been more than one time
when debugging an issue that I notice something amiss when I see all the
data at once but if it wasn't present in the default view I probably would
have missed it as I wouldn't have thought to look at that particular
filter.

This would also be more consistent with the API of the kafka-topics.sh
where "--describe" gives everything and then can be filtered down.



On Tue, Jul 4, 2017 at 10:42 AM, Edoardo Comar  wrote:

> Hi Vahid,
> no we are not relying on parsing the current output.
>
> I just thought that keeping the full output isn't necessarily that bad as
> it shows some sort of history of how a group was used.
>
> ciao
> Edo
> --
>
> Edoardo Comar
>
> IBM Message Hub
>
> IBM UK Ltd, Hursley Park, SO21 2JN
>
> "Vahid S Hashemian"  wrote on 04/07/2017
> 17:11:43:
>
> > From: "Vahid S Hashemian" 
> > To: dev@kafka.apache.org
> > Cc: "Kafka User" 
> > Date: 04/07/2017 17:12
> > Subject: Re: [DISCUSS] KIP-175: Additional '--describe' views for
> > ConsumerGroupCommand
> >
> > Hi Edo,
> >
> > Thanks for reviewing the KIP.
> >
> > Modifying the default behavior of `--describe` was suggested in the
> > related JIRA.
> > We could poll the community to see whether they go for that option, or,
> as
> > you suggested, introducing a new `--only-xxx` ( can't also think of a
> > proper name right now :) ) option instead.
> >
> > Are you making use of the current `--describe` output and relying on the
>
> > full data set?
> >
> > Thanks.
> > --Vahid
> >
> >
> >
> >
> > From:   Edoardo Comar 
> > To: dev@kafka.apache.org
> > Cc: "Kafka User" 
> > Date:   07/04/2017 03:17 AM
> > Subject:Re: [DISCUSS] KIP-175: Additional '--describe' views for
>
> > ConsumerGroupCommand
> >
> >
> >
> > Thanks Vahid, I like the KIP.
> >
> > One question - could we keep the current "--describe" behavior unchanged
>
> > and introduce "--only-xxx" options to filter down the full output as you
>
> > proposed ?
> >
> > ciao,
> > Edo
> > --
> >
> > Edoardo Comar
> >
> > IBM Message Hub
> >
> > IBM UK Ltd, Hursley Park, SO21 2JN
> >
> >
> >
> > From:   "Vahid S Hashemian" 
> > To: dev , "Kafka User"
> 
> > Date:   04/07/2017 00:06
> > Subject:[DISCUSS] KIP-175: Additional '--describe' views for
> > ConsumerGroupCommand
> >
> >
> >
> > Hi,
> >
> > I created KIP-175 to make some improvements to the ConsumerGroupCommand
> > tool.
> > The KIP can be found here:
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-175%3A
> > +Additional+%27--describe%27+views+for+ConsumerGroupCommand
> >
> >
> >
> > Your review and feedback is welcome!
> >
> > Thanks.
> > --Vahid
> >
> >
> >
> >
> >
> > Unless stated otherwise above:
> > IBM United Kingdom Limited - Registered in England and Wales with number
>
> > 741598.
> > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
> 3AU
> >
> >
> >
> >
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>


[jira] [Resolved] (KAFKA-5299) MirrorMaker with New.consumer doesn't consume message from multiple topics whitelisted

2017-06-29 Thread Jeff Widman (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Widman resolved KAFKA-5299.

Resolution: Cannot Reproduce

> MirrorMaker with New.consumer doesn't consume message from multiple topics 
> whitelisted 
> ---
>
> Key: KAFKA-5299
> URL: https://issues.apache.org/jira/browse/KAFKA-5299
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1
>Reporter: Jyoti
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Pull-Requests not worked on

2017-06-19 Thread Jeff Widman
On Thu, Jun 8, 2017 at 5:00 PM, Ismael Juma  wrote:

> One of the reasons why some PRs remain open even if they should be
> closed is that we have no way to close PRs ourselves without referencing
> them in a commit or asking Apache Infra for help. So, abandoned PRs tend to
> remain open. This is very frustrating, but there is no solution in sight.
>


Why is this?

Don't those with commit rights to the project have collaborator permissions
and thus the ability to close PRs (ala most Github-based projects?)


Re: Changing JIRA notifications to remove Single Email Address (dev@kafka.apache.org)

2017-06-09 Thread Jeff Widman
+1

Thanks for driving this

On Jun 9, 2017 1:22 PM, "Guozhang Wang"  wrote:

> I have just confirmed with Jun that de cannot change the notification edits
> himself as well. So I'll talk to Apache Infra anyways in order to make that
> change.
>
> So I'd update my proposal to:
>
> 1. create an "iss...@kafka.apache.org" which will receive all the JIRA
> events.
> 2. remove "dev@kafka.apache.org" from all the JIRA event notifications
> except "Issue Created", "Issue Resolved" and "Issue Reopened".
>
>
> Everyone's feedback are more than welcome! I will wait for a few days and
> if there is an agreement create a ticket to Apache JIRA.
>
> Guozhang
>
>
> On Wed, Jun 7, 2017 at 2:43 PM, Guozhang Wang  wrote:
>
> > Yeah I have thought about that but that would involve Apache Infra. The
> > current proposal is based on the assumption that removing the
> notification
> > can be done by ourselves.
> >
> >
> > Guozhang
> >
> > On Wed, Jun 7, 2017 at 1:24 PM, Matthias J. Sax 
> > wrote:
> >
> >> Have you considered starting a new mailing list "issues@" that gets the
> >> full load of all emails?
> >>
> >> -Matthias
> >>
> >> On 6/6/17 7:26 PM, Jeff Widman wrote:
> >> > What about also adding "Issue Resolved" to the events that hit dev?
> >> >
> >> > While I don't generally care about updates on issue progress, I do
> want
> >> to
> >> > know when issues are resolved in case undocumented behavior that I may
> >> have
> >> > accidentally been relying on may have changed.
> >> >
> >> > With or without this suggested tweak, I am +1 (non-binding) on this.
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > On Tue, Jun 6, 2017 at 3:49 PM, Guozhang Wang 
> >> wrote:
> >> >
> >> >> (Changing the subject title to reflect on the proposal)
> >> >>
> >> >> Hey guys,
> >> >>
> >> >> In order to not drop the ball on the floor I'd like to kick off some
> >> >> proposals according to people's feedbacks on this issue. Feel free to
> >> >> brainstorm on different ideas:
> >> >>
> >> >> We change JIRA notifications to remove `Single Email Address (
> >> >> dev@kafka.apache.org)` for all events EXCEPT "Issue Created" (5 - 8
> >> >> events per day). Currently `dev@kafka.apache.org` is notified on all
> >> >> events (~180 events per day).
> >> >>
> >> >>
> >> >> Though as a PMC member I can view these notification schemes on the
> >> JIRA
> >> >> admin page, I cannot edit on them. Maybe Jun can check if he has the
> >> >> privilege to do so after we have agreed on some proposal; otherwise
> we
> >> need
> >> >> to talk to Apache INFRA for it.
> >> >>
> >> >>
> >> >> Guozhang
> >> >>
> >> >>
> >> >>
> >> >> On Tue, May 30, 2017 at 11:57 PM, Michal Borowiecki <
> >> >> michal.borowie...@openbet.com> wrote:
> >> >>
> >> >>> +1 agree with Jeff,
> >> >>>
> >> >>> Michał
> >> >>>
> >> >>> On 31/05/17 06:25, Jeff Widman wrote:
> >> >>>
> >> >>> I'm hugely in favor of this change as well...
> >> >>>
> >> >>> Although I actually find the Github pull request emails less useful
> >> than
> >> >>> the jirabot ones since Jira typically has more info when I'm trying
> to
> >> >>> figure out if the issue is relevant to me or not...
> >> >>>
> >> >>> On Tue, May 30, 2017 at 2:28 PM, Guozhang Wang 
> <
> >> wangg...@gmail.com> wrote:
> >> >>>
> >> >>>
> >> >>> I actually do not know.. Maybe Jun knows better than me?
> >> >>>
> >> >>>
> >> >>> Guozhang
> >> >>>
> >> >>> On Mon, May 29, 2017 at 12:58 PM, Gwen Shapira 
> <
> >> g...@confluent.io> wrote:
> >> >>>
> >> >>>
> >> >>> I agree.
> >> >>>
> >> >>> Guozhang, do yo

Re: Changing JIRA notifications to remove Single Email Address (dev@kafka.apache.org)

2017-06-06 Thread Jeff Widman
What about also adding "Issue Resolved" to the events that hit dev?

While I don't generally care about updates on issue progress, I do want to
know when issues are resolved in case undocumented behavior that I may have
accidentally been relying on may have changed.

With or without this suggested tweak, I am +1 (non-binding) on this.





On Tue, Jun 6, 2017 at 3:49 PM, Guozhang Wang  wrote:

> (Changing the subject title to reflect on the proposal)
>
> Hey guys,
>
> In order to not drop the ball on the floor I'd like to kick off some
> proposals according to people's feedbacks on this issue. Feel free to
> brainstorm on different ideas:
>
> We change JIRA notifications to remove `Single Email Address (
> dev@kafka.apache.org)` for all events EXCEPT "Issue Created" (5 - 8
> events per day). Currently `dev@kafka.apache.org` is notified on all
> events (~180 events per day).
>
>
> Though as a PMC member I can view these notification schemes on the JIRA
> admin page, I cannot edit on them. Maybe Jun can check if he has the
> privilege to do so after we have agreed on some proposal; otherwise we need
> to talk to Apache INFRA for it.
>
>
> Guozhang
>
>
>
> On Tue, May 30, 2017 at 11:57 PM, Michal Borowiecki <
> michal.borowie...@openbet.com> wrote:
>
>> +1 agree with Jeff,
>>
>> Michał
>>
>> On 31/05/17 06:25, Jeff Widman wrote:
>>
>> I'm hugely in favor of this change as well...
>>
>> Although I actually find the Github pull request emails less useful than
>> the jirabot ones since Jira typically has more info when I'm trying to
>> figure out if the issue is relevant to me or not...
>>
>> On Tue, May 30, 2017 at 2:28 PM, Guozhang Wang  
>>  wrote:
>>
>>
>> I actually do not know.. Maybe Jun knows better than me?
>>
>>
>> Guozhang
>>
>> On Mon, May 29, 2017 at 12:58 PM, Gwen Shapira  
>>  wrote:
>>
>>
>> I agree.
>>
>> Guozhang, do you know how to implement the suggestion? JIRA to Apache
>> Infra? Or is this something we can do ourselves somehow?
>>
>> On Mon, May 29, 2017 at 9:33 PM Guozhang Wang  
>> 
>>
>> wrote:
>>
>> I share your pains. Right now I use filters on my email accounts and it
>>
>> has
>>
>> been down to about 25 per day.
>>
>> I think setup a separate mailing list for jirabot and jenkins auto
>> generated emails is a good idea.
>>
>>
>> Guozhang
>>
>>
>> On Mon, May 29, 2017 at 12:58 AM,  
>>  wrote:
>>
>>
>> Hello everyone
>>
>> I find it hard to follow this mailinglist due to all the mails
>>
>> generated
>>
>> by Jira. Just over this weekend there are 240 new mails.
>> Would it be possible to setup something like j...@kafka.apache.org
>>
>> where
>>
>> everyone can subscribe interested in those Jira mails?
>>
>> Right now I am going to setup a filter which just deletes the
>>
>> jira-tagged
>>
>> mails, but I think the current setup also makes it hard to read
>>
>> through
>>
>> the archives.
>>
>> regards
>> Marc
>>
>> --
>> -- Guozhang
>>
>>
>> --
>> -- Guozhang
>>
>>
>>
>> --
>> <http://www.openbet.com/> Michal Borowiecki
>> Senior Software Engineer L4
>> T: +44 208 742 1600 <+44%2020%208742%201600>
>>
>>
>> +44 203 249 8448 <+44%2020%203249%208448>
>>
>>
>>
>> E: michal.borowie...@openbet.com
>> W: www.openbet.com
>> OpenBet Ltd
>>
>> Chiswick Park Building 9
>>
>> 566 Chiswick High Rd
>>
>> London
>>
>> W4 5XT
>>
>> UK
>> <https://www.openbet.com/email_promo>
>> This message is confidential and intended only for the addressee. If you
>> have received this message in error, please immediately notify the
>> postmas...@openbet.com and delete it from your system as well as any
>> copies. The content of e-mails as well as traffic data may be monitored by
>> OpenBet for employment and security purposes. To protect the environment
>> please do not print this e-mail unless necessary. OpenBet Ltd. Registered
>> Office: Chiswick Park Building 9, 566 Chiswick High Road, London, W4 5XT,
>> United Kingdom. A company registered in England and Wales. Registered no.
>> 3134634. VAT no. GB927523612
>>
>
>
>
> --
> -- Guozhang
>


[jira] [Updated] (KAFKA-5381) ERROR Uncaught exception in scheduled task 'delete-expired-consumer-offsets' (kafka.utils.KafkaScheduler)

2017-06-06 Thread Jeff Widman (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Widman updated KAFKA-5381:
---
Description: 
We have a 6 node cluster of 0.10.0.1 brokers. Broker 4 had a hardware problem, 
so we re-assigned all its partitions to other brokers. We immediately started 
observing the error described in KAFKA-4362 from several of our consumers.

However, on broker 6, we also started seeing the following exceptions in 
{{KafkaScheduler}} which have a somewhat similar-looking traceback:
{code}
[2017-06-03 17:23:57,926] ERROR Uncaught exception in scheduled task 
'delete-expired-consumer-offsets' (kafka.utils.KafkaScheduler)
java.lang.IllegalArgumentException: Message format version for partition 50 not 
found
at 
kafka.coordinator.GroupMetadataManager$$anonfun$14.apply(GroupMetadataManager.scala:633)
at 
kafka.coordinator.GroupMetadataManager$$anonfun$14.apply(GroupMetadataManager.scala:633)
at scala.Option.getOrElse(Option.scala:121)
at 
kafka.coordinator.GroupMetadataManager.kafka$coordinator$GroupMetadataManager$$getMessageFormatVersionAndTimestamp(GroupMetadataManager.scala:632)
at 
kafka.coordinator.GroupMetadataManager$$anonfun$2$$anonfun$10.apply(GroupMetadataManager.scala:560)
at 
kafka.coordinator.GroupMetadataManager$$anonfun$2$$anonfun$10.apply(GroupMetadataManager.scala:551)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:285)
at 
kafka.coordinator.GroupMetadataManager$$anonfun$2.apply$mcI$sp(GroupMetadataManager.scala:551)
at 
kafka.coordinator.GroupMetadataManager$$anonfun$2.apply(GroupMetadataManager.scala:543)
at 
kafka.coordinator.GroupMetadataManager$$anonfun$2.apply(GroupMetadataManager.scala:543)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:231)
at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:239)
at 
kafka.coordinator.GroupMetadataManager.kafka$coordinator$GroupMetadataManager$$deleteExpiredOffsets(GroupMetadataManager.scala:543)
at 
kafka.coordinator.GroupMetadataManager$$anonfun$1.apply$mcV$sp(GroupMetadataManager.scala:87)
at 
kafka.utils.KafkaScheduler$$anonfun$1.apply$mcV$sp(KafkaScheduler.scala:110)
at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:56)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
{code}

Unsurprisingly, the error disappeared once {{offsets.retention.minutes}} passed.

This appears to be similar root cause to KAFKA-4362 where 
{{GroupMetadataManager.getMessageFormatVersionAndTimestamp}} is throwing the 
error due to the offset partition being moved, but I'm unclear whether the fix 
for that version also fixed the {{KafkaScheduler}} or if more work needs to be 
done here.

We did the partition re-assignment by using the 
{{kafka-reassign-partitions.sh}} script and giving it the five healthy brokers. 
From my understanding, this would have randomly re-assigned all partitions (I 
don't think its sticky), so probably at least one partition from the 
{{__consumer_offsets}} topic was removed from broker 6. However, if that was 
the case, I would have expected all brokers to have had these partitions 
removed and be throwing this error. But our logging infrastructure shows that 
this error was only happening on broker 6, not on the other brokers. Not sure 
why that is.

  was:
We have a 6 node cluster of 0.10.0.1 brokers. Broker 4 had a hardware problem, 
so we re-assigned all its partitions to other brokers. We immediately started 
observing the error described in KAFKA-4362 from several of our consumers.

However, on broker 6, we also started seeing the following exceptions in 
{{KafkaScheduler}} which have a somewhat similar-looking traceback:
{code}
[2017-06-03 17:23:57,926] ERROR Uncaught exception in scheduled task 
'delete-expired-consumer-offsets' (kafka.utils.KafkaScheduler)
java.lang.IllegalArgumentException: Message format version for partition 50 not 
found
at 
kafka.co

[jira] [Updated] (KAFKA-5381) ERROR Uncaught exception in scheduled task 'delete-expired-consumer-offsets' (kafka.utils.KafkaScheduler)

2017-06-05 Thread Jeff Widman (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Widman updated KAFKA-5381:
---
Description: 
We have a 6 node cluster of 0.10.0.1 brokers. Broker 4 had a hardware problem, 
so we re-assigned all its partitions to other brokers. We immediately started 
observing the error described in KAFKA-4362 from several of our consumers.

However, on broker 6, we also started seeing the following exceptions in 
{{KafkaScheduler}} which have a somewhat similar-looking traceback:
{code}
[2017-06-03 17:23:57,926] ERROR Uncaught exception in scheduled task 
'delete-expired-consumer-offsets' (kafka.utils.KafkaScheduler)
java.lang.IllegalArgumentException: Message format version for partition 50 not 
found
at 
kafka.coordinator.GroupMetadataManager$$anonfun$14.apply(GroupMetadataManager.scala:633)
at 
kafka.coordinator.GroupMetadataManager$$anonfun$14.apply(GroupMetadataManager.scala:633)
at scala.Option.getOrElse(Option.scala:121)
at 
kafka.coordinator.GroupMetadataManager.kafka$coordinator$GroupMetadataManager$$getMessageFormatVersionAndTimestamp(GroupMetadataManager.scala:632)
at 
kafka.coordinator.GroupMetadataManager$$anonfun$2$$anonfun$10.apply(GroupMetadataManager.scala:560)
at 
kafka.coordinator.GroupMetadataManager$$anonfun$2$$anonfun$10.apply(GroupMetadataManager.scala:551)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:285)
at 
kafka.coordinator.GroupMetadataManager$$anonfun$2.apply$mcI$sp(GroupMetadataManager.scala:551)
at 
kafka.coordinator.GroupMetadataManager$$anonfun$2.apply(GroupMetadataManager.scala:543)
at 
kafka.coordinator.GroupMetadataManager$$anonfun$2.apply(GroupMetadataManager.scala:543)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:231)
at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:239)
at 
kafka.coordinator.GroupMetadataManager.kafka$coordinator$GroupMetadataManager$$deleteExpiredOffsets(GroupMetadataManager.scala:543)
at 
kafka.coordinator.GroupMetadataManager$$anonfun$1.apply$mcV$sp(GroupMetadataManager.scala:87)
at 
kafka.utils.KafkaScheduler$$anonfun$1.apply$mcV$sp(KafkaScheduler.scala:110)
at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:56)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
{code}

Unsurprisingly, the error disappeared once {{offsets.retention.minutes}} passed.

This appears to be similar root cause to KAFKA-4362 where 
{{GroupMetadataManager.getMessageFormatVersionAndTimestamp}} is throwing the 
error due to the offset partition being moved, but I'm unclear whether the fix 
for that version also fixed the {{KafkaScheduler}} or if more work needs to be 
done here.

When the partition re-assignment was done, I do not know how this was executed 
by our Ops team. I suspect they used the {{kafka-reassign-partitions.sh}} 
script and just gave it five brokers rather than six. From my understanding, 
this would have randomly re-assigned all partitions (I don't think its sticky), 
so probably at least one partition from the {{__consumer_offsets}} topic was 
removed from broker 6. However, if that was the case, I would have expected all 
brokers to have had these partitions removed and be throwing this error. But 
our logging infrastructure shows that this error was only happening on broker 
6, not on the other brokers. Not sure why that is.

  was:
We have a 6 node cluster of 0.10.0.1 brokers. Broker 4 had a hardware problem, 
so we re-assigned all its partitions to other brokers. We immediately started 
observing the error described in KAFKA-4362 from several of our consumers.

However, on broker 6, we also started seeing the following exceptions in 
{{KafkaScheduler}} which have a somewhat similar-looking traceback:
{code}
[2017-06-03 17:23:57,926] ERROR Uncaught exception in scheduled task 
'delete-expired-consumer-offsets' (kafka.utils.KafkaScheduler)
java.lang.IllegalArgumentExc

[jira] [Updated] (KAFKA-5381) ERROR Uncaught exception in scheduled task 'delete-expired-consumer-offsets' (kafka.utils.KafkaScheduler)

2017-06-05 Thread Jeff Widman (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Widman updated KAFKA-5381:
---
Description: 
We have a 6 node cluster of 0.10.0.1 brokers. Broker 4 had a hardware problem, 
so we re-assigned all its partitions to other brokers. We immediately started 
observing the error described in KAFKA-4362 from several of our consumers.

However, on broker 6, we also started seeing the following exceptions in 
{{KafkaScheduler}} which have a somewhat similar-looking traceback:
{code}
[2017-06-03 17:23:57,926] ERROR Uncaught exception in scheduled task 
'delete-expired-consumer-offsets' (kafka.utils.KafkaScheduler)
java.lang.IllegalArgumentException: Message format version for partition 50 not 
found
at 
kafka.coordinator.GroupMetadataManager$$anonfun$14.apply(GroupMetadataManager.scala:633)
at 
kafka.coordinator.GroupMetadataManager$$anonfun$14.apply(GroupMetadataManager.scala:633)
at scala.Option.getOrElse(Option.scala:121)
at 
kafka.coordinator.GroupMetadataManager.kafka$coordinator$GroupMetadataManager$$getMessageFormatVersionAndTimestamp(GroupMetadataManager.scala:632)
at 
kafka.coordinator.GroupMetadataManager$$anonfun$2$$anonfun$10.apply(GroupMetadataManager.scala:560)
at 
kafka.coordinator.GroupMetadataManager$$anonfun$2$$anonfun$10.apply(GroupMetadataManager.scala:551)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:285)
at 
kafka.coordinator.GroupMetadataManager$$anonfun$2.apply$mcI$sp(GroupMetadataManager.scala:551)
at 
kafka.coordinator.GroupMetadataManager$$anonfun$2.apply(GroupMetadataManager.scala:543)
at 
kafka.coordinator.GroupMetadataManager$$anonfun$2.apply(GroupMetadataManager.scala:543)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:231)
at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:239)
at 
kafka.coordinator.GroupMetadataManager.kafka$coordinator$GroupMetadataManager$$deleteExpiredOffsets(GroupMetadataManager.scala:543)
at 
kafka.coordinator.GroupMetadataManager$$anonfun$1.apply$mcV$sp(GroupMetadataManager.scala:87)
at 
kafka.utils.KafkaScheduler$$anonfun$1.apply$mcV$sp(KafkaScheduler.scala:110)
at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:56)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
{code}

This appears to be similar root cause to KAFKA-4362 where 
{{GroupMetadataManager.getMessageFormatVersionAndTimestamp}} is throwing the 
error due to the offset partition being moved, but I'm unclear whether the fix 
for that version also fixed the {{KafkaScheduler}} or if more work needs to be 
done here.

When the partition re-assignment was done, I do not know how this was executed 
by our Ops team. I suspect they used the {{kafka-reassign-partitions.sh}} 
script and just gave it five brokers rather than six. From my understanding, 
this would have randomly re-assigned all partitions (I don't think its sticky), 
so probably at least one partition from the {{__consumer_offsets}} topic was 
removed from broker 6. However, if that was the case, I would have expected all 
brokers to have had these partitions removed and be throwing this error. But 
our logging infrastructure shows that this error was only happening on broker 
6, not on the other brokers. Not sure why that is.

  was:
We have a 6 node cluster of 0.10.0.1 brokers. Broker 4 had a hardware problem, 
so we re-assigned all its partitions to other brokers. We immediately started 
observing the error described in KAFKA-4362 from several of our consumers.

However, on broker 6, we also started seeing the following exceptions in 
{{KafkaScheduler}} which have a somewhat similar-looking traceback:
{code}
[2017-06-03 17:23:57,926] ERROR Uncaught exception in scheduled task 
'delete-expired-consumer-offsets' (kafka.utils.KafkaScheduler)
java.lang.IllegalArgumentException: Message format version for partition 50 not 
found
at 
kafka.co

[jira] [Created] (KAFKA-5381) ERROR Uncaught exception in scheduled task 'delete-expired-consumer-offsets' (kafka.utils.KafkaScheduler)

2017-06-05 Thread Jeff Widman (JIRA)
Jeff Widman created KAFKA-5381:
--

 Summary: ERROR Uncaught exception in scheduled task 
'delete-expired-consumer-offsets' (kafka.utils.KafkaScheduler)
 Key: KAFKA-5381
 URL: https://issues.apache.org/jira/browse/KAFKA-5381
 Project: Kafka
  Issue Type: Bug
Reporter: Jeff Widman


We have a 6 node cluster of 0.10.0.1 brokers. Broker 4 had a hardware problem, 
so we re-assigned all its partitions to other brokers. We immediately started 
observing the error described in KAFKA-4362 from several of our consumers.

However, on broker 6, we also started seeing the following exceptions in 
{{KafkaScheduler}} which have a somewhat similar-looking traceback:
{code}
[2017-06-03 17:23:57,926] ERROR Uncaught exception in scheduled task 
'delete-expired-consumer-offsets' (kafka.utils.KafkaScheduler)
java.lang.IllegalArgumentException: Message format version for partition 50 not 
found
at 
kafka.coordinator.GroupMetadataManager$$anonfun$14.apply(GroupMetadataManager.scala:633)
at 
kafka.coordinator.GroupMetadataManager$$anonfun$14.apply(GroupMetadataManager.scala:633)
at scala.Option.getOrElse(Option.scala:121)
at 
kafka.coordinator.GroupMetadataManager.kafka$coordinator$GroupMetadataManager$$getMessageFormatVersionAndTimestamp(GroupMetadataManager.scala:632)
at 
kafka.coordinator.GroupMetadataManager$$anonfun$2$$anonfun$10.apply(GroupMetadataManager.scala:560)
at 
kafka.coordinator.GroupMetadataManager$$anonfun$2$$anonfun$10.apply(GroupMetadataManager.scala:551)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:285)
at 
kafka.coordinator.GroupMetadataManager$$anonfun$2.apply$mcI$sp(GroupMetadataManager.scala:551)
at 
kafka.coordinator.GroupMetadataManager$$anonfun$2.apply(GroupMetadataManager.scala:543)
at 
kafka.coordinator.GroupMetadataManager$$anonfun$2.apply(GroupMetadataManager.scala:543)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:231)
at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:239)
at 
kafka.coordinator.GroupMetadataManager.kafka$coordinator$GroupMetadataManager$$deleteExpiredOffsets(GroupMetadataManager.scala:543)
at 
kafka.coordinator.GroupMetadataManager$$anonfun$1.apply$mcV$sp(GroupMetadataManager.scala:87)
at 
kafka.utils.KafkaScheduler$$anonfun$1.apply$mcV$sp(KafkaScheduler.scala:110)
at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:56)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
{code}

This appears to be similar root cause to KAFKA-4362 where 
{{GroupMetadataManager.getMessageFormatVersionAndTimestamp}} is throwing the 
error due to the offset partition being moved, but I'm unclear whether the fix 
for that version also fixed the {{KafkaScheduler}} or if more work needs to be 
done here.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-3356) Remove ConsumerOffsetChecker, deprecated in 0.9, in 0.11

2017-05-31 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030804#comment-16030804
 ] 

Jeff Widman commented on KAFKA-3356:


Is there anyway to get this to land today?

This is literally just deleting a shell script and a scala file, and there's 
already a PR ready to go.

Normally I wouldn't care if this were an internal cleanup, but since it exposes 
the shell script, I routinely get questions from folks who don't realize they 
shouldn't be using it. 

So I'd rather it get cleaned up as it improves the new user experience.

> Remove ConsumerOffsetChecker, deprecated in 0.9, in 0.11
> 
>
> Key: KAFKA-3356
> URL: https://issues.apache.org/jira/browse/KAFKA-3356
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 0.10.2.0
>Reporter: Ashish Singh
>Assignee: Mickael Maison
>Priority: Blocker
> Fix For: 0.12.0.0
>
>
> ConsumerOffsetChecker is marked deprecated as of 0.9, should be removed in 
> 0.11.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: Jira-Spam on Dev-Mailinglist

2017-05-30 Thread Jeff Widman
I'm hugely in favor of this change as well...

Although I actually find the Github pull request emails less useful than
the jirabot ones since Jira typically has more info when I'm trying to
figure out if the issue is relevant to me or not...

On Tue, May 30, 2017 at 2:28 PM, Guozhang Wang  wrote:

> I actually do not know.. Maybe Jun knows better than me?
>
>
> Guozhang
>
> On Mon, May 29, 2017 at 12:58 PM, Gwen Shapira  wrote:
>
> > I agree.
> >
> > Guozhang, do you know how to implement the suggestion? JIRA to Apache
> > Infra? Or is this something we can do ourselves somehow?
> >
> > On Mon, May 29, 2017 at 9:33 PM Guozhang Wang 
> wrote:
> >
> > > I share your pains. Right now I use filters on my email accounts and it
> > has
> > > been down to about 25 per day.
> > >
> > > I think setup a separate mailing list for jirabot and jenkins auto
> > > generated emails is a good idea.
> > >
> > >
> > > Guozhang
> > >
> > >
> > > On Mon, May 29, 2017 at 12:58 AM,  wrote:
> > >
> > > > Hello everyone
> > > >
> > > > I find it hard to follow this mailinglist due to all the mails
> > generated
> > > > by Jira. Just over this weekend there are 240 new mails.
> > > > Would it be possible to setup something like j...@kafka.apache.org
> > where
> > > > everyone can subscribe interested in those Jira mails?
> > > >
> > > > Right now I am going to setup a filter which just deletes the
> > jira-tagged
> > > > mails, but I think the current setup also makes it hard to read
> through
> > > > the archives.
> > > >
> > > > regards
> > > > Marc
> > >
> > >
> > >
> > >
> > > --
> > > -- Guozhang
> > >
> >
>
>
>
> --
> -- Guozhang
>


[jira] [Commented] (KAFKA-4362) Consumer can fail after reassignment of the offsets topic partition

2017-05-25 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16024997#comment-16024997
 ] 

Jeff Widman commented on KAFKA-4362:


Updated the applicable version as we just encountered this on a {{0.10.0.1}} 
cluster.

> Consumer can fail after reassignment of the offsets topic partition
> ---
>
> Key: KAFKA-4362
> URL: https://issues.apache.org/jira/browse/KAFKA-4362
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.1, 0.10.1.0
>Reporter: Joel Koshy
>Assignee: Mayuresh Gharat
> Fix For: 0.10.1.1
>
>
> When a consumer offsets topic partition reassignment completes, an offset 
> commit shows this:
> {code}
> java.lang.IllegalArgumentException: Message format version for partition 100 
> not found
> at 
> kafka.coordinator.GroupMetadataManager$$anonfun$14.apply(GroupMetadataManager.scala:633)
>  ~[kafka_2.10.jar:?]
> at 
> kafka.coordinator.GroupMetadataManager$$anonfun$14.apply(GroupMetadataManager.scala:633)
>  ~[kafka_2.10.jar:?]
> at scala.Option.getOrElse(Option.scala:120) ~[scala-library-2.10.4.jar:?]
> at 
> kafka.coordinator.GroupMetadataManager.kafka$coordinator$GroupMetadataManager$$getMessageFormatVersionAndTimestamp(GroupMetadataManager.scala:632)
>  ~[kafka_2.10.jar:?]
> at 
> ...
> {code}
> The issue is that the replica has been deleted so the 
> {{GroupMetadataManager.getMessageFormatVersionAndTimestamp}} throws this 
> exception instead which propagates as an unknown error.
> Unfortunately consumers don't respond to this and will fail their offset 
> commits.
> One workaround in the above situation is to bounce the cluster - the consumer 
> will be forced to rediscover the group coordinator.
> (Incidentally, the message incorrectly prints the number of partitions 
> instead of the actual partition.)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-4362) Consumer can fail after reassignment of the offsets topic partition

2017-05-25 Thread Jeff Widman (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Widman updated KAFKA-4362:
---
Affects Version/s: 0.10.0.1

> Consumer can fail after reassignment of the offsets topic partition
> ---
>
> Key: KAFKA-4362
> URL: https://issues.apache.org/jira/browse/KAFKA-4362
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.1, 0.10.1.0
>Reporter: Joel Koshy
>Assignee: Mayuresh Gharat
> Fix For: 0.10.1.1
>
>
> When a consumer offsets topic partition reassignment completes, an offset 
> commit shows this:
> {code}
> java.lang.IllegalArgumentException: Message format version for partition 100 
> not found
> at 
> kafka.coordinator.GroupMetadataManager$$anonfun$14.apply(GroupMetadataManager.scala:633)
>  ~[kafka_2.10.jar:?]
> at 
> kafka.coordinator.GroupMetadataManager$$anonfun$14.apply(GroupMetadataManager.scala:633)
>  ~[kafka_2.10.jar:?]
> at scala.Option.getOrElse(Option.scala:120) ~[scala-library-2.10.4.jar:?]
> at 
> kafka.coordinator.GroupMetadataManager.kafka$coordinator$GroupMetadataManager$$getMessageFormatVersionAndTimestamp(GroupMetadataManager.scala:632)
>  ~[kafka_2.10.jar:?]
> at 
> ...
> {code}
> The issue is that the replica has been deleted so the 
> {{GroupMetadataManager.getMessageFormatVersionAndTimestamp}} throws this 
> exception instead which propagates as an unknown error.
> Unfortunately consumers don't respond to this and will fail their offset 
> commits.
> One workaround in the above situation is to bounce the cluster - the consumer 
> will be forced to rediscover the group coordinator.
> (Incidentally, the message incorrectly prints the number of partitions 
> instead of the actual partition.)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: Pull-Requests not worked on

2017-05-23 Thread Jeff Widman
I agree with this. As a new contributor, it was a bit demoralizing to look
at all the open PR's and wonder whether when I sent a patch it would just
be left to sit in the ether.

In other projects I'm involved with, more typically the maintainers go
through periodically and close old PR's that will never be merged. I know
at this point it's an intimidating amount of work, but I still think it'd
be useful to cut down this backlog.

Maybe at the SF Kafka summit sprint have a group that does this? It's a
decent task for n00bs who want to help but don't know where to start to ask
them to help identify PR's that are ancient and should be closed as they
will never be merged.

On Tue, May 23, 2017 at 4:59 AM,  wrote:

> Hello everyone
>
> I am wondering how pull-requests are handled for Kafka? There is currently
> a huge amount of PRs on Github and most of them are not getting any
> attention.
>
> If the maintainers only have a look at PR which passed the CI (which makes
> sense due to the amount), then there is a problem, because the CI-pipeline
> is not stable. I've submitted a PR myself which adds OSGi-metadata to the
> kafka-clients artifact (see 2882). The pipeline fails randomly even though
> the change only adds some entries to the manifest.
> The next issue I have is, that people submitting PRs cannot have a look at
> the failing CI job. So with regards to my PR, I dont have a clue what went
> wrong.
> If I am missing something in the process please let me know.
> Regarding PR 2882, please consider merging because it would safe the
> osgi-community the effort of wrapping the kafka artifact and deploy it
> with different coordinates on maven central (which can confuse users)
> regards
> Marc
>


[jira] [Commented] (KAFKA-3356) Remove ConsumerOffsetChecker, deprecated in 0.9, in 0.11

2017-05-15 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16010801#comment-16010801
 ] 

Jeff Widman commented on KAFKA-3356:


Personally, I don't think we need feature parity for the old-consumers since 
those are in the process of being deprecated/removed.

> Remove ConsumerOffsetChecker, deprecated in 0.9, in 0.11
> 
>
> Key: KAFKA-3356
> URL: https://issues.apache.org/jira/browse/KAFKA-3356
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 0.10.2.0
>Reporter: Ashish Singh
>Assignee: Mickael Maison
>Priority: Blocker
> Fix For: 0.11.0.0
>
>
> ConsumerOffsetChecker is marked deprecated as of 0.9, should be removed in 
> 0.11.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [DISCUSS] KIP-142: Add ListTopicsRequest to efficiently list all the topics in a cluster

2017-04-20 Thread Jeff Widman
In theory this is nice.

However, one possibly silly question... personally I've found that most
times when I wanted to list the topics, I normally also wanted to know
additional metadata such as the leader of the topic.

So I just want to doublecheck that there are enough common use cases that
only want to know the list of topics to merit creating a new
request/response type?



On Thu, Apr 20, 2017 at 2:17 PM, Colin McCabe  wrote:

> Hi all,
>
> While working on the AdminClient, we determined that we didn't have a
> way to efficiently list all the topics in the cluster.  I wrote up a KIP
> to add a ListTopicsRequest to make this more efficient in larger
> clusters with more brokers and topics.  Please take a look at the KIP
> here:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 142%3A+Add+ListTopicsRequest+to+efficiently+list+all+the+
> topics+in+a+cluster
>
> cheers,
> Colin
>


[jira] [Commented] (KAFKA-3042) updateIsr should stop after failed several times due to zkVersion issue

2017-04-19 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15975603#comment-15975603
 ] 

Jeff Widman commented on KAFKA-3042:


[~lindong] thanks for the offer to help and sorry for the slow response. 

I'm not exactly sure how to repro, but below I copied a sanitized version of 
our internal wiki page documenting our findings as we tried to figure out what 
was happening and how we got into the state of mis-matched controller epoch for 
controller vs random partition. It's not the most polished, more of a train of 
thought put to paper as we debugged.

Reading through it, it appeared that broker 3 lost connection to zookeeper, 
then when it came back, it elected itself controller, but somehow ended up in a 
state where the broker 3 controller had a list of brokers that was completely 
empty. This doesn't make logical sense because if a broker is controller, then 
it should list itself in active brokers. But somehow it happened. Then 
following that, the active epoch for the controller is 134, but the active 
epoch listed by a random partition in zookeeper is 133. So that created the 
version mismatch. 

More details below, and I also have access to the detailed Kafka logs (but not 
ZK logs) beyond just the snippets if you need anything else. They will get 
rotated out of elasticsearch within a few months and disappear, so hopefully we 
can get to the bottom of this before that.


{code}
3 node cluster. 
Broker 1 is controller.
Zookeeper GC pause meant that broker 3 lost connection. 
When it came back, broker 3 thought it was controller, but thought there were 
no alive brokers--see the empty set referenced in the logs below. This alone 
seems incorrect because if a broker is a controller, you'd think it would 
include itself in the set.


See the following in the logs:


[2017-03-17 21:32:15,812] ERROR Controller 3 epoch 134 initiated state change 
for partition [topic_name,626] from OfflinePartition to OnlinePartition failed 
(s
tate.change.logger)
kafka.common.NoReplicaOnlineException: No replica for partition 
[topic_name,626] is alive. Live brokers are: [Set()], Assigned replicas are: 
[List(1, 3)]
at 
kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:75)
at 
kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:345)
at 
kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:205)
at 
kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:120)
at 
kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:117)
at 
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
at 
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
at 
kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:117)
at 
kafka.controller.PartitionStateMachine.startup(PartitionStateMachine.scala:70)
at 
kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:335)
at 
kafka.controller.KafkaController$$anonfun$1.apply$mcV$sp(KafkaController.scala:166)
at kafka.server.ZookeeperLeaderElector.elect(ZookeeperLeaderElector.scala:84)
at 
kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply$mcZ$sp(ZookeeperLeaderElector.scala:146)
at 
kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply(ZookeeperLeaderElector.scala:141)
at 
kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply(ZookeeperLeaderElector.scala:141)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:231)
at 
kafka.server.ZookeeperLeaderElector$LeaderChangeListener.handleDataDeleted(ZookeeperLeaderElector.scala:141)
at org.I0Itec.zkclient.ZkClient$9.run(ZkClient.java:824)
at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)

Looking at the code + error message, the controller is unaware of active 
brokers. However, there are assigned replicas. We checked the log files under 
/data/kafka and they had m_times greater than the exception timestamp, plus our 
producers and consumers seemed to be working, so the cluster is successfully 
passing data around. The controller situation is just screwed up.


[2017-03-17 21:32:43,976] ERROR Controller 3 epoch 134 ini

[jira] [Comment Edited] (KAFKA-3042) updateIsr should stop after failed several times due to zkVersion issue

2017-04-06 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15959479#comment-15959479
 ] 

Jeff Widman edited comment on KAFKA-3042 at 4/6/17 6:16 PM:


We hit this on 0.10.0.1. 

Root cause was a really long zookeeper GC pause that caused the brokers to lose 
their connection. Producers / Consumers were working successfully as they'd 
established their connections to the brokers before the zk issue, so they kept 
happily working. But the broker logs were throwing these warnings about cached 
zkVersion not matching. And anything that required controller was broken, for 
example any newly created partitions didn't have leaders. 

I don't know if this error message is thrown whenever two znodes don't match, 
but in our case the ZK GC pause had resulted in a race condition sequence where 
somehow the epoch of /controller znode did not match the partition controller 
epoch under /brokers znode. I'm not sure if it's possible to fix this, perhaps 
with the ZK multi-command where updates are transactional.

It took us a while to realize that was what the log message meant, so the log 
message could be made more specific to report exactly which znode paths don't 
match in zookeper.

For us, forcing a controller re-election by deleting the /controller znode 
immediately fixed the issue without having to restart brokers. 


was (Author: jeffwidman):
We hit this on 0.10.0.1. 

Root cause was a really long zookeeper GC pause that caused the brokers to lose 
their connection. Producers / Consumers were working successfully as they'd 
established their connections to the brokers before the zk issue, so they kept 
happily working. But the broker logs were throwing these warnings about cached 
zkVersion not matching. And anything that required controller was broken, for 
example any newly created partitions didn't have leaders. 

I think this log message could be made more specific to show which znodes don't 
match.

I don't know if this error message is thrown whenever two znodes don't match, 
but in our case the ZK GC pause resulted in a race condition sequence where 
somehow the epoch of /controller znode did not match the partition controller 
epoch under /brokers znode. I'm not sure if it's possible to fix this, perhaps 
with the ZK multi-command where updates are transactional.

It took us a while to realize that was what the log message meant, so the log 
message could be made more specific to report exactly which znode paths don't 
match in zookeper.

For us, forcing a controller re-election by deleting the /controller znode 
immediately fixed the issue without having to restart brokers. 

> updateIsr should stop after failed several times due to zkVersion issue
> ---
>
> Key: KAFKA-3042
> URL: https://issues.apache.org/jira/browse/KAFKA-3042
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 0.8.2.1
> Environment: jdk 1.7
> centos 6.4
>Reporter: Jiahongchao
>Assignee: Dong Lin
>  Labels: reliability
> Fix For: 0.11.0.0
>
> Attachments: controller.log, server.log.2016-03-23-01, 
> state-change.log
>
>
> sometimes one broker may repeatly log
> "Cached zkVersion 54 not equal to that in zookeeper, skip updating ISR"
> I think this is because the broker consider itself as the leader in fact it's 
> a follower.
> So after several failed tries, it need to find out who is the leader



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-3042) updateIsr should stop after failed several times due to zkVersion issue

2017-04-06 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15959479#comment-15959479
 ] 

Jeff Widman commented on KAFKA-3042:


We hit this on 0.10.0.1. 

Root cause was a really long zookeeper GC pause that caused the brokers to lose 
their connection. Producers / Consumers were working successfully as they'd 
established their connections to the brokers before the zk issue, so they kept 
happily working. But the broker logs were throwing these warnings about cached 
zkVersion not matching. And anything that required controller was broken, for 
example any newly created partitions didn't have leaders. 

I think this log message could be made more specific to show which znodes don't 
match.

I don't know if this error message is thrown whenever two znodes don't match, 
but in our case the ZK GC pause resulted in a race condition sequence where 
somehow the epoch of /controller znode did not match the partition controller 
epoch under /brokers znode. I'm not sure if it's possible to fix this, perhaps 
with the ZK multi-command where updates are transactional.

It took us a while to realize that was what the log message meant, so the log 
message could be made more specific to report exactly which znode paths don't 
match in zookeper.

For us, forcing a controller re-election by deleting the /controller znode 
immediately fixed the issue without having to restart brokers. 

> updateIsr should stop after failed several times due to zkVersion issue
> ---
>
> Key: KAFKA-3042
> URL: https://issues.apache.org/jira/browse/KAFKA-3042
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 0.8.2.1
> Environment: jdk 1.7
> centos 6.4
>Reporter: Jiahongchao
>Assignee: Dong Lin
>  Labels: reliability
> Fix For: 0.11.0.0
>
> Attachments: controller.log, server.log.2016-03-23-01, 
> state-change.log
>
>
> sometimes one broker may repeatly log
> "Cached zkVersion 54 not equal to that in zookeeper, skip updating ISR"
> I think this is because the broker consider itself as the leader in fact it's 
> a follower.
> So after several failed tries, it need to find out who is the leader



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: any production deployment of kafka 0.10.2.0

2017-03-26 Thread Jeff Widman
+ Users list


On Mar 26, 2017 8:17 AM, "Jianbin Wei"  wrote:

We are thinking about upgrading our system to 0.10.2.0.  Has anybody
upgraded his/her system to 0.10.2.0 and any issues?

Regards,

-- Jianbin


[jira] [Commented] (KAFKA-4858) Long topic names created using old kafka-topics.sh can prevent newer brokers from joining any ISRs

2017-03-16 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15929421#comment-15929421
 ] 

Jeff Widman commented on KAFKA-4858:


I'd say if you patch it so that replication doesn't break, that's good enough 
for me. I don't see the point in spending a lot of time fixing support for old 
versions beyond that.

The better fix will be to update shell scripts etc to use the broker-side API 
so all broker-side checks are hit before anything else happens or anything is 
created in Zookeeper. In that case, so long as the CreateTopic api throws a 
descriptive error, then I think the user experience is fine.

> Long topic names created using old kafka-topics.sh can prevent newer brokers 
> from joining any ISRs
> --
>
> Key: KAFKA-4858
> URL: https://issues.apache.org/jira/browse/KAFKA-4858
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.1.1, 0.10.2.0
>Reporter: James Cheng
>Assignee: Vahid Hashemian
>
> I ran into a variant of KAFKA-3219 that resulted in a broker being unable to 
> join any ISRs the cluster.
> Prior to 0.10.0.0, the maximum topic length was 255.
> With 0.10.0.0 and beyond, the maximum topic length is 249.
> The check on topic name length is done by kafka-topics.sh prior to topic 
> creation. Thus, it is possible to use a 0.9.0.1 kafka-topics.sh script to 
> create a 255 character topic on a 0.10.1.1 broker.
> When this happens, you will get the following stack trace (the same one seen 
> in KAFKA-3219)
> {code}
> $ TOPIC=$(printf 'd%.0s' {1..255} ) ; bin/kafka-topics.sh --zookeeper 
> 127.0.0.1 --create --topic $TOPIC --partitions 1 --replication-factor 2
> Created topic 
> "ddd".
> {code}
> {code}
> [2017-03-06 22:01:19,011] ERROR [KafkaApi-2] Error when handling request 
> {controller_id=1,controller_epoch=1,partition_states=[{topic=ddd,partition=0,controller_epoch=1,leader=2,leader_epoch=0,isr=[2,1],zk_version=0,replicas=[2,1]}],live_leaders=[{id=2,host=jchengmbpro15,port=9093}]}
>  (kafka.server.KafkaApis)
> java.lang.NullPointerException
>   at 
> scala.collection.mutable.ArrayOps$ofRef$.length$extension(ArrayOps.scala:192)
>   at scala.collection.mutable.ArrayOps$ofRef.length(ArrayOps.scala:192)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:32)
>   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
>   at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
>   at kafka.log.Log.loadSegments(Log.scala:155)
>   at kafka.log.Log.(Log.scala:108)
>   at kafka.log.LogManager.createLog(LogManager.scala:362)
>   at kafka.cluster.Partition.getOrCreateReplica(Partition.scala:94)
>   at 
> kafka.cluster.Partition$$anonfun$4$$anonfun$apply$2.apply(Partition.scala:174)
>   at 
> kafka.cluster.Partition$$anonfun$4$$anonfun$apply$2.apply(Partition.scala:174)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
>   at kafka.cluster.Partition$$anonfun$4.apply(Partition.scala:174)
>   at kafka.cluster.Partition$$anonfun$4.apply(Partition.scala:168)
>   at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
>   at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:242)
>   at kafka.cluster.Partition.makeLeader(Partition.scala:168)
>   at 
> kafka.server.ReplicaManager$$anonfun$makeLeaders$4.apply(ReplicaManager.scala:758)
>   at 
> kafka.server.ReplicaManager$$anonfun$makeLeaders$4.apply(ReplicaManager.scala:757)
>   at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
>   at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
>   at 
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
>   at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
>   at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
>   at kafka.server.ReplicaManager.makeLeaders(ReplicaManager.scala:757

[jira] [Commented] (KAFKA-4858) Long topic names created using old kafka-topics.sh can prevent newer brokers from joining any ISRs

2017-03-10 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15905917#comment-15905917
 ] 

Jeff Widman commented on KAFKA-4858:


I saw the PR ignores those topics... is this a silent failure?

Like will users be confused why their topic creation appears to succeed in the 
shell script but fail in the broker?

Not sure there's a good solution if someone is using 0.9 shell script, but for 
the CreateTopic API I hope it throws a clear error message if topic creation 
fails due to this new broker-side check.

> Long topic names created using old kafka-topics.sh can prevent newer brokers 
> from joining any ISRs
> --
>
> Key: KAFKA-4858
> URL: https://issues.apache.org/jira/browse/KAFKA-4858
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.1.1, 0.10.2.0
>Reporter: James Cheng
>Assignee: Vahid Hashemian
>
> I ran into a variant of KAFKA-3219 that resulted in a broker being unable to 
> join any ISRs the cluster.
> Prior to 0.10.0.0, the maximum topic length was 255.
> With 0.10.0.0 and beyond, the maximum topic length is 249.
> The check on topic name length is done by kafka-topics.sh prior to topic 
> creation. Thus, it is possible to use a 0.9.0.1 kafka-topics.sh script to 
> create a 255 character topic on a 0.10.1.1 broker.
> When this happens, you will get the following stack trace (the same one seen 
> in KAFKA-3219)
> {code}
> $ TOPIC=$(printf 'd%.0s' {1..255} ) ; bin/kafka-topics.sh --zookeeper 
> 127.0.0.1 --create --topic $TOPIC --partitions 1 --replication-factor 2
> Created topic 
> "ddd".
> {code}
> {code}
> [2017-03-06 22:01:19,011] ERROR [KafkaApi-2] Error when handling request 
> {controller_id=1,controller_epoch=1,partition_states=[{topic=ddd,partition=0,controller_epoch=1,leader=2,leader_epoch=0,isr=[2,1],zk_version=0,replicas=[2,1]}],live_leaders=[{id=2,host=jchengmbpro15,port=9093}]}
>  (kafka.server.KafkaApis)
> java.lang.NullPointerException
>   at 
> scala.collection.mutable.ArrayOps$ofRef$.length$extension(ArrayOps.scala:192)
>   at scala.collection.mutable.ArrayOps$ofRef.length(ArrayOps.scala:192)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:32)
>   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
>   at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
>   at kafka.log.Log.loadSegments(Log.scala:155)
>   at kafka.log.Log.(Log.scala:108)
>   at kafka.log.LogManager.createLog(LogManager.scala:362)
>   at kafka.cluster.Partition.getOrCreateReplica(Partition.scala:94)
>   at 
> kafka.cluster.Partition$$anonfun$4$$anonfun$apply$2.apply(Partition.scala:174)
>   at 
> kafka.cluster.Partition$$anonfun$4$$anonfun$apply$2.apply(Partition.scala:174)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
>   at kafka.cluster.Partition$$anonfun$4.apply(Partition.scala:174)
>   at kafka.cluster.Partition$$anonfun$4.apply(Partition.scala:168)
>   at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
>   at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:242)
>   at kafka.cluster.Partition.makeLeader(Partition.scala:168)
>   at 
> kafka.server.ReplicaManager$$anonfun$makeLeaders$4.apply(ReplicaManager.scala:758)
>   at 
> kafka.server.ReplicaManager$$anonfun$makeLeaders$4.apply(ReplicaManager.scala:757)
>   at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
>   at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
>   at 
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
>   at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
>   at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
>   at kafka.server.ReplicaManager.makeLeaders(ReplicaManager.scala:757)
>   at 
> kafka.server.ReplicaManager.becomeLeaderOrFollower(ReplicaManager.scala:703)
>   at k

[jira] [Commented] (KAFKA-4858) Long topic names created using old kafka-topics.sh can prevent newer brokers from joining any ISRs

2017-03-09 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15903928#comment-15903928
 ] 

Jeff Widman commented on KAFKA-4858:


My understanding was that the general expectation was that you should only use 
Kafka shell scripts whose version matches the broker version because many (all) 
of them use internal Java/scala code that may break between releases. So 
normally I'd say this isn't worth worrying about fixing.

However, adding a check for this on the broker would be useful because couldn't 
this be hit when calling the new CreateTopic API call? 

I could easily see this happening in a third-party/non-java client, if they 
didn't realize they needed to add a check for this before sending the API call.

> Long topic names created using old kafka-topics.sh can prevent newer brokers 
> from joining any ISRs
> --
>
> Key: KAFKA-4858
> URL: https://issues.apache.org/jira/browse/KAFKA-4858
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.1.1, 0.10.2.0
>Reporter: James Cheng
>Assignee: Vahid Hashemian
>
> I ran into a variant of KAFKA-3219 that resulted in a broker being unable to 
> join any ISRs the cluster.
> Prior to 0.10.0.0, the maximum topic length was 255.
> With 0.10.0.0 and beyond, the maximum topic length is 249.
> The check on topic name length is done by kafka-topics.sh prior to topic 
> creation. Thus, it is possible to use a 0.9.0.1 kafka-topics.sh script to 
> create a 255 character topic on a 0.10.1.1 broker.
> When this happens, you will get the following stack trace (the same one seen 
> in KAFKA-3219)
> {code}
> $ TOPIC=$(printf 'd%.0s' {1..255} ) ; bin/kafka-topics.sh --zookeeper 
> 127.0.0.1 --create --topic $TOPIC --partitions 1 --replication-factor 2
> Created topic 
> "ddd".
> {code}
> {code}
> [2017-03-06 22:01:19,011] ERROR [KafkaApi-2] Error when handling request 
> {controller_id=1,controller_epoch=1,partition_states=[{topic=ddd,partition=0,controller_epoch=1,leader=2,leader_epoch=0,isr=[2,1],zk_version=0,replicas=[2,1]}],live_leaders=[{id=2,host=jchengmbpro15,port=9093}]}
>  (kafka.server.KafkaApis)
> java.lang.NullPointerException
>   at 
> scala.collection.mutable.ArrayOps$ofRef$.length$extension(ArrayOps.scala:192)
>   at scala.collection.mutable.ArrayOps$ofRef.length(ArrayOps.scala:192)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:32)
>   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
>   at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
>   at kafka.log.Log.loadSegments(Log.scala:155)
>   at kafka.log.Log.(Log.scala:108)
>   at kafka.log.LogManager.createLog(LogManager.scala:362)
>   at kafka.cluster.Partition.getOrCreateReplica(Partition.scala:94)
>   at 
> kafka.cluster.Partition$$anonfun$4$$anonfun$apply$2.apply(Partition.scala:174)
>   at 
> kafka.cluster.Partition$$anonfun$4$$anonfun$apply$2.apply(Partition.scala:174)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
>   at kafka.cluster.Partition$$anonfun$4.apply(Partition.scala:174)
>   at kafka.cluster.Partition$$anonfun$4.apply(Partition.scala:168)
>   at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
>   at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:242)
>   at kafka.cluster.Partition.makeLeader(Partition.scala:168)
>   at 
> kafka.server.ReplicaManager$$anonfun$makeLeaders$4.apply(ReplicaManager.scala:758)
>   at 
> kafka.server.ReplicaManager$$anonfun$makeLeaders$4.apply(ReplicaManager.scala:757)
>   at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
>   at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
>   at 
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
>   at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
> 

[jira] [Updated] (KAFKA-2273) KIP-54: Add rebalance with a minimal number of reassignments to server-defined strategy list

2017-03-06 Thread Jeff Widman (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Widman updated KAFKA-2273:
---
Fix Version/s: 0.11.0.0

> KIP-54: Add rebalance with a minimal number of reassignments to 
> server-defined strategy list
> 
>
> Key: KAFKA-2273
> URL: https://issues.apache.org/jira/browse/KAFKA-2273
> Project: Kafka
>  Issue Type: Sub-task
>  Components: consumer
>Reporter: Olof Johansson
>Assignee: Vahid Hashemian
>  Labels: kip
> Fix For: 0.11.0.0
>
>
> Add a new partitions.assignment.strategy to the server-defined list that will 
> do reassignments based on moving as few partitions as possible. This should 
> be a quite common reassignment strategy especially for the cases where the 
> consumer has to maintain state, either in memory, or on disk.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-2273) KIP-54: Add rebalance with a minimal number of reassignments to server-defined strategy list

2017-03-06 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15898678#comment-15898678
 ] 

Jeff Widman commented on KAFKA-2273:


This KIP was accepted, so I added it to the 0.11 milestone

> KIP-54: Add rebalance with a minimal number of reassignments to 
> server-defined strategy list
> 
>
> Key: KAFKA-2273
> URL: https://issues.apache.org/jira/browse/KAFKA-2273
> Project: Kafka
>  Issue Type: Sub-task
>  Components: consumer
>Reporter: Olof Johansson
>Assignee: Vahid Hashemian
>  Labels: kip
> Fix For: 0.11.0.0
>
>
> Add a new partitions.assignment.strategy to the server-defined list that will 
> do reassignments based on moving as few partitions as possible. This should 
> be a quite common reassignment strategy especially for the cases where the 
> consumer has to maintain state, either in memory, or on disk.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-4790) Kafka cannot recover after a disk full

2017-03-03 Thread Jeff Widman (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Widman updated KAFKA-4790:
---
Fix Version/s: (was: 0.10.2.1)

> Kafka cannot recover after a disk full
> --
>
> Key: KAFKA-4790
> URL: https://issues.apache.org/jira/browse/KAFKA-4790
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1, 0.10.1.1
>Reporter: Pengwei
>  Labels: reliability
>
> [2017-02-23 18:43:57,736] INFO zookeeper state changed (SyncConnected) 
> (org.I0Itec.zkclient.ZkClient)
> [2017-02-23 18:43:57,887] INFO Loading logs. (kafka.log.LogManager)
> [2017-02-23 18:43:57,935] INFO Recovering unflushed segment 0 in log test1-0. 
> (kafka.log.Log)
> [2017-02-23 18:43:59,297] ERROR There was an error in one of the threads 
> during logs loading: java.lang.IllegalArgumentException: requirement failed: 
> Attempt to append to a full index (size = 128000). (kafka.log.LogManager)
> [2017-02-23 18:43:59,299] FATAL Fatal error during KafkaServer startup. 
> Prepare to shutdown (kafka.server.KafkaServer)
> java.lang.IllegalArgumentException: requirement failed: Attempt to append to 
> a full index (size = 128000).
>   at scala.Predef$.require(Predef.scala:219)
>   at 
> kafka.log.OffsetIndex$$anonfun$append$1.apply$mcV$sp(OffsetIndex.scala:200)
>   at kafka.log.OffsetIndex$$anonfun$append$1.apply(OffsetIndex.scala:199)
>   at kafka.log.OffsetIndex$$anonfun$append$1.apply(OffsetIndex.scala:199)
>   at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
>   at kafka.log.OffsetIndex.append(OffsetIndex.scala:199)
>   at kafka.log.LogSegment.recover(LogSegment.scala:191)
>   at kafka.log.Log.recoverLog(Log.scala:259)
>   at kafka.log.Log.loadSegments(Log.scala:234)
>   at kafka.log.Log.(Log.scala:92)
>   at 
> kafka.log.LogManager$$anonfun$loadLogs$2$$anonfun$4$$anonfun$apply$10$$anonfun$apply$1.apply$mcV$sp(LogManager.scala:201)
>   at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:60)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [VOTE] KIP-107: Add purgeDataBefore() API in AdminClient

2017-03-03 Thread Jeff Widman
re: Purge vs PurgeMessages/Records - I also prefer that it be more explicit
about what is being purged. Despite the inconsistency with Fetch/Produce,
because it's explicit about what's being purged there shouldn't be
additional confusion. Who knows what in the future might need purging?
Adding the extra word provides optionality in the future with very little
cost.

re: Records vs Messages... it would be nice if it was consistent across
Kafka. Messages has a sense of expiring once received which seems like a
better fit for a system like Kafka that has things flowing through it and
then deleted, whereas records has a connotation of being kept for a
unspecified period of time, such as a a more typical database scenario. But
that's a minor point, really I just prefer it's consistent.

On Fri, Mar 3, 2017 at 6:34 AM, Ismael Juma  wrote:

> First of all, sorry to arrive late on this.
>
> Jun, do you have a reference that states that "purge" means to remove a
> portion? If I do "define: purge" on Google, one of the definitions is
> "physically remove (something) completely."
>
> In the PR, I was asking about the reasoning more than suggesting a change.
> But let me clarify my thoughts. There are 2 separate things to think about:
>
> 1. The protocol change.
>
> It's currently called Purge with no mention of what it's purging. This is
> consistent with Fetch and Produce and it makes sense if we reserve the word
> "Purge" for dealing with records/messages. Having said that, I don't think
> this is particularly intuitive for people who are not familiar with Kafka
> and its history. The number of APIs in the protocol keeps growing and it
> would be better to be explicit about what is being purged/deleted, in my
> opinion. If we are explicit, then we need to decide what to call it, since
> there is no precedent. A few options: PurgeRecords, PurgeMessages,
> PurgeData, DeleteRecords, DeleteMessages, DeleteData (I personally don't
> like the Data suffix as it's not used anywhere else).
>
> 2. The AdminClient change.
>
> Regarding the name of the method, I'd prefer to avoid the `Data` suffix
> because I don't think we use that anywhere else (please correct me if I'm
> wrong). In the Producer, we have `send(ProduceRecord)` and in the consumer
> we have `ConsumerRecords poll(...)`. So maybe, the suffix should be
> `Records`? Like in the protocol, we still need to decide if we want to use
> `purge` or `delete`. We seem to use `delete` for all the other methods in
> the AdminClient, so unless we have a reason to use a different name, it
> seems like we should be consistent.
>
> The proposed method signature is `Future PurgeDataResult>> purgeDataBefore(Map
> offsetForPartition)`. In the AdminClient KIP (KIP-117), we are using
> classes to encapsulate the parameters and result. We should probably do the
> same in this KIP for consistency. Once we do that, we should also consider
> if `Before` should be in the method name or should be in the parameter
> class. Just an example to describe what I mean, one could say
> `deleteRecords(DeleteRecordsParams.before(offsetsForPartition)`. That way,
> we could provide a different way of deleting by simply updating the
> parameters class.
>
> Some food for thought. :)
>
> Ismael
>
>
>
> On Thu, Mar 2, 2017 at 5:46 PM, Dong Lin  wrote:
>
> > Hey Jun,
> >
> > Thanks for this information. I am not aware of this difference between
> the
> > purge and delete. Given this difference, I will prefer to the existing
> name
> > of the purge.
> >
> > Ismael, please let me if you are strong about using delete.
> >
> > Thanks,
> > Dong
> >
> >
> > On Thu, Mar 2, 2017 at 9:40 AM, Jun Rao  wrote:
> >
> > > Hi, Dong,
> > >
> > > It seems that delete means removing everything while purge means
> > removing a
> > > portion. So, it seems that it's better to be able to distinguish the
> two?
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Wed, Mar 1, 2017 at 1:57 PM, Dong Lin  wrote:
> > >
> > > > Hi all,
> > > >
> > > > I have updated the KIP to include a script that allows user to purge
> > data
> > > > by providing a map from partition to offset. I think this script may
> be
> > > > convenience and useful, e.g., if user simply wants to purge all data
> of
> > > > given partitions from command line. I am wondering if anyone object
> > this
> > > > script or has suggestions on the interface.
> > > >
> > > > Besides, Ismael commented in the pull request that it may be better
> to
> > > > rename PurgeDataBefore() to DeleteDataBefore() and rename
> PurgeRequest
> > to
> > > > DeleteRequest. I think it may be a good idea because kafka-topics.sh
> > > > already use "delete" as an option. Personally I don't have strong
> > > > preference between "purge" and "delete". I am wondering if anyone
> > object
> > > to
> > > > this change.
> > > >
> > > > Thanks,
> > > > Dong
> > > >
> > > >
> > > >
> > > > On Wed, Mar 1, 2017 at 9:46 AM, Dong Lin 
> wrote:
> > > >
> > > > > Hi Ismael,
> > > > >
> > > > > I actually me

Re: Is there already a KIP or JIRA to change auto.create.topics.enable to default to false?

2017-03-02 Thread Jeff Widman
Thanks, that's the ticket I was thinking of.

On Thu, Mar 2, 2017 at 11:19 AM, Grant Henke  wrote:

> I think the idea was that once clients have the ability to create topics,
> we would move "auto topic creation" client side and deprecate and
> eventually remove the support for server side auto create. This simplifies
> error handling, authorization, and puts the client in control of details
> like partition counts and replication factor.
>
> There is a jira (KAFKA-2410
> <https://issues.apache.org/jira/browse/KAFKA-2410>) tracking the move to
> client side auto topic creation and a discussion about some of the details
> here:
> http://search-hadoop.com/m/Kafka/uyzND1yAwWoCt1yc?subj=+
> DISCUSS+Client+Side+Auto+Topic+Creation
>
> Since a change in the default behavior of auto.create.topics.enable would
> be considered "breaking" I think it would be best to consider this change
> as a part of the deprecation and eventual removal of the configuration.
> Perhaps 0.11 would be a good timeframe to consider doing that, but it
> depends on if the supporting features are complete.
>
> Thanks,
> Grant
>
>
>
> On Thu, Mar 2, 2017 at 12:56 PM, Jeff Widman  wrote:
>
> > I thought I saw mention somewhere of changing the default of
> > auto.create.topics.enable to false.
> >
> > I searched, but couldn't find anything in JIRA... am I imagining things?
> >
> > Now that there's API support for creating topics, the version bump to
> > 0.11.0 seems like a good time to re-evaluate whether this default should
> be
> > flipped to false.
> >
> > I'm happy to create a KIP if needed, just didn't want to duplicate
> effort.
> >
>
>
>
> --
> Grant Henke
> Software Engineer | Cloudera
> gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke
>


Is there already a KIP or JIRA to change auto.create.topics.enable to default to false?

2017-03-02 Thread Jeff Widman
I thought I saw mention somewhere of changing the default of
auto.create.topics.enable to false.

I searched, but couldn't find anything in JIRA... am I imagining things?

Now that there's API support for creating topics, the version bump to
0.11.0 seems like a good time to re-evaluate whether this default should be
flipped to false.

I'm happy to create a KIP if needed, just didn't want to duplicate effort.


Add me to jira/confluence?

2017-03-02 Thread Jeff Widman
Can someone add me to Jira/confluence for Kafka?

Don't use this work email, use j...@jeffwidman.com

Thanks!


[jira] [Commented] (KAFKA-4677) Avoid unnecessary task movement across threads during rebalance

2017-03-01 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890935#comment-15890935
 ] 

Jeff Widman commented on KAFKA-4677:


Is KIP-54 related since it also implements a sticky partition assigner, albeit 
within the Kafka consumer? 

I haven't worked directly with streams, but I thought they piggy-backed on top 
of KafKa consumer. If so, wouldn't it be better to just inherit the KIP-54 
implementation?

Again, I don't really understand internals, so not sure.

> Avoid unnecessary task movement across threads during rebalance
> ---
>
> Key: KAFKA-4677
> URL: https://issues.apache.org/jira/browse/KAFKA-4677
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.0
>Reporter: Damian Guy
>Assignee: Damian Guy
> Fix For: 0.10.3.0
>
>
> StreamPartitionAssigner tries to follow a sticky assignment policy to avoid 
> expensive task migration. Currently, it does this in a best-effort approach.
> We could observe a case, for which tasks did migrate for no good reason, thus 
> we assume that the current implementation could be improved to be more sticky.
> The concrete scenario is as follows:
> assume we have topology with 3 tasks, A, B, C
> assume we have 3 threads, each executing one task: 1-A, 2-B, 3-C
> for some reason, thread 1 goes down and a rebalance gets triggered
> thread 2 and 3 get their partitions revoked
> sometimes (not sure what the exact condition for this is), the new assignment 
> flips the assignment for task B and C (task A is newly assigned to either 
> thread 2 or 3)
> > possible new assignment 2(A,C) and 3-B
> There is no obvious reason (like load-balancing) why the task assignment for 
> B and C does change to the other thread resulting in unnecessary task 
> migration.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4095) When a topic is deleted and then created with the same name, 'committed' offsets are not reset

2017-02-28 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15889093#comment-15889093
 ] 

Jeff Widman commented on KAFKA-4095:


Wasn't this solved by KAFKA-2000?

At least for the new consumers which store their offsets in Kafka... if they 
store offsets in zookeeper that isn't updated, but that's been deprecated since 
Kafka 0.8.2 so probably not worth fixing at this point.

> When a topic is deleted and then created with the same name, 'committed' 
> offsets are not reset
> --
>
> Key: KAFKA-4095
> URL: https://issues.apache.org/jira/browse/KAFKA-4095
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.1, 0.10.0.0
>Reporter: Alex Glikson
>Assignee: Vahid Hashemian
>
> I encountered a very strange behavior of Kafka, which seems to be a bug.
> After deleting a topic and re-creating it with the same name, I produced 
> certain amount of new messages, and then opened a consumer with the same ID 
> that I used before re-creating the topic (with auto.commit=false, 
> auto.offset.reset=earliest). While the latest offsets seemed up to date, the 
> *committed* offset (returned by committed() method) was an *old* offset, from 
> the time before the topic has been deleted and created.
> I would have assumed that when a topic is deleted, all the associated 
> topic-partitions and consumer groups are recycled too.
> I am using the Java client version 0.9, with Kafka server 0.10.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [DISCUSS] 0.10.3.0/0.11.0.0 release planning

2017-02-27 Thread Jeff Widman
+1 for major version bump.

A good bit of deprecated code I would like to see removed especially on old
consumer side plus a few other settings defaults changed such as the brief
discussion on mirrormaker options a few months back. Just be good to
continue to make the new user experience a lot more streamlined so they're
not wondering about all these variations on consumers, CLI scripts etc.

On Feb 27, 2017 8:14 PM, "Becket Qin"  wrote:

> Hi Ismael,
>
> Thanks for volunteering on the new release.
>
> I think 0.11.0.0 makes a lot of sense given the new big features we are
> intended to include.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Mon, Feb 27, 2017 at 7:47 PM, Ismael Juma  wrote:
>
> > Hi all,
> >
> > With 0.10.2.0 out of the way, I would like to volunteer to be the release
> > manager for our next time-based release. See https://cwiki.apache.org/c
> > onfluence/display/KAFKA/Time+Based+Release+Plan if you missed previous
> > communication on time-based releases or need a reminder.
> >
> > I put together a draft release plan with June 2017 as the release month
> (as
> > previously agreed) and a list of KIPs that have already been voted:
> >
> > *https://cwiki.apache.org/confluence/pages/viewpage.
> action?pageId=68716876
> >  action?pageId=68716876
> > >*
> >
> > I haven't set exact dates for the various stages (feature freeze, code
> > freeze, etc.) for now as Ewen is going to send out an email with some
> > suggested tweaks based on his experience as release manager for 0.10.2.0.
> > We can set the exact dates after that discussion.
> >
> > As we are starting the process early this time, we should expect the
> number
> > of KIPs in the plan to grow (so don't worry if your KIP is not there
> yet),
> > but it's good to see that we already have 10 (including 2 merged and 2
> with
> > PR reviews in progress).
> >
> > Out of the KIPs listed, KIP-98 (Exactly-once and Transactions) and
> KIP-101
> > (Leader Generation in Replication) require message format changes, which
> > typically imply a major version bump (i.e. 0.11.0.0). If we do that, then
> > it makes sense to also include KIP-106 (Unclean leader election should be
> > false by default) and KIP-118 (Drop support for Java 7). We would also
> take
> > the chance to remove deprecated code, in that case.
> >
> > Given the above, how do people feel about 0.11.0.0 as the next Kafka
> > version? Please share your thoughts.
> >
> > Thanks,
> > Ismael
> >
>


Re: [DISCUSS] KIP-117: Add a public AdministrativeClient API for Kafka admin operations

2017-02-06 Thread Jeff Widman
For individual consumer groups, it would be nice if the admin client made
it possible to fetch consumer offsets for the entire consumer group. Then
we don't have to manually assemble this outside of the admin client
interface.

On Feb 6, 2017 11:41 AM, "Colin McCabe"  wrote:

> On Fri, Feb 3, 2017, at 16:57, Dong Lin wrote:
> > Thanks for the reply, Colin. I have some comments inline.
>
> Hi Dong L.,
>
> >
> > In addition, I also have some comments regarding the Future() in response
> > to your latest email. As Ismael mentioned, we have added
> > purgeDataBefore()
> > API in AdminClient. This API returns Future() that allows user to purge
> > data in either syn or async manner. And we have presented use-case for
> > both
> > syn and async usage of this API in the discussion thread of KIP-107. I
> > think we should at least return a Future() object in this case, right?
>
> Hmm.  I didn't see any discussion in the KIP-107 [DISCUSS] thread of why
> a Future<> based API was proposed.  It seemed like the discussion
> focused on other issues (or maybe I missed the an email thread?)
>
> >
> > As you mentioned, we can transform a blocking API into a Futures-based
> > API
> > by using a thread pool. But thread pool seems inconvenient as compared to
> > using future().get() which transform a Future-based API into a blocking
> > API. Would it be a good reason to return Future() for all those API where
> > we need both syn and async mode?
>
> That's a good point.  It's easier to build a sync API on top of Futures
> than the reverse.
>
> >
> >
> > On Fri, Feb 3, 2017 at 10:20 AM, Colin McCabe 
> wrote:
> >
> > > On Thu, Feb 2, 2017, at 17:54, Dong Lin wrote:
> > > > Hey Colin,
> > > >
> > > > Thanks for the KIP. I have a few comments below:
> > > >
> > > > - I share similar view with Ismael that a Future-based API is better.
> > > > PurgeDataFrom() is an example API that user may want to do it
> > > > asynchronously even though there is only one request in flight at a
> time.
> > > > In the future we may also have some admin operation that allows user
> to
> > > > move replica from one broker to another, which also needs to work in
> both
> > > > sync and async style. It seems more generic to return Future for
> any
> > > > API
> > > > that requires both mode.
> > > >
> > > > - I am not sure if it is the cleanest way to have enum classes
> > > > CreateTopicsFlags and DeleteTopicsFlags. Are we going to create such
> > > > class
> > > > for every future API that requires configuration? It may be more
> generic
> > > > to
> > > > provide Map to any admin API that operates on multiple
> > > > entries.
> > > > For example, deleteTopic(Map). And it can be
> Map > > > Properties> for those API that requires multiple configs per entry.
> And
> > > > we
> > > > can provide default value, doc, config name for those API as we do
> > > > with AbstractConfig.
> > >
> > > Thanks for the comments, Dong L.
> > >
> > > EnumSet, EnumSet, and so forth
> are
> > > type-safe ways for the user to pass a set of boolean flags to the
> > > function.  It is basically a replacement for having an api like
> > > createTopics(, boolean nonblocking, boolean validateOnly).  It is
> > > preferrable to having the boolean flags because it's immediately clear
> > > when reading the code what createTopics(...,
> > > CreateTopicsFlags.VALIDATE_ONLY) means, whereas it is not immediately
> > > clear what createTopics(..., false, true) means.  It also prevents
> > > having lots of function overloads over time, which becomes confusing
> for
> > > users.  The EnumSet is not intended as a replacement for all possible
> > > future arguments we might add, but just an easy way to add more boolean
> > > arguments later without adding messy function overloads or type
> > > signatures with many booleans.
> > >
> > > Map is not type-safe, and I don't think we should use it in
> > > our APIs.  Instead, we should just add function overloads if necessary.
> > >
> >
> > I agree that using EnumSet is type safe.
> >
> >
> > >
> > > >
> > > > - I not sure if "Try" is very intuitive to Java developer. Is there
> any
> > > > counterpart of scala's "Try" in java
> > >
> > > Unfortunately, there is no equivalent to Try in the standard Java
> > > library.  That is the first place I checked, and I spent a while
> > > searching.  The closest is probably Java 8's Optional.  However,
> > > Optional just allows us to express Some(thing) or None, so callers
> would
> > > not be able to determine what the error was.
> >
> >
> > > > We actually have quite a few
> > > > existing
> > > > classes in Kafka that address the same problem, such as
> > > > ProduceRequestResult, LogAppendResult etc. Maybe we can follow the
> same
> > > > conversion and use *Result as this class name.
> > >
> > > Hmm.  ProduceRequestResult and LogAppendResult just store an exception
> > > alongside the data, and make the caller responsible for checking
> whether
> > > the exception is null (or None) before look

Re: [DISCUSS] KIP-110: Add Codec for ZStandard Compression

2017-02-03 Thread Jeff Widman
*My concern is that this minor improvement (based on the benchmark) over
LZ4 does not warrant the work of adding support for a new compression codec
to the broker, official clients and horde of 3rd party clients, including
upgrade paths, transformations, tests, additional dependencies, etc.*


As someone who primarily works with non-Java clients, I couldn't +1 this
enough.

If there's truly a benefit (and probably there is...the Facebook post
announcing ZStandard had some fairly compelling benchmarks) then yes, let's
do this. But otherwise, please no.

On Tue, Jan 31, 2017 at 1:47 AM, Magnus Edenhill  wrote:

> Hi Dongjin and good work on the KIP,
>
> I understand that ZStandard is generally considered an improvement over
> LZ4, but the
> benchmark you provided on the KIP-110 wiki doesn't really reflect that, and
> even
> makes a note that they are comparable:
> *> As you can see above, ZStandard shows outstanding performance in both of
> compression rate and speed, especially working with the speed-first setting
> (level 1). To the extent that only LZ4 can be compared to ZStandard.*
>
> My concern is that this minor improvement (based on the benchmark) over LZ4
> does not warrant the work
> of adding support for a new compression codec to the broker, official
> clients and horde of 3rd party clients, including
> upgrade paths, transformations, tests, additional dependencies, etc.
>
> Is it possible to produce more convincing comparisons?
>
> Thanks,
> Magnus
>
>
>
>
>
> 2017-01-31 10:28 GMT+01:00 Dongjin Lee :
>
> > Ismael & All,
> >
> > After Inspecting the related code & commits, I concluded following:
> >
> > 1. We have to update the masking value which is used to retrieve the used
> > codec id from the messages, to enable the retrieval of the 3rd bit of
> > compression type field of the message.
> > 2. The above task is already done; so, we need nothing.
> >
> > Here is why.
> >
> > Let's start with the first one, with the scenario Juma proposed. In the
> > case of receiving the message of unsupported compression type, the
> receiver
> > (= broker or consumer) raises IllegalArgumentException[^1][^2]. The key
> > element in this operation is Record#COMPRESSION_CODEC_MASK, which is used
> > to extract the codec id. We have to update this value from 2-bits
> extractor
> > (=0x03) to 3-bits extractor (=0x07).
> >
> > But in fact, this task is already done, so its current value is 0x07. We
> > don't have to update it.
> >
> > The reason why this task is already done has some story; From the first
> > time Record.java file was added to the project[^3], the
> > COMPRESSION_CODEC_MASK was already 2-bits extractor, that is, 0x03. At
> that
> > time, Kafka supported only two compression types - GZipCompression and
> > SnappyCompression.[^4] After that, KAFKA-1456 introduced two additional
> > codecs of LZ4 and LZ4C[^5]. This update modified COMPRESSION_CODEC_MASK
> > into 3 bits extractor, 0x07, in the aim of supporting four compression
> > codecs.
> >
> > Although its following work, KAFKA-1493, removed the support of LZ4C
> > codec[^6], this mask was not reverted into 2-bits extractor - by this
> > reason, we don't need to care about the message format.
> >
> > Attached screenshot is my test on Juma's scenario. I created topic & sent
> > some messages using the snapshot version with ZStandard compression and
> > received the message with the latest version. As you can see, it works
> > perfectly as expected.
> >
> > If you have more opinion to this issue, don't hesitate to send me as a
> > message.
> >
> > Best,
> > Dongjin
> >
> > [^1]: see: Record#compressionType.
> > [^2]: There is similar routine in Message.scala. But after KAFKA-4390,
> > that routine is not being used anymore - more precisely, Message class is
> > now used in ConsoleConsumer only. I think this class should be replaced
> but
> > since it is a separated topic, I will send another message for this
> issue.
> > [^3]: commit 642da2f (2011.8.2).
> > [^4]: commit c51b940.
> > [^5]: commit 547cced.
> > [^6]: commit 37356bf.
> >
> > On Thu, Jan 26, 2017 at 12:35 AM, Ismael Juma  wrote:
> >
> >> So far the discussion was around the performance characteristics of the
> >> new
> >> compression algorithm. Another area that is important and is not covered
> >> in
> >> the KIP is the compatibility implications. For example, what happens if
> a
> >> consumer that doesn't support zstd tries to consume a topic compressed
> >> with
> >> it? Or if a broker that doesn't support receives data compressed with
> it?
> >> If we go through that exercise, then more changes may be required (like
> >> bumping the version of produce/fetch protocols).
> >>
> >> Ismael
> >>
> >> On Wed, Jan 25, 2017 at 3:22 PM, Ben Stopford  wrote:
> >>
> >> > Is there more discussion to be had on this KIP, or should it be taken
> >> to a
> >> > vote?
> >> >
> >> > On Mon, Jan 16, 2017 at 6:37 AM Dongjin Lee 
> wrote:
> >> >
> >> > > I updated KIP-110 with JMH-measured benchmark resul

KIP-54 voting status?

2017-01-30 Thread Jeff Widman
I joined the dev list after KIP-54 voting started, so unfortunately don't
have the old thread to bump.

But wanted to check if there was any news on this?

>From KAFKA-2273 sounds like there are no outstanding objections to the
design, but there also aren't yet enough +1's, so is this just languishing
in purgatory?


[jira] [Commented] (KAFKA-3806) Adjust default values of log.retention.hours and offsets.retention.minutes

2017-01-25 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15838918#comment-15838918
 ] 

Jeff Widman commented on KAFKA-3806:


Too often people not super familiar with Kafka don't realize the default is so 
low... 

So I would rather see this default to 4 days. If all processes in a consumer 
group randomly blow out Friday late afternoon and it's a low-priority consumer 
group, so ops/dev team decides to wait til Monday to fix the problem, thinking 
that the consumer will just catch back up when they get it fixed, then they'll 
be surprised to learn their offsets were hosed. 

For those companies big enough to have a performance hit from maintaining 
offsets that long, they will generally have the in-house resources to realize 
they should reduce this value.

> Adjust default values of log.retention.hours and offsets.retention.minutes
> --
>
> Key: KAFKA-3806
> URL: https://issues.apache.org/jira/browse/KAFKA-3806
> Project: Kafka
>  Issue Type: Improvement
>  Components: config
>Affects Versions: 0.9.0.1, 0.10.0.0
>Reporter: Michal Turek
>Priority: Minor
>
> Combination of default values of log.retention.hours (168 hours = 7 days) and 
> offsets.retention.minutes (1440 minutes = 1 day) may be dangerous in special 
> cases. Offset retention should be always greater than log retention.
> We have observed the following scenario and issue:
> - Producing of data to a topic was disabled two days ago by producer update, 
> topic wasn't deleted.
> - Consumer consumed all data and properly committed offsets to Kafka.
> - Consumer made no more offset commits for that topic because there was no 
> more incoming data and there was nothing to confirm. (We have auto-commit 
> disabled, I'm not sure how behaves enabled auto-commit.)
> - After one day: Kafka cleared too old offsets according to 
> offsets.retention.minutes.
> - After two days: Long-term running consumer was restarted after update, it 
> didn't find any committed offsets for that topic since they were deleted by 
> offsets.retention.minutes so it started consuming from the beginning.
> - The messages were still in Kafka due to larger log.retention.hours, about 5 
> days of messages were read again.
> Known workaround to solve this issue:
> - Explicitly configure log.retention.hours and offsets.retention.minutes, 
> don't use defaults.
> Proposals:
> - Prolong default value of offsets.retention.minutes to be at least twice 
> larger than log.retention.hours.
> - Check these values during Kafka startup and log a warning if 
> offsets.retention.minutes is smaller than log.retention.hours.
> - Add a note to migration guide about differences between storing of offsets 
> in ZooKeeper and Kafka (http://kafka.apache.org/documentation.html#upgrade).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-115: Enforce offsets.topic.replication.factor

2017-01-25 Thread Jeff Widman
As a user, I'd personally rather see this land sooner rather than waiting
for a major release...

As long as it fails noisily if I don't have enough brokers when it tries to
create the topic, then it's easy enough to solve...

On Wed, Jan 25, 2017 at 4:22 PM, Ismael Juma  wrote:

> An important question is if this needs to wait for a major release or not.
>
> Ismael
>
> On Thu, Jan 26, 2017 at 12:19 AM, Ismael Juma  wrote:
>
> > +1 from me too.
> >
> > Ismael
> >
> > On Thu, Jan 26, 2017 at 12:07 AM, Ewen Cheslack-Postava <
> e...@confluent.io
> > > wrote:
> >
> >> +1
> >>
> >> Since this is an unusual one, I think it's worth pointing out that the
> KIP
> >> notes it is really a bug fix, but since it has compatibility
> implications
> >> the KIP was worth it. It was a sort of intentional bug, but confusing
> and
> >> dangerous.
> >>
> >> Seems important to fix this ASAP since people are hitting this in
> practice
> >> and would have to go out of their way to set up monitoring to catch the
> >> issue.
> >>
> >> -Ewen
> >>
> >> On Wed, Jan 25, 2017 at 4:02 PM, Jason Gustafson 
> >> wrote:
> >>
> >> > +1 from me. The current behavior seems both surprising and dangerous.
> >> >
> >> > -Jason
> >> >
> >> > On Wed, Jan 25, 2017 at 3:58 PM, Onur Karaman <
> >> > onurkaraman.apa...@gmail.com>
> >> > wrote:
> >> >
> >> > > Hey everyone.
> >> > >
> >> > > I made a bug-fix KIP-115 to enforce offsets.topic.replication.
> factor:
> >> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >> > > 115%3A+Enforce+offsets.topic.replication.factor
> >> > >
> >> > > Comments are welcome.
> >> > >
> >> > > - Onur
> >> > >
> >> >
> >>
> >
> >
>


[jira] [Commented] (KAFKA-4517) Remove kafka-consumer-offset-checker.sh script since already deprecated in Kafka 9

2017-01-24 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15836818#comment-15836818
 ] 

Jeff Widman commented on KAFKA-4517:


So what are the next actions here since I've got an outstanding PR for this and 
it's the only PR against any of these issues...?

Should I retitle the PR / commit to the proper ticket? 

And also remove the tool itself in the same commit or a second commit?

> Remove kafka-consumer-offset-checker.sh script since already deprecated in 
> Kafka 9
> --
>
> Key: KAFKA-4517
> URL: https://issues.apache.org/jira/browse/KAFKA-4517
> Project: Kafka
>  Issue Type: Task
>Affects Versions: 0.10.1.0, 0.10.0.0, 0.10.0.1
>Reporter: Jeff Widman
>Assignee: Ewen Cheslack-Postava
>Priority: Blocker
>
> Kafka 9 deprecated kafka-consumer-offset-checker.sh 
> (kafka.tools.ConsumerOffsetChecker) in favor of kafka-consumer-groups.sh 
> (kafka.admin.ConsumerGroupCommand). 
> Since this was deprecated in 9, and the full functionality of the old script 
> appears to be available in the new script, can we remove the old shell script 
> in 10? 
> From an Ops perspective, it's confusing when I'm trying to check consumer 
> offsets that I open the bin directory, and see a script that seems to do 
> exactly what I want, only to later discover that I'm not supposed to use it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-4682) Committed offsets should not be deleted if a consumer is still active

2017-01-20 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15832697#comment-15832697
 ] 

Jeff Widman edited comment on KAFKA-4682 at 1/21/17 1:08 AM:
-

Now that consumers have background heartbeat thread, it should be much easier 
to identify when consumer dies vs alive. So this makes sense to me. However, 
this would make KAFKA-2000 more important because you can't count on offsets 
expiring.

We also had a production problem where a couple of topics log files were 
totally cleared, but the offsets weren't cleared, so we had negative lag where 
consumer offset was higher than broker highwater. This was with zookeeper 
offset storage, but regardless I could envision something getting screwed up or 
someone resetting a cluster w/o understanding what they're doing and making 
offsets screwed up. If this was implemented those old offsets would never go 
away unless manually cleared up also. So I'd want to make sure that's protected 
against somehow... like if a broker ever encounters consumer offset that's 
higher than highwater mark, either an exception is thrown or those consumer 
offsets get reset to the broker highwater mark. Probably safest to just throw 
an exception in case something else funky is going on.


was (Author: jeffwidman):
Now that consumers have background heartbeat thread, it should be much easier 
to identify when consumer dies vs alive. So this makes sense to me. However, 
this would make KAFKA-2000 more important because you can't count on offsets 
expiring.

We also had a production problem where a couple of topics log files were 
totally cleared, but the offsets weren't cleared, so we had negative lag where 
consumer offset was higher than broker highwater. This was with zookeeper 
offset storage, but regardless I could envision something getting screwed up or 
someone resetting a cluster w/o understanding what they're doing and making 
offsets screwed up. If this was implemented those old offsets would never go 
away unless manually cleared up also. So I'd want to make sure that's protected 
against somehow... like if a broker ever encounters consumer offset that's 
higher than highwater mark, that gets removed from the topic.

> Committed offsets should not be deleted if a consumer is still active
> -
>
> Key: KAFKA-4682
> URL: https://issues.apache.org/jira/browse/KAFKA-4682
> Project: Kafka
>  Issue Type: Bug
>Reporter: James Cheng
>
> Kafka will delete committed offsets that are older than 
> offsets.retention.minutes
> If there is an active consumer on a low traffic partition, it is possible 
> that Kafka will delete the committed offset for that consumer. Once the 
> offset is deleted, a restart or a rebalance of that consumer will cause the 
> consumer to not find any committed offset and start consuming from 
> earliest/latest (depending on auto.offset.reset). I'm not sure, but a broker 
> failover might also cause you to start reading from auto.offset.reset (due to 
> broker restart, or coordinator failover).
> I think that Kafka should only delete offsets for inactive consumers. The 
> timer should only start after a consumer group goes inactive. For example, if 
> a consumer group goes inactive, then after 1 week, delete the offsets for 
> that consumer group. This is a solution that [~junrao] mentioned in 
> https://issues.apache.org/jira/browse/KAFKA-3806?focusedCommentId=15323521&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15323521
> The current workarounds are to:
> # Commit an offset on every partition you own on a regular basis, making sure 
> that it is more frequent than offsets.retention.minutes (a broker-side 
> setting that a consumer might not be aware of)
> or
> # Turn the value of offsets.retention.minutes up really really high. You have 
> to make sure it is higher than any valid low-traffic rate that you want to 
> support. For example, if you want to support a topic where someone produces 
> once a month, you would have to set offsetes.retention.mintues to 1 month. 
> or
> # Turn on enable.auto.commit (this is essentially #1, but easier to 
> implement).
> None of these are ideal. 
> #1 can be spammy. It requires your consumers know something about how the 
> brokers are configured. Sometimes it is out of your control. Mirrormaker, for 
> example, only commits offsets on partitions where it receives data. And it is 
> duplication that you need to put into all of your consumers.
> #2 has disk-space impact on the broker (in __consumer_offsets) as well as 

[jira] [Commented] (KAFKA-4682) Committed offsets should not be deleted if a consumer is still active

2017-01-20 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15832697#comment-15832697
 ] 

Jeff Widman commented on KAFKA-4682:


Now that consumers have background heartbeat thread, it should be much easier 
to identify when consumer dies vs alive. So this makes sense to me. However, 
this would make KAFKA-2000 more important because you can't count on offsets 
expiring.

We also had a production problem where a couple of topics log files were 
totally cleared, but the offsets weren't cleared, so we had negative lag where 
consumer offset was higher than broker highwater. This was with zookeeper 
offset storage, but regardless I could envision something getting screwed up or 
someone resetting a cluster w/o understanding what they're doing and making 
offsets screwed up. If this was implemented those old offsets would never go 
away unless manually cleared up also. So I'd want to make sure that's protected 
against somehow... like if a broker ever encounters consumer offset that's 
higher than highwater mark, that gets removed from the topic.

> Committed offsets should not be deleted if a consumer is still active
> -
>
> Key: KAFKA-4682
> URL: https://issues.apache.org/jira/browse/KAFKA-4682
> Project: Kafka
>  Issue Type: Bug
>Reporter: James Cheng
>
> Kafka will delete committed offsets that are older than 
> offsets.retention.minutes
> If there is an active consumer on a low traffic partition, it is possible 
> that Kafka will delete the committed offset for that consumer. Once the 
> offset is deleted, a restart or a rebalance of that consumer will cause the 
> consumer to not find any committed offset and start consuming from 
> earliest/latest (depending on auto.offset.reset). I'm not sure, but a broker 
> failover might also cause you to start reading from auto.offset.reset (due to 
> broker restart, or coordinator failover).
> I think that Kafka should only delete offsets for inactive consumers. The 
> timer should only start after a consumer group goes inactive. For example, if 
> a consumer group goes inactive, then after 1 week, delete the offsets for 
> that consumer group. This is a solution that [~junrao] mentioned in 
> https://issues.apache.org/jira/browse/KAFKA-3806?focusedCommentId=15323521&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15323521
> The current workarounds are to:
> # Commit an offset on every partition you own on a regular basis, making sure 
> that it is more frequent than offsets.retention.minutes (a broker-side 
> setting that a consumer might not be aware of)
> or
> # Turn the value of offsets.retention.minutes up really really high. You have 
> to make sure it is higher than any valid low-traffic rate that you want to 
> support. For example, if you want to support a topic where someone produces 
> once a month, you would have to set offsetes.retention.mintues to 1 month. 
> or
> # Turn on enable.auto.commit (this is essentially #1, but easier to 
> implement).
> None of these are ideal. 
> #1 can be spammy. It requires your consumers know something about how the 
> brokers are configured. Sometimes it is out of your control. Mirrormaker, for 
> example, only commits offsets on partitions where it receives data. And it is 
> duplication that you need to put into all of your consumers.
> #2 has disk-space impact on the broker (in __consumer_offsets) as well as 
> memory-size on the broker (to answer OffsetFetch).
> #3 I think has the potential for message loss (the consumer might commit on 
> messages that are not yet fully processed)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2000) Delete consumer offsets from kafka once the topic is deleted

2017-01-20 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15832378#comment-15832378
 ] 

Jeff Widman commented on KAFKA-2000:


If neither of them is interested, I'm happy to cleanup the existing patch to 
get it merged into 0.10.2. The test suite at my work would benefit from this.

> Delete consumer offsets from kafka once the topic is deleted
> 
>
> Key: KAFKA-2000
> URL: https://issues.apache.org/jira/browse/KAFKA-2000
> Project: Kafka
>  Issue Type: Bug
>Reporter: Sriharsha Chintalapani
>Assignee: Manikumar Reddy
>  Labels: newbie++
> Fix For: 0.10.2.0
>
> Attachments: KAFKA-2000_2015-05-03_10:39:11.patch, KAFKA-2000.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-2000) Delete consumer offsets from kafka once the topic is deleted

2017-01-20 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15832378#comment-15832378
 ] 

Jeff Widman edited comment on KAFKA-2000 at 1/20/17 8:45 PM:
-

If neither of them is interested, I'm happy to cleanup the existing patch to 
get it merged into 0.10.2. The test suite at my work would benefit from this. 
Just let me know.


was (Author: jeffwidman):
If neither of them is interested, I'm happy to cleanup the existing patch to 
get it merged into 0.10.2. The test suite at my work would benefit from this.

> Delete consumer offsets from kafka once the topic is deleted
> 
>
> Key: KAFKA-2000
> URL: https://issues.apache.org/jira/browse/KAFKA-2000
> Project: Kafka
>  Issue Type: Bug
>Reporter: Sriharsha Chintalapani
>Assignee: Manikumar Reddy
>  Labels: newbie++
> Fix For: 0.10.2.0
>
> Attachments: KAFKA-2000_2015-05-03_10:39:11.patch, KAFKA-2000.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3853) Report offsets for empty groups in ConsumerGroupCommand

2017-01-18 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828620#comment-15828620
 ] 

Jeff Widman commented on KAFKA-3853:


Thanks again for this! Really looking forward to it.

> Report offsets for empty groups in ConsumerGroupCommand
> ---
>
> Key: KAFKA-3853
> URL: https://issues.apache.org/jira/browse/KAFKA-3853
> Project: Kafka
>  Issue Type: Bug
>  Components: admin, tools
>Reporter: Jason Gustafson
>Assignee: Vahid Hashemian
>  Labels: kip
> Fix For: 0.10.2.0
>
>
> We ought to be able to display offsets for groups which either have no active 
> members or which are not using group management. The owner column can be left 
> empty or set to "N/A". If a group is active, I'm not sure it would make sense 
> to report all offsets, in particular when partitions are unassigned, but if 
> it seems problematic to do so, we could enable the behavior with a flag (e.g. 
> --include-unassigned).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4668) Mirrormaker should default to auto.offset.reset=earliest

2017-01-18 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15827769#comment-15827769
 ] 

Jeff Widman commented on KAFKA-4668:


Also KAFKA-3848

> Mirrormaker should default to auto.offset.reset=earliest
> 
>
> Key: KAFKA-4668
> URL: https://issues.apache.org/jira/browse/KAFKA-4668
> Project: Kafka
>  Issue Type: Improvement
>    Reporter: Jeff Widman
> Fix For: 0.10.3.0
>
>
> Mirrormaker currently inherits the default value for auto.offset.reset, which 
> is latest (new consumer) / largest (old consumer). 
> While for most consumers this is a sensible default, mirrormakers are 
> specifically designed for replication, so they should default to replicating 
> topics from the beginning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Fwd: [jira] [Commented] (KAFKA-1207) Launch Kafka from within Apache Mesos

2017-01-17 Thread Jeff Widman
I don't understand why the whole dev list needs to be notified of failed
email deliveries.

Is there a way to shut these off?




-- Forwarded message --
From: postmas...@inn.ru (JIRA) 
Date: Tue, Jan 17, 2017 at 9:50 PM
Subject: [jira] [Commented] (KAFKA-1207) Launch Kafka from within Apache
Mesos
To: dev@kafka.apache.org



[ https://issues.apache.org/jira/browse/KAFKA-1207?page=
com.atlassian.jira.plugin.system.issuetabpanels:comment-
tabpanel&focusedCommentId=15827464#comment-15827464 ]

postmas...@inn.ru commented on KAFKA-1207:
--

Delivery is delayed to these recipients or groups:

e...@inn.ru

Subject: [jira] [Commented] (KAFKA-1207) Launch Kafka from within Apache
Mesos

This message hasn't been delivered yet. Delivery will continue to be
attempted.

The server will keep trying to deliver this message for the next 1 days, 19
hours and 51 minutes. You'll be notified if the message can't be delivered
by that time.







Diagnostic information for administrators:

Generating server: lc-exch-02.inn.local
Receiving server: inn.ru (109.105.153.25)

e...@inn.ru
Server at inn.ru (109.105.153.25) returned '400 4.4.7 Message delayed'
1/18/2017 5:38:42 AM - Server at inn.ru (109.105.153.25) returned '441
4.4.1 Error communicating with target host: "Failed to connect. Winsock
error code: 10060, Win32 error code: 10060." Last endpoint attempted was
109.105.153.25:25'

Original message headers:

Received: from lc-exch-04.inn.local (10.64.37.99) by lc-exch-02.inn.local
 (10.64.37.98) with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.669.32; Wed, 18
 Jan 2017 04:40:33 +0300
Received: from lc-asp-02.inn.ru (10.64.37.105) by lc-exch-04.inn.local
 (10.64.37.100) with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.669.32 via
 Frontend Transport; Wed, 18 Jan 2017 04:40:33 +0300
Received-SPF: None (no SPF record) identity=mailfrom;
client-ip=209.188.14.142; helo=spamd1-us-west.apache.org; envelope-from=
j...@apache.org; receiver=e...@inn.ru
X-Envelope-From: 
Received: from spamd1-us-west.apache.org (pnap-us-west-generic-nat.
apache.org [209.188.14.142])
by lc-asp-02.inn.ru (Postfix) with ESMTP id 51D8B400C3
for ; Wed, 18 Jan 2017 02:40:32 +0100 (CET)
Received: from localhost (localhost [127.0.0.1])
by spamd1-us-west.apache.org (ASF Mail Server at
spamd1-us-west.apache.org) with ESMTP id B4F2DC25B1
for ; Wed, 18 Jan 2017 01:40:31 + (UTC)
X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org
X-Spam-Flag: NO
X-Spam-Score: -1.998
X-Spam-Level:
X-Spam-Status: No, score=-1.998 tagged_above=-999 required=6.31
tests=[KAM_LAZY_DOMAIN_SECURITY=1, RP_MATCHES_RCVD=-2.999,
URIBL_BLOCKED=0.001] autolearn=disabled
Received: from mx1-lw-us.apache.org ([10.40.0.8])
by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new,
port 10024)
with ESMTP id lxl1Bv2BWoLB for ;
Wed, 18 Jan 2017 01:40:30 + (UTC)
Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org
[209.188.14.139])
by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org)
with ESMTP id 33F315FC87
for ; Wed, 18 Jan 2017 01:40:30 + (UTC)
Received: from jira-lw-us.apache.org (unknown [207.244.88.139])
by mailrelay1-us-west.apache.org (ASF Mail Server at
mailrelay1-us-west.apache.org) with ESMTP id C9A42E867E
for ; Wed, 18 Jan 2017 01:40:28 + (UTC)
Received: from jira-lw-us.apache.org (localhost [127.0.0.1])
by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org)
with ESMTP id 1E11B25295
for ; Wed, 18 Jan 2017 01:40:27 + (UTC)
Date: Wed, 18 Jan 2017 01:40:27 +
From: "postmas...@inn.ru (JIRA)" 
To: 
Message-ID: 
In-Reply-To: 
References:  <
jira.12689059.1389811935...@jira-lw-us.apache.org>
Subject: [jira] [Commented] (KAFKA-1207) Launch Kafka from within Apache
 Mesos
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394
X-inn-MailScanner-ESVA-Information: Please contact  for more information
X-inn-MailScanner-ESVA-ID: 51D8B400C3.A6AA1
X-inn-MailScanner-ESVA: Found to be clean
X-inn-MailScanner-ESVA-From: j...@apache.org
X-inn-MailScanner-ESVA-Watermark: 1485308433.01128@ko+T+VvxU2LmvLEwrJMO8w
Return-Path: j...@apache.org
X-OrganizationHeadersPreserved: lc-exch-02.inn.local
X-CrossPremisesHeadersFilteredByDsnGenerator: lc-exch-02.inn.local



> Launch Kafka from within Apache Mesos
> -
>
> Key: KAFKA-1207
> URL: https://issues.apache.org/jira/browse/KAFKA-1207
> Project: Kafka
>  Issue Type: Bug
>Reporter: Joe Stein
>  Labels: mesos
> Attachments: KAFKA-1207_2014-01-19_00:04:58.patch,
KAFKA-1

Re: Is this a bug or just unintuitive behavior?

2017-01-17 Thread Jeff Widman
Agree on suggesting in the docs that generally the default offset should be
reset to "none" after the mirrormaker is going.

There is an edgecase where you want to keep offsets to earliest: When
you've got a mirrormaker consumer subscribed to a regex pattern and have
auto-topic creation enabled on your cluster.

If you start producing to a non-existent topic that matches the regex, then
there will be a period of time where the producer is producing before the
new topic's partitions have been picked up by the mirrormaker. Those
messages will never be consumed by the mirrormaker because it will start
from latest, ignoring those just-produced messages.

Also agree on increasing the default offsets retention minutes. I actually
didn't realize the default was so small.

On Tue, Jan 17, 2017 at 8:16 PM, Grant Henke  wrote:

> I agree that setting the default auto.offset.reset to earliest makes sense
> (This was actually a default choice Flume made for its Kafka channel to
> avoid missing the first messages). However I think, at a minimum, we should
> also document a recommendation to consider changing the value to none after
> mirror maker has run to commit its initial offsets.
>
> Setting the value to none ensures you don't replicated the entire topic
> from scratch in the case offsets are lost or purged due to prolonged
> downtime or other unforeseen circumstances. Having and auto.offset.reset of
> none also allows you to ensure you don't miss data. Missing data can occur
> when auto.offset.reset is set to latest and the offset state was lost
> before mirrormaker was caught up or data was produced while it was down.
>
> I would also suggest considering increasing the default
> offsets.retention.minutes
> from 1 day (1440) to 7 days (10080)...or something similar. I have seen a
> handful of scenarios where an outage lasts longer than a day, the offsets
> get purged causing the auto.offset.reset to kick in and in the case of
> earliest, re-replicating billions of messages.
>
> On Tue, Jan 17, 2017 at 8:09 PM, Jeff Widman  wrote:
>
> > [Moving discussion from users list to dev list]
> >
> > I agree with Ewen that it's more sensible for mirrormaker to default to
> > replicating topics from the earliest offset available, rather than just
> > replicating from the current offset onward.
> >
> > I filed a JIRA ticket https://issues.apache.org/jira/browse/KAFKA-4668
> > As well as a PR: https://github.com/apache/kafka/pull/2394
> >
> > Does this need a KIP?
> >
> > The main side effect of this change is if you start mirroring a new topic
> > you can hammer your network until it catches up or until you realize
> what's
> > happening and throttle the mirrormaker client.
> >
> > Cheers,
> > Jeff
> >
> >
> > On Thu, Jan 5, 2017 at 7:55 PM, Ewen Cheslack-Postava  >
> > wrote:
> >
> > > The basic issue here is just that the auto.offset.reset defaults to
> > latest,
> > > right? That's not a very good setting for a mirroring tool and this
> seems
> > > like something we might just want to change the default for. It's
> > debatable
> > > whether it would even need a KIP.
> > >
> > > We have other settings in MM where we override them if they aren't set
> > > explicitly but we don't want the normal defaults. Most are producer
> > > properties to avoid duplicates (the acks, retries, max.block.ms, and
> > > max.in.flight.requests.per.connection settings), but there are a
> couple
> > of
> > > consumer ones too (auto.commit.enable and consumer.timeout.ms).
> > >
> > > This is probably something like a 1-line MM patch if someone wants to
> > > tackle it -- the question of whether it needs a KIP or not is,
> > > unfortunately, the more complicated question :(
> > >
> > > -Ewen
> > >
> > > On Thu, Jan 5, 2017 at 1:10 PM, James Cheng 
> > wrote:
> > >
> > > >
> > > > > On Jan 5, 2017, at 12:57 PM, Jeff Widman 
> wrote:
> > > > >
> > > > > Thanks James and Hans.
> > > > >
> > > > > Will this also happen when we expand the number of partitions in a
> > > topic?
> > > > >
> > > > > That also will trigger a rebalance, the consumer won't subscribe to
> > the
> > > > > partition until the rebalance finishes, etc.
> > > > >
> > > > > So it'd seem that any messages published to the new partition in
> > > between
> > > > > the partition creation and the rebalance finish

Re: Is this a bug or just unintuitive behavior?

2017-01-17 Thread Jeff Widman
[Moving discussion from users list to dev list]

I agree with Ewen that it's more sensible for mirrormaker to default to
replicating topics from the earliest offset available, rather than just
replicating from the current offset onward.

I filed a JIRA ticket https://issues.apache.org/jira/browse/KAFKA-4668
As well as a PR: https://github.com/apache/kafka/pull/2394

Does this need a KIP?

The main side effect of this change is if you start mirroring a new topic
you can hammer your network until it catches up or until you realize what's
happening and throttle the mirrormaker client.

Cheers,
Jeff


On Thu, Jan 5, 2017 at 7:55 PM, Ewen Cheslack-Postava 
wrote:

> The basic issue here is just that the auto.offset.reset defaults to latest,
> right? That's not a very good setting for a mirroring tool and this seems
> like something we might just want to change the default for. It's debatable
> whether it would even need a KIP.
>
> We have other settings in MM where we override them if they aren't set
> explicitly but we don't want the normal defaults. Most are producer
> properties to avoid duplicates (the acks, retries, max.block.ms, and
> max.in.flight.requests.per.connection settings), but there are a couple of
> consumer ones too (auto.commit.enable and consumer.timeout.ms).
>
> This is probably something like a 1-line MM patch if someone wants to
> tackle it -- the question of whether it needs a KIP or not is,
> unfortunately, the more complicated question :(
>
> -Ewen
>
> On Thu, Jan 5, 2017 at 1:10 PM, James Cheng  wrote:
>
> >
> > > On Jan 5, 2017, at 12:57 PM, Jeff Widman  wrote:
> > >
> > > Thanks James and Hans.
> > >
> > > Will this also happen when we expand the number of partitions in a
> topic?
> > >
> > > That also will trigger a rebalance, the consumer won't subscribe to the
> > > partition until the rebalance finishes, etc.
> > >
> > > So it'd seem that any messages published to the new partition in
> between
> > > the partition creation and the rebalance finishing won't be consumed by
> > any
> > > consumers that have offset=latest
> > >
> >
> > It hadn't occured to me until you mentioned it, but yes, I think it'd
> also
> > happen in those cases.
> >
> > In the kafka consumer javadocs, they provide a list of things that would
> > cause a rebalance:
> > http://kafka.apache.org/0101/javadoc/org/apache/kafka/clients/consumer/
> > KafkaConsumer.html#subscribe(java.util.Collection,%20org.
> > apache.kafka.clients.consumer.ConsumerRebalanceListener) <
> > http://kafka.apache.org/0101/javadoc/org/apache/kafka/clients/consumer/
> > KafkaConsumer.html#subscribe(java.util.Collection,
> > org.apache.kafka.clients.consumer.ConsumerRebalanceListener)>
> >
> > "As part of group management, the consumer will keep track of the list of
> > consumers that belong to a particular group and will trigger a rebalance
> > operation if one of the following events trigger -
> >
> > Number of partitions change for any of the subscribed list of topics
> > Topic is created or deleted
> > An existing member of the consumer group dies
> > A new member is added to an existing consumer group via the join API
> > "
> >
> > I'm guessing that this would affect any of those scenarios.
> >
> > -James
> >
> >
> > >
> > >
> > >
> > > On Thu, Jan 5, 2017 at 12:40 AM, James Cheng 
> > wrote:
> > >
> > >> Jeff,
> > >>
> > >> Your analysis is correct. I would say that it is known but unintuitive
> > >> behavior.
> > >>
> > >> As an example of a problem caused by this behavior, it's possible for
> > >> mirrormaker to miss messages on newly created topics, even thought it
> > was
> > >> subscribed to them before topics were creted.
> > >>
> > >> See the following JIRAs:
> > >> https://issues.apache.org/jira/browse/KAFKA-3848 <
> > >> https://issues.apache.org/jira/browse/KAFKA-3848>
> > >> https://issues.apache.org/jira/browse/KAFKA-3370 <
> > >> https://issues.apache.org/jira/browse/KAFKA-3370>
> > >>
> > >> -James
> > >>
> > >>> On Jan 4, 2017, at 4:37 PM, h...@confluent.io wrote:
> > >>>
> > >>> This sounds exactly as I would expect things to behave. If you
> consume
> > >> from the beginning I would think you would get all the messages but
> not
> >

[jira] [Created] (KAFKA-4668) Mirrormaker should default to auto.offset.reset=earliest

2017-01-17 Thread Jeff Widman (JIRA)
Jeff Widman created KAFKA-4668:
--

 Summary: Mirrormaker should default to auto.offset.reset=earliest
 Key: KAFKA-4668
 URL: https://issues.apache.org/jira/browse/KAFKA-4668
 Project: Kafka
  Issue Type: Improvement
Reporter: Jeff Widman
 Fix For: 0.10.3.0


Mirrormaker currently inherits the default value for auto.offset.reset, which 
is latest (new consumer) / largest (old consumer). 

While for most consumers this is a sensible default, mirrormakers are 
specifically designed for replication, so they should default to replicating 
topics from the beginning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Okay to remove the deprecated kafka-consumer-offset-checker.sh?

2017-01-17 Thread Jeff Widman
Kafka 9 deprecated kafka-consumer-offset-checker.sh
(kafka.tools.ConsumerOffsetChecker) in favor of kafka-consumer-groups.sh
(kafka.admin.ConsumerGroupCommand).

Since this was deprecated in 9, and the full functionality of the old
script appears to be available in the new script, can we completely remove
the old shell script in K11?

>From an Ops perspective, it's confusing when I'm trying to check consumer
offsets that I open the bin directory, and see a script that seems to do
exactly what I want, only to later discover that I'm not supposed to use it.

I submitted a Jira ticket (https://issues.apache.org/jira/browse/KAFKA-4517)
and Github PR (https://github.com/apache/kafka/pull/2236) and was told it'd
be best to doublecheck this with the mailing list first.

Opinions?

Also, is there any accompanying Scala/Java code that's already deprecated
and should also be removed?


[jira] [Commented] (KAFKA-2410) Implement "Auto Topic Creation" client side and remove support from Broker side

2017-01-16 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15824382#comment-15824382
 ] 

Jeff Widman commented on KAFKA-2410:


I strongly support removing it from the Broker.

But does it need to be added to the Consumer? Why not only add it to the 
Producer? 

For the consumer, as long as it can subscribe to a non-existent topic name and 
then be notified once the topic is created by producing to it, there's no need 
to actually create a topic just by consuming from it. I think something similar 
to this behavior exists currently with the regex pattern subscription.  

> Implement "Auto Topic Creation" client side and remove support from Broker 
> side
> ---
>
> Key: KAFKA-2410
> URL: https://issues.apache.org/jira/browse/KAFKA-2410
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients, core
>Affects Versions: 0.8.2.1
>Reporter: Grant Henke
>Assignee: Grant Henke
>
> Auto topic creation on the broker has caused pain in the past; And today it 
> still causes unusual error handling requirements on the client side, added 
> complexity in the broker, mixed responsibility of the TopicMetadataRequest, 
> and limits configuration of the option to be cluster wide. In the future 
> having it broker side will also make features such as authorization very 
> difficult. 
> There have been discussions in the past of implementing this feature client 
> side. 
> [example|https://issues.apache.org/jira/browse/KAFKA-689?focusedCommentId=13548746&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13548746]
> This Jira is to track that discussion and implementation once the necessary 
> protocol support exists: KAFKA-2229



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4621) Fetch Request V3 docs list max_bytes twice

2017-01-11 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15819844#comment-15819844
 ] 

Jeff Widman commented on KAFKA-4621:


Yeah, I saw that... perhaps the descriptions could also be cleaned up slightly 
to make it more clear what they're referring to... I'll try to put together a 
PR at some point here... 

> Fetch Request V3 docs list max_bytes twice
> --
>
> Key: KAFKA-4621
> URL: https://issues.apache.org/jira/browse/KAFKA-4621
> Project: Kafka
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 0.10.2.0
>Reporter: Jeff Widman
>Priority: Minor
>
> http://kafka.apache.org/protocol.html
> Fetch Request (Version: 3)  lists "max_bytes" twice, but with different 
> descriptions. This is confusing, as it's not apparent if this is an 
> accidental mistake or a purposeful inclusion... if purposeful, it's not clear 
> why.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4621) Fetch Request V3 docs list max_bytes twice

2017-01-11 Thread Jeff Widman (JIRA)
Jeff Widman created KAFKA-4621:
--

 Summary: Fetch Request V3 docs list max_bytes twice
 Key: KAFKA-4621
 URL: https://issues.apache.org/jira/browse/KAFKA-4621
 Project: Kafka
  Issue Type: Bug
  Components: documentation
Affects Versions: 0.10.2.0
Reporter: Jeff Widman
Priority: Minor


http://kafka.apache.org/protocol.html

Fetch Request (Version: 3)  lists "max_bytes" twice, but with different 
descriptions. This is confusing, as it's not apparent if this is an accidental 
mistake or a purposeful inclusion... if purposeful, it's not clear why.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [VOTE] KIP-106 - Default unclean.leader.election.enabled True => False

2017-01-11 Thread Jeff Widman
+1 nonbinding. We were bit by this in a production environment.

On Wed, Jan 11, 2017 at 11:42 AM, Ian Wrigley  wrote:

> +1 (non-binding)
>
> > On Jan 11, 2017, at 11:33 AM, Jay Kreps  wrote:
> >
> > +1
> >
> > On Wed, Jan 11, 2017 at 10:56 AM, Ben Stopford  wrote:
> >
> >> Looks like there was a good consensus on the discuss thread for KIP-106
> so
> >> lets move to a vote.
> >>
> >> Please chime in if you would like to change the default for
> >> unclean.leader.election.enabled from true to false.
> >>
> >> https://cwiki.apache.org/confluence/display/KAFKA/%
> >> 5BWIP%5D+KIP-106+-+Change+Default+unclean.leader.
> >> election.enabled+from+True+to+False
> >>
> >> B
> >>
>
>


[jira] [Commented] (KAFKA-1817) AdminUtils.createTopic vs kafka-topics.sh --create with partitions

2017-01-05 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803174#comment-15803174
 ] 

Jeff Widman commented on KAFKA-1817:


So can this be closed?

> AdminUtils.createTopic vs kafka-topics.sh --create with partitions
> --
>
> Key: KAFKA-1817
> URL: https://issues.apache.org/jira/browse/KAFKA-1817
> Project: Kafka
>  Issue Type: Sub-task
>Affects Versions: 0.8.2.0
> Environment: debian linux current version  up to date
>Reporter: Jason Kania
>
> When topics are created using AdminUtils.createTopic in code, no partitions 
> folder is created The zookeeper shell shows this.
> ls /brokers/topics/foshizzle
> []
> However, when kafka-topics.sh --create is run, the partitions folder is 
> created:
> ls /brokers/topics/foshizzle
> [partitions]
> The unfortunately useless error message "KeeperErrorCode = NoNode for 
> /brokers/topics/periodicReading/partitions" makes it unclear what to do. When 
> the topics are listed via kafka-topics.sh, they appear to have been created 
> fine. It would be good if the exception was wrapped by Kafka to suggested 
> looking in the zookeeper shell so a person didn't have to dig around to 
> understand what the meaning of this path is...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3135) Unexpected delay before fetch response transmission

2016-12-30 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15788686#comment-15788686
 ] 

Jeff Widman commented on KAFKA-3135:


It's not currently a critical issue for my company. Typically when we're 
considering upgrading we look at outstanding bugs to evaluate whether to 
upgrade or wait, so I just wanted the tags to be corrected. Thanks [~ewencp] 
for handling.


> Unexpected delay before fetch response transmission
> ---
>
> Key: KAFKA-3135
> URL: https://issues.apache.org/jira/browse/KAFKA-3135
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.0, 0.10.1.0, 0.9.0.1, 0.10.0.0, 0.10.0.1, 0.10.1.1
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Critical
> Fix For: 0.10.2.0
>
>
> From the user list, Krzysztof Ciesielski reports the following:
> {quote}
> Scenario description:
> First, a producer writes 50 elements into a topic
> Then, a consumer starts to read, polling in a loop.
> When "max.partition.fetch.bytes" is set to a relatively small value, each
> "consumer.poll()" returns a batch of messages.
> If this value is left as default, the output tends to look like this:
> Poll returned 13793 elements
> Poll returned 13793 elements
> Poll returned 13793 elements
> Poll returned 13793 elements
> Poll returned 0 elements
> Poll returned 0 elements
> Poll returned 0 elements
> Poll returned 0 elements
> Poll returned 13793 elements
> Poll returned 13793 elements
> As we can see, there are weird "gaps" when poll returns 0 elements for some
> time. What is the reason for that? Maybe there are some good practices
> about setting "max.partition.fetch.bytes" which I don't follow?
> {quote}
> The gist to reproduce this problem is here: 
> https://gist.github.com/kciesielski/054bb4359a318aa17561.
> After some initial investigation, the delay appears to be in the server's 
> networking layer. Basically I see a delay of 5 seconds from the time that 
> Selector.send() is invoked in SocketServer.Processor with the fetch response 
> to the time that the send is completed. Using netstat in the middle of the 
> delay shows the following output:
> {code}
> tcp4   0  0  10.191.0.30.55455  10.191.0.30.9092   ESTABLISHED
> tcp4   0 102400  10.191.0.30.9092   10.191.0.30.55454  ESTABLISHED
> {code}
> From this, it looks like the data reaches the send buffer, but needs to be 
> flushed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3297) More optimally balanced partition assignment strategy (new consumer)

2016-12-29 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15786986#comment-15786986
 ] 

Jeff Widman commented on KAFKA-3297:


[~ewencp] KIP-54 tackles both sticky re-assignments and a more "fair" initial 
assignment. Search the wiki page for "fair yet sticky" for a section that 
provides a bit more context.

> More optimally balanced partition assignment strategy (new consumer)
> 
>
> Key: KAFKA-3297
> URL: https://issues.apache.org/jira/browse/KAFKA-3297
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer
>Reporter: Andrew Olson
>Assignee: Andrew Olson
> Fix For: 0.10.2.0
>
>
> While the roundrobin partition assignment strategy is an improvement over the 
> range strategy, when the consumer topic subscriptions are not identical 
> (previously disallowed but will be possible as of KAFKA-2172) it can produce 
> heavily skewed assignments. As suggested 
> [here|https://issues.apache.org/jira/browse/KAFKA-2172?focusedCommentId=14530767&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14530767]
>  it would be nice to have a strategy that attempts to assign an equal number 
> of partitions to each consumer in a group, regardless of how similar their 
> individual topic subscriptions are. We can accomplish this by tracking the 
> number of partitions assigned to each consumer, and having the partition 
> assignment loop assign each partition to a consumer interested in that topic 
> with the least number of partitions assigned. 
> Additionally, we can optimize the distribution fairness by adjusting the 
> partition assignment order:
> * Topics with fewer consumers are assigned first.
> * In the event of a tie for least consumers, the topic with more partitions 
> is assigned first.
> The general idea behind these two rules is to keep the most flexible 
> assignment choices available as long as possible by starting with the most 
> constrained partitions/consumers.
> This JIRA addresses the new consumer. For the original high-level consumer, 
> see KAFKA-2435.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2273) Add rebalance with a minimal number of reassignments to server-defined strategy list

2016-12-21 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768710#comment-15768710
 ] 

Jeff Widman commented on KAFKA-2273:


What was the outcome of the vote?

> Add rebalance with a minimal number of reassignments to server-defined 
> strategy list
> 
>
> Key: KAFKA-2273
> URL: https://issues.apache.org/jira/browse/KAFKA-2273
> Project: Kafka
>  Issue Type: Sub-task
>  Components: consumer
>Reporter: Olof Johansson
>Assignee: Vahid Hashemian
>  Labels: newbie++, newbiee
> Fix For: 0.10.2.0
>
>
> Add a new partitions.assignment.strategy to the server-defined list that will 
> do reassignments based on moving as few partitions as possible. This should 
> be a quite common reassignment strategy especially for the cases where the 
> consumer has to maintain state, either in memory, or on disk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2331) Kafka does not spread partitions in a topic among all consumers evenly

2016-12-21 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15766509#comment-15766509
 ] 

Jeff Widman commented on KAFKA-2331:


Isn't this what round robin partitioning strategy was trying to solve?

If so, this issue should be closed.

> Kafka does not spread partitions in a topic among all consumers evenly
> --
>
> Key: KAFKA-2331
> URL: https://issues.apache.org/jira/browse/KAFKA-2331
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.1.1
>Reporter: Stefan Miklosovic
>
> I want to have 1 topic with 10 partitions. I am using default configuration 
> of Kafka. I create 1 topic with 10 partitions by that helper script and now I 
> am about to produce messages to it.
> The thing is that even all partitions are indeed consumed, some consumers 
> have more then 1 partition assigned even I have number of consumer threads 
> equal to partitions in a topic hence some threads are idle.
> Let's describe it in more detail.
> I know that common stuff that you need one consumer thread per partition. I 
> want to be able to commit offsets per partition and this is possible only 
> when I have 1 thread per consumer connector per partition (I am using high 
> level consumer).
> So I create 10 threads, in each thread I am calling 
> Consumer.createJavaConsumerConnector() where I am doing this
> topicCountMap.put("mytopic", 1);
> and in the end I have 1 iterator which consumes messages from 1 partition.
> When I do this 10 times, I have 10 consumers, consumer per thread per 
> partition where I can commit offsets independently per partition because if I 
> put different number from 1 in topic map, I would end up with more then 1 
> consumer thread for that topic for given consumer instance so if I am about 
> to commit offsets with created consumer instance, it would commit them for 
> all threads which is not desired.
> But the thing is that when I use consumers, only 7 consumers are involved and 
> it seems that other consumer threads are idle but I do not know why.
> The thing is that I am creating these consumer threads in a loop. So I start 
> first thread (submit to executor service), then another, then another and so 
> on.
> So the scenario is that first consumer gets all 10 partitions, then 2nd 
> connects so it is splits between these two to 5 and 5 (or something similar), 
> then other threads are connecting.
> I understand this as a partition rebalancing among all consumers so it 
> behaves well in such sense that if more consumers are being created, 
> partition rebalancing occurs between these consumers so every consumer should 
> have some partitions to operate upon.
> But from the results I see that there is only 7 consumers and according to 
> consumed messages it seems they are split like 3,2,1,1,1,1,1 partition-wise. 
> Yes, these 7 consumers covered all 10 partitions, but why consumers with more 
> then 1 partition do no split and give partitions to remaining 3 consumers?
> I am pretty much wondering what is happening with remaining 3 threads and why 
> they do not "grab" partitions from consumers which have more then 1 partition 
> assigned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2019) RoundRobinAssignor clusters by consumer

2016-12-21 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15766498#comment-15766498
 ] 

Jeff Widman commented on KAFKA-2019:


As noted in KIP-54 
(https://cwiki.apache.org/confluence/display/pages/viewpage.action?pageId=62692483)
 this is not relevant to the new consumer.

Could someone update the issue title to make it clear it only applies to the 
old consumer?

I also suspect this may never get merged as the new consumer is the future, in 
which case it'd be nice if this were closed as "wontfix"


> RoundRobinAssignor clusters by consumer
> ---
>
> Key: KAFKA-2019
> URL: https://issues.apache.org/jira/browse/KAFKA-2019
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Reporter: Joseph Holsten
>Assignee: Neha Narkhede
>Priority: Minor
> Attachments: 0001-sort-consumer-thread-ids-by-hashcode.patch, 
> KAFKA-2019.patch
>
>
> When rolling out a change today, I noticed that some of my consumers are 
> "greedy", taking far more partitions than others.
> The cause is that the RoundRobinAssignor is using a list of ConsumerThreadIds 
> sorted by toString, which is {{ "%s-%d".format(consumer, threadId)}}. This 
> causes each consumer's threads to be adjacent to each other.
> One possible fix would be to define ConsumerThreadId.hashCode, and sort by 
> that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-3297) More optimally balanced partition assignment strategy (new consumer)

2016-12-21 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15766465#comment-15766465
 ] 

Jeff Widman edited comment on KAFKA-3297 at 12/21/16 8:38 AM:
--

Is this being superceded by KIP-54? 
https://cwiki.apache.org/confluence/display/pages/viewpage.action?pageId=62692483


was (Author: jeffwidman):
Was this KIP ever voted on? I see there's only a handful of messages about it, 
one of which mentions patching the round robin implementation to avoid 
"clumping" partitions from the same topic onto the same consumer. 

> More optimally balanced partition assignment strategy (new consumer)
> 
>
> Key: KAFKA-3297
> URL: https://issues.apache.org/jira/browse/KAFKA-3297
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Andrew Olson
>Assignee: Andrew Olson
> Fix For: 0.10.2.0
>
>
> While the roundrobin partition assignment strategy is an improvement over the 
> range strategy, when the consumer topic subscriptions are not identical 
> (previously disallowed but will be possible as of KAFKA-2172) it can produce 
> heavily skewed assignments. As suggested 
> [here|https://issues.apache.org/jira/browse/KAFKA-2172?focusedCommentId=14530767&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14530767]
>  it would be nice to have a strategy that attempts to assign an equal number 
> of partitions to each consumer in a group, regardless of how similar their 
> individual topic subscriptions are. We can accomplish this by tracking the 
> number of partitions assigned to each consumer, and having the partition 
> assignment loop assign each partition to a consumer interested in that topic 
> with the least number of partitions assigned. 
> Additionally, we can optimize the distribution fairness by adjusting the 
> partition assignment order:
> * Topics with fewer consumers are assigned first.
> * In the event of a tie for least consumers, the topic with more partitions 
> is assigned first.
> The general idea behind these two rules is to keep the most flexible 
> assignment choices available as long as possible by starting with the most 
> constrained partitions/consumers.
> This JIRA addresses the new consumer. For the original high-level consumer, 
> see KAFKA-2435.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3297) More optimally balanced partition assignment strategy (new consumer)

2016-12-21 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15766465#comment-15766465
 ] 

Jeff Widman commented on KAFKA-3297:


Was this KIP ever voted on? I see there's only a handful of messages about it, 
one of which mentions patching the round robin implementation to avoid 
"clumping" partitions from the same topic onto the same consumer. 

> More optimally balanced partition assignment strategy (new consumer)
> 
>
> Key: KAFKA-3297
> URL: https://issues.apache.org/jira/browse/KAFKA-3297
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Andrew Olson
>Assignee: Andrew Olson
> Fix For: 0.10.2.0
>
>
> While the roundrobin partition assignment strategy is an improvement over the 
> range strategy, when the consumer topic subscriptions are not identical 
> (previously disallowed but will be possible as of KAFKA-2172) it can produce 
> heavily skewed assignments. As suggested 
> [here|https://issues.apache.org/jira/browse/KAFKA-2172?focusedCommentId=14530767&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14530767]
>  it would be nice to have a strategy that attempts to assign an equal number 
> of partitions to each consumer in a group, regardless of how similar their 
> individual topic subscriptions are. We can accomplish this by tracking the 
> number of partitions assigned to each consumer, and having the partition 
> assignment loop assign each partition to a consumer interested in that topic 
> with the least number of partitions assigned. 
> Additionally, we can optimize the distribution fairness by adjusting the 
> partition assignment order:
> * Topics with fewer consumers are assigned first.
> * In the event of a tie for least consumers, the topic with more partitions 
> is assigned first.
> The general idea behind these two rules is to keep the most flexible 
> assignment choices available as long as possible by starting with the most 
> constrained partitions/consumers.
> This JIRA addresses the new consumer. For the original high-level consumer, 
> see KAFKA-2435.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2172) Round-robin partition assignment strategy too restrictive

2016-12-21 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15766445#comment-15766445
 ] 

Jeff Widman commented on KAFKA-2172:


Should this be marked as resolved?

KAFKA-2196 added this to the new consumer quite a while ago. 

KAFKA-2434 has a patch available but appears dead in the water, so if it's not 
getting merged then there's nothing more to do on this ticket.

> Round-robin partition assignment strategy too restrictive
> -
>
> Key: KAFKA-2172
> URL: https://issues.apache.org/jira/browse/KAFKA-2172
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Rosenberg
>
> The round-ropin partition assignment strategy, was introduced for the 
> high-level consumer, starting with 0.8.2.1.  This appears to be a very 
> attractive feature, but it has an unfortunate restriction, which prevents it 
> from being easily utilized.  That is that it requires all consumers in the 
> consumer group have identical topic regex selectors, and that they have the 
> same number of consumer threads.
> It turns out this is not always the case for our deployments.  It's not 
> unusual to run multiple consumers within a single process (with different 
> topic selectors), or we might have multiple processes dedicated for different 
> topic subsets.  Agreed, we could change these to have separate group ids for 
> each sub topic selector (but unfortunately, that's easier said than done).  
> In several cases, we do at least have separate client.ids set for each 
> sub-consumer, so it would be incrementally better if we could at least loosen 
> the requirement such that each set of topics selected by a groupid/clientid 
> pair are the same.
> But, if we want to do a rolling restart for a new version of a consumer 
> config, the cluster will likely be in a state where it's not possible to have 
> a single config until the full rolling restart completes across all nodes.  
> This results in a consumer outage while the rolling restart is happening.
> Finally, it's especially problematic if we want to canary a new version for a 
> period before rolling to the whole cluster.
> I'm not sure why this restriction should exist (as it obviously does not 
> exist for the 'range' assignment strategy).  It seems it could be made to 
> work reasonably well with heterogenous topic selection and heterogenous 
> thread counts.  The documentation states that "The round-robin partition 
> assignor lays out all the available partitions and all the available consumer 
> threads. It then proceeds to do a round-robin assignment from partition to 
> consumer thread."
> If the assignor can "lay out all the available partitions and all the 
> available consumer threads", it should be able to uniformly assign partitions 
> to the available threads.  In each case, if a thread belongs to a consumer 
> that doesn't have that partition selected, just move to the next available 
> thread that does have the selection, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2196) remove roundrobin identical topic constraint in consumer coordinator

2016-12-21 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15766440#comment-15766440
 ] 

Jeff Widman commented on KAFKA-2196:


What version did this land in?

> remove roundrobin identical topic constraint in consumer coordinator
> 
>
> Key: KAFKA-2196
> URL: https://issues.apache.org/jira/browse/KAFKA-2196
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Onur Karaman
>Assignee: Onur Karaman
> Attachments: KAFKA-2196.patch
>
>
> roundrobin doesn't need to make all consumers have identical topic 
> subscriptions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-242) Subsequent calls of ConsumerConnector.createMessageStreams cause Consumer offset to be incorrect

2016-12-20 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15765835#comment-15765835
 ] 

Jeff Widman commented on KAFKA-242:
---

Does this bug still exist in 0.10?

> Subsequent calls of ConsumerConnector.createMessageStreams cause Consumer 
> offset to be incorrect
> 
>
> Key: KAFKA-242
> URL: https://issues.apache.org/jira/browse/KAFKA-242
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.7
>Reporter: David Arthur
> Attachments: kafka.log
>
>
> When calling ConsumerConnector.createMessageStreams in rapid succession, the 
> Consumer offset is incorrectly advanced causing the consumer to lose 
> messages. This seems to happen when createMessageStreams is called before the 
> rebalancing triggered by the previous call to createMessageStreams has 
> completed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3853) Report offsets for empty groups in ConsumerGroupCommand

2016-12-18 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15759588#comment-15759588
 ] 

Jeff Widman commented on KAFKA-3853:


Did the KIP get the missing vote required to pass?

> Report offsets for empty groups in ConsumerGroupCommand
> ---
>
> Key: KAFKA-3853
> URL: https://issues.apache.org/jira/browse/KAFKA-3853
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Vahid Hashemian
> Fix For: 0.10.2.0
>
>
> We ought to be able to display offsets for groups which either have no active 
> members or which are not using group management. The owner column can be left 
> empty or set to "N/A". If a group is active, I'm not sure it would make sense 
> to report all offsets, in particular when partitions are unassigned, but if 
> it seems problematic to do so, we could enable the behavior with a flag (e.g. 
> --include-unassigned).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3135) Unexpected delay before fetch response transmission

2016-12-14 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15750082#comment-15750082
 ] 

Jeff Widman commented on KAFKA-3135:


Can the tags on this issue be updated to note that it applies to 0.10.x 
versions as well? 

At least, that's what it seems from reading through the issue description.

> Unexpected delay before fetch response transmission
> ---
>
> Key: KAFKA-3135
> URL: https://issues.apache.org/jira/browse/KAFKA-3135
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.0
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Critical
> Fix For: 0.10.2.0
>
>
> From the user list, Krzysztof Ciesielski reports the following:
> {quote}
> Scenario description:
> First, a producer writes 50 elements into a topic
> Then, a consumer starts to read, polling in a loop.
> When "max.partition.fetch.bytes" is set to a relatively small value, each
> "consumer.poll()" returns a batch of messages.
> If this value is left as default, the output tends to look like this:
> Poll returned 13793 elements
> Poll returned 13793 elements
> Poll returned 13793 elements
> Poll returned 13793 elements
> Poll returned 0 elements
> Poll returned 0 elements
> Poll returned 0 elements
> Poll returned 0 elements
> Poll returned 13793 elements
> Poll returned 13793 elements
> As we can see, there are weird "gaps" when poll returns 0 elements for some
> time. What is the reason for that? Maybe there are some good practices
> about setting "max.partition.fetch.bytes" which I don't follow?
> {quote}
> The gist to reproduce this problem is here: 
> https://gist.github.com/kciesielski/054bb4359a318aa17561.
> After some initial investigation, the delay appears to be in the server's 
> networking layer. Basically I see a delay of 5 seconds from the time that 
> Selector.send() is invoked in SocketServer.Processor with the fetch response 
> to the time that the send is completed. Using netstat in the middle of the 
> delay shows the following output:
> {code}
> tcp4   0  0  10.191.0.30.55455  10.191.0.30.9092   ESTABLISHED
> tcp4   0 102400  10.191.0.30.9092   10.191.0.30.55454  ESTABLISHED
> {code}
> From this, it looks like the data reaches the send buffer, but needs to be 
> flushed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3853) Report offsets for empty groups in ConsumerGroupCommand

2016-12-09 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15736740#comment-15736740
 ] 

Jeff Widman commented on KAFKA-3853:


Any idea if this will land in 0.10.2?

> Report offsets for empty groups in ConsumerGroupCommand
> ---
>
> Key: KAFKA-3853
> URL: https://issues.apache.org/jira/browse/KAFKA-3853
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Vahid Hashemian
>
> We ought to be able to display offsets for groups which either have no active 
> members or which are not using group management. The owner column can be left 
> empty or set to "N/A". If a group is active, I'm not sure it would make sense 
> to report all offsets, in particular when partitions are unassigned, but if 
> it seems problematic to do so, we could enable the behavior with a flag (e.g. 
> --include-unassigned).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4519) Delete old unused branches in git repo

2016-12-09 Thread Jeff Widman (JIRA)
Jeff Widman created KAFKA-4519:
--

 Summary: Delete old unused branches in git repo
 Key: KAFKA-4519
 URL: https://issues.apache.org/jira/browse/KAFKA-4519
 Project: Kafka
  Issue Type: Task
Reporter: Jeff Widman
Priority: Trivial


Delete these old git branches, as they're quite outdated and not relevant for 
various version branches:
* consumer_redesign
* transactional_messaging
* 0.8.0-beta1-candidate1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4517) Remove kafka-consumer-offset-checker.sh script since already deprecated in Kafka 9

2016-12-09 Thread Jeff Widman (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Widman updated KAFKA-4517:
---
Summary: Remove kafka-consumer-offset-checker.sh script since already 
deprecated in Kafka 9  (was: Remove kafka-consumer-offset-checker.sh script 
since deprecated in Kafka 9)

> Remove kafka-consumer-offset-checker.sh script since already deprecated in 
> Kafka 9
> --
>
> Key: KAFKA-4517
> URL: https://issues.apache.org/jira/browse/KAFKA-4517
> Project: Kafka
>  Issue Type: Task
>Affects Versions: 0.10.1.0, 0.10.0.0, 0.10.0.1
>    Reporter: Jeff Widman
>Priority: Minor
>
> Kafka 9 deprecated kafka-consumer-offset-checker.sh 
> (kafka.tools.ConsumerOffsetChecker) in favor of kafka-consumer-groups.sh 
> (kafka.admin.ConsumerGroupCommand). 
> Since this was deprecated in 9, and the full functionality of the old script 
> appears to be available in the new script, can we remove the old shell script 
> in 10? 
> From an Ops perspective, it's confusing when I'm trying to check consumer 
> offsets that I open the bin directory, and see a script that seems to do 
> exactly what I want, only to later discover that I'm not supposed to use it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4517) Remove kafka-consumer-offset-checker.sh script since deprecated in Kafka 9

2016-12-09 Thread Jeff Widman (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Widman updated KAFKA-4517:
---
Summary: Remove kafka-consumer-offset-checker.sh script since deprecated in 
Kafka 9  (was: Remove shell scripts deprecated in Kafka 9)

> Remove kafka-consumer-offset-checker.sh script since deprecated in Kafka 9
> --
>
> Key: KAFKA-4517
> URL: https://issues.apache.org/jira/browse/KAFKA-4517
> Project: Kafka
>  Issue Type: Task
>Affects Versions: 0.10.1.0, 0.10.0.0, 0.10.0.1
>    Reporter: Jeff Widman
>Priority: Minor
>
> Kafka 9 deprecated kafka-consumer-offset-checker.sh 
> (kafka.tools.ConsumerOffsetChecker) in favor of kafka-consumer-groups.sh 
> (kafka.admin.ConsumerGroupCommand). 
> Since this was deprecated in 9, and the full functionality of the old script 
> appears to be available in the new script, can we remove the old shell script 
> in 10? 
> From an Ops perspective, it's confusing when I'm trying to check consumer 
> offsets that I open the bin directory, and see a script that seems to do 
> exactly what I want, only to later discover that I'm not supposed to use it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >