Newbie JIRAs

2016-08-30 Thread Srabasti Banerjee
Hi,I want to contribute to Apache Kafka.Can you please let me know any newbie 
JIRAs I could pick up?ThanksSrabasti


Re: [DISCUSS] Remove beta label from the new Java consumer

2016-08-30 Thread Jaikiran Pai
We have been using the (new) Java consumer API in 0.9.0.1 for a while 
now. We have some well known issues with it - like heart beats being 
part of the same thread causing the consumer to sometimes be considered 
dead. I understand that this has been fixed in 0.10.0.1 but we haven't 
yet had a chance to migrate to it. We plan to do that in the next month 
or so.


Personally, I would be OK if the beta label is removed from it if the 
dev team is sure the API isn't going to change. I don't know if that's 
true or not post 0.10.0.1. For me the major thing that I think needs to 
be addressed is these JIRAs which actually expose some API 
implementation level issues. Not sure if solving those issues will 
involve changes to API itself:


https://issues.apache.org/jira/browse/KAFKA-1894
https://issues.apache.org/jira/browse/KAFKA-3540
https://issues.apache.org/jira/browse/KAFKA-3539

If solving issues like these will not involve changes to the API, I 
think it's safe to move it out of beta label.


-Jaikiran

On Tuesday 30 August 2016 05:09 PM, Ismael Juma wrote:

Thanks for the feedback everyone. Since Harsha said that he is OK either
way and everyone else is in favour, I think we should go ahead with this.
Since we committed to API stability for the new Java consumer in 0.10.0.0
via KIP-45, this is simply a documentation change and I don't think we need
an official vote thread (we didn't have one for the equivalent producer
change).

Ismael

On Mon, Aug 29, 2016 at 7:37 PM, Jay Kreps  wrote:


+1 I talk to a lot of kafka users, and I would say > 75% of people doing
new things are on the new consumer despite our warnings :-)

-Jay

On Thu, Aug 25, 2016 at 2:05 PM, Jason Gustafson 
wrote:


I'm +1 also. I feel a lot more confident about this with all of the

system

testing we now have in place (including the tests covering Streams and
Connect).

-Jason

On Thu, Aug 25, 2016 at 9:57 AM, Gwen Shapira  wrote:


Makes sense :)

On Thu, Aug 25, 2016 at 9:40 AM, Neha Narkhede 

wrote:

Yeah, I'm supportive of this.

On Thu, Aug 25, 2016 at 9:26 AM Ismael Juma 

wrote:

Hi Gwen,

We have a few recent stories of people using Connect and Streams in
production. That means the new Java Consumer too. :)

Ismael

On Thu, Aug 25, 2016 at 5:09 PM, Gwen Shapira 

wrote:

Originally, we suggested keeping the beta label until we know

someone

successfully uses the new consumer in production.

We can consider the recent KIPs enough, but IMO it will be better

if

someone with production deployment hanging out on our mailing list
will confirm good experience with the new consumer.

Gwen

On Wed, Aug 24, 2016 at 8:45 PM, Ismael Juma 

wrote:

Hi all,

We currently say the following in our documentation:

"As of the 0.9.0 release we have added a new Java consumer to

replace

our

existing high-level ZooKeeper-based consumer and low-level

consumer

APIs.

This client is considered beta quality."[1]

Since then, Jason and the community have done a lot of work to

improve

it

(including KIP-41 and KIP-62), we declared it API stable in

0.10.0.0

and

it's the only option for those that need security support. Yes,

it

still

has bugs, but so does the old consumer and all development is

currently

focused on the new consumer.

As such, I propose we remove the beta label for the next release

and

switch

our tools to use the new consumer by default unless the

zookeeper

command-line option is present (for compatibility). This is

similar

to

what

we did it for the new producer in 0.9.0.0, but backwards

compatible.

Thoughts?

Ismael

[1] http://kafka.apache.org/documentation.html#consumerapi



--
Gwen Shapira
Product Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter | blog


--
Thanks,
Neha



--
Gwen Shapira
Product Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter | blog





[jira] [Assigned] (KAFKA-4103) DumpLogSegments cannot print data from offsets topic

2016-08-30 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson reassigned KAFKA-4103:
--

Assignee: Jason Gustafson

> DumpLogSegments cannot print data from offsets topic
> 
>
> Key: KAFKA-4103
> URL: https://issues.apache.org/jira/browse/KAFKA-4103
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Reporter: Ewen Cheslack-Postava
>Assignee: Jason Gustafson
>Priority: Blocker
> Fix For: 0.10.1.0
>
>
> It looks like there's been a regression in the DumpLogSegments tool. I'm 
> marking it a blocker since it appears we can no longer dump offset 
> information from this tool, which makes it really hard to debug anything 
> related to __consumer_offsets.
> The 0.10.0 branch seems to work fine, but even with offsets log files 
> generated using only old formats (0.10.0 branch), the DumpLogSegments tool 
> from trunk (i.e. 0.10.1.0-SNAPSHOT with latest githash 
> b91eeac9438b8718c410045b0e9191296ebb536d as of reporting this) will cause the 
> exception below. This was found while doing some basic testing of KAFKA-4062.
> {quote}
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> offset: 0 position: 0 CreateTime: 1472615183913 isvalid: true payloadsize: 
> 199 magic: 1 compresscodec: NoCompressionCodec crc: 2036280914 keysize: 
> 26Exception in thread "main" java.util.IllegalFormatConversionException: x != 
> scala.math.BigInt
>   at 
> java.util.Formatter$FormatSpecifier.failConversion(Formatter.java:4045)
>   at 
> java.util.Formatter$FormatSpecifier.printInteger(Formatter.java:2748)
>   at 
> java.util.Formatter$FormatSpecifier.print(Formatter.java:2702)
>   at java.util.Formatter.format(Formatter.java:2488)
>   at java.util.Formatter.format(Formatter.java:2423)
>   at java.lang.String.format(String.java:2792)
>   at 
> kafka.tools.DumpLogSegments$OffsetsMessageParser.kafka$tools$DumpLogSegments$OffsetsMessageParser$$hex(DumpLogSegments.scala:240)
>   at 
> kafka.tools.DumpLogSegments$OffsetsMessageParser$$anonfun$3.apply(DumpLogSegments.scala:272)
>   at 
> kafka.tools.DumpLogSegments$OffsetsMessageParser$$anonfun$3.apply(DumpLogSegments.scala:262)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>   at 
> scala.collection.AbstractTraversable.map(Traversable.scala:105)
>   at 
> kafka.tools.DumpLogSegments$OffsetsMessageParser.parseGroupMetadata(DumpLogSegments.scala:262)
>   at 
> kafka.tools.DumpLogSegments$OffsetsMessageParser.parse(DumpLogSegments.scala:290)
>   at 
> kafka.tools.DumpLogSegments$$anonfun$kafka$tools$DumpLogSegments$$dumpLog$1$$anonfun$apply$3.apply(DumpLogSegments.scala:332)
>   at 
> kafka.tools.DumpLogSegments$$anonfun$kafka$tools$DumpLogSegments$$dumpLog$1$$anonfun$apply$3.apply(DumpLogSegments.scala:312)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at 
> kafka.utils.IteratorTemplate.foreach(IteratorTemplate.scala:30)
>   at 
> kafka.tools.DumpLogSegments$$anonfun$kafka$tools$DumpLogSegments$$dumpLog$1.apply(DumpLogSegments.scala:312)
>   at 
> kafka.tools.DumpLogSegments$$anonfun$kafka$tools$DumpLogSegments$$dumpLog$1.apply(DumpLogSegments.scala:310)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at 
> kafka.utils.IteratorTemplate.foreach(IteratorTemplate.scala:30)
>   at 
> kafka.tools.DumpLogSegments$.kafka$tools$DumpLogSegments$$dumpLog(DumpLogSegments.scala:310)
>   at 
> kafka.tools.DumpLogSegments$$anonfun$main$1.apply(DumpLogSegments.scala:96)
>   at 
> kafka.tools.DumpLogSegments$$anonfun$main$1.apply(DumpLogSegments.scala:92)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at 
> scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
>   at kafka.tools.DumpLogSegments$.main(DumpLogSegments.scala:92)
>   at kafka.tools.DumpLogSegments.main(DumpLogSegments.scala)
> {quote}
> I haven't really dug in, but the source of the error is confusing since the 
> relevant string formatting code doesn't seem to have changed anytime 
> recently. It seems it might be related to changes in the group metadata code. 
> I did the git bisect and this

[jira] [Resolved] (KAFKA-4062) Require --print-data-log if --offsets-decoder is enabled for DumpLogOffsets

2016-08-30 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava resolved KAFKA-4062.
--
   Resolution: Fixed
Fix Version/s: 0.10.1.0

Issue resolved by pull request 1797
[https://github.com/apache/kafka/pull/1797]

> Require --print-data-log if --offsets-decoder is enabled for DumpLogOffsets
> ---
>
> Key: KAFKA-4062
> URL: https://issues.apache.org/jira/browse/KAFKA-4062
> Project: Kafka
>  Issue Type: Improvement
>  Components: admin
>Reporter: Dustin Cote
>Assignee: Dustin Cote
>Priority: Minor
> Fix For: 0.10.1.0
>
>
> When using the DumpLogOffsets tool, if you want to print out contents of 
> __consumer_offsets, you would typically use --offsets-decoder as an option.  
> This option doesn't actually do much without --print-data-log enabled, so we 
> should just require it to prevent user errors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4062) Require --print-data-log if --offsets-decoder is enabled for DumpLogOffsets

2016-08-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15451086#comment-15451086
 ] 

ASF GitHub Bot commented on KAFKA-4062:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1797


> Require --print-data-log if --offsets-decoder is enabled for DumpLogOffsets
> ---
>
> Key: KAFKA-4062
> URL: https://issues.apache.org/jira/browse/KAFKA-4062
> Project: Kafka
>  Issue Type: Improvement
>  Components: admin
>Reporter: Dustin Cote
>Assignee: Dustin Cote
>Priority: Minor
> Fix For: 0.10.1.0
>
>
> When using the DumpLogOffsets tool, if you want to print out contents of 
> __consumer_offsets, you would typically use --offsets-decoder as an option.  
> This option doesn't actually do much without --print-data-log enabled, so we 
> should just require it to prevent user errors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1797: KAFKA-4062: Require --print-data-log if --offsets-...

2016-08-30 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1797


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-4103) DumpLogSegments cannot print data from offsets topic

2016-08-30 Thread Ewen Cheslack-Postava (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15451076#comment-15451076
 ] 

Ewen Cheslack-Postava commented on KAFKA-4103:
--

cc [~hachikuji] since it looks like you authored the commit that introduced the 
regression.

> DumpLogSegments cannot print data from offsets topic
> 
>
> Key: KAFKA-4103
> URL: https://issues.apache.org/jira/browse/KAFKA-4103
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Reporter: Ewen Cheslack-Postava
>Priority: Blocker
> Fix For: 0.10.1.0
>
>
> It looks like there's been a regression in the DumpLogSegments tool. I'm 
> marking it a blocker since it appears we can no longer dump offset 
> information from this tool, which makes it really hard to debug anything 
> related to __consumer_offsets.
> The 0.10.0 branch seems to work fine, but even with offsets log files 
> generated using only old formats (0.10.0 branch), the DumpLogSegments tool 
> from trunk (i.e. 0.10.1.0-SNAPSHOT with latest githash 
> b91eeac9438b8718c410045b0e9191296ebb536d as of reporting this) will cause the 
> exception below. This was found while doing some basic testing of KAFKA-4062.
> {quote}
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> offset: 0 position: 0 CreateTime: 1472615183913 isvalid: true payloadsize: 
> 199 magic: 1 compresscodec: NoCompressionCodec crc: 2036280914 keysize: 
> 26Exception in thread "main" java.util.IllegalFormatConversionException: x != 
> scala.math.BigInt
>   at 
> java.util.Formatter$FormatSpecifier.failConversion(Formatter.java:4045)
>   at 
> java.util.Formatter$FormatSpecifier.printInteger(Formatter.java:2748)
>   at 
> java.util.Formatter$FormatSpecifier.print(Formatter.java:2702)
>   at java.util.Formatter.format(Formatter.java:2488)
>   at java.util.Formatter.format(Formatter.java:2423)
>   at java.lang.String.format(String.java:2792)
>   at 
> kafka.tools.DumpLogSegments$OffsetsMessageParser.kafka$tools$DumpLogSegments$OffsetsMessageParser$$hex(DumpLogSegments.scala:240)
>   at 
> kafka.tools.DumpLogSegments$OffsetsMessageParser$$anonfun$3.apply(DumpLogSegments.scala:272)
>   at 
> kafka.tools.DumpLogSegments$OffsetsMessageParser$$anonfun$3.apply(DumpLogSegments.scala:262)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>   at 
> scala.collection.AbstractTraversable.map(Traversable.scala:105)
>   at 
> kafka.tools.DumpLogSegments$OffsetsMessageParser.parseGroupMetadata(DumpLogSegments.scala:262)
>   at 
> kafka.tools.DumpLogSegments$OffsetsMessageParser.parse(DumpLogSegments.scala:290)
>   at 
> kafka.tools.DumpLogSegments$$anonfun$kafka$tools$DumpLogSegments$$dumpLog$1$$anonfun$apply$3.apply(DumpLogSegments.scala:332)
>   at 
> kafka.tools.DumpLogSegments$$anonfun$kafka$tools$DumpLogSegments$$dumpLog$1$$anonfun$apply$3.apply(DumpLogSegments.scala:312)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at 
> kafka.utils.IteratorTemplate.foreach(IteratorTemplate.scala:30)
>   at 
> kafka.tools.DumpLogSegments$$anonfun$kafka$tools$DumpLogSegments$$dumpLog$1.apply(DumpLogSegments.scala:312)
>   at 
> kafka.tools.DumpLogSegments$$anonfun$kafka$tools$DumpLogSegments$$dumpLog$1.apply(DumpLogSegments.scala:310)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at 
> kafka.utils.IteratorTemplate.foreach(IteratorTemplate.scala:30)
>   at 
> kafka.tools.DumpLogSegments$.kafka$tools$DumpLogSegments$$dumpLog(DumpLogSegments.scala:310)
>   at 
> kafka.tools.DumpLogSegments$$anonfun$main$1.apply(DumpLogSegments.scala:96)
>   at 
> kafka.tools.DumpLogSegments$$anonfun$main$1.apply(DumpLogSegments.scala:92)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at 
> scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
>   at kafka.tools.DumpLogSegments$.main(DumpLogSegments.scala:92)
>   at kafka.tools.DumpLogSegments.main(DumpLogSegments.scala)
> {quote}
> I haven't really dug in, but the source of the error is confusing since the 
> relevant string formatting code doesn't seem to have changed anytime 
> recently. It seems it m

[jira] [Created] (KAFKA-4103) DumpLogSegments cannot print data from offsets topic

2016-08-30 Thread Ewen Cheslack-Postava (JIRA)
Ewen Cheslack-Postava created KAFKA-4103:


 Summary: DumpLogSegments cannot print data from offsets topic
 Key: KAFKA-4103
 URL: https://issues.apache.org/jira/browse/KAFKA-4103
 Project: Kafka
  Issue Type: Bug
  Components: tools
Reporter: Ewen Cheslack-Postava
Priority: Blocker
 Fix For: 0.10.1.0


It looks like there's been a regression in the DumpLogSegments tool. I'm 
marking it a blocker since it appears we can no longer dump offset information 
from this tool, which makes it really hard to debug anything related to 
__consumer_offsets.

The 0.10.0 branch seems to work fine, but even with offsets log files generated 
using only old formats (0.10.0 branch), the DumpLogSegments tool from trunk 
(i.e. 0.10.1.0-SNAPSHOT with latest githash 
b91eeac9438b8718c410045b0e9191296ebb536d as of reporting this) will cause the 
exception below. This was found while doing some basic testing of KAFKA-4062.

{quote}
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
offset: 0 position: 0 CreateTime: 1472615183913 isvalid: true payloadsize: 199 
magic: 1 compresscodec: NoCompressionCodec crc: 2036280914 keysize: 26Exception 
in thread "main" java.util.IllegalFormatConversionException: x != 
scala.math.BigInt
at 
java.util.Formatter$FormatSpecifier.failConversion(Formatter.java:4045)
at java.util.Formatter$FormatSpecifier.printInteger(Formatter.java:2748)
at java.util.Formatter$FormatSpecifier.print(Formatter.java:2702)
at java.util.Formatter.format(Formatter.java:2488)
at java.util.Formatter.format(Formatter.java:2423)
at java.lang.String.format(String.java:2792)
at 
kafka.tools.DumpLogSegments$OffsetsMessageParser.kafka$tools$DumpLogSegments$OffsetsMessageParser$$hex(DumpLogSegments.scala:240)
at 
kafka.tools.DumpLogSegments$OffsetsMessageParser$$anonfun$3.apply(DumpLogSegments.scala:272)
at 
kafka.tools.DumpLogSegments$OffsetsMessageParser$$anonfun$3.apply(DumpLogSegments.scala:262)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.immutable.List.foreach(List.scala:318)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.AbstractTraversable.map(Traversable.scala:105)
at 
kafka.tools.DumpLogSegments$OffsetsMessageParser.parseGroupMetadata(DumpLogSegments.scala:262)
at 
kafka.tools.DumpLogSegments$OffsetsMessageParser.parse(DumpLogSegments.scala:290)
at 
kafka.tools.DumpLogSegments$$anonfun$kafka$tools$DumpLogSegments$$dumpLog$1$$anonfun$apply$3.apply(DumpLogSegments.scala:332)
at 
kafka.tools.DumpLogSegments$$anonfun$kafka$tools$DumpLogSegments$$dumpLog$1$$anonfun$apply$3.apply(DumpLogSegments.scala:312)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at kafka.utils.IteratorTemplate.foreach(IteratorTemplate.scala:30)
at 
kafka.tools.DumpLogSegments$$anonfun$kafka$tools$DumpLogSegments$$dumpLog$1.apply(DumpLogSegments.scala:312)
at 
kafka.tools.DumpLogSegments$$anonfun$kafka$tools$DumpLogSegments$$dumpLog$1.apply(DumpLogSegments.scala:310)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at kafka.utils.IteratorTemplate.foreach(IteratorTemplate.scala:30)
at 
kafka.tools.DumpLogSegments$.kafka$tools$DumpLogSegments$$dumpLog(DumpLogSegments.scala:310)
at 
kafka.tools.DumpLogSegments$$anonfun$main$1.apply(DumpLogSegments.scala:96)
at 
kafka.tools.DumpLogSegments$$anonfun$main$1.apply(DumpLogSegments.scala:92)
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at kafka.tools.DumpLogSegments$.main(DumpLogSegments.scala:92)
at kafka.tools.DumpLogSegments.main(DumpLogSegments.scala)
{quote}

I haven't really dug in, but the source of the error is confusing since the 
relevant string formatting code doesn't seem to have changed anytime recently. 
It seems it might be related to changes in the group metadata code. I did the 
git bisect and this seems to be the bad commit:

{quote}
8c551675adb11947e9f27b20a9195c9c4a20b432 is the first bad commit
commit 8c551675adb11947e9f27b20a9195c9c4a20b432
Author: Jason Gustafson 
Date:   Wed Jun 15 19:46:42 2016 -0700

KAFKA-2720: expire group metadata when all offsets have expired

Author: Jason Gustafson 

Reviewers: Liquan Pei, Onur Karaman, Guozhang Wang

Closes #1427 from hachikuji/KAFKA-2720

:04 04 0da885a8896f0894940cc1b002160ca4e7176905 
eb5a672ae09159264993bc61b6a18a2f19de804e M  core
{quote}



--
This message was sent by Atlassian 

Re: [VOTE] KIP-77: Improve Kafka Streams Join Semantics

2016-08-30 Thread Neha Narkhede
+1 (binding)

On Tue, Aug 30, 2016 at 5:33 PM Ewen Cheslack-Postava 
wrote:

> +1 (binding)
>
> I think the major gap I notice in the PR is a lack of docs updates to
> notify people of the change. Given these are improvements and streams is
> still new, I wouldn't necessarily call them out as anything critical, but
> the changes should be noted somewhere.
>
> -Ewen
>
> On Tue, Aug 30, 2016 at 3:53 PM, Guozhang Wang  wrote:
>
> > +1 (binding)
> >
> > Thanks!
> >
> > On Tue, Aug 30, 2016 at 2:42 AM, Damian Guy 
> wrote:
> >
> > > +1
> > >
> > > On Mon, 29 Aug 2016 at 18:07 Eno Thereska 
> > wrote:
> > >
> > > > +1 (non-binding)
> > > >
> > > > > On 29 Aug 2016, at 12:22, Bill Bejeck  wrote:
> > > > >
> > > > > +1
> > > > >
> > > > > On Mon, Aug 29, 2016 at 5:50 AM, Matthias J. Sax <
> > > matth...@confluent.io>
> > > > > wrote:
> > > > >
> > > > >> I’d like to initiate the voting process for KIP-77:
> > > > >>
> > > > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > >> 77%3A+Improve+Kafka+Streams+Join+Semantics
> > > > >>
> > > > >> -Matthias
> > > > >>
> > > > >>
> > > > >>
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>
>
>
> --
> Thanks,
> Ewen
>
-- 
Thanks,
Neha


[DISCUSS] KIP-79 - ListOffsetRequest v1 and offsetForTime() method in new consumer.

2016-08-30 Thread Becket Qin
Hi Kafka devs,

I created KIP-79 to allow consumer to precisely query the offsets based on
timestamp.

In short we propose to :
1. add a ListOffsetRequest/ListOffsetResponse v1, and
2. add an offsetForTime() method in new consumer.

The KIP wiki is the following:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65868090

Comments are welcome.

Thanks,

Jiangjie (Becket) Qin


[jira] [Commented] (KAFKA-3410) Unclean leader election and "Halting because log truncation is not allowed"

2016-08-30 Thread Maysam Yabandeh (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15450881#comment-15450881
 ] 

Maysam Yabandeh commented on KAFKA-3410:


[~wushujames] I do not think KAFKA-3924 intended to fix this issue, it rather 
tries to alleviate the impact by avoid data loss caused by concurrent halts. 

Unfortunately the patch committed in KAFKA-3924 could result in 
[deadlock|https://issues.apache.org/jira/browse/KAFKA-3924?focusedCommentId=15419873&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15419873]
 as invoking System.exit in any thread known to shutdown hook would cause a 
circular dependency. The patch prepared in KAFKA-4039 would address this issue 
by changing the way we invoke System.exit. Even after KAFKA-4039, the issue in 
KAFKA-3410 or KAFKA-3861 will still remains and requires a separate solution.


> Unclean leader election and "Halting because log truncation is not allowed"
> ---
>
> Key: KAFKA-3410
> URL: https://issues.apache.org/jira/browse/KAFKA-3410
> Project: Kafka
>  Issue Type: Bug
>Reporter: James Cheng
>
> I ran into a scenario where one of my brokers would continually shutdown, 
> with the error message:
> [2016-02-25 00:29:39,236] FATAL [ReplicaFetcherThread-0-1], Halting because 
> log truncation is not allowed for topic test, Current leader 1's latest 
> offset 0 is less than replica 2's latest offset 151 
> (kafka.server.ReplicaFetcherThread)
> I managed to reproduce it with the following scenario:
> 1. Start broker1, with unclean.leader.election.enable=false
> 2. Start broker2, with unclean.leader.election.enable=false
> 3. Create topic, single partition, with replication-factor 2.
> 4. Write data to the topic.
> 5. At this point, both brokers are in the ISR. Broker1 is the partition 
> leader.
> 6. Ctrl-Z on broker2. (Simulates a GC pause or a slow network) Broker2 gets 
> dropped out of ISR. Broker1 is still the leader. I can still write data to 
> the partition.
> 7. Shutdown Broker1. Hard or controlled, doesn't matter.
> 8. rm -rf the log directory of broker1. (This simulates a disk replacement or 
> full hardware replacement)
> 9. Resume broker2. It attempts to connect to broker1, but doesn't succeed 
> because broker1 is down. At this point, the partition is offline. Can't write 
> to it.
> 10. Resume broker1. Broker1 resumes leadership of the topic. Broker2 attempts 
> to join ISR, and immediately halts with the error message:
> [2016-02-25 00:29:39,236] FATAL [ReplicaFetcherThread-0-1], Halting because 
> log truncation is not allowed for topic test, Current leader 1's latest 
> offset 0 is less than replica 2's latest offset 151 
> (kafka.server.ReplicaFetcherThread)
> I am able to recover by setting unclean.leader.election.enable=true on my 
> brokers.
> I'm trying to understand a couple things:
> * In step 10, why is broker1 allowed to resume leadership even though it has 
> no data?
> * In step 10, why is it necessary to stop the entire broker due to one 
> partition that is in this state? Wouldn't it be possible for the broker to 
> continue to serve traffic for all the other topics, and just mark this one as 
> unavailable?
> * Would it make sense to allow an operator to manually specify which broker 
> they want to become the new master? This would give me more control over how 
> much data loss I am willing to handle. In this case, I would want broker2 to 
> become the new master. Or, is that possible and I just don't know how to do 
> it?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4023) Add thread id as prefix in Kafka Streams thread logging

2016-08-30 Thread Bill Bejeck (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Bejeck updated KAFKA-4023:
---
Status: Patch Available  (was: In Progress)

> Add thread id as prefix in Kafka Streams thread logging
> ---
>
> Key: KAFKA-4023
> URL: https://issues.apache.org/jira/browse/KAFKA-4023
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Bill Bejeck
>  Labels: newbie++
>
> A single Kafka Streams instance can include multiple stream threads, and 
> hence without logging prefix it is difficult to determine which thread's 
> producing which log entries.
> We should
> 1) add the log-prefix as thread id in StreamThread logger, as well as its 
> contained StreamPartitionAssignor.
> 2) add the log-prefix as task id in StreamTask / StandbyTask, as well as its 
> contained RecordCollector and ProcessorStateManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4023) Add thread id as prefix in Kafka Streams thread logging

2016-08-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15450744#comment-15450744
 ] 

ASF GitHub Bot commented on KAFKA-4023:
---

GitHub user bbejeck opened a pull request:

https://github.com/apache/kafka/pull/1803

KAFKA-4023: add thread/task id for logging prefix



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bbejeck/kafka 
KAFKA-4023_add_thread_id_prefix_for_logging

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1803.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1803


commit 63eab5f7ba150880fb73226b3099bd4385e574ad
Author: bbejeck 
Date:   2016-08-31T01:46:20Z

KAFKA-4023: add thread/task id for logging prefix




> Add thread id as prefix in Kafka Streams thread logging
> ---
>
> Key: KAFKA-4023
> URL: https://issues.apache.org/jira/browse/KAFKA-4023
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Bill Bejeck
>  Labels: newbie++
>
> A single Kafka Streams instance can include multiple stream threads, and 
> hence without logging prefix it is difficult to determine which thread's 
> producing which log entries.
> We should
> 1) add the log-prefix as thread id in StreamThread logger, as well as its 
> contained StreamPartitionAssignor.
> 2) add the log-prefix as task id in StreamTask / StandbyTask, as well as its 
> contained RecordCollector and ProcessorStateManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1803: KAFKA-4023: add thread/task id for logging prefix

2016-08-30 Thread bbejeck
GitHub user bbejeck opened a pull request:

https://github.com/apache/kafka/pull/1803

KAFKA-4023: add thread/task id for logging prefix



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bbejeck/kafka 
KAFKA-4023_add_thread_id_prefix_for_logging

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1803.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1803


commit 63eab5f7ba150880fb73226b3099bd4385e574ad
Author: bbejeck 
Date:   2016-08-31T01:46:20Z

KAFKA-4023: add thread/task id for logging prefix




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-3144) report members with no assigned partitions in ConsumerGroupCommand

2016-08-30 Thread Alex Loddengaard (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15450732#comment-15450732
 ] 

Alex Loddengaard commented on KAFKA-3144:
-

I'll pile on another feature request: it'd be nice to have a big warning shown 
when using the old consumer, saying "This will only show consumers using the 
old consumer API that depended on ZooKeeper." Same goes with using 
"--new-consumer" -- "This will only show consumers using the new consumer API, 
not consumers using ZooKeeper."

> report members with no assigned partitions in ConsumerGroupCommand
> --
>
> Key: KAFKA-3144
> URL: https://issues.apache.org/jira/browse/KAFKA-3144
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.9.0.0
>Reporter: Jun Rao
>Assignee: Vahid Hashemian
>  Labels: newbie
>
> A couple of suggestions on improving ConsumerGroupCommand. 
> 1. It would be useful to list members with no assigned partitions when doing 
> describe in ConsumerGroupCommand.
> 2. Currently, we show the client.id of each member when doing describe in 
> ConsumerGroupCommand. Since client.id is supposed to be the logical 
> application id, all members in the same group are supposed to set the same 
> client.id. So, it would be clearer if we show the client id as well as the 
> member id.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3144) report members with no assigned partitions in ConsumerGroupCommand

2016-08-30 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15450713#comment-15450713
 ] 

Jun Rao commented on KAFKA-3144:


Related to this. It will also be useful to optionally show the consumer group 
coordinator when running the tool.

> report members with no assigned partitions in ConsumerGroupCommand
> --
>
> Key: KAFKA-3144
> URL: https://issues.apache.org/jira/browse/KAFKA-3144
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.9.0.0
>Reporter: Jun Rao
>Assignee: Vahid Hashemian
>  Labels: newbie
>
> A couple of suggestions on improving ConsumerGroupCommand. 
> 1. It would be useful to list members with no assigned partitions when doing 
> describe in ConsumerGroupCommand.
> 2. Currently, we show the client.id of each member when doing describe in 
> ConsumerGroupCommand. Since client.id is supposed to be the logical 
> application id, all members in the same group are supposed to set the same 
> client.id. So, it would be clearer if we show the client id as well as the 
> member id.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-78: Cluster Id

2016-08-30 Thread Guozhang Wang
The KIP doc is well written, thanks Ismael!

About logging / debugging with the cluster id: I think the random UUID
itself may not be very helpful for human-readable debugging information,
and we'd better use the cluster name mentioned in future work in logging.

Guozhang


On Tue, Aug 30, 2016 at 6:05 AM, Ismael Juma  wrote:

> Hi Harsha,
>
> It's a good question. If your broker connects to a different zookeeper root
> (whether on the same server or not), the outcome depends on whether a
> cluster id already exists there. If it does, then that cluster id will be
> used. If not, a new cluster id will be generated.
>
> The existing KIP doesn't try to solve that problem although we listed the
> following under "Future Improvements":
>
> 3. Use the cluster id to ensure that brokers are connected to the right
> > cluster: it's useful, but something that can be done later via a separate
> > KIP. One of the discussion points is how the broker knows its cluster id
> > (e.g. via a config or by storing it after the first connection to the
> > cluster).
>
>
> One of the options matches your suggestion to store the cluster id in
> `meta.properties`. We were thinking that it would make sense to reject the
> connection if the cluster id did not match. In that case, migrating Kafka
> to a different ZooKeeper root would require setting the cluster id on the
> new root before migrating. Another option is to do it automatically if the
> cluster id is not set in ZooKeeper (i.e. if there's a cluster id in
> `meta.properties` and there isn't one in ZooKeeper, set the cluster id
> in ZooKeeper). This is perhaps a bit too much magic as we don't
> auto-migrate anything else like ACLs when you change the ZooKeeper root.
>
> In any case, we think the above can be tackled in a subsequent KIP while
> the existing one is valuable in its current form. Does that make sense?
>
> Thanks,
> Ismael
>
> On Mon, Aug 29, 2016 at 6:51 PM, Harsha Chintalapani 
> wrote:
>
> > Ismael,
> >What happens when the cluster.id changes from initial value.
> > Ex,
> > Users changed their zookeeper.root and now new cluster.id generated. Do
> > you
> > think it would be useful to store this in meta.properties along with
> > broker.id. So that we only generate it once and store it in disk.
> >
> > Thanks,
> > Harsha
> >
> > On Sat, Aug 27, 2016 at 4:47 PM Gwen Shapira  wrote:
> >
> > > Thanks Ismael, this looks great.
> > >
> > > One of the things you mentioned is that cluster ID will be useful in
> > > log aggregation. Perhaps it makes sense to include cluster ID in the
> > > log? For example, as one of the things a broker logs after startup?
> > > And ideally clients would log that as well after successful parsing of
> > > MetadataResponse?
> > >
> > > Gwen
> > >
> > >
> > > On Sat, Aug 27, 2016 at 4:39 AM, Ismael Juma 
> wrote:
> > > > Hi all,
> > > >
> > > > We've posted "KIP-78: Cluster Id" for discussion:
> > > >
> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 78%3A+Cluster+Id
> > > >
> > > > Please take a look. Your feedback is appreciated.
> > > >
> > > > Thanks,
> > > > Ismael
> > >
> > >
> > >
> > > --
> > > Gwen Shapira
> > > Product Manager | Confluent
> > > 650.450.2760 | @gwenshap
> > > Follow us: Twitter | blog
> > >
> >
>



-- 
-- Guozhang


Re: [VOTE] KIP-77: Improve Kafka Streams Join Semantics

2016-08-30 Thread Ewen Cheslack-Postava
+1 (binding)

I think the major gap I notice in the PR is a lack of docs updates to
notify people of the change. Given these are improvements and streams is
still new, I wouldn't necessarily call them out as anything critical, but
the changes should be noted somewhere.

-Ewen

On Tue, Aug 30, 2016 at 3:53 PM, Guozhang Wang  wrote:

> +1 (binding)
>
> Thanks!
>
> On Tue, Aug 30, 2016 at 2:42 AM, Damian Guy  wrote:
>
> > +1
> >
> > On Mon, 29 Aug 2016 at 18:07 Eno Thereska 
> wrote:
> >
> > > +1 (non-binding)
> > >
> > > > On 29 Aug 2016, at 12:22, Bill Bejeck  wrote:
> > > >
> > > > +1
> > > >
> > > > On Mon, Aug 29, 2016 at 5:50 AM, Matthias J. Sax <
> > matth...@confluent.io>
> > > > wrote:
> > > >
> > > >> I’d like to initiate the voting process for KIP-77:
> > > >>
> > > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > >> 77%3A+Improve+Kafka+Streams+Join+Semantics
> > > >>
> > > >> -Matthias
> > > >>
> > > >>
> > > >>
> > >
> > >
> >
>
>
>
> --
> -- Guozhang
>



-- 
Thanks,
Ewen


[jira] [Commented] (KAFKA-3410) Unclean leader election and "Halting because log truncation is not allowed"

2016-08-30 Thread James Cheng (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15450532#comment-15450532
 ] 

James Cheng commented on KAFKA-3410:


KAFKA-3924 has not fixed this issue.

I followed steps 1-10 from my original bug report, using Kafka 0.10.0.1 (which 
contains the fix for KAFKA-3924). At Step 10, instead of broker2 exiting, it 
instead does a controlled shutdown.

{noformat}
[2016-08-30 16:51:03,374] FATAL [ReplicaFetcherThread-0-1], Exiting because log 
truncation is not allowed for topic test, Current leader 1's latest offset 0 is 
less than replica 2's latest offset 1 (kafka.server.ReplicaFetcherThread)
[2016-08-30 16:51:03,374] INFO [Kafka Server 2], shutting down 
(kafka.server.KafkaServer)
[2016-08-30 16:51:03,375] INFO [Kafka Server 2], Starting controlled shutdown 
(kafka.server.KafkaServer)
[2016-08-30 16:51:03,397] INFO [Kafka Server 2], Controlled shutdown succeeded 
(kafka.server.KafkaServer)
[2016-08-30 16:51:03,399] INFO [Socket Server on Broker 2], Shutting down 
(kafka.network.SocketServer)
[2016-08-30 16:51:03,403] INFO [Socket Server on Broker 2], Shutdown completed 
(kafka.network.SocketServer)
[2016-08-30 16:51:03,404] INFO [Kafka Request Handler on Broker 2], shutting 
down (kafka.server.KafkaRequestHandlerPool)
{noformat}

So the broker still takes itself completely offline, in response to a problem 
with a single partition.

One thing I noticed, the broker output all those lines and did a controlled 
shutdown. However, the java process did not exit. It still stays alive. And 
there is still an entry for that broker in zookeeper at /brokers/ids/2.

So the controlled shutdown didn't successfully complete.

> Unclean leader election and "Halting because log truncation is not allowed"
> ---
>
> Key: KAFKA-3410
> URL: https://issues.apache.org/jira/browse/KAFKA-3410
> Project: Kafka
>  Issue Type: Bug
>Reporter: James Cheng
>
> I ran into a scenario where one of my brokers would continually shutdown, 
> with the error message:
> [2016-02-25 00:29:39,236] FATAL [ReplicaFetcherThread-0-1], Halting because 
> log truncation is not allowed for topic test, Current leader 1's latest 
> offset 0 is less than replica 2's latest offset 151 
> (kafka.server.ReplicaFetcherThread)
> I managed to reproduce it with the following scenario:
> 1. Start broker1, with unclean.leader.election.enable=false
> 2. Start broker2, with unclean.leader.election.enable=false
> 3. Create topic, single partition, with replication-factor 2.
> 4. Write data to the topic.
> 5. At this point, both brokers are in the ISR. Broker1 is the partition 
> leader.
> 6. Ctrl-Z on broker2. (Simulates a GC pause or a slow network) Broker2 gets 
> dropped out of ISR. Broker1 is still the leader. I can still write data to 
> the partition.
> 7. Shutdown Broker1. Hard or controlled, doesn't matter.
> 8. rm -rf the log directory of broker1. (This simulates a disk replacement or 
> full hardware replacement)
> 9. Resume broker2. It attempts to connect to broker1, but doesn't succeed 
> because broker1 is down. At this point, the partition is offline. Can't write 
> to it.
> 10. Resume broker1. Broker1 resumes leadership of the topic. Broker2 attempts 
> to join ISR, and immediately halts with the error message:
> [2016-02-25 00:29:39,236] FATAL [ReplicaFetcherThread-0-1], Halting because 
> log truncation is not allowed for topic test, Current leader 1's latest 
> offset 0 is less than replica 2's latest offset 151 
> (kafka.server.ReplicaFetcherThread)
> I am able to recover by setting unclean.leader.election.enable=true on my 
> brokers.
> I'm trying to understand a couple things:
> * In step 10, why is broker1 allowed to resume leadership even though it has 
> no data?
> * In step 10, why is it necessary to stop the entire broker due to one 
> partition that is in this state? Wouldn't it be possible for the broker to 
> continue to serve traffic for all the other topics, and just mark this one as 
> unavailable?
> * Would it make sense to allow an operator to manually specify which broker 
> they want to become the new master? This would give me more control over how 
> much data loss I am willing to handle. In this case, I would want broker2 to 
> become the new master. Or, is that possible and I just don't know how to do 
> it?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [VOTE] KIP-33 - Add a time based log index

2016-08-30 Thread Becket Qin
Hi folks,

Here is another update on the change of time based log rolling.

After the latest implementation, we encountered KAFKA-4099. The issue is
that if users move replicas, for the messages in the old segments, the new
replica will create one log segment for each message. The root cause of
this is we are comparing the wall clock time with the message timestamp. A
solution to that is also described in KAFKA-4099, which is to change the
log rolling purely based on the timestamp in the messages. More
specifically, we roll out the log segment if the timestamp in the current
message is greater than the timestamp of the first message in the segment
by more than log.roll.ms. This approach is wall clock independent and
should solve the problem. With message.timestamp.difference.max.ms
configuration, we can achieve 1) the log segment will be rolled out in a
bounded time, 2) no excessively large timestamp will be accepted and cause
frequent log rolling.

Any concern regarding this change?

Thanks,

Jiangjie (Becket) Qin

On Mon, Jun 13, 2016 at 2:30 PM, Guozhang Wang  wrote:

> Thanks Jiangjie,
>
> I see the need for sensitive data purging, the above proposed change LGTM.
> One minor concern is that a wrongly marked timestamp on the first record
> could cause the segment to roll much later / earlier, though it may be
> rare.
>
> Guozhang
>
> On Fri, Jun 10, 2016 at 10:07 AM, Becket Qin  wrote:
>
> > Hi,
> >
> > During the implementation of KIP-33, we found it might be useful to have
> a
> > more deterministic time based log rolling than what proposed in the KIP.
> >
> > The current KIP proposal uses the largest timestamp in the segment for
> time
> > based rolling. The active log segment only rolls when there is no message
> > appended in max.roll.ms since the largest timestamp in the segment. i.e.
> > the rolling time may change if user keeping appending messages into the
> > segment. This may not be a desirable behavior for people who have
> sensitive
> > data and want to make sure they are removed after some time.
> >
> > To solve the above issue, we want to modify the KIP proposal regarding
> the
> > time based rolling to the following behavior. The time based log rolling
> > will be based on the first message with a timestamp in the log segment if
> > there is such a message. If no message in the segment has a timestamp,
> the
> > time based log rolling will still be based on log segment create time,
> > which is the same as we are doing now. The reasons we don't want to
> always
> > roll based on file create time are because 1) the message timestamp may
> be
> > assigned by clients which can be different from the create time of the
> log
> > segment file. 2) On some Linux, the file create time is not available, so
> > using segment file create time may not always work.
> >
> > Do people have any concern for this change? I will update the KIP if
> people
> > think the change is OK.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> > On Tue, Apr 19, 2016 at 6:27 PM, Becket Qin 
> wrote:
> >
> > > Thanks Joel and Ismael. I just updated the KIP based on your feedback.
> > >
> > > KIP-33 has passed with +4 (binding) and +2 (non-binding)
> > >
> > > Thanks everyone for the reading, feedback and voting!
> > >
> > > Jiangjie (Becket) Qin
> > >
> > > On Tue, Apr 19, 2016 at 5:25 PM, Ismael Juma 
> wrote:
> > >
> > >> Thanks Becket. I think it would be nice to update the KIP with regards
> > to
> > >> point 3 and 4.
> > >>
> > >> In any case, +1 (non-binding)
> > >>
> > >> Ismael
> > >>
> > >> On Tue, Apr 19, 2016 at 2:03 AM, Becket Qin 
> > wrote:
> > >>
> > >> > Thanks for the comments Ismael. Please see the replies inline.
> > >> >
> > >> > On Mon, Apr 18, 2016 at 6:50 AM, Ismael Juma 
> > wrote:
> > >> >
> > >> > > Hi Jiangjie,
> > >> > >
> > >> > > Thanks for the KIP, it's a nice improvement. Since it seems like
> we
> > >> have
> > >> > > been using the voting thread for discussion, I'll do the same.
> > >> > >
> > >> > > A few minor comments/questions:
> > >> > >
> > >> > > 1. The proposed name for the time index file
> > >> > `SegmentBaseOffset.timeindex`.
> > >> > > Would `SegmentBaseOffset.time-index` be a little better? It would
> > >> clearly
> > >> > > separate the type of index in case we add additional index types
> in
> > >> the
> > >> > > future.
> > >> > >
> > >> > I have no strong opinion on this, I am not adding any thing
> separator
> > >> > because it is more regex friendly.
> > >> > I am not sure about the other indexes, time and space seems to be
> two
> > >> most
> > >> > common dimensions.
> > >> >
> > >> > 2. When describing the time index entry, we say "Offset - the next
> > >> offset
> > >> > > when the time index entry is inserted". I found the mention of
> > `next`
> > >> a
> > >> > bit
> > >> > > confusing as it looks to me like the time index entry has the
> first
> > >> > offset
> > >> > > in the message set.
> > >> >
> > >> > This semantic meaning is a little different from the 

Re: [VOTE] KIP-77: Improve Kafka Streams Join Semantics

2016-08-30 Thread Guozhang Wang
+1 (binding)

Thanks!

On Tue, Aug 30, 2016 at 2:42 AM, Damian Guy  wrote:

> +1
>
> On Mon, 29 Aug 2016 at 18:07 Eno Thereska  wrote:
>
> > +1 (non-binding)
> >
> > > On 29 Aug 2016, at 12:22, Bill Bejeck  wrote:
> > >
> > > +1
> > >
> > > On Mon, Aug 29, 2016 at 5:50 AM, Matthias J. Sax <
> matth...@confluent.io>
> > > wrote:
> > >
> > >> I’d like to initiate the voting process for KIP-77:
> > >>
> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > >> 77%3A+Improve+Kafka+Streams+Join+Semantics
> > >>
> > >> -Matthias
> > >>
> > >>
> > >>
> >
> >
>



-- 
-- Guozhang


[jira] [Commented] (KAFKA-1981) Make log compaction point configurable

2016-08-30 Thread Eric Wasserman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15450387#comment-15450387
 ] 

Eric Wasserman commented on KAFKA-1981:
---

Opened new pull request at:
https://github.com/apache/kafka/pull/1794

@junrao could you please take a look at this new PR (I closed the prior one). I 
switched to using the time index to determine the last message time in the 
LogSegments as we discussed.

> Make log compaction point configurable
> --
>
> Key: KAFKA-1981
> URL: https://issues.apache.org/jira/browse/KAFKA-1981
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.8.2.0
>Reporter: Jay Kreps
>  Labels: newbie++
> Attachments: KIP for Kafka Compaction Patch.md
>
>
> Currently if you enable log compaction the compactor will kick in whenever 
> you hit a certain "dirty ratio", i.e. when 50% of your data is uncompacted. 
> Other than this we don't give you fine-grained control over when compaction 
> occurs. In addition we never compact the active segment (since it is still 
> being written to).
> Other than this we don't really give you much control over when compaction 
> will happen. The result is that you can't really guarantee that a consumer 
> will get every update to a compacted topic--if the consumer falls behind a 
> bit it might just get the compacted version.
> This is usually fine, but it would be nice to make this more configurable so 
> you could set either a # messages, size, or time bound for compaction.
> This would let you say, for example, "any consumer that is no more than 1 
> hour behind will get every message."
> This should be relatively easy to implement since it just impacts the 
> end-point the compactor considers available for compaction. I think we 
> already have that concept, so this would just be some other overrides to add 
> in when calculating that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4099) Change the time based log rolling to base on the file create time instead of timestamp of the first message.

2016-08-30 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15450309#comment-15450309
 ] 

Jun Rao commented on KAFKA-4099:


[~becket_qin], thanks for the proposal. Your suggestion sounds reasonable to 
me. It should cover the common usage of log based rolling. One case that this 
doesn't quite cover is that if the timestamp of the first message is large and 
a message with a much smaller timestamp shows up later. In this case, the 
rolling of the segment may not be quick enough. However, that should be rare. 
So, could you email the KIP mailing thread about this change and provide a 
patch?

> Change the time based log rolling to base on the file create time instead of 
> timestamp of the first message.
> 
>
> Key: KAFKA-4099
> URL: https://issues.apache.org/jira/browse/KAFKA-4099
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Reporter: Jiangjie Qin
>Assignee: Jiangjie Qin
> Fix For: 0.10.1.0
>
>
> This is an issue introduced in KAFKA-3163. When partition relocation occurs, 
> the newly created replica may have messages with old timestamp and cause the 
> log segment rolling for each message. The fix is to change the log rolling 
> behavior back to based on segment create time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Kafka unable to process message

2016-08-30 Thread Ghosh, Achintya (Contractor)
Hi there,

What does the below error mean and  how to avoid this? I see this error one of 
the kafkaServer.out file when other broker is down.

And not able to process any message as we see o.a.k.c.c.i.AbstractCoordinator - 
Issuing group metadata request to broker 5  from application log

[2016-08-30 20:40:28,621] WARN [ReplicaFetcherThread-0-3], Error in fetch 
kafka.server.ReplicaFetcherThread$FetchRequest@8b198c3 
(kafka.server.ReplicaFetcherThread)
java.io.IOException: Connection to psaq3-wc.sys.comcast.net:61616 (id: 3 rack: 
null) failed
   at 
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingReady$extension$2.apply(NetworkClientBlockingOps.scala:63)
   at 
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingReady$extension$2.apply(NetworkClientBlockingOps.scala:59)
   at 
kafka.utils.NetworkClientBlockingOps$.recursivePoll$1(NetworkClientBlockingOps.scala:112)
   at 
kafka.utils.NetworkClientBlockingOps$.kafka$utils$NetworkClientBlockingOps$$pollUntil$extension(NetworkClientBlockingOps.scala:120)
   at 
kafka.utils.NetworkClientBlockingOps$.blockingReady$extension(NetworkClientBlockingOps.scala:59)
   at 
kafka.server.ReplicaFetcherThread.sendRequest(ReplicaFetcherThread.scala:239)
   at 
kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:229)
   at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:42)
   at 
kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:107)
   at 
kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:98)
   at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)


[jira] [Commented] (KAFKA-4099) Change the time based log rolling to base on the file create time instead of timestamp of the first message.

2016-08-30 Thread Jiangjie Qin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15450166#comment-15450166
 ] 

Jiangjie Qin commented on KAFKA-4099:
-

[~junrao] Any thoughts? Thanks.

> Change the time based log rolling to base on the file create time instead of 
> timestamp of the first message.
> 
>
> Key: KAFKA-4099
> URL: https://issues.apache.org/jira/browse/KAFKA-4099
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Reporter: Jiangjie Qin
>Assignee: Jiangjie Qin
> Fix For: 0.10.1.0
>
>
> This is an issue introduced in KAFKA-3163. When partition relocation occurs, 
> the newly created replica may have messages with old timestamp and cause the 
> log segment rolling for each message. The fix is to change the log rolling 
> behavior back to based on segment create time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1756: KAFKA-4058: Failure in org.apache.kafka.streams.in...

2016-08-30 Thread mjsax
Github user mjsax closed the pull request at:

https://github.com/apache/kafka/pull/1756


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-4058) Failure in org.apache.kafka.streams.integration.ResetIntegrationTest.testReprocessingFromScratchAfterReset

2016-08-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15450063#comment-15450063
 ] 

ASF GitHub Bot commented on KAFKA-4058:
---

Github user mjsax closed the pull request at:

https://github.com/apache/kafka/pull/1756


> Failure in 
> org.apache.kafka.streams.integration.ResetIntegrationTest.testReprocessingFromScratchAfterReset
> --
>
> Key: KAFKA-4058
> URL: https://issues.apache.org/jira/browse/KAFKA-4058
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Matthias J. Sax
>  Labels: test
>
> {code}
> java.lang.AssertionError: expected:<0> but was:<1>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at org.junit.Assert.assertEquals(Assert.java:631)
>   at 
> org.apache.kafka.streams.integration.ResetIntegrationTest.cleanGlobal(ResetIntegrationTest.java:225)
>   at 
> org.apache.kafka.streams.integration.ResetIntegrationTest.testReprocessingFromScratchAfterReset(ResetIntegrationTest.java:103)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.runTestClass(JUnitTestClassExecuter.java:114)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.execute(JUnitTestClassExecuter.java:57)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassProcessor.processTestClass(JUnitTestClassProcessor.java:66)
>   at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
>   at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>   at 
> org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
>   at 
> org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
>   at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
>   at 
> org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:109)
>   at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>   at 
> org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:377)
>   at 
> org.gradle.internal.concurr

Build failed in Jenkins: kafka-0.10.0-jdk7 #197

2016-08-30 Thread Apache Jenkins Server
See 

Changes:

[wangguoz] KAFKA-4058: Failure in

--
[...truncated 5728 lines...]

org.apache.kafka.streams.kstream.UnlimitedWindowsTest > 
shouldIncludeRecordsThatHappenedAfterWindowStart PASSED

org.apache.kafka.streams.kstream.UnlimitedWindowsTest > 
shouldExcludeRecordsThatHappenedBeforeWindowStart PASSED

org.apache.kafka.streams.kstream.JoinWindowsTest > afterBelowLower PASSED

org.apache.kafka.streams.kstream.JoinWindowsTest > nameMustNotBeEmpty PASSED

org.apache.kafka.streams.kstream.JoinWindowsTest > beforeOverUpper PASSED

org.apache.kafka.streams.kstream.JoinWindowsTest > nameMustNotBeNull PASSED

org.apache.kafka.streams.kstream.JoinWindowsTest > 
shouldHaveSaneEqualsAndHashCode PASSED

org.apache.kafka.streams.kstream.JoinWindowsTest > validWindows PASSED

org.apache.kafka.streams.kstream.JoinWindowsTest > 
timeDifferenceMustNotBeNegative PASSED

org.apache.kafka.streams.kstream.TimeWindowsTest > nameMustNotBeEmpty PASSED

org.apache.kafka.streams.kstream.TimeWindowsTest > nameMustNotBeNull PASSED

org.apache.kafka.streams.kstream.TimeWindowsTest > 
shouldHaveSaneEqualsAndHashCode PASSED

org.apache.kafka.streams.kstream.TimeWindowsTest > windowSizeMustNotBeZero 
PASSED

org.apache.kafka.streams.kstream.TimeWindowsTest > advanceIntervalMustNotBeZero 
PASSED

org.apache.kafka.streams.kstream.TimeWindowsTest > windowSizeMustNotBeNegative 
PASSED

org.apache.kafka.streams.kstream.TimeWindowsTest > 
advanceIntervalMustNotBeNegative PASSED

org.apache.kafka.streams.kstream.TimeWindowsTest > 
advanceIntervalMustNotBeLargerThanWindowSize PASSED

org.apache.kafka.streams.kstream.TimeWindowsTest > windowsForTumblingWindows 
PASSED

org.apache.kafka.streams.kstream.TimeWindowsTest > windowsForHoppingWindows 
PASSED

org.apache.kafka.streams.kstream.TimeWindowsTest > 
windowsForBarelyOverlappingHoppingWindows PASSED

org.apache.kafka.streams.kstream.KStreamBuilderTest > testMerge PASSED

org.apache.kafka.streams.kstream.KStreamBuilderTest > testFrom PASSED

org.apache.kafka.streams.kstream.KStreamBuilderTest > testNewName PASSED

org.apache.kafka.streams.KeyValueTest > shouldHaveSaneEqualsAndHashCode PASSED

org.apache.kafka.streams.KafkaStreamsTest > classMethod FAILED
java.lang.OutOfMemoryError: Java heap space

org.apache.kafka.streams.state.internals.OffsetCheckpointTest > testReadWrite 
PASSED

org.apache.kafka.streams.state.internals.InMemoryLRUCacheStoreTest > testEvict 
PASSED

org.apache.kafka.streams.state.internals.InMemoryLRUCacheStoreTest > 
testPutIfAbsent PASSED

org.apache.kafka.streams.state.internals.InMemoryLRUCacheStoreTest > 
testRestoreWithDefaultSerdes PASSED

org.apache.kafka.streams.state.internals.InMemoryLRUCacheStoreTest > 
testRestore PASSED

org.apache.kafka.streams.state.internals.InMemoryLRUCacheStoreTest > 
testPutGetRange PASSED

org.apache.kafka.streams.state.internals.InMemoryLRUCacheStoreTest > 
testPutGetRangeWithDefaultSerdes PASSED

org.apache.kafka.streams.state.internals.InMemoryKeyValueStoreTest > 
testPutIfAbsent PASSED

org.apache.kafka.streams.state.internals.InMemoryKeyValueStoreTest > 
testRestoreWithDefaultSerdes PASSED

org.apache.kafka.streams.state.internals.InMemoryKeyValueStoreTest > 
testRestore PASSED

org.apache.kafka.streams.state.internals.InMemoryKeyValueStoreTest > 
testPutGetRange PASSED

org.apache.kafka.streams.state.internals.InMemoryKeyValueStoreTest > 
testPutGetRangeWithDefaultSerdes PASSED

org.apache.kafka.streams.state.internals.RocksDBKeyValueStoreTest > 
testPutIfAbsent PASSED

org.apache.kafka.streams.state.internals.RocksDBKeyValueStoreTest > 
testRestoreWithDefaultSerdes PASSED

org.apache.kafka.streams.state.internals.RocksDBKeyValueStoreTest > testRestore 
PASSED

org.apache.kafka.streams.state.internals.RocksDBKeyValueStoreTest > 
testPutGetRange PASSED

org.apache.kafka.streams.state.internals.RocksDBKeyValueStoreTest > 
testPutGetRangeWithDefaultSerdes PASSED

org.apache.kafka.streams.state.internals.RocksDBWindowStoreTest > 
testPutAndFetch PASSED

org.apache.kafka.streams.state.internals.RocksDBWindowStoreTest > 
testPutAndFetchBefore PASSED

org.apache.kafka.streams.state.internals.RocksDBWindowStoreTest > 
testInitialLoading PASSED

org.apache.kafka.streams.state.internals.RocksDBWindowStoreTest > testRestore 
PASSED

org.apache.kafka.streams.state.internals.RocksDBWindowStoreTest > testRolling 
PASSED

org.apache.kafka.streams.state.internals.RocksDBWindowStoreTest > 
testSegmentMaintenance PASSED

org.apache.kafka.streams.state.internals.RocksDBWindowStoreTest > 
testPutSameKeyTimestamp PASSED

org.apache.kafka.streams.state.internals.RocksDBWindowStoreTest > 
testPutAndFetchAfter PASSED

org.apache.kafka.streams.state.internals.StoreChangeLoggerTest > testAddRemove 
PASSED

org.apache.kafka.streams.processor.DefaultPartitionGrouperTest > testGrouping 
PASSED

org.apache.kafka.streams.processor.internals.Processor

[jira] [Commented] (KAFKA-3993) Console producer drops data

2016-08-30 Thread Roger Hoover (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15450054#comment-15450054
 ] 

Roger Hoover commented on KAFKA-3993:
-

Thanks, [~cotedm].  I tried to set acks=all but apparently that doesn't work. 

> Console producer drops data
> ---
>
> Key: KAFKA-3993
> URL: https://issues.apache.org/jira/browse/KAFKA-3993
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.10.0.0
>Reporter: Roger Hoover
>
> The console producer drops data when if the process exits too quickly.  I 
> suspect that the shutdown hook does not call close() or something goes wrong 
> during that close().
> Here's a simple to illustrate the issue:
> {noformat}
> export BOOTSTRAP_SERVERS=localhost:9092
> export TOPIC=bar
> export MESSAGES=1
> ./bin/kafka-topics.sh --zookeeper localhost:2181 --create --partitions 1 
> --replication-factor 1 --topic "$TOPIC" \
> && echo "acks=all" > /tmp/producer.config \
> && echo "linger.ms=0" >> /tmp/producer.config \
> && seq "$MESSAGES" | ./bin/kafka-console-producer.sh --broker-list 
> "$BOOTSTRAP_SERVERS" --topic "$TOPIC" --producer-config /tmp/producer.config \
> && ./bin/kafka-console-consumer.sh --bootstrap-server "$BOOTSTRAP_SERVERS" 
> --new-consumer --from-beginning --max-messages "${MESSAGES}" --topic "$TOPIC"
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-4095) When a topic is deleted and then created with the same name, 'committed' offsets are not reset

2016-08-30 Thread Vahid Hashemian (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1544#comment-1544
 ] 

Vahid Hashemian edited comment on KAFKA-4095 at 8/30/16 8:13 PM:
-

[~glikson] Consumer group information is not removed if there is offset 
information stored in it. There are some related open JIRAs that aim at 
improving the information provided by the consumer group command (part of which 
is showing stored offset information even if a consumer group has no active 
member): 
[KAFKA-3144|https://issues.apache.org/jira/browse/KAFKA-3144?focusedCommentId=15115594&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15115594],
 [KAFKA-3853|https://issues.apache.org/jira/browse/KAFKA-3853].

I agree that if a topic is deleted its stored offset info should be removed 
from all consumer groups. This is something that currently doesn't seem to be 
handled, and that's why you see this behavior. I'll look further into this.


was (Author: vahid):
[~glikson] I believe what you are observing is the intended behavior. Consumer 
group information is not removed if there is offset information stored in it. 
There are some related open JIRAs that aim at improving the information 
provided by the consumer group command (part of which is showing stored offset 
information even if a consumer group has no active member): 
[KAFKA-3144|https://issues.apache.org/jira/browse/KAFKA-3144?focusedCommentId=15115594&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15115594],
 [KAFKA-3853|https://issues.apache.org/jira/browse/KAFKA-3853].

> When a topic is deleted and then created with the same name, 'committed' 
> offsets are not reset
> --
>
> Key: KAFKA-4095
> URL: https://issues.apache.org/jira/browse/KAFKA-4095
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.1, 0.10.0.0
>Reporter: Alex Glikson
>Assignee: Vahid Hashemian
>
> I encountered a very strange behavior of Kafka, which seems to be a bug.
> After deleting a topic and re-creating it with the same name, I produced 
> certain amount of new messages, and then opened a consumer with the same ID 
> that I used before re-creating the topic (with auto.commit=false, 
> auto.offset.reset=earliest). While the latest offsets seemed up to date, the 
> *committed* offset (returned by committed() method) was an *old* offset, from 
> the time before the topic has been deleted and created.
> I would have assumed that when a topic is deleted, all the associated 
> topic-partitions and consumer groups are recycled too.
> I am using the Java client version 0.9, with Kafka server 0.10.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4095) When a topic is deleted and then created with the same name, 'committed' offsets are not reset

2016-08-30 Thread Vahid Hashemian (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1544#comment-1544
 ] 

Vahid Hashemian commented on KAFKA-4095:


[~glikson] I believe what you are observing is the intended behavior. Consumer 
group information is not removed if there is offset information stored in it. 
There are some related open JIRAs that aim at improving the information 
provided by the consumer group command (part of which is showing stored offset 
information even if a consumer group has no active member): 
[KAFKA-3144|https://issues.apache.org/jira/browse/KAFKA-3144?focusedCommentId=15115594&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15115594],
 [KAFKA-3853|https://issues.apache.org/jira/browse/KAFKA-3853].

> When a topic is deleted and then created with the same name, 'committed' 
> offsets are not reset
> --
>
> Key: KAFKA-4095
> URL: https://issues.apache.org/jira/browse/KAFKA-4095
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.1, 0.10.0.0
>Reporter: Alex Glikson
>Assignee: Vahid Hashemian
>
> I encountered a very strange behavior of Kafka, which seems to be a bug.
> After deleting a topic and re-creating it with the same name, I produced 
> certain amount of new messages, and then opened a consumer with the same ID 
> that I used before re-creating the topic (with auto.commit=false, 
> auto.offset.reset=earliest). While the latest offsets seemed up to date, the 
> *committed* offset (returned by committed() method) was an *old* offset, from 
> the time before the topic has been deleted and created.
> I would have assumed that when a topic is deleted, all the associated 
> topic-partitions and consumer groups are recycled too.
> I am using the Java client version 0.9, with Kafka server 0.10.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Kafka KIP meeting Aug 30 at 11:00am PST

2016-08-30 Thread Jun Rao
The following are the notes from today's KIP discussion.

   - KIP48 (delegation tokens): Harsha will update the wiki with more
   details on how to use delegation tokens and how to configure it.
   - KIP-78 (cluster id): There was discussion on adding human readable
   tags later. No major concerns.

The video will be uploaded soon in https://cwiki.apache.org/co
nfluence/display/KAFKA/Kafka+Improvement+Proposals .

Thanks,


On Fri, Aug 26, 2016 at 10:40 AM, Jun Rao  wrote:

> Hi, Everyone.,
>
> We plan to have a Kafka KIP meeting this coming Tuesday at 11:00am PST. If
> you plan to attend but haven't received an invite, please let me know.
> The following is the tentative agenda.
>
> Agenda:
> KIP-48: delegation tokens
>
> Thanks,
>
> Jun
>


[jira] [Commented] (KAFKA-3984) Broker doesn't retry reconnecting to an expired Zookeeper connection

2016-08-30 Thread Steve Niemitz (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15449702#comment-15449702
 ] 

Steve Niemitz commented on KAFKA-3984:
--

A related issue here is ANY exception occurring in handleNewSession causes the 
broker to become stuck in an unknown state, where it thinks it's running but is 
not registered in zookeeper.

I'm disappointed to see that the JVM termination was removed, as that was what 
I'd recommend doing in that situation.  As it is now, its impossible to recover 
from this situation without manually restarting the broker.

> Broker doesn't retry reconnecting to an expired Zookeeper connection
> 
>
> Key: KAFKA-3984
> URL: https://issues.apache.org/jira/browse/KAFKA-3984
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1
>Reporter: Braedon Vickers
>
> We've been having issues with the network connectivity of our Kafka cluster, 
> and this seems to be triggering an issue where the brokers stop trying to 
> reconnect to Zookeeper, leaving us with a broken cluster even when the 
> network has recovered.
> When network issues begin we see {{java.net.NoRouteToHostException}} 
> exceptions from {{org.apache.zookeeper.ClientCnxn}} as it attempts to 
> re-establish the connection. If the network issue resolves itself while we 
> are only getting these errors the broker seems to reconnect fine.
> However, a lot of the time we end up with a message like this:
> {code}[2016-07-22 00:21:44,181] FATAL Could not establish session with 
> zookeeper (kafka.server.KafkaHealthcheck)
> org.I0Itec.zkclient.exception.ZkException: Unable to connect to  hosts>
>   at org.I0Itec.zkclient.ZkConnection.connect(ZkConnection.java:71)
>   at org.I0Itec.zkclient.ZkClient.reconnect(ZkClient.java:1279)
> ...
> Caused by: java.net.UnknownHostException: 
>   at java.net.InetAddress.getAllByName(InetAddress.java:1126)
>   at java.net.InetAddress.getAllByName(InetAddress.java:1192)
>   at 
> org.apache.zookeeper.client.StaticHostProvider.(StaticHostProvider.java:61)
>   at org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:445)
> ...
> {code}
> (apologies for the partial stack traces - I'm having to try and reconstruct 
> them from a less than ideal centralised logging setup.)
> If this happens, the broker stops trying to reconnect to Zookeeper, and we 
> have to restart it.
> It looks like while the {{org.apache.zookeeper.Zookeeper}} client's state 
> isn't {{Expired}} it will keep retrying the connection, and will recover OK 
> when the network is back. However, once it changes to {{Expired}} (not 
> entirely sure how that happens - based on the session timeout perhaps?) 
> zkclient closes the existing client and attempts to create a new one. If the 
> network is still down, the client constructor throws a 
> {{java.net.UnknownHostException}}, zkclient calls 
> {{handleSessionEstablishmentError()}} on {{KafkaHealthcheck}}, 
> {{KafkaHealthcheck.handleSessionEstablishmentError()}} logs a "Fatal" error 
> and does nothing else.
> It seems like some form of retry needs to happen here, or the broker is stuck 
> with no Zookeeper connection 
> indefinitely.{{KafkaHealthcheck.handleSessionEstablishmentError()}} used to 
> kill the JVM, but that was removed in 
> https://issues.apache.org/jira/browse/KAFKA-2405. Killing the JVM would be 
> better than doing nothing, as then your init system could restart it, 
> allowing it to recover once the network was back.
> Our cluster is running 0.9.0.1, so not sure if it affects 0.10.0.0 as well. 
> However, it seems likely, as there doesn't seem to be any code changes in 
> kafka or zkclient that would affect this behaviour.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-72 Allow Sizing Incoming Request Queue in Bytes

2016-08-30 Thread radai
My apologies for the delay in response.

I agree with the concerns about OOM reading from the actual sockets and
blocking the network threads - messing with the request queue itself would
not do.

I propose instead a memory pool approach - the broker would have a non
blocking memory pool. upon reading the first 4 bytes out of a socket an
attempt would be made to acquire enough memory and if that attempt fails
the processing thread will move on to try and make progress with other
tasks.

I think Its simpler than mute/unmute because using mute/unmute would
require differentiating between sockets muted due to a request in progress
(normal current operation) and sockets muted due to lack of memory. sockets
of the 1st kind would be unmuted at the end of request processing (as it
happens right now) but the 2nd kind would require some sort of "unmute
watchdog" which is (i claim) more complicated than a memory pool. also a
memory pool is a more generic solution.

I've updated the KIP page (
https://cwiki.apache.org/confluence/display/KAFKA/KIP-72%3A+Allow+putting+a+bound+on+memory+consumed+by+Incoming+requests)
to reflect the new proposed implementation, and i've also put up an inital
implementation proposal on github -
https://github.com/radai-rosenblatt/kafka/commits/broker-memory-pool. the
proposed code is not complete and tested yet (so probably buggy) but does
include the main points of modification.

the specific implementation of the pool on that branch also has a built in
safety net where memory that is acquired but not released (which is a bug)
is discovered when the garbage collector frees it and the capacity is
reclaimed.

On Tue, Aug 9, 2016 at 8:14 AM, Jun Rao  wrote:

> Radi,
>
> Yes, I got the benefit of bounding the request queue by bytes. My concern
> is the following if we don't change the behavior of processor blocking on
> queue full.
>
> If the broker truly doesn't have enough memory for buffering outstanding
> requests from all connections, we have to either hit OOM or block the
> processor. Both will be bad. I am not sure if one is clearly better than
> the other. In this case, the solution is probably to expand the cluster to
> reduce the per broker request load.
>
> If the broker actually has enough memory, we want to be able to configure
> the request queue in such a way that it never blocks. You can tell people
> to just set the request queue to be unbounded, which may scare them. If we
> do want to put a bound, it seems it's easier to configure the queue size
> based on # requests. Basically, we can tell people to set the queue size
> based on number of connections. If the queue is based on bytes, it's not
> clear how people should set it w/o causing the processor to block.
>
> Finally, Rajini has a good point. The ByteBuffer in the request object is
> allocated as soon as we see the first 4 bytes from the socket. So, I am not
> sure if just bounding the request queue itself is enough to bound the
> memory related to requests.
>
> Thanks,
>
> Jun
>
>
>
> On Mon, Aug 8, 2016 at 4:46 PM, radai  wrote:
>
> > I agree that filling up the request queue can cause clients to time out
> > (and presumably retry?). However, for the workloads where we expect this
> > configuration to be useful the alternative is currently an OOM crash.
> > In my opinion an initial implementation of this feature could be
> > constrained to a simple drop-in replacement of ArrayBlockingQueue
> > (conditional, opt-in) and further study of behavior patterns under load
> can
> > drive future changes to the API later when those behaviors are better
> > understood (like back-pressure, nop filler responses to avoid client
> > timeouts or whatever).
> >
> > On Mon, Aug 8, 2016 at 2:23 PM, Mayuresh Gharat <
> > gharatmayures...@gmail.com>
> > wrote:
> >
> > > Nice write up Radai.
> > > I think what Jun said is a valid concern.
> > > If I am not wrong as per the proposal, we are depending on the entire
> > > pipeline to flow smoothly from accepting requests to handling it,
> calling
> > > KafkaApis and handing back the responses.
> > >
> > > Thanks,
> > >
> > > Mayuresh
> > >
> > >
> > > On Mon, Aug 8, 2016 at 12:22 PM, Joel Koshy 
> wrote:
> > >
> > > > >
> > > > > .
> > > > >>
> > > > >>
> > > > > Hi Becket,
> > > > >
> > > > > I don't think progress can be made in the processor's run loop if
> the
> > > > > queue fills up. i.e., I think Jun's point is that if the queue is
> > full
> > > > > (either due to the proposed max.bytes or today due to max.requests
> > > > hitting
> > > > > the limit) then processCompletedReceives will block and no further
> > > > progress
> > > > > can be made.
> > > > >
> > > >
> > > > I'm sorry - this isn't right. There will be progress as long as the
> API
> > > > handlers are able to pick requests off the request queue and add the
> > > > responses to the response queues (which are effectively unbounded).
> > > > However, the point is valid that blocking in the request channel's
> put
> > > has

[GitHub] kafka pull request #1802: wrong property was mentioned in doc

2016-08-30 Thread yourspraveen
GitHub user yourspraveen opened a pull request:

https://github.com/apache/kafka/pull/1802

wrong property was mentioned in doc

max.fetch.wait is mentioned in document where it should have been 
fetch.wait.max.ms

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yourspraveen/kafka patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1802.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1802


commit 14efa02a5e4119f611764cb29821f3516eccb6eb
Author: Praveen K Palaniswamy 
Date:   2016-08-30T16:50:30Z

wrong property was mentioned in doc

max.fetch.wait is mentioned in document where it should have been 
fetch.wait.max.ms




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #1801: MINOR: Include TopicPartition in warning when log ...

2016-08-30 Thread dpkp
GitHub user dpkp opened a pull request:

https://github.com/apache/kafka/pull/1801

MINOR: Include TopicPartition in warning when log cleaner resets dirty 
offset

Typically this error condition is caused by topic-level configuration 
issues, so it is useful to say include which topic partition was reset for 
operator use when debugging the root cause.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dpkp/kafka 
log_topic_partition_reset_dirty_offset

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1801.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1801


commit aa5de24f3d302fda5e8bd1196d16a3723c81f4f3
Author: Dana Powers 
Date:   2016-08-30T16:09:31Z

MINOR: Include TopicPartition in warning message when log cleaner resets 
dirty offset




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


ApacheCon Seville CFP closes September 9th

2016-08-30 Thread Rich Bowen
It's traditional. We wait for the last minute to get our talk proposals
in for conferences.

Well, the last minute has arrived. The CFP for ApacheCon Seville closes
on September 9th, which is less than 2 weeks away. It's time to get your
talks in, so that we can make this the best ApacheCon yet.

It's also time to discuss with your developer and user community whether
there's a track of talks that you might want to propose, so that you
have more complete coverage of your project than a talk or two.

For Apache Big Data, the relevant URLs are:
Event details:
http://events.linuxfoundation.org/events/apache-big-data-europe
CFP:
http://events.linuxfoundation.org/events/apache-big-data-europe/program/cfp

For ApacheCon Europe, the relevant URLs are:
Event details: http://events.linuxfoundation.org/events/apachecon-europe
CFP: http://events.linuxfoundation.org/events/apachecon-europe/program/cfp

This year, we'll be reviewing papers "blind" - that is, looking at the
abstracts without knowing who the speaker is. This has been shown to
eliminate the "me and my buddies" nature of many tech conferences,
producing more diversity, and more new speakers. So make sure your
abstracts clearly explain what you'll be talking about.

For further updated about ApacheCon, follow us on Twitter, @ApacheCon,
or drop by our IRC channel, #apachecon on the Freenode IRC network.

-- 
Rich Bowen
WWW: http://apachecon.com/
Twitter: @ApacheCon


[jira] [Commented] (KAFKA-4101) java.lang.IllegalStateException in org.apache.kafka.common.network.Selector.channelOrFail

2016-08-30 Thread Andrey Savov (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15449055#comment-15449055
 ] 

Andrey Savov commented on KAFKA-4101:
-

Yes, there were a lot of closing sockets due to bad message size. 

> java.lang.IllegalStateException in 
> org.apache.kafka.common.network.Selector.channelOrFail
> -
>
> Key: KAFKA-4101
> URL: https://issues.apache.org/jira/browse/KAFKA-4101
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1
> Environment: Ubuntu 14.04, AWS deployment, under heavy network load
>Reporter: Andrey Savov
>
> {code}
>  at org.apache.kafka.common.network.Selector.channelOrFail(Selector.java:467)
> at org.apache.kafka.common.network.Selector.mute(Selector.java:347)
> at 
> kafka.network.Processor$$anonfun$run$11.apply(SocketServer.scala:434)
> at 
> kafka.network.Processor$$anonfun$run$11.apply(SocketServer.scala:421)
> at scala.collection.Iterator$class.foreach(Iterator.scala:742)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
> at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
> at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
> at kafka.network.Processor.run(SocketServer.scala:421)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2063) Bound fetch response size

2016-08-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15449045#comment-15449045
 ] 

ASF GitHub Bot commented on KAFKA-2063:
---

Github user nepal closed the pull request at:

https://github.com/apache/kafka/pull/1683


> Bound fetch response size
> -
>
> Key: KAFKA-2063
> URL: https://issues.apache.org/jira/browse/KAFKA-2063
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jay Kreps
>
> Currently the only bound on the fetch response size is 
> max.partition.fetch.bytes * num_partitions. There are two problems:
> 1. First this bound is often large. You may chose 
> max.partition.fetch.bytes=1MB to enable messages of up to 1MB. However if you 
> also need to consume 1k partitions this means you may receive a 1GB response 
> in the worst case!
> 2. The actual memory usage is unpredictable. Partition assignment changes, 
> and you only actually get the full fetch amount when you are behind and there 
> is a full chunk of data ready. This means an application that seems to work 
> fine will suddenly OOM when partitions shift or when the application falls 
> behind.
> We need to decouple the fetch response size from the number of partitions.
> The proposal for doing this would be to add a new field to the fetch request, 
> max_bytes which would control the maximum data bytes we would include in the 
> response.
> The implementation on the server side would grab data from each partition in 
> the fetch request until it hit this limit, then send back just the data for 
> the partitions that fit in the response. The implementation would need to 
> start from a random position in the list of topics included in the fetch 
> request to ensure that in a case of backlog we fairly balance between 
> partitions (to avoid first giving just the first partition until that is 
> exhausted, then the next partition, etc).
> This setting will make the max.partition.fetch.bytes field in the fetch 
> request much less useful and we  should discuss just getting rid of it.
> I believe this also solves the same thing we were trying to address in 
> KAFKA-598. The max_bytes setting now becomes the new limit that would need to 
> be compared to max_message size. This can be much larger--e.g. setting a 50MB 
> max_bytes setting would be okay, whereas now if you set 50MB you may need to 
> allocate 50MB*num_partitions.
> This will require evolving the fetch request protocol version to add the new 
> field and we should do a KIP for it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1683: KAFKA-2063: Add possibility to bound fetch respons...

2016-08-30 Thread nepal
Github user nepal closed the pull request at:

https://github.com/apache/kafka/pull/1683


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] KIP-78: Cluster Id

2016-08-30 Thread Ismael Juma
Hi Harsha,

It's a good question. If your broker connects to a different zookeeper root
(whether on the same server or not), the outcome depends on whether a
cluster id already exists there. If it does, then that cluster id will be
used. If not, a new cluster id will be generated.

The existing KIP doesn't try to solve that problem although we listed the
following under "Future Improvements":

3. Use the cluster id to ensure that brokers are connected to the right
> cluster: it's useful, but something that can be done later via a separate
> KIP. One of the discussion points is how the broker knows its cluster id
> (e.g. via a config or by storing it after the first connection to the
> cluster).


One of the options matches your suggestion to store the cluster id in
`meta.properties`. We were thinking that it would make sense to reject the
connection if the cluster id did not match. In that case, migrating Kafka
to a different ZooKeeper root would require setting the cluster id on the
new root before migrating. Another option is to do it automatically if the
cluster id is not set in ZooKeeper (i.e. if there's a cluster id in
`meta.properties` and there isn't one in ZooKeeper, set the cluster id
in ZooKeeper). This is perhaps a bit too much magic as we don't
auto-migrate anything else like ACLs when you change the ZooKeeper root.

In any case, we think the above can be tackled in a subsequent KIP while
the existing one is valuable in its current form. Does that make sense?

Thanks,
Ismael

On Mon, Aug 29, 2016 at 6:51 PM, Harsha Chintalapani 
wrote:

> Ismael,
>What happens when the cluster.id changes from initial value.
> Ex,
> Users changed their zookeeper.root and now new cluster.id generated. Do
> you
> think it would be useful to store this in meta.properties along with
> broker.id. So that we only generate it once and store it in disk.
>
> Thanks,
> Harsha
>
> On Sat, Aug 27, 2016 at 4:47 PM Gwen Shapira  wrote:
>
> > Thanks Ismael, this looks great.
> >
> > One of the things you mentioned is that cluster ID will be useful in
> > log aggregation. Perhaps it makes sense to include cluster ID in the
> > log? For example, as one of the things a broker logs after startup?
> > And ideally clients would log that as well after successful parsing of
> > MetadataResponse?
> >
> > Gwen
> >
> >
> > On Sat, Aug 27, 2016 at 4:39 AM, Ismael Juma  wrote:
> > > Hi all,
> > >
> > > We've posted "KIP-78: Cluster Id" for discussion:
> > >
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-78%3A+Cluster+Id
> > >
> > > Please take a look. Your feedback is appreciated.
> > >
> > > Thanks,
> > > Ismael
> >
> >
> >
> > --
> > Gwen Shapira
> > Product Manager | Confluent
> > 650.450.2760 | @gwenshap
> > Follow us: Twitter | blog
> >
>


Re: [DISCUSS] KIP-78: Cluster Id

2016-08-30 Thread Ismael Juma
Hi Gwen,

That's a good point. I updated the KIP to mention that.

Thanks,
Ismael

On Sun, Aug 28, 2016 at 12:47 AM, Gwen Shapira  wrote:

> Thanks Ismael, this looks great.
>
> One of the things you mentioned is that cluster ID will be useful in
> log aggregation. Perhaps it makes sense to include cluster ID in the
> log? For example, as one of the things a broker logs after startup?
> And ideally clients would log that as well after successful parsing of
> MetadataResponse?
>
> Gwen
>
>
> On Sat, Aug 27, 2016 at 4:39 AM, Ismael Juma  wrote:
> > Hi all,
> >
> > We've posted "KIP-78: Cluster Id" for discussion:
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-78%3A+Cluster+Id
> >
> > Please take a look. Your feedback is appreciated.
> >
> > Thanks,
> > Ismael
>
>
>
> --
> Gwen Shapira
> Product Manager | Confluent
> 650.450.2760 | @gwenshap
> Follow us: Twitter | blog
>


[jira] [Updated] (KAFKA-3282) Change tools to use new consumer if zookeeper is not specified

2016-08-30 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-3282:
---
Description: 
This only applies to tools that support the new consumer and it's similar to 
what we did with the producer for 0.9.0.0, but with a better compatibility 
story.

Part of this JIRA is updating the documentation to remove `--new-consumer` from 
command invocations where appropriate. An example where this will be the case 
is in the security documentation.

  was:
This only applies to tools that support the new consumer and it's similar to 
what we did with the producer for 0.9.0.0.

Part of this JIRA is updating the documentation to remove `--new-consumer` from 
command invocations where appropriate. An example where this will be the case 
is in the security documentation.


> Change tools to use new consumer if zookeeper is not specified
> --
>
> Key: KAFKA-3282
> URL: https://issues.apache.org/jira/browse/KAFKA-3282
> Project: Kafka
>  Issue Type: Improvement
>  Components: tools
>Reporter: Ismael Juma
>Assignee: Arun Mahadevan
> Fix For: 0.10.1.0
>
>
> This only applies to tools that support the new consumer and it's similar to 
> what we did with the producer for 0.9.0.0, but with a better compatibility 
> story.
> Part of this JIRA is updating the documentation to remove `--new-consumer` 
> from command invocations where appropriate. An example where this will be the 
> case is in the security documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3282) Change tools to use new consumer if zookeeper is not specified

2016-08-30 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-3282:
---
Summary: Change tools to use new consumer if zookeeper is not specified  
(was: Change tools to use --new-consumer by default)

> Change tools to use new consumer if zookeeper is not specified
> --
>
> Key: KAFKA-3282
> URL: https://issues.apache.org/jira/browse/KAFKA-3282
> Project: Kafka
>  Issue Type: Improvement
>  Components: tools
>Reporter: Ismael Juma
>Assignee: Arun Mahadevan
> Fix For: 0.10.1.0
>
>
> This only applies to tools that support the new consumer and it's similar to 
> what we did with the producer for 0.9.0.0.
> Part of this JIRA is updating the documentation to remove `--new-consumer` 
> from command invocations where appropriate. An example where this will be the 
> case is in the security documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3282) Change tools to use --new-consumer by default and introduce --old-consumer

2016-08-30 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15448837#comment-15448837
 ] 

Ismael Juma commented on KAFKA-3282:


PR link https://github.com/apache/kafka/pull/1376

> Change tools to use --new-consumer by default and introduce --old-consumer
> --
>
> Key: KAFKA-3282
> URL: https://issues.apache.org/jira/browse/KAFKA-3282
> Project: Kafka
>  Issue Type: Improvement
>  Components: tools
>Reporter: Ismael Juma
>Assignee: Arun Mahadevan
> Fix For: 0.10.1.0
>
>
> This only applies to tools that support the new consumer and it's similar to 
> what we did with the producer for 0.9.0.0.
> Part of this JIRA is updating the documentation to remove `--new-consumer` 
> from command invocations where appropriate. An example where this will be the 
> case is in the security documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3282) Change tools to use --new-consumer by default

2016-08-30 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-3282:
---
Summary: Change tools to use --new-consumer by default  (was: Change tools 
to use --new-consumer by default and introduce --old-consumer)

> Change tools to use --new-consumer by default
> -
>
> Key: KAFKA-3282
> URL: https://issues.apache.org/jira/browse/KAFKA-3282
> Project: Kafka
>  Issue Type: Improvement
>  Components: tools
>Reporter: Ismael Juma
>Assignee: Arun Mahadevan
> Fix For: 0.10.1.0
>
>
> This only applies to tools that support the new consumer and it's similar to 
> what we did with the producer for 0.9.0.0.
> Part of this JIRA is updating the documentation to remove `--new-consumer` 
> from command invocations where appropriate. An example where this will be the 
> case is in the security documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3283) Consider marking the new consumer as production-ready

2016-08-30 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15448807#comment-15448807
 ] 

Ismael Juma commented on KAFKA-3283:


[~hachikuji], do you think you could do a PR for the documentation change as 
per the mailing list discussion on this?

> Consider marking the new consumer as production-ready
> -
>
> Key: KAFKA-3283
> URL: https://issues.apache.org/jira/browse/KAFKA-3283
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer
>Reporter: Ismael Juma
>Assignee: Jason Gustafson
> Fix For: 0.10.1.0
>
>
> Ideally, we would:
> * Remove the beta label
> * Filling any critical gaps in functionality
> * Update the documentation on the old consumers to recommend the new consumer 
> (without deprecating the old consumer, however)
> Current target is 0.10.1.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] Remove beta label from the new Java consumer

2016-08-30 Thread Ismael Juma
Thanks for the feedback everyone. Since Harsha said that he is OK either
way and everyone else is in favour, I think we should go ahead with this.
Since we committed to API stability for the new Java consumer in 0.10.0.0
via KIP-45, this is simply a documentation change and I don't think we need
an official vote thread (we didn't have one for the equivalent producer
change).

Ismael

On Mon, Aug 29, 2016 at 7:37 PM, Jay Kreps  wrote:

> +1 I talk to a lot of kafka users, and I would say > 75% of people doing
> new things are on the new consumer despite our warnings :-)
>
> -Jay
>
> On Thu, Aug 25, 2016 at 2:05 PM, Jason Gustafson 
> wrote:
>
> > I'm +1 also. I feel a lot more confident about this with all of the
> system
> > testing we now have in place (including the tests covering Streams and
> > Connect).
> >
> > -Jason
> >
> > On Thu, Aug 25, 2016 at 9:57 AM, Gwen Shapira  wrote:
> >
> > > Makes sense :)
> > >
> > > On Thu, Aug 25, 2016 at 9:40 AM, Neha Narkhede 
> > wrote:
> > > > Yeah, I'm supportive of this.
> > > >
> > > > On Thu, Aug 25, 2016 at 9:26 AM Ismael Juma 
> wrote:
> > > >
> > > >> Hi Gwen,
> > > >>
> > > >> We have a few recent stories of people using Connect and Streams in
> > > >> production. That means the new Java Consumer too. :)
> > > >>
> > > >> Ismael
> > > >>
> > > >> On Thu, Aug 25, 2016 at 5:09 PM, Gwen Shapira 
> > > wrote:
> > > >>
> > > >> > Originally, we suggested keeping the beta label until we know
> > someone
> > > >> > successfully uses the new consumer in production.
> > > >> >
> > > >> > We can consider the recent KIPs enough, but IMO it will be better
> if
> > > >> > someone with production deployment hanging out on our mailing list
> > > >> > will confirm good experience with the new consumer.
> > > >> >
> > > >> > Gwen
> > > >> >
> > > >> > On Wed, Aug 24, 2016 at 8:45 PM, Ismael Juma 
> > > wrote:
> > > >> > > Hi all,
> > > >> > >
> > > >> > > We currently say the following in our documentation:
> > > >> > >
> > > >> > > "As of the 0.9.0 release we have added a new Java consumer to
> > > replace
> > > >> our
> > > >> > > existing high-level ZooKeeper-based consumer and low-level
> > consumer
> > > >> APIs.
> > > >> > > This client is considered beta quality."[1]
> > > >> > >
> > > >> > > Since then, Jason and the community have done a lot of work to
> > > improve
> > > >> it
> > > >> > > (including KIP-41 and KIP-62), we declared it API stable in
> > 0.10.0.0
> > > >> and
> > > >> > > it's the only option for those that need security support. Yes,
> it
> > > >> still
> > > >> > > has bugs, but so does the old consumer and all development is
> > > currently
> > > >> > > focused on the new consumer.
> > > >> > >
> > > >> > > As such, I propose we remove the beta label for the next release
> > and
> > > >> > switch
> > > >> > > our tools to use the new consumer by default unless the
> zookeeper
> > > >> > > command-line option is present (for compatibility). This is
> > similar
> > > to
> > > >> > what
> > > >> > > we did it for the new producer in 0.9.0.0, but backwards
> > compatible.
> > > >> > >
> > > >> > > Thoughts?
> > > >> > >
> > > >> > > Ismael
> > > >> > >
> > > >> > > [1] http://kafka.apache.org/documentation.html#consumerapi
> > > >> >
> > > >> >
> > > >> >
> > > >> > --
> > > >> > Gwen Shapira
> > > >> > Product Manager | Confluent
> > > >> > 650.450.2760 | @gwenshap
> > > >> > Follow us: Twitter | blog
> > > >> >
> > > >>
> > > > --
> > > > Thanks,
> > > > Neha
> > >
> > >
> > >
> > > --
> > > Gwen Shapira
> > > Product Manager | Confluent
> > > 650.450.2760 | @gwenshap
> > > Follow us: Twitter | blog
> > >
> >
>


Question regarding Producer and Duplicates

2016-08-30 Thread Florian Hussonnois
Hi all,

I am using kafka_2.11-0.10.0.1, my understanding is that the producer API
batches records per partition to send efficient requests. We can configure
batch.size to increase the throughtput.

However, in case of failure all records within the batch failed ? If that
is true,  does that mean that increasing batch.size can also increase the
number of duplicates in case of retries ?

Thanks,

Florian.


Re: [VOTE] KIP-77: Improve Kafka Streams Join Semantics

2016-08-30 Thread Damian Guy
+1

On Mon, 29 Aug 2016 at 18:07 Eno Thereska  wrote:

> +1 (non-binding)
>
> > On 29 Aug 2016, at 12:22, Bill Bejeck  wrote:
> >
> > +1
> >
> > On Mon, Aug 29, 2016 at 5:50 AM, Matthias J. Sax 
> > wrote:
> >
> >> I’d like to initiate the voting process for KIP-77:
> >>
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >> 77%3A+Improve+Kafka+Streams+Join+Semantics
> >>
> >> -Matthias
> >>
> >>
> >>
>
>


[jira] [Assigned] (KAFKA-4081) Consumer API consumer new interface commitSyn does not verify the validity of offset

2016-08-30 Thread Mickael Maison (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mickael Maison reassigned KAFKA-4081:
-

Assignee: Mickael Maison

> Consumer API consumer new interface commitSyn does not verify the validity of 
> offset
> 
>
> Key: KAFKA-4081
> URL: https://issues.apache.org/jira/browse/KAFKA-4081
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.9.0.1
>Reporter: lifeng
>Assignee: Mickael Maison
>
> Consumer API consumer new interface commitSyn synchronization update offset, 
> for the illegal offset successful return, illegal offset<0 or offset>hw



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)