[jira] [Commented] (KAFKA-1646) Improve consumer read performance for Windows

2015-06-22 Thread Honghai Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14595450#comment-14595450
 ] 

Honghai Chen commented on KAFKA-1646:
-

See the commit 
https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=commit;h=ca758252c5a524fe6135a585282dd4bf747afef2
 
Many thanks for everyone for your help to make this happen.

> Improve consumer read performance for Windows
> -
>
> Key: KAFKA-1646
> URL: https://issues.apache.org/jira/browse/KAFKA-1646
> Project: Kafka
>  Issue Type: Improvement
>  Components: log
>Affects Versions: 0.8.1.1
> Environment: Windows
>Reporter: xueqiang wang
>Assignee: Honghai Chen
>  Labels: newbie, patch
> Fix For: 0.8.3
>
> Attachments: Improve consumer read performance for Windows.patch, 
> KAFKA-1646-truncate-off-trailing-zeros-on-broker-restart-if-bro.patch, 
> KAFKA-1646_20141216_163008.patch, KAFKA-1646_20150306_005526.patch, 
> KAFKA-1646_20150511_AddTestcases.patch, 
> KAFKA-1646_20150609_MergeToLatestTrunk.patch, 
> KAFKA-1646_20150616_FixFormat.patch, KAFKA-1646_20150618_235231.patch
>
>
> This patch is for Window platform only. In Windows platform, if there are 
> more than one replicas writing to disk, the segment log files will not be 
> consistent in disk and then consumer reading performance will be dropped down 
> greatly. This fix allocates more disk spaces when rolling a new segment, and 
> then it will improve the consumer reading performance in NTFS file system.
> This patch doesn't affect file allocation of other filesystems, for it only 
> adds statements like 'if(Os.iswindow)' or adds methods used on Windows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (KAFKA-2290) OffsetIndex should open RandomAccessFile consistently

2015-06-22 Thread Chris Black (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Black reassigned KAFKA-2290:
--

Assignee: Chris Black

> OffsetIndex should open RandomAccessFile consistently
> -
>
> Key: KAFKA-2290
> URL: https://issues.apache.org/jira/browse/KAFKA-2290
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.9.0
>Reporter: Jun Rao
>Assignee: Chris Black
>  Labels: newbie
>
> We open RandomAccessFile in "rw" mode in the constructor, but in "rws" mode 
> in resize(). We should use "rw" in both cases since it's more efficient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2290) OffsetIndex should open RandomAccessFile consistently

2015-06-22 Thread Chris Black (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Black updated KAFKA-2290:
---
Attachment: KAFKA-2290.patch

> OffsetIndex should open RandomAccessFile consistently
> -
>
> Key: KAFKA-2290
> URL: https://issues.apache.org/jira/browse/KAFKA-2290
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.9.0
>Reporter: Jun Rao
>Assignee: Chris Black
>  Labels: newbie
> Attachments: KAFKA-2290.patch
>
>
> We open RandomAccessFile in "rw" mode in the constructor, but in "rws" mode 
> in resize(). We should use "rw" in both cases since it's more efficient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-26 - Add Copycat, a connector framework for data import/export

2015-06-22 Thread Jiangjie Qin
Very useful KIP.
I have no clear opinion over where to put the framework will be better yet.
I agree with Gwen on the benefits we can get from have a separate project
for Copycat. But still have a few questions:

1. As far as code is concerned, Copycat would be some datasource adapters
+ Kafka clients. My guess is for most people who wants to contribute to
Copycat, the code would be on data source adapter part, while Kafka
clients part will rarely be touched. The framework itself probably only
needs change when some changes are mede to Kafka. If that is the case, it
seems cleaner to make connectors as a separate library project instead of
having a static framework along with it?

2. I am not sure whether it matters or not. Say if I¹m a user and only
want to use Copycat while Kafka cluster is maintained by someone else. If
we package Copycat with Kafka, I have to get the entire Kafka even if I
only want Copycat. Is it necessary if we want to guarantee compatibility
between Copycat and Kafka?

That said, I kind of think the packaging should depend on:
How tightly coupled it is between Kafka and Copycat vs. between Connectors
and Copycat.
How easily user can use.

Thanks,

Jiangjie (Becket) Qin

On 6/21/15, 9:24 PM, "Gwen Shapira"  wrote:

>Ah, I see this in rejected alternatives now. Sorry :)
>
>I actually prefer the idea of a separate project for framework +
>connectors over having the framework be part of Apache Kafka.
>
>Looking at nearby examples: Hadoop has created a wide ecosystem of
>projects, with Sqoop and Flume supplying connectors. Spark on the
>other hand keeps its subprojects as part of Apache Spark.
>
>When I look at both projects, I see that Flume and Sqoop created
>active communities (that was especially true a few years back when we
>were rapidly growing), with many companies contributing. Spark OTOH
>(and with all respect to my friends at Spark), has tons of
>contributors to its core, but much less activity on its sub-projects
>(for example, SparkStreaming). I strongly believe that SparkStreaming
>is under-served by being a part of Spark, especially when compared to
>Storm which is an independent project with its own community.
>
>The way I see it, connector frameworks are significantly simpler than
>distributed data stores (although they are pretty large in terms of
>code base, especially with copycat having its own distributed
>processing framework). Which means that the barrier to contribution to
>connector frameworks is lower, both for contributing to the framework
>and for contributing connectors. Separate communities can also have
>different rules regarding dependencies and committership.
>Committership is the big one, and IMO what prevents SparkStreaming
>from growing - I can give someone commit bit on Sqoop without giving
>them any power over Hadoop. Not true for Spark and SparkStreaming.
>This means that a CopyCat community (with its own sexy cat logo) will
>be able to attract more volunteers and grow at a faster pace than core
>Kafka, making it more useful to the community.
>
>The other part is that just like Kafka will be more useful with a
>connector framework, a connector framework tends to work better when
>there are lots of connectors. So if we decide to partition the Kafka /
>Connector framework / Connectors triad, I'm not sure which
>partitioning makes more sense. Giving CopyCat (I love the name. You
>can say things like "get the data into MySQL and CC Kafka") its own
>community will allow the CopyCat community to accept connector
>contributions, which is good for CopyCat and for Kafka adoption.
>Oracle and Netezza contributed connectors to Sqoop, they probably
>couldn't contribute it at all if Sqoop was inside Hadoop, and they
>can't really opensource their own stuff through Github, so it was a
>win for our community. This doesn't negate the possibility to create
>connectors for CopyCat and not contribute them to the community (like
>the popular Teradata connector for Sqoop).
>
>Regarding ease of use and adoption: Right now, a lot of people adopt
>Kafka as stand-alone piece, while Hadoop usually shows up through a
>distribution. I expect that soon people will start adopting Kafka
>through distributions, so the framework and a collection of connectors
>will be part of every distribution. In the same way that no one thinks
>of Sqoop or Flume as stand alone projects. With a bunch of Kafka
>distributions out there, people will get Kafka + Framework +
>Connectors, with a core connection portion being common to multiple
>distributions - this will allow even easier adoption, while allowing
>the Kafka community to focus on core Kafka.
>
>The point about documentation that Ewen has made in the KIP is a good
>one. We definitely want to point people to the right place for export
>/ import tools. However, it sounds solvable with few links.
>
>Sorry for the lengthy essay - I'm a bit passionate about connectors
>and want to see CopyCat off to a great start in life :)
>
>(BTW. I think Ap

[jira] [Commented] (KAFKA-2092) New partitioning for better load balancing

2015-06-22 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14595510#comment-14595510
 ] 

Gianmarco De Francisci Morales commented on KAFKA-2092:
---

Any more thoughts on this?

> New partitioning for better load balancing
> --
>
> Key: KAFKA-2092
> URL: https://issues.apache.org/jira/browse/KAFKA-2092
> Project: Kafka
>  Issue Type: Improvement
>  Components: producer 
>Reporter: Gianmarco De Francisci Morales
>Assignee: Jun Rao
> Attachments: KAFKA-2092-v1.patch
>
>
> We have recently studied the problem of load balancing in distributed stream 
> processing systems such as Samza [1].
> In particular, we focused on what happens when the key distribution of the 
> stream is skewed when using key grouping.
> We developed a new stream partitioning scheme (which we call Partial Key 
> Grouping). It achieves better load balancing than hashing while being more 
> scalable than round robin in terms of memory.
> In the paper we show a number of mining algorithms that are easy to implement 
> with partial key grouping, and whose performance can benefit from it. We 
> think that it might also be useful for a larger class of algorithms.
> PKG has already been integrated in Storm [2], and I would like to be able to 
> use it in Samza as well. As far as I understand, Kafka producers are the ones 
> that decide how to partition the stream (or Kafka topic).
> I do not have experience with Kafka, however partial key grouping is very 
> easy to implement: it requires just a few lines of code in Java when 
> implemented as a custom grouping in Storm [3].
> I believe it should be very easy to integrate.
> For all these reasons, I believe it will be a nice addition to Kafka/Samza. 
> If the community thinks it's a good idea, I will be happy to offer support in 
> the porting.
> References:
> [1] 
> https://melmeric.files.wordpress.com/2014/11/the-power-of-both-choices-practical-load-balancing-for-distributed-stream-processing-engines.pdf
> [2] https://issues.apache.org/jira/browse/STORM-632
> [3] https://github.com/gdfm/partial-key-grouping



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-2092) New partitioning for better load balancing

2015-06-22 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14589634#comment-14589634
 ] 

Gianmarco De Francisci Morales edited comment on KAFKA-2092 at 6/22/15 8:42 AM:


Thanks for your comment [~jkreps].
Indeed, this uses the load estimated at the producer to infer the load at the 
consumer. You might think this does not work but indeed it does in most cases 
(see [1] for details). I am not sure whether the lifecycle of the producer has 
any impact here. The goal is simply to send balanced partitions out of the 
producer.

Regarding the key=>partition mapping, yes this breaks the 1 key to 1 partition 
mapping. That's exactly the point, to offer a new primitive for stream 
partitioning. If you are doing word count you need a final aggregator as you 
say, but the aggregation is O(1) rather than O(W) [where W is the number of 
workers, i.e., parallelism of the operator]. Also, if you imagine building 
views out of these partitions, you can query 2 views rather than 1 to obtain 
the final answer (again, compared to shuffle grouping where you need W queries).

I disagree with your last point (and the results do too). Given that you have 2 
options, the imbalance is reduced much more than just by 2 times, because you 
create options to offload part of the load on a heavy partition to the second 
choice, thus creating a network of "backup/offload" options to move to when one 
key becomes hot. It's as creating interconnected pipes where you pump a fluid 
into.

What is true is that if the single heavy key is larger than (2/W)% of the 
stream, then this technique cannot help you to achieve perfect load balance.


was (Author: azaroth):
Thanks for your comment [~jkreps].
Indeed, this uses the load estimated at the producer to infer the load at the 
consumer. You might think this does not work but indeed it does in most cases 
(see [1] for details). I am not sure whether the lifecycle of the producer has 
any impact here. The goal is simply to send balanced partitions out of the 
producer.

Regarding the key=>partition mapping, yes this breaks the 1 key to 1 partition 
mapping. That's exactly the point, to offer a new primitive for stream 
partitioning. If you are doing word count you need a final aggregator as you 
say, but the aggregation is O(1) rather than O(W) [where W is the number of 
workers, i.e., parallelism of the operator]. Also, if you imagine building 
views out of these partitions, you can query 2 views rather than 1 to obtain 
the final answer (again, compared to shuffle grouping where you need p queries).

I disagree with your last point (and the results do too). Given that you have 2 
options, the imbalance is reduced much more than just by 2 times, because you 
create options to offload part of the load on a heavy partition to the second 
choice, thus creating a network of "backup/offload" options to move to when one 
key becomes hot. It's as creating interconnected pipes where you pump a fluid 
into.

What is true is that if the single heavy key is larger than (2/W)% of the 
stream, then this technique cannot help you to achieve perfect load balance.

> New partitioning for better load balancing
> --
>
> Key: KAFKA-2092
> URL: https://issues.apache.org/jira/browse/KAFKA-2092
> Project: Kafka
>  Issue Type: Improvement
>  Components: producer 
>Reporter: Gianmarco De Francisci Morales
>Assignee: Jun Rao
> Attachments: KAFKA-2092-v1.patch
>
>
> We have recently studied the problem of load balancing in distributed stream 
> processing systems such as Samza [1].
> In particular, we focused on what happens when the key distribution of the 
> stream is skewed when using key grouping.
> We developed a new stream partitioning scheme (which we call Partial Key 
> Grouping). It achieves better load balancing than hashing while being more 
> scalable than round robin in terms of memory.
> In the paper we show a number of mining algorithms that are easy to implement 
> with partial key grouping, and whose performance can benefit from it. We 
> think that it might also be useful for a larger class of algorithms.
> PKG has already been integrated in Storm [2], and I would like to be able to 
> use it in Samza as well. As far as I understand, Kafka producers are the ones 
> that decide how to partition the stream (or Kafka topic).
> I do not have experience with Kafka, however partial key grouping is very 
> easy to implement: it requires just a few lines of code in Java when 
> implemented as a custom grouping in Storm [3].
> I believe it should be very easy to integrate.
> For all these reasons, I believe it will be a nice addition to Kafka/Samza. 
> If the community thinks it's a good idea, I will be happy to offer

[jira] [Updated] (KAFKA-2235) LogCleaner offset map overflow

2015-06-22 Thread Ivan Simoneko (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Simoneko updated KAFKA-2235:
-
Attachment: KAFKA-2235_v2.patch

> LogCleaner offset map overflow
> --
>
> Key: KAFKA-2235
> URL: https://issues.apache.org/jira/browse/KAFKA-2235
> Project: Kafka
>  Issue Type: Bug
>  Components: core, log
>Affects Versions: 0.8.1, 0.8.2.0
>Reporter: Ivan Simoneko
>Assignee: Jay Kreps
> Attachments: KAFKA-2235_v1.patch, KAFKA-2235_v2.patch
>
>
> We've seen log cleaning generating an error for a topic with lots of small 
> messages. It seems that cleanup map overflow is possible if a log segment 
> contains more unique keys than empty slots in offsetMap. Check for baseOffset 
> and map utilization before processing segment seems to be not enough because 
> it doesn't take into account segment size (number of unique messages in the 
> segment).
> I suggest to estimate upper bound of keys in a segment as a number of 
> messages in the segment and compare it with the number of available slots in 
> the map (keeping in mind desired load factor). It should work in cases where 
> an empty map is capable to hold all the keys for a single segment. If even a 
> single segment no able to fit into an empty map cleanup process will still 
> fail. Probably there should be a limit on the log segment entries count?
> Here is the stack trace for this error:
> 2015-05-19 16:52:48,758 ERROR [kafka-log-cleaner-thread-0] 
> kafka.log.LogCleaner - [kafka-log-cleaner-thread-0], Error due to
> java.lang.IllegalArgumentException: requirement failed: Attempt to add a new 
> entry to a full offset map.
>at scala.Predef$.require(Predef.scala:233)
>at kafka.log.SkimpyOffsetMap.put(OffsetMap.scala:79)
>at 
> kafka.log.Cleaner$$anonfun$kafka$log$Cleaner$$buildOffsetMapForSegment$1.apply(LogCleaner.scala:543)
>at 
> kafka.log.Cleaner$$anonfun$kafka$log$Cleaner$$buildOffsetMapForSegment$1.apply(LogCleaner.scala:538)
>at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>at kafka.utils.IteratorTemplate.foreach(IteratorTemplate.scala:32)
>at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>at kafka.message.MessageSet.foreach(MessageSet.scala:67)
>at 
> kafka.log.Cleaner.kafka$log$Cleaner$$buildOffsetMapForSegment(LogCleaner.scala:538)
>at 
> kafka.log.Cleaner$$anonfun$buildOffsetMap$3.apply(LogCleaner.scala:515)
>at 
> kafka.log.Cleaner$$anonfun$buildOffsetMap$3.apply(LogCleaner.scala:512)
>at scala.collection.immutable.Stream.foreach(Stream.scala:547)
>at kafka.log.Cleaner.buildOffsetMap(LogCleaner.scala:512)
>at kafka.log.Cleaner.clean(LogCleaner.scala:307)
>at 
> kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:221)
>at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:199)
>at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2235) LogCleaner offset map overflow

2015-06-22 Thread Ivan Simoneko (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14595798#comment-14595798
 ] 

Ivan Simoneko commented on KAFKA-2235:
--

[~junrao] thank you for review. Please check the patch v2. I think in most 
cases mentioning log.cleaner.dedupe.buffer.size should be enough, but as 
log.cleaner.threads is also used in determining map size I've added both of 
them. If someone increases threads num and start getting this message he can 
easily understand cause of the problem

> LogCleaner offset map overflow
> --
>
> Key: KAFKA-2235
> URL: https://issues.apache.org/jira/browse/KAFKA-2235
> Project: Kafka
>  Issue Type: Bug
>  Components: core, log
>Affects Versions: 0.8.1, 0.8.2.0
>Reporter: Ivan Simoneko
>Assignee: Jay Kreps
> Attachments: KAFKA-2235_v1.patch, KAFKA-2235_v2.patch
>
>
> We've seen log cleaning generating an error for a topic with lots of small 
> messages. It seems that cleanup map overflow is possible if a log segment 
> contains more unique keys than empty slots in offsetMap. Check for baseOffset 
> and map utilization before processing segment seems to be not enough because 
> it doesn't take into account segment size (number of unique messages in the 
> segment).
> I suggest to estimate upper bound of keys in a segment as a number of 
> messages in the segment and compare it with the number of available slots in 
> the map (keeping in mind desired load factor). It should work in cases where 
> an empty map is capable to hold all the keys for a single segment. If even a 
> single segment no able to fit into an empty map cleanup process will still 
> fail. Probably there should be a limit on the log segment entries count?
> Here is the stack trace for this error:
> 2015-05-19 16:52:48,758 ERROR [kafka-log-cleaner-thread-0] 
> kafka.log.LogCleaner - [kafka-log-cleaner-thread-0], Error due to
> java.lang.IllegalArgumentException: requirement failed: Attempt to add a new 
> entry to a full offset map.
>at scala.Predef$.require(Predef.scala:233)
>at kafka.log.SkimpyOffsetMap.put(OffsetMap.scala:79)
>at 
> kafka.log.Cleaner$$anonfun$kafka$log$Cleaner$$buildOffsetMapForSegment$1.apply(LogCleaner.scala:543)
>at 
> kafka.log.Cleaner$$anonfun$kafka$log$Cleaner$$buildOffsetMapForSegment$1.apply(LogCleaner.scala:538)
>at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>at kafka.utils.IteratorTemplate.foreach(IteratorTemplate.scala:32)
>at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>at kafka.message.MessageSet.foreach(MessageSet.scala:67)
>at 
> kafka.log.Cleaner.kafka$log$Cleaner$$buildOffsetMapForSegment(LogCleaner.scala:538)
>at 
> kafka.log.Cleaner$$anonfun$buildOffsetMap$3.apply(LogCleaner.scala:515)
>at 
> kafka.log.Cleaner$$anonfun$buildOffsetMap$3.apply(LogCleaner.scala:512)
>at scala.collection.immutable.Stream.foreach(Stream.scala:547)
>at kafka.log.Cleaner.buildOffsetMap(LogCleaner.scala:512)
>at kafka.log.Cleaner.clean(LogCleaner.scala:307)
>at 
> kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:221)
>at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:199)
>at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-2290) OffsetIndex should open RandomAccessFile consistently

2015-06-22 Thread Jun Rao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao resolved KAFKA-2290.

   Resolution: Fixed
Fix Version/s: 0.8.3

Thanks for the patch. +1 and committed to trunk.

> OffsetIndex should open RandomAccessFile consistently
> -
>
> Key: KAFKA-2290
> URL: https://issues.apache.org/jira/browse/KAFKA-2290
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.9.0
>Reporter: Jun Rao
>Assignee: Chris Black
>  Labels: newbie
> Fix For: 0.8.3
>
> Attachments: KAFKA-2290.patch
>
>
> We open RandomAccessFile in "rw" mode in the constructor, but in "rws" mode 
> in resize(). We should use "rw" in both cases since it's more efficient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 34789: Patch for KAFKA-2168

2015-06-22 Thread Guozhang Wang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34789/#review88796
---

Ship it!


I did not review it thoroughly but the design looks clean to me. Great work!

I think we can check it in to unblock other JIRAs, and come back to it when 
necessary in the future for any follow-up work.


clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerWakeupException.java
 (lines 15 - 18)


We would like to have a serialVersionUID for any classes extending 
Serializable. You can take a look at ConfigException for example.


- Guozhang Wang


On June 19, 2015, 4:19 p.m., Jason Gustafson wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34789/
> ---
> 
> (Updated June 19, 2015, 4:19 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-2168
> https://issues.apache.org/jira/browse/KAFKA-2168
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> KAFKA-2168; refactored callback handling to prevent unnecessary requests
> 
> 
> KAFKA-2168; address review comments
> 
> 
> KAFKA-2168; fix rebase error and checkstyle issue
> 
> 
> KAFKA-2168; address review comments and add docs
> 
> 
> KAFKA-2168; handle polling with timeout 0
> 
> 
> KAFKA-2168; timeout=0 means return immediately
> 
> 
> KAFKA-2168; address review comments
> 
> 
> KAFKA-2168; address more review comments
> 
> 
> KAFKA-2168; updated for review comments
> 
> 
> Diffs
> -
> 
>   clients/src/main/java/org/apache/kafka/clients/consumer/Consumer.java 
> 8f587bc0705b65b3ef37c86e0c25bb43ab8803de 
>   
> clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerRecords.java 
> 1ca75f83d3667f7d01da1ae2fd9488fb79562364 
>   
> clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerWakeupException.java
>  PRE-CREATION 
>   clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java 
> 951c34c92710fc4b38d656e99d2a41255c60aeb7 
>   clients/src/main/java/org/apache/kafka/clients/consumer/MockConsumer.java 
> f50da825756938c193d7f07bee953e000e2627d9 
>   
> clients/src/main/java/org/apache/kafka/clients/consumer/OffsetResetStrategy.java
>  PRE-CREATION 
>   
> clients/src/main/java/org/apache/kafka/clients/consumer/internals/Coordinator.java
>  41cb9458f51875ac9418fce52f264b35adba92f4 
>   
> clients/src/main/java/org/apache/kafka/clients/consumer/internals/Fetcher.java
>  56281ee15cc33dfc96ff64d5b1e596047c7132a4 
>   
> clients/src/main/java/org/apache/kafka/clients/consumer/internals/Heartbeat.java
>  e7cfaaad296fa6e325026a5eee1aaf9b9c0fe1fe 
>   
> clients/src/main/java/org/apache/kafka/clients/consumer/internals/RequestFuture.java
>  PRE-CREATION 
>   
> clients/src/main/java/org/apache/kafka/clients/consumer/internals/SubscriptionState.java
>  cee75410127dd1b86c1156563003216d93a086b3 
>   clients/src/main/java/org/apache/kafka/common/utils/Utils.java 
> f73eedb030987f018d8446bb1dcd98d19fa97331 
>   
> clients/src/test/java/org/apache/kafka/clients/consumer/MockConsumerTest.java 
> 677edd385f35d4262342b567262c0b874876d25b 
>   
> clients/src/test/java/org/apache/kafka/clients/consumer/internals/CoordinatorTest.java
>  1454ab73df22cce028f41f74b970628829da4e9d 
>   
> clients/src/test/java/org/apache/kafka/clients/consumer/internals/FetcherTest.java
>  419541011d652becf0cda7a5e62ce813cddb1732 
>   
> clients/src/test/java/org/apache/kafka/clients/consumer/internals/HeartbeatTest.java
>  ecc78cedf59a994fcf084fa7a458fe9ed5386b00 
>   
> clients/src/test/java/org/apache/kafka/clients/consumer/internals/SubscriptionStateTest.java
>  e000cf8e10ebfacd6c9ee68d7b88ff8c157f73c6 
>   clients/src/test/java/org/apache/kafka/common/utils/UtilsTest.java 
> 2ebe3c21f611dc133a2dbb8c7dfb0845f8c21498 
> 
> Diff: https://reviews.apache.org/r/34789/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Jason Gustafson
> 
>



[jira] [Resolved] (KAFKA-2235) LogCleaner offset map overflow

2015-06-22 Thread Jun Rao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao resolved KAFKA-2235.

   Resolution: Fixed
Fix Version/s: 0.8.3
 Assignee: Ivan Simoneko  (was: Jay Kreps)

Thanks for patch v2. +1. Committed to trunk after changing the following 
statement from testing < to <=.

  if (map.size + segmentSize <= maxDesiredMapSize)


> LogCleaner offset map overflow
> --
>
> Key: KAFKA-2235
> URL: https://issues.apache.org/jira/browse/KAFKA-2235
> Project: Kafka
>  Issue Type: Bug
>  Components: core, log
>Affects Versions: 0.8.1, 0.8.2.0
>Reporter: Ivan Simoneko
>Assignee: Ivan Simoneko
> Fix For: 0.8.3
>
> Attachments: KAFKA-2235_v1.patch, KAFKA-2235_v2.patch
>
>
> We've seen log cleaning generating an error for a topic with lots of small 
> messages. It seems that cleanup map overflow is possible if a log segment 
> contains more unique keys than empty slots in offsetMap. Check for baseOffset 
> and map utilization before processing segment seems to be not enough because 
> it doesn't take into account segment size (number of unique messages in the 
> segment).
> I suggest to estimate upper bound of keys in a segment as a number of 
> messages in the segment and compare it with the number of available slots in 
> the map (keeping in mind desired load factor). It should work in cases where 
> an empty map is capable to hold all the keys for a single segment. If even a 
> single segment no able to fit into an empty map cleanup process will still 
> fail. Probably there should be a limit on the log segment entries count?
> Here is the stack trace for this error:
> 2015-05-19 16:52:48,758 ERROR [kafka-log-cleaner-thread-0] 
> kafka.log.LogCleaner - [kafka-log-cleaner-thread-0], Error due to
> java.lang.IllegalArgumentException: requirement failed: Attempt to add a new 
> entry to a full offset map.
>at scala.Predef$.require(Predef.scala:233)
>at kafka.log.SkimpyOffsetMap.put(OffsetMap.scala:79)
>at 
> kafka.log.Cleaner$$anonfun$kafka$log$Cleaner$$buildOffsetMapForSegment$1.apply(LogCleaner.scala:543)
>at 
> kafka.log.Cleaner$$anonfun$kafka$log$Cleaner$$buildOffsetMapForSegment$1.apply(LogCleaner.scala:538)
>at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>at kafka.utils.IteratorTemplate.foreach(IteratorTemplate.scala:32)
>at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>at kafka.message.MessageSet.foreach(MessageSet.scala:67)
>at 
> kafka.log.Cleaner.kafka$log$Cleaner$$buildOffsetMapForSegment(LogCleaner.scala:538)
>at 
> kafka.log.Cleaner$$anonfun$buildOffsetMap$3.apply(LogCleaner.scala:515)
>at 
> kafka.log.Cleaner$$anonfun$buildOffsetMap$3.apply(LogCleaner.scala:512)
>at scala.collection.immutable.Stream.foreach(Stream.scala:547)
>at kafka.log.Cleaner.buildOffsetMap(LogCleaner.scala:512)
>at kafka.log.Cleaner.clean(LogCleaner.scala:307)
>at 
> kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:221)
>at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:199)
>at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 35677: Patch for KAFKA-2288

2015-06-22 Thread Gwen Shapira

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35677/
---

(Updated June 22, 2015, 5:02 p.m.)


Review request for kafka.


Bugs: KAFKA-2288
https://issues.apache.org/jira/browse/KAFKA-2288


Repository: kafka


Description (updated)
---

removed logging of topic overrides


Diffs (updated)
-

  clients/src/main/java/org/apache/kafka/common/config/AbstractConfig.java 
bae528d31516679bed88ee61b408f209f185a8cc 
  core/src/main/scala/kafka/log/LogConfig.scala 
fc41132d2bf29439225ec581829eb479f98cc416 
  core/src/test/scala/unit/kafka/log/LogConfigTest.scala 
19dcb47f3f406b8d6c3668297450ab6b534e4471 
  core/src/test/scala/unit/kafka/server/KafkaConfigConfigDefTest.scala 
98a5b042a710d3c1064b0379db1d152efc9eabee 
  core/src/test/scala/unit/kafka/server/KafkaConfigTest.scala 
2428dbd7197a58cf4cad42ef82b385dab3a2b15e 

Diff: https://reviews.apache.org/r/35677/diff/


Testing
---


Thanks,

Gwen Shapira



[jira] [Updated] (KAFKA-2288) Follow-up to KAFKA-2249 - reduce logging and testing

2015-06-22 Thread Gwen Shapira (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira updated KAFKA-2288:

Attachment: KAFKA-2288_2015-06-22_10:02:27.patch

> Follow-up to KAFKA-2249 - reduce logging and testing
> 
>
> Key: KAFKA-2288
> URL: https://issues.apache.org/jira/browse/KAFKA-2288
> Project: Kafka
>  Issue Type: Bug
>Reporter: Gwen Shapira
>Assignee: Gwen Shapira
> Attachments: KAFKA-2288.patch, KAFKA-2288_2015-06-22_10:02:27.patch
>
>
> As [~junrao] commented on KAFKA-2249, we have a needless test and we are 
> logging configuration for every single partition now. 
> Lets reduce the noise.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2288) Follow-up to KAFKA-2249 - reduce logging and testing

2015-06-22 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596233#comment-14596233
 ] 

Gwen Shapira commented on KAFKA-2288:
-

Updated reviewboard https://reviews.apache.org/r/35677/diff/
 against branch trunk

> Follow-up to KAFKA-2249 - reduce logging and testing
> 
>
> Key: KAFKA-2288
> URL: https://issues.apache.org/jira/browse/KAFKA-2288
> Project: Kafka
>  Issue Type: Bug
>Reporter: Gwen Shapira
>Assignee: Gwen Shapira
> Attachments: KAFKA-2288.patch, KAFKA-2288_2015-06-22_10:02:27.patch
>
>
> As [~junrao] commented on KAFKA-2249, we have a needless test and we are 
> logging configuration for every single partition now. 
> Lets reduce the noise.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2120) Add a request timeout to NetworkClient

2015-06-22 Thread Jason Gustafson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596242#comment-14596242
 ] 

Jason Gustafson commented on KAFKA-2120:


Hey [~mgharat] and [~becket_qin], how is this feature coming along? I'd be 
happy to help out if you don't have time for it.

> Add a request timeout to NetworkClient
> --
>
> Key: KAFKA-2120
> URL: https://issues.apache.org/jira/browse/KAFKA-2120
> Project: Kafka
>  Issue Type: New Feature
>Reporter: Jiangjie Qin
>Assignee: Mayuresh Gharat
>
> Currently NetworkClient does not have a timeout setting for requests. So if 
> no response is received for a request due to reasons such as broker is down, 
> the request will never be completed.
> Request timeout will also be used as implicit timeout for some methods such 
> as KafkaProducer.flush() and kafkaProducer.close().
> KIP-19 is created for this public interface change.
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-19+-+Add+a+request+timeout+to+NetworkClient



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-2293) IllegalFormatConversionException in Partition.scala

2015-06-22 Thread Aditya Auradkar (JIRA)
Aditya Auradkar created KAFKA-2293:
--

 Summary: IllegalFormatConversionException in Partition.scala
 Key: KAFKA-2293
 URL: https://issues.apache.org/jira/browse/KAFKA-2293
 Project: Kafka
  Issue Type: Bug
Reporter: Aditya Auradkar
Assignee: Aditya Auradkar


ERROR [KafkaApis] [kafka-request-handler-9] [kafka-server] [] [KafkaApi-306] 
error when handling request Name: 
java.util.IllegalFormatConversionException: d != kafka.server.LogOffsetMetadata
at 
java.util.Formatter$FormatSpecifier.failConversion(Formatter.java:4302)
at java.util.Formatter$FormatSpecifier.printInteger(Formatter.java:2793)
at java.util.Formatter$FormatSpecifier.print(Formatter.java:2747)
at java.util.Formatter.format(Formatter.java:2520)
at java.util.Formatter.format(Formatter.java:2455)
at java.lang.String.format(String.java:2925)
at 
scala.collection.immutable.StringLike$class.format(StringLike.scala:266)
at scala.collection.immutable.StringOps.format(StringOps.scala:31)
at 
kafka.cluster.Partition.updateReplicaLogReadResult(Partition.scala:253)
at 
kafka.server.ReplicaManager$$anonfun$updateFollowerLogReadResults$2.apply(ReplicaManager.scala:791)
at 
kafka.server.ReplicaManager$$anonfun$updateFollowerLogReadResults$2.apply(ReplicaManager.scala:788)
at scala.collection.immutable.Map$Map1.foreach(Map.scala:109)
at 
kafka.server.ReplicaManager.updateFollowerLogReadResults(ReplicaManager.scala:788)
at kafka.server.ReplicaManager.fetchMessages(ReplicaManager.scala:433)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2245) Add response tests for ConsumerCoordinator

2015-06-22 Thread Joel Koshy (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Koshy updated KAFKA-2245:
--
Resolution: Fixed
  Reviewer: Joel Koshy
Status: Resolved  (was: Patch Available)

Thanks for the patch - committed to trunk.

> Add response tests for ConsumerCoordinator
> --
>
> Key: KAFKA-2245
> URL: https://issues.apache.org/jira/browse/KAFKA-2245
> Project: Kafka
>  Issue Type: Sub-task
>  Components: consumer
>Reporter: Onur Karaman
>Assignee: Onur Karaman
>Priority: Critical
> Fix For: 0.8.3
>
> Attachments: KAFKA-2245.patch, KAFKA-2245_2015-06-19_22:55:56.patch
>
>
> We can validate error codes from JoinGroupResponses and HeartbeatResponses. 
> Currently this includes:
> - JoinGroupRequest to the wrong coordinator returns 
> NOT_COORDINATOR_FOR_CONSUMER
> - JoinGroupRequest with an unknown partition assignment strategy returns 
> UNKNOWN_PARTITION_ASSIGNMENT_STRATEGY
> - JoinGroupRequest with an out-of-range session timeout returns 
> INVALID_SESSION_TIMEOUT
> - JoinGroupRequest on a brand new group with an unrecognized consumer id 
> produces UNKNOWN_CONSUMER_ID
> - JoinGroupRequest with mismatched partition assignment strategy compared to 
> the rest of the group returns INCONSISTENT_PARTITION_ASSIGNMENT_STRATEGY
> - JoinGroupRequest on an existing group with an unrecognized consumer id 
> produces UNKNOWN_CONSUMER_ID
> - A correct JoinGroupRequest returns NONE
> - HeartbeatRequest to the wrong coordinator returns 
> NOT_COORDINATOR_FOR_CONSUMER
> - HeartbeatRequest with an unknown group returns UNKNOWN_CONSUMER_ID
> - HeartbeatRequest with an unrecognized consumer id returns 
> UNKNOWN_CONSUMER_ID
> - HeartbeatRequest with generation id mismatch returns ILLEGAL_GENERATION
> - A correct HeartbeatRequest returns NONE
> We can validate the generation id increments on rebalance based on the 
> JoinGroupResponse.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2120) Add a request timeout to NetworkClient

2015-06-22 Thread Mayuresh Gharat (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596265#comment-14596265
 ] 

Mayuresh Gharat commented on KAFKA-2120:


Hi [~jasong35], we have began work on this from today. Will try to get this 
done asap.

Thanks,

Mayuresh

> Add a request timeout to NetworkClient
> --
>
> Key: KAFKA-2120
> URL: https://issues.apache.org/jira/browse/KAFKA-2120
> Project: Kafka
>  Issue Type: New Feature
>Reporter: Jiangjie Qin
>Assignee: Mayuresh Gharat
>
> Currently NetworkClient does not have a timeout setting for requests. So if 
> no response is received for a request due to reasons such as broker is down, 
> the request will never be completed.
> Request timeout will also be used as implicit timeout for some methods such 
> as KafkaProducer.flush() and kafkaProducer.close().
> KIP-19 is created for this public interface change.
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-19+-+Add+a+request+timeout+to+NetworkClient



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Review Request 35734: Patch for KAFKA-2293

2015-06-22 Thread Aditya Auradkar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35734/
---

Review request for kafka.


Bugs: KAFKA-2293
https://issues.apache.org/jira/browse/KAFKA-2293


Repository: kafka


Description
---

Fix for 2293


Diffs
-

  core/src/main/scala/kafka/cluster/Partition.scala 
6cb647711191aee8d36e9ff15bdc2af4f1c95457 

Diff: https://reviews.apache.org/r/35734/diff/


Testing
---


Thanks,

Aditya Auradkar



[jira] [Commented] (KAFKA-2293) IllegalFormatConversionException in Partition.scala

2015-06-22 Thread Aditya A Auradkar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596296#comment-14596296
 ] 

Aditya A Auradkar commented on KAFKA-2293:
--

Created reviewboard https://reviews.apache.org/r/35734/diff/
 against branch origin/trunk

> IllegalFormatConversionException in Partition.scala
> ---
>
> Key: KAFKA-2293
> URL: https://issues.apache.org/jira/browse/KAFKA-2293
> Project: Kafka
>  Issue Type: Bug
>Reporter: Aditya Auradkar
>Assignee: Aditya Auradkar
> Attachments: KAFKA-2293.patch
>
>
> ERROR [KafkaApis] [kafka-request-handler-9] [kafka-server] [] [KafkaApi-306] 
> error when handling request Name: 
> java.util.IllegalFormatConversionException: d != 
> kafka.server.LogOffsetMetadata
> at 
> java.util.Formatter$FormatSpecifier.failConversion(Formatter.java:4302)
> at 
> java.util.Formatter$FormatSpecifier.printInteger(Formatter.java:2793)
> at java.util.Formatter$FormatSpecifier.print(Formatter.java:2747)
> at java.util.Formatter.format(Formatter.java:2520)
> at java.util.Formatter.format(Formatter.java:2455)
> at java.lang.String.format(String.java:2925)
> at 
> scala.collection.immutable.StringLike$class.format(StringLike.scala:266)
> at scala.collection.immutable.StringOps.format(StringOps.scala:31)
> at 
> kafka.cluster.Partition.updateReplicaLogReadResult(Partition.scala:253)
> at 
> kafka.server.ReplicaManager$$anonfun$updateFollowerLogReadResults$2.apply(ReplicaManager.scala:791)
> at 
> kafka.server.ReplicaManager$$anonfun$updateFollowerLogReadResults$2.apply(ReplicaManager.scala:788)
> at scala.collection.immutable.Map$Map1.foreach(Map.scala:109)
> at 
> kafka.server.ReplicaManager.updateFollowerLogReadResults(ReplicaManager.scala:788)
> at kafka.server.ReplicaManager.fetchMessages(ReplicaManager.scala:433)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2293) IllegalFormatConversionException in Partition.scala

2015-06-22 Thread Aditya A Auradkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya A Auradkar updated KAFKA-2293:
-
Attachment: KAFKA-2293.patch

> IllegalFormatConversionException in Partition.scala
> ---
>
> Key: KAFKA-2293
> URL: https://issues.apache.org/jira/browse/KAFKA-2293
> Project: Kafka
>  Issue Type: Bug
>Reporter: Aditya Auradkar
>Assignee: Aditya Auradkar
> Attachments: KAFKA-2293.patch
>
>
> ERROR [KafkaApis] [kafka-request-handler-9] [kafka-server] [] [KafkaApi-306] 
> error when handling request Name: 
> java.util.IllegalFormatConversionException: d != 
> kafka.server.LogOffsetMetadata
> at 
> java.util.Formatter$FormatSpecifier.failConversion(Formatter.java:4302)
> at 
> java.util.Formatter$FormatSpecifier.printInteger(Formatter.java:2793)
> at java.util.Formatter$FormatSpecifier.print(Formatter.java:2747)
> at java.util.Formatter.format(Formatter.java:2520)
> at java.util.Formatter.format(Formatter.java:2455)
> at java.lang.String.format(String.java:2925)
> at 
> scala.collection.immutable.StringLike$class.format(StringLike.scala:266)
> at scala.collection.immutable.StringOps.format(StringOps.scala:31)
> at 
> kafka.cluster.Partition.updateReplicaLogReadResult(Partition.scala:253)
> at 
> kafka.server.ReplicaManager$$anonfun$updateFollowerLogReadResults$2.apply(ReplicaManager.scala:791)
> at 
> kafka.server.ReplicaManager$$anonfun$updateFollowerLogReadResults$2.apply(ReplicaManager.scala:788)
> at scala.collection.immutable.Map$Map1.foreach(Map.scala:109)
> at 
> kafka.server.ReplicaManager.updateFollowerLogReadResults(ReplicaManager.scala:788)
> at kafka.server.ReplicaManager.fetchMessages(ReplicaManager.scala:433)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2293) IllegalFormatConversionException in Partition.scala

2015-06-22 Thread Aditya Auradkar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596298#comment-14596298
 ] 

Aditya Auradkar commented on KAFKA-2293:


[~junrao] Can you take a look at this minor fix?

> IllegalFormatConversionException in Partition.scala
> ---
>
> Key: KAFKA-2293
> URL: https://issues.apache.org/jira/browse/KAFKA-2293
> Project: Kafka
>  Issue Type: Bug
>Reporter: Aditya Auradkar
>Assignee: Aditya Auradkar
> Attachments: KAFKA-2293.patch
>
>
> ERROR [KafkaApis] [kafka-request-handler-9] [kafka-server] [] [KafkaApi-306] 
> error when handling request Name: 
> java.util.IllegalFormatConversionException: d != 
> kafka.server.LogOffsetMetadata
> at 
> java.util.Formatter$FormatSpecifier.failConversion(Formatter.java:4302)
> at 
> java.util.Formatter$FormatSpecifier.printInteger(Formatter.java:2793)
> at java.util.Formatter$FormatSpecifier.print(Formatter.java:2747)
> at java.util.Formatter.format(Formatter.java:2520)
> at java.util.Formatter.format(Formatter.java:2455)
> at java.lang.String.format(String.java:2925)
> at 
> scala.collection.immutable.StringLike$class.format(StringLike.scala:266)
> at scala.collection.immutable.StringOps.format(StringOps.scala:31)
> at 
> kafka.cluster.Partition.updateReplicaLogReadResult(Partition.scala:253)
> at 
> kafka.server.ReplicaManager$$anonfun$updateFollowerLogReadResults$2.apply(ReplicaManager.scala:791)
> at 
> kafka.server.ReplicaManager$$anonfun$updateFollowerLogReadResults$2.apply(ReplicaManager.scala:788)
> at scala.collection.immutable.Map$Map1.foreach(Map.scala:109)
> at 
> kafka.server.ReplicaManager.updateFollowerLogReadResults(ReplicaManager.scala:788)
> at kafka.server.ReplicaManager.fetchMessages(ReplicaManager.scala:433)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-2294) javadoc compile error due to illegal , build failing (jdk 8)

2015-06-22 Thread Jeremy Fields (JIRA)
Jeremy Fields created KAFKA-2294:


 Summary: javadoc compile error due to illegal  , build failing 
(jdk 8)
 Key: KAFKA-2294
 URL: https://issues.apache.org/jira/browse/KAFKA-2294
 Project: Kafka
  Issue Type: Bug
Reporter: Jeremy Fields


Quick one,

kafka/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java:525:
 error: self-closing element not allowed
 * 

This is causing build to fail under java 8 due to strict html checking.

Replace that  with 

Regards,



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2276) Initial patch for KIP-25

2015-06-22 Thread Geoffrey Anderson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596624#comment-14596624
 ] 

Geoffrey Anderson commented on KAFKA-2276:
--

Pull request is here: https://github.com/apache/kafka/pull/70

> Initial patch for KIP-25
> 
>
> Key: KAFKA-2276
> URL: https://issues.apache.org/jira/browse/KAFKA-2276
> Project: Kafka
>  Issue Type: Bug
>Reporter: Geoffrey Anderson
>Assignee: Geoffrey Anderson
>
> Submit initial patch for KIP-25 
> (https://cwiki.apache.org/confluence/display/KAFKA/KIP-25+-+System+test+improvements)
> This patch should contain a few Service classes and a few tests which can 
> serve as examples 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [GitHub] kafka pull request: Kafka 2276

2015-06-22 Thread Geoffrey Anderson
Hi,

I'm pinging the dev list regarding KAFKA-2276 (KIP-25 initial patch) again
since it sounds like at least one person I spoke with did not see the
initial pull request.

Pull request: https://github.com/apache/kafka/pull/70/
JIRA: https://issues.apache.org/jira/browse/KAFKA-2276

Thanks!
Geoff


On Tue, Jun 16, 2015 at 2:50 PM, granders  wrote:

> GitHub user granders opened a pull request:
>
> https://github.com/apache/kafka/pull/70
>
> Kafka 2276
>
> Initial patch for KIP-25
>
> Note that to install ducktape, do *not* use pip to install ducktape.
> Instead:
>
> ```
> $ git clone g...@github.com:confluentinc/ducktape.git
> $ cd ducktape
> $ python setup.py install
> ```
>
>
> You can merge this pull request into a Git repository by running:
>
> $ git pull https://github.com/confluentinc/kafka KAFKA-2276
>
> Alternatively you can review and apply these changes as the patch at:
>
> https://github.com/apache/kafka/pull/70.patch
>
> To close this pull request, make a commit to your master/trunk branch
> with (at least) the following in the commit message:
>
> This closes #70
>
> 
> commit 81e41562f3836e95e89e12f215c82b1b2d505381
> Author: Liquan Pei 
> Date:   2015-04-24T01:32:54Z
>
> Bootstrap Kafka system tests
>
> commit f1914c3ba9b52d0f8db3989c8b031127b42ac59e
> Author: Liquan Pei 
> Date:   2015-04-24T01:33:44Z
>
> Merge pull request #2 from confluentinc/system_tests
>
> Bootstrap Kafka system tests
>
> commit a2789885806f98dcd1fd58edc9a10a30e4bd314c
> Author: Geoff Anderson 
> Date:   2015-05-26T22:21:23Z
>
> fixed typos
>
> commit 07cd1c66a952ee29fc3c8e85464acb43a6981b8a
> Author: Geoff Anderson 
> Date:   2015-05-26T22:22:14Z
>
> Added simple producer which prints status of produced messages to
> stdout.
>
> commit da94b8cbe79e6634cc32fbe8f6deb25388923029
> Author: Geoff Anderson 
> Date:   2015-05-27T21:07:20Z
>
> Added number of messages option.
>
> commit 212b39a2d75027299fbb1b1008d463a82aab
> Author: Geoff Anderson 
> Date:   2015-05-27T22:35:06Z
>
> Added some metadata to producer output.
>
> commit 8b4b1f2aa9681632ef65aa92dfd3066cd7d62851
> Author: Geoff Anderson 
> Date:   2015-05-29T23:38:32Z
>
> Minor updates to VerboseProducer
>
> commit c0526fe44cea739519a0889ebe9ead01b406b365
> Author: Geoff Anderson 
> Date:   2015-06-01T02:27:15Z
>
> Updates per review comments.
>
> commit bc009f218e00241cbdd23931d01b52c442eef6b7
> Author: Geoff Anderson 
> Date:   2015-06-01T02:28:28Z
>
> Got rid of VerboseProducer in core (moved to clients)
>
> commit 475423bb642ac8f816e8080f891867a6362c17fa
> Author: Geoff Anderson 
> Date:   2015-06-01T04:05:09Z
>
> Convert class to string before adding to json object.
>
> commit 0a5de8e0590e3a8dce1a91769ad41497b5e07d17
> Author: Geoff Anderson 
> Date:   2015-06-02T22:46:52Z
>
> Fixed checkstyle errors. Changed name to VerifiableProducer. Added
> synchronization for thread safety on println statements.
>
> commit 9100417ce0717a71c822c5a279fe7858bfe7a7ee
> Author: Geoff Anderson 
> Date:   2015-06-03T19:50:11Z
>
> Updated command-line options for VerifiableProducer. Extracted
> throughput logic to make it reusable.
>
> commit 1228eefc4e52b58c214b3ad45feab36a475d5a66
> Author: Geoff Anderson 
> Date:   2015-06-04T01:09:14Z
>
> Renamed throttler
>
> commit 6842ed1ffad62a84df67a0f0b6a651a6df085d12
> Author: Geoff Anderson 
> Date:   2015-06-04T01:12:11Z
>
> left out a file from last commit
>
> commit d586fb0eb63409807c02f280fae786cec55fb348
> Author: Geoff Anderson 
> Date:   2015-06-04T01:22:34Z
>
> Updated comments to reflect that throttler is not message-specific
>
> commit a80a4282ba9a288edba7cdf409d31f01ebf3d458
> Author: Geoff Anderson 
> Date:   2015-06-04T20:47:21Z
>
> Added shell program for VerifiableProducer.
>
> commit 51a94fd6ece926bcdd864af353efcf4c4d1b8ad8
> Author: Geoff Anderson 
> Date:   2015-06-04T20:55:02Z
>
> Use argparse4j instead of joptsimple. ThroughputThrottler now has more
> intuitive behavior when targetThroughput is 0.
>
> commit 632be12d2384bfd1ed3b057913dfd363cab71726
> Author: Geoff 
> Date:   2015-06-04T22:22:44Z
>
> Merge pull request #3 from confluentinc/verbose-client
>
> Verbose client
>
> commit fc7c81c1f6cce497c19da34f7c452ee44800ab6d
> Author: Geoff Anderson 
> Date:   2015-06-11T01:01:39Z
>
> added setup.py
>
> commit 884b20e3a7ce7a94f22594782322e4366b51f7eb
> Author: Geoff Anderson 
> Date:   2015-06-11T01:02:11Z
>
> Moved a bunch of files to kafkatest directory
>
> commit 25a413d6ae938e9773eb2b20509760bab464
> Author: Geoff 
> Date:   2015-06-11T20:29:21Z
>
> Update aws-example-Vagrantfile.local
>
> commit 96533c3718a9285d78393fb453b951592c72a490
> Author: Geoff 
> Date:   2015-06-11T20:36:33Z
>
> Update aws-access-keys-commands
>
> commit e5edf031aeb99b9176a6ae8375963f2aedaaa6d7
> Author: Geoff Anderson 
> Date:   2015-06-12T00:27:49Z
>
> Updated example aws Vagrant

[jira] [Comment Edited] (KAFKA-2276) Initial patch for KIP-25

2015-06-22 Thread Geoffrey Anderson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596624#comment-14596624
 ] 

Geoffrey Anderson edited comment on KAFKA-2276 at 6/22/15 9:00 PM:
---

Pull request is here: https://github.com/apache/kafka/pull/70

More info:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-25+-+System+test+improvements
https://cwiki.apache.org/confluence/display/KAFKA/Roadmap+-+port+existing+system+tests


was (Author: granders):
Pull request is here: https://github.com/apache/kafka/pull/70

> Initial patch for KIP-25
> 
>
> Key: KAFKA-2276
> URL: https://issues.apache.org/jira/browse/KAFKA-2276
> Project: Kafka
>  Issue Type: Bug
>Reporter: Geoffrey Anderson
>Assignee: Geoffrey Anderson
>
> Submit initial patch for KIP-25 
> (https://cwiki.apache.org/confluence/display/KAFKA/KIP-25+-+System+test+improvements)
> This patch should contain a few Service classes and a few tests which can 
> serve as examples 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-26 - Add Copycat, a connector framework for data import/export

2015-06-22 Thread Jay Kreps
Hey Roshan,

That is definitely the key question in this space--what can we do that
other systems don't?

It's true that there are a number of systems that copy data between things.
At a high enough level of abstraction I suppose they are somewhat the same.
But I think this area is the source of rather a lot of pain for people
running these things so it is hard to imagine that the problem is totally
solved in the current state.

All the systems you mention are good, and a few we have even contributed to
so this is not to disparage anything.

Here are the advantages in what we are proposing:
1. Unlike sqoop and Camus this treats batch load as a special case of
continuous load (where the stream happens to be a bit bursty). I think this
is the right approach and enables real-time integration without giving up
the possibility of periodic dumps.
2.  We are trying to make it possible to capture and integrate the metadata
around schema with the data whenever possible. This is present and
something the connectors themselves have access to. I think this is a big
deal versus just delivering opaque byte[]/String rows, and is really
required for doing this kind of thing well at scale. This allows a lot of
simple filtering, projection, mapping, etc without custom code as well as
making it possible to start to have notions of compatibility and schema
evolution. We hope to make the byte[]/String case be kind of a special case
of the richer record model where you just have a simple schema.
3. This has a built in notion of parallelism throughout.
4. This maps well to Kafka. For people using Kafka I think basically
sharing a data model makes things a lot simpler (topics, partitions, etc).
This also makes it a lot easier to reason about guarantees.
5. Philosophically we are very committed to the idea of O(1) data loads,
which I think Gwen has more eloquently called the "factory model", and in
other context's I have heard described as Cattle not Pets. The idea being
that if you accept up front that you are going to have ~1000 data streams
in a company and dozens of sources and syncs the approach you take towards
this sort of stuff is radically different than if you assume a few inputs,
one output and a dozen data streams. I think this plays out in a bunch of
ways around management, configuration, etc.

Ultimately I think one thing we learned in thinking about the area is that
the system you come up with really comes down to what assumptions you make.

To address a few of your other points:
- We agree running in YARN is a good thing, but requiring YARN is a bad
thing. I think you may be seeing things somewhat from a Hadoop-centric view
where YARN is much more prevalent. However I think the scope of the problem
is not at all specific to Hadoop and beyond the Hadoop ecosystem we don't
see that heavy use of YARN (Mesos is more prevalent, but neither is
particularly common). I think our approach here is that copycat runs as a
process, if you run it in YARN it should work in Slider, if you run it in
Mesos in Marathon, and if you run it with old fashioned ops tools then you
just manage it like any other process.
- Exactly-once: Yes, but when we add that support in Kafka you will get it
end-to-end, which is important.
- I agree that all existing systems have more connectors--we are willing to
do the work to catch up there as we think it is possible to get to an
overall better state. I definitely agree this is significant work.

-Jay




On Fri, Jun 19, 2015 at 7:57 PM, Roshan Naik  wrote:

> My initial thoughts:
>
> Although it is kind of discussed very broadly, I did struggle a bit to
> properly grasp the value add this adds over the alternative approaches that
> are available today (or need a little work to accomplish) in specific use
> cases. I feel its better to take  specific common use cases and show why
> this will do better to make it clear. For example data flow starting from a
> pool of web server and finally end up in HDFS or Hive while providing
> At-least one guarantees.
>
> Below are more specific points that occurred to me:
>
> - Import: Today we can create data flows to pick up data from a variety of
> source and push data into Kafka using Flume. Not clear how this system can
> do better in this specific case.
> - Export: For pulling data out of Kakfa there is Camus (which limits
> destination to HDFS), Flume (which can deliver to many places) and also
> Sqoop (which could be extended to support Kafka). Camus and Sqoop don't
> have the problem of "requires defining many tasks" issue for parallelism.
> - YARN support – Letting YARN manage things  is actually good thing (not a
> bad thing as indicated), since its easier for the scaling in/out as needed
> and not worry too much about hardware allocation.
> - Exactly-Once:  It is clear that on the import side you won't support
> that for now. Not clear how you will support that on export side for
> destination like HDFS or some other. Exactly once only make sense when we

Re: Review Request 35734: Patch for KAFKA-2293

2015-06-22 Thread Joel Koshy

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35734/#review88866
---

Ship it!


Minor comment: would use `%s` with `TopicAndPartition` instead of `%s,%d`. I'll 
do this on check-in.

- Joel Koshy


On June 22, 2015, 5:35 p.m., Aditya Auradkar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/35734/
> ---
> 
> (Updated June 22, 2015, 5:35 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-2293
> https://issues.apache.org/jira/browse/KAFKA-2293
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> Fix for 2293
> 
> 
> Diffs
> -
> 
>   core/src/main/scala/kafka/cluster/Partition.scala 
> 6cb647711191aee8d36e9ff15bdc2af4f1c95457 
> 
> Diff: https://reviews.apache.org/r/35734/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Aditya Auradkar
> 
>



Re: [DISCUSS] KIP-26 - Add Copycat, a connector framework for data import/export

2015-06-22 Thread Jay Kreps
Hey Gwen,

That makes a lot of sense. Here was the thinking on our side.

I guess there are two questions, where does Copycat go and where do the
connectors go?

I'm in favor of Copycat being in Kafka and the connectors being federated.

Arguments for federating connectors:
- There will be like >> 100 connectors so if we keep them all in the same
repo it will be a lot.
- These plugin apis are a fantastic area for open source contribution--well
defined, bite sized, immediately useful, etc.
- If I wrote connector A I'm not particularly qualified to review connector
B. These things require basic Kafka knowledge but mostly they're very
system specific. Putting them all in one project ends up being kind of a
mess.
- Many people will have in-house systems that require custom connectors
anyway.
- You can't centrally maintain all the connectors so you need in any case
need to solve the whole "app store" experience for connectors (Ewen laughs
at me every time I say "app store for connectors"). Once you do that it
makes sense to just use the mechanism for everything.
- Many vendors we've talked to want to be able to maintain their own
connector and release it with their system not release it with Kafka or
another third party project.
- There is definitely a role for testing and certification of the
connectors but it's probably not something the main project should take on.

Federation doesn't necessarily mean that there can only be one repository
for each connector. We have a single repo for the connectors we're building
at confluent just for simplicity. It just means that regardless of where
the connector is maintained it integrates as a first-class citizen.

Basically I think really nailing federated connectors is pretty central to
having a healthy connector ecosystem which is the primary thing for making
this work.

Okay now the question of whether the copycat apis/framework should be in
Kafka or be an external project. We debated this a lot internally.

I was on the pro-kafka-inclusion side so let me give that argument. I think
the apis for pulling data into Kafka or pushing into a third party system
are actually really a core thing to what Kafka is. Kafka currently provides
a push producer and pull consumer because those are the harder problems to
solve, but about half the time you need the opposite (a pull producer and
push consumer). It feels weird to include any new thing, but I actually
feel like these apis are super central and natural to include in Kafka (in
fact they are so natural many other system only have that style of API).

I think the key question is whether we can do a good job at designing these
apis. If we can then we should really have an official set of apis. Having
official Kafka apis that are documented as part of the main docs and are
part of each release will do a ton to help foster the connector ecosystem
because it will be kind of a default way of doing Kaka integration and all
the people building in-house from-scratch connectors will likely just use
it. If it is a separate project then it is a separate discovery and
adoption decision (this is somewhat irrational but totally true).

I think one assumption we are making is that the copycat framework won't be
huge. It should be a manageable chunk of code.

I agree with your description of the some of the cons of bundling. However
I think there are pros as well and some of them are quite important.

The biggest is that for some reasons things that are maintained and
documented together end up feeling and working like a single product. This
is sort of a fuzzy thing. But one complaint I have about the Hadoop
ecosystem (and it is one of the more amazing products of open source in the
history of the world, so forgive the criticism) is that it FEELs like a
loosely affiliated collection of independent things kind of bolted
together. Products that are more centralized can give a much more holistic
feel to usage (configuration, commands, monitoring, etc) and things that
aren't somehow always drift apart (maybe just because the committers are
different).

So I actually totally agree with what you said about Spark. And if we end
up trying to include a machine learning library or anything far afield I
think I would agree we would have exactly that problem.

But I think the argument I would make is that this is actually a gap in our
existing product, not a new product and so having that identity is
important.

-Jay

On Sun, Jun 21, 2015 at 9:24 PM, Gwen Shapira  wrote:

> Ah, I see this in rejected alternatives now. Sorry :)
>
> I actually prefer the idea of a separate project for framework +
> connectors over having the framework be part of Apache Kafka.
>
> Looking at nearby examples: Hadoop has created a wide ecosystem of
> projects, with Sqoop and Flume supplying connectors. Spark on the
> other hand keeps its subprojects as part of Apache Spark.
>
> When I look at both projects, I see that Flume and Sqoop created
> active communities (that was es

Re: [DISCUSS] KIP-26 - Add Copycat, a connector framework for data import/export

2015-06-22 Thread Ewen Cheslack-Postava
I'll respond to specific comments, but at the bottom of this email I've
included some comparisons with other connector frameworks and Kafka
import/export tools. This definitely isn't an exhaustive list, but
hopefully will clarify how I'm thinking about Copycat should live wrt these
other systems.

Since Jay replied with 2 essays as I was writing this up, there may be some
duplication. Sorry for the verbosity...

@Roshan - The main gist is that by designing a framework around Kafka, we
don't have to generalize in a way that loses important features. Of the
systems you mentioned, the ones that are fairly general and have lots of
connectors don't offer the parallelism or semantics that could be achieved
(e.g. Flume) and the ones that have these benefits are almost all highly
specific to just one or two systems (e.g. Camus). Since Kafka is
increasingly becoming a central hub for streaming data (and buffer for
batch systems), one *common* system for integrating all these pieces is
pretty compelling.
Import: Flume is just one of many similar systems designed around log
collection. See notes below, but one major point is that they generally
don't provide any sort of guaranteed delivery semantics.
Export: Same deal here, you either get good delivery semantics and
parallelism for one system or a lot of connectors with very limited
guarantees. Copycat is intended to make it very easy to write connectors
for a variety of systems, get good (configurable!) delivery semantics,
parallelism, and work for a wide variety of systems (e.g. both batch and
streaming).
YARN: My point isn't that YARN is bad, it's that tying to any particular
cluster manager severely limits the applicability of the tool. The goal is
to make Copycat agnostic to the cluster manager so it can run under Mesos,
YARN, etc.
Exactly once: You accomplish this in any system by managing offsets in the
destination system atomically with the data or through some kind of
deduplication. Jiangjie actually just gave a great talk about this issue at
a recent Kafka meetup, perhaps he can share some slides about it. When you
see all the details involved, you'll see why I think it might be nice to
have the framework help you manage the complexities of achieving different
delivery semantics ;)
Connector variety: Addressed above.

@Jiangjie -
1. Yes, the expectation is that most coding is in the connectors. Ideally
the framework doesn't need many changes after we get the basics up and
running. But I'm not sure I understand what you mean about a library vs.
static framework?
2. This depends on packaging. We should at least have a separate jar, just
as we now do with clients. It's true that the tar.gz downloads would
contain both, but that probably makes sense since you need Kafka to do any
local testing with Copycat anyway, which you presumably want to do before
running any production jobs.

@Gwen -
I agree that the community around a project is really important. Some of
the issues you mentioned -- committership and dependencies -- are
definitely important considerations. The community aspect can easily make
or break something like Copycat. I think this is something Kafka needs to
address anyway (committership in particular, since committers are currently
overloaded).

One immediate benefit of including it in the same community is that it
starts out with a great, supportive community. We'd get to leverage all the
great existing Kafka knowledge of the community. It also means Copycat
patches are more likely to be seen by Kafka devs that can give helpful
reviews. I'll definitely agree that there are some drawbacks too -- joining
the mailing lists might be a bit overwhelming if you only wanted help w/
Copycat :)

Another benefit, not to be overlooked, is that it avoids a bunch of extra
overhead. Incubating an entire separate Apache project adds a *lot* of
overhead.

I also want to mention that the KIP specifically mentions that Copycat
should use public Kafka APIs, but I don't think this means development of
both should be decoupled. In particular, the distributed version of Copycat
needs functionality that is very closely related to functionality that
already exists in Kafka, some of which is exposed via public protocols
(worker membership needs to be tracked like consumers, worker assignments
have similar needs to consumer topic-partition assignments, offset commits
in Copycat are similar to consumer offset commits). It's hard to say if any
of that can be directly reused, but if it could, it could pay off in
spades. Even if not, since there are so many similar issues involved, it'd
be worth it just to leverage previous experience. Even though Copycat
should be cleanly separated from the main Kafka code (just as the clients
are now cleanly separated from the broker), I think they can likely benefit
from careful co-evolution that is more difficult to achieve if they really
are separate communities.

On docs, you're right that we could address that issue just by adding a few

Re: Review Request 34789: Patch for KAFKA-2168

2015-06-22 Thread Jason Gustafson

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34789/
---

(Updated June 22, 2015, 11:35 p.m.)


Review request for kafka.


Bugs: KAFKA-2168
https://issues.apache.org/jira/browse/KAFKA-2168


Repository: kafka


Description (updated)
---

KAFKA-2168; refactored callback handling to prevent unnecessary requests


KAFKA-2168; address review comments


KAFKA-2168; fix rebase error and checkstyle issue


KAFKA-2168; address review comments and add docs


KAFKA-2168; handle polling with timeout 0


KAFKA-2168; timeout=0 means return immediately


KAFKA-2168; address review comments


KAFKA-2168; address more review comments


KAFKA-2168; updated for review comments


KAFKA-2168; add serialVersionUID to ConsumerWakeupException


Diffs (updated)
-

  clients/src/main/java/org/apache/kafka/clients/consumer/Consumer.java 
8f587bc0705b65b3ef37c86e0c25bb43ab8803de 
  clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerRecords.java 
1ca75f83d3667f7d01da1ae2fd9488fb79562364 
  
clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerWakeupException.java
 PRE-CREATION 
  clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java 
951c34c92710fc4b38d656e99d2a41255c60aeb7 
  clients/src/main/java/org/apache/kafka/clients/consumer/MockConsumer.java 
f50da825756938c193d7f07bee953e000e2627d9 
  
clients/src/main/java/org/apache/kafka/clients/consumer/OffsetResetStrategy.java
 PRE-CREATION 
  
clients/src/main/java/org/apache/kafka/clients/consumer/internals/Coordinator.java
 41cb9458f51875ac9418fce52f264b35adba92f4 
  
clients/src/main/java/org/apache/kafka/clients/consumer/internals/Fetcher.java 
56281ee15cc33dfc96ff64d5b1e596047c7132a4 
  
clients/src/main/java/org/apache/kafka/clients/consumer/internals/Heartbeat.java
 e7cfaaad296fa6e325026a5eee1aaf9b9c0fe1fe 
  
clients/src/main/java/org/apache/kafka/clients/consumer/internals/RequestFuture.java
 PRE-CREATION 
  
clients/src/main/java/org/apache/kafka/clients/consumer/internals/SubscriptionState.java
 cee75410127dd1b86c1156563003216d93a086b3 
  clients/src/main/java/org/apache/kafka/common/utils/Utils.java 
f73eedb030987f018d8446bb1dcd98d19fa97331 
  clients/src/test/java/org/apache/kafka/clients/consumer/MockConsumerTest.java 
677edd385f35d4262342b567262c0b874876d25b 
  
clients/src/test/java/org/apache/kafka/clients/consumer/internals/CoordinatorTest.java
 1454ab73df22cce028f41f74b970628829da4e9d 
  
clients/src/test/java/org/apache/kafka/clients/consumer/internals/FetcherTest.java
 419541011d652becf0cda7a5e62ce813cddb1732 
  
clients/src/test/java/org/apache/kafka/clients/consumer/internals/HeartbeatTest.java
 ecc78cedf59a994fcf084fa7a458fe9ed5386b00 
  
clients/src/test/java/org/apache/kafka/clients/consumer/internals/SubscriptionStateTest.java
 e000cf8e10ebfacd6c9ee68d7b88ff8c157f73c6 
  clients/src/test/java/org/apache/kafka/common/utils/UtilsTest.java 
2ebe3c21f611dc133a2dbb8c7dfb0845f8c21498 

Diff: https://reviews.apache.org/r/34789/diff/


Testing
---


Thanks,

Jason Gustafson



[jira] [Commented] (KAFKA-2168) New consumer poll() can block other calls like position(), commit(), and close() indefinitely

2015-06-22 Thread Jason Gustafson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596850#comment-14596850
 ] 

Jason Gustafson commented on KAFKA-2168:


Updated reviewboard https://reviews.apache.org/r/34789/diff/
 against branch upstream/trunk

> New consumer poll() can block other calls like position(), commit(), and 
> close() indefinitely
> -
>
> Key: KAFKA-2168
> URL: https://issues.apache.org/jira/browse/KAFKA-2168
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients, consumer
>Reporter: Ewen Cheslack-Postava
>Assignee: Jason Gustafson
>Priority: Critical
> Fix For: 0.8.3
>
> Attachments: KAFKA-2168.patch, KAFKA-2168_2015-06-01_16:03:38.patch, 
> KAFKA-2168_2015-06-02_17:09:37.patch, KAFKA-2168_2015-06-03_18:20:23.patch, 
> KAFKA-2168_2015-06-03_21:06:42.patch, KAFKA-2168_2015-06-04_14:36:04.patch, 
> KAFKA-2168_2015-06-05_12:01:28.patch, KAFKA-2168_2015-06-05_12:44:48.patch, 
> KAFKA-2168_2015-06-11_14:09:59.patch, KAFKA-2168_2015-06-18_14:39:36.patch, 
> KAFKA-2168_2015-06-19_09:19:02.patch, KAFKA-2168_2015-06-22_16:34:37.patch
>
>
> The new consumer is currently using very coarse-grained synchronization. For 
> most methods this isn't a problem since they finish quickly once the lock is 
> acquired, but poll() might run for a long time (and commonly will since 
> polling with long timeouts is a normal use case). This means any operations 
> invoked from another thread may block until the poll() call completes.
> Some example use cases where this can be a problem:
> * A shutdown hook is registered to trigger shutdown and invokes close(). It 
> gets invoked from another thread and blocks indefinitely.
> * User wants to manage offset commit themselves in a background thread. If 
> the commit policy is not purely time based, it's not currently possibly to 
> make sure the call to commit() will be processed promptly.
> Two possible solutions to this:
> 1. Make sure a lock is not held during the actual select call. Since we have 
> multiple layers (KafkaConsumer -> NetworkClient -> Selector -> nio Selector) 
> this is probably hard to make work cleanly since locking is currently only 
> performed at the KafkaConsumer level and we'd want it unlocked around a 
> single line of code in Selector.
> 2. Wake up the selector before synchronizing for certain operations. This 
> would require some additional coordination to make sure the caller of 
> wakeup() is able to acquire the lock promptly (instead of, e.g., the poll() 
> thread being woken up and then promptly reacquiring the lock with a 
> subsequent long poll() call).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2168) New consumer poll() can block other calls like position(), commit(), and close() indefinitely

2015-06-22 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-2168:
---
Attachment: KAFKA-2168_2015-06-22_16:34:37.patch

> New consumer poll() can block other calls like position(), commit(), and 
> close() indefinitely
> -
>
> Key: KAFKA-2168
> URL: https://issues.apache.org/jira/browse/KAFKA-2168
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients, consumer
>Reporter: Ewen Cheslack-Postava
>Assignee: Jason Gustafson
>Priority: Critical
> Fix For: 0.8.3
>
> Attachments: KAFKA-2168.patch, KAFKA-2168_2015-06-01_16:03:38.patch, 
> KAFKA-2168_2015-06-02_17:09:37.patch, KAFKA-2168_2015-06-03_18:20:23.patch, 
> KAFKA-2168_2015-06-03_21:06:42.patch, KAFKA-2168_2015-06-04_14:36:04.patch, 
> KAFKA-2168_2015-06-05_12:01:28.patch, KAFKA-2168_2015-06-05_12:44:48.patch, 
> KAFKA-2168_2015-06-11_14:09:59.patch, KAFKA-2168_2015-06-18_14:39:36.patch, 
> KAFKA-2168_2015-06-19_09:19:02.patch, KAFKA-2168_2015-06-22_16:34:37.patch
>
>
> The new consumer is currently using very coarse-grained synchronization. For 
> most methods this isn't a problem since they finish quickly once the lock is 
> acquired, but poll() might run for a long time (and commonly will since 
> polling with long timeouts is a normal use case). This means any operations 
> invoked from another thread may block until the poll() call completes.
> Some example use cases where this can be a problem:
> * A shutdown hook is registered to trigger shutdown and invokes close(). It 
> gets invoked from another thread and blocks indefinitely.
> * User wants to manage offset commit themselves in a background thread. If 
> the commit policy is not purely time based, it's not currently possibly to 
> make sure the call to commit() will be processed promptly.
> Two possible solutions to this:
> 1. Make sure a lock is not held during the actual select call. Since we have 
> multiple layers (KafkaConsumer -> NetworkClient -> Selector -> nio Selector) 
> this is probably hard to make work cleanly since locking is currently only 
> performed at the KafkaConsumer level and we'd want it unlocked around a 
> single line of code in Selector.
> 2. Wake up the selector before synchronizing for certain operations. This 
> would require some additional coordination to make sure the caller of 
> wakeup() is able to acquire the lock promptly (instead of, e.g., the poll() 
> thread being woken up and then promptly reacquiring the lock with a 
> subsequent long poll() call).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 35655: Patch for KAFKA-2271

2015-06-22 Thread Ewen Cheslack-Postava

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35655/#review88877
---

Ship it!


LGTM. There were 9 question marks when 10 characters were requested, so the 
problem was probably just that a whitespace character at the start or end would 
get trimmed during AbstractConfig's parsing.

- Ewen Cheslack-Postava


On June 19, 2015, 4:48 p.m., Jason Gustafson wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/35655/
> ---
> 
> (Updated June 19, 2015, 4:48 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-2271
> https://issues.apache.org/jira/browse/KAFKA-2271
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> KAFKA-2271; fix minor test bugs
> 
> 
> Diffs
> -
> 
>   core/src/test/scala/unit/kafka/server/KafkaConfigConfigDefTest.scala 
> 98a5b042a710d3c1064b0379db1d152efc9eabee 
> 
> Diff: https://reviews.apache.org/r/35655/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Jason Gustafson
> 
>



Re: Review Request 35655: Patch for KAFKA-2271

2015-06-22 Thread Jason Gustafson


> On June 22, 2015, 11:43 p.m., Ewen Cheslack-Postava wrote:
> > LGTM. There were 9 question marks when 10 characters were requested, so the 
> > problem was probably just that a whitespace character at the start or end 
> > would get trimmed during AbstractConfig's parsing.

That's what I thought as well, but I was puzzled that I couldn't reproduce it. 
In fact, it looks like the issue was fixed with KAFKA-2249, which preserves the 
original properties that were used to construct the config. In that case, 
however, the assertion basically becomes a tautology, so perhaps we should just 
remove the test case?


- Jason


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35655/#review88877
---


On June 19, 2015, 4:48 p.m., Jason Gustafson wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/35655/
> ---
> 
> (Updated June 19, 2015, 4:48 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-2271
> https://issues.apache.org/jira/browse/KAFKA-2271
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> KAFKA-2271; fix minor test bugs
> 
> 
> Diffs
> -
> 
>   core/src/test/scala/unit/kafka/server/KafkaConfigConfigDefTest.scala 
> 98a5b042a710d3c1064b0379db1d152efc9eabee 
> 
> Diff: https://reviews.apache.org/r/35655/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Jason Gustafson
> 
>



Re: Review Request 35655: Patch for KAFKA-2271

2015-06-22 Thread Jason Gustafson


> On June 21, 2015, 5:49 a.m., Guozhang Wang wrote:
> > I think the main reason is that util.Random.nextString() may include other 
> > non-ascii chars and hence its toString may not be well-defined:
> > 
> > http://alvinalexander.com/scala/creating-random-strings-in-scala
> > 
> > So I think bottom-line is that we should not use util.Random.nextString to 
> > generate random strings. There are a couple of other places where it it 
> > used and I suggest we remove them as well.

Since Java uses utf-16 internally, nextString is virtually guaranteed to be 
non-ascii. But the problem was actually the additional space which was being 
trimmed. Nevertheless, we might want to avoid using non-ascii characters in 
assertions unles we can find a reasonable way to display them in test results. 
The other usages seem legitimate though. One of them explicitly expects 
non-ascii strings (ApiUtilsTest.testShortStringNonASCII), and the others just 
use them as arbitrary data of a certain length 
(OffsetCommitTest.testLargeMetadataPayload).


- Jason


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35655/#review88690
---


On June 19, 2015, 4:48 p.m., Jason Gustafson wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/35655/
> ---
> 
> (Updated June 19, 2015, 4:48 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-2271
> https://issues.apache.org/jira/browse/KAFKA-2271
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> KAFKA-2271; fix minor test bugs
> 
> 
> Diffs
> -
> 
>   core/src/test/scala/unit/kafka/server/KafkaConfigConfigDefTest.scala 
> 98a5b042a710d3c1064b0379db1d152efc9eabee 
> 
> Diff: https://reviews.apache.org/r/35655/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Jason Gustafson
> 
>



[jira] [Commented] (KAFKA-2205) Generalize TopicConfigManager to handle multiple entity configs

2015-06-22 Thread Aditya Auradkar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596965#comment-14596965
 ] 

Aditya Auradkar commented on KAFKA-2205:


[~junrao] - Can you review this patch?

> Generalize TopicConfigManager to handle multiple entity configs
> ---
>
> Key: KAFKA-2205
> URL: https://issues.apache.org/jira/browse/KAFKA-2205
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Aditya Auradkar
>Assignee: Aditya Auradkar
>  Labels: quotas
> Attachments: KAFKA-2205.patch
>
>
> Acceptance Criteria:
> - TopicConfigManager should be generalized to handle Topic and Client configs 
> (and any type of config in the future). As described in KIP-21
> - Add a ConfigCommand tool to change topic and client configuration



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 35655: Patch for KAFKA-2271

2015-06-22 Thread Ewen Cheslack-Postava


> On June 22, 2015, 11:43 p.m., Ewen Cheslack-Postava wrote:
> > LGTM. There were 9 question marks when 10 characters were requested, so the 
> > problem was probably just that a whitespace character at the start or end 
> > would get trimmed during AbstractConfig's parsing.
> 
> Jason Gustafson wrote:
> That's what I thought as well, but I was puzzled that I couldn't 
> reproduce it. In fact, it looks like the issue was fixed with KAFKA-2249, 
> which preserves the original properties that were used to construct the 
> config. In that case, however, the assertion basically becomes a tautology, 
> so perhaps we should just remove the test case?

That makes sense. I agree that this test doesn't seem all that useful anymore. 
I think all the old config classes had tests since they were all written 
manually. Since the configs are now defined more declaratively via 
AbstractConfig, these types of tests seem a lot less relevant. It doesn't look 
like we've generated tests for any other uses of AbstractConfig. Even other 
tests in that same class are just sort of redundant, e.g. testFromPropsInvalid. 
It is, I suppose checking that the type is set correctly in the ConfigDef, but 
mostly it's just retesting the common parsing functionality repeatedly.


- Ewen


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35655/#review88877
---


On June 19, 2015, 4:48 p.m., Jason Gustafson wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/35655/
> ---
> 
> (Updated June 19, 2015, 4:48 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-2271
> https://issues.apache.org/jira/browse/KAFKA-2271
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> KAFKA-2271; fix minor test bugs
> 
> 
> Diffs
> -
> 
>   core/src/test/scala/unit/kafka/server/KafkaConfigConfigDefTest.scala 
> 98a5b042a710d3c1064b0379db1d152efc9eabee 
> 
> Diff: https://reviews.apache.org/r/35655/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Jason Gustafson
> 
>



Re: [DISCUSS] KIP-26 - Add Copycat, a connector framework for data import/export

2015-06-22 Thread Roshan Naik
Thanks Jay and Ewen for the response.


>@Jay
>
> 3. This has a built in notion of parallelism throughout.



It was not obvious how it will look like or differ from existing systemsŠ
since all of existing ones do parallelize data movement.


@Ewen,

>Import: Flume is just one of many similar systems designed around log
>collection. See notes below, but one major point is that they generally
>don't provide any sort of guaranteed delivery semantics.


I think most of them do provide guarantees of some sort (Ex. Flume &
FluentD). 


>YARN: My point isn't that YARN is bad, it's that tying to any particular
>cluster manager severely limits the applicability of the tool. The goal is
>to make Copycat agnostic to the cluster manager so it can run under Mesos,
>YARN, etc.

ok. Got it. Sounds like there is plan to do some work here to ensure
out-of-the-box it works with more than one scheduler (as @Jay listed out).
In that case, IMO it would be better to actually rephrase it in the KIP
that it will support more than one scheduler.


>Exactly once: You accomplish this in any system by managing offsets in the
>destination system atomically with the data or through some kind of
>deduplication. Jiangjie actually just gave a great talk about this issue
>at
>a recent Kafka meetup, perhaps he can share some slides about it. When you
>see all the details involved, you'll see why I think it might be nice to
>have the framework help you manage the complexities of achieving different
>delivery semantics ;)


Deduplication as a post processing step is a common recommendation done
today Š but that is a workaround/fix for the inability to provide
exactly-once by the delivery systems. IMO such post processing should not
be considered part of the "exacty-once" guarantee of Copycat.


Will be good to know how this guarantee will be possible when delivering
to HDFS.
Would be great if someone can share those slides if it is discussed there.




Was looking for clarification on this ..
- Export side - is this like a map reduce kind of job or something else ?
If delivering to hdfs would this be running on the hadoop cluster or
outside ?
- Import side - how does this look ? Is it a bunch of flume like processes
? maybe just some kind of a broker that translates the incoming protocol
into outgoing Kafka producer api protocol ? If delivering to hdfs, will
this run on the cluster or outside ?


I still think adding one or two specific end-to-end use-cases in the KIP,
showing how copycat will pan out for them for import/export will really
clarify things.





Re: Review Request 34789: Patch for KAFKA-2168

2015-06-22 Thread Jun Rao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34789/#review88914
---

Ship it!


Thanks for the latest patch. Looks good overall. To avoid holding to this 
relative large patch for too long, I am committed the latest patch to trunk. 
There are a few minor comments below and we can commit any necessary fix in a 
follow up patch.


clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java 
(lines 335 - 338)


These seem redundant give the code below.



clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java 
(line 420)


Should this be volatile so that different threads can see the latest value 
of refcount?



clients/src/main/java/org/apache/kafka/clients/consumer/internals/Coordinator.java
 (line 319)


What's the logic to initiate connection to coordinator if the coordinator 
is not available during HB?


- Jun Rao


On June 22, 2015, 11:35 p.m., Jason Gustafson wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34789/
> ---
> 
> (Updated June 22, 2015, 11:35 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-2168
> https://issues.apache.org/jira/browse/KAFKA-2168
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> KAFKA-2168; refactored callback handling to prevent unnecessary requests
> 
> 
> KAFKA-2168; address review comments
> 
> 
> KAFKA-2168; fix rebase error and checkstyle issue
> 
> 
> KAFKA-2168; address review comments and add docs
> 
> 
> KAFKA-2168; handle polling with timeout 0
> 
> 
> KAFKA-2168; timeout=0 means return immediately
> 
> 
> KAFKA-2168; address review comments
> 
> 
> KAFKA-2168; address more review comments
> 
> 
> KAFKA-2168; updated for review comments
> 
> 
> KAFKA-2168; add serialVersionUID to ConsumerWakeupException
> 
> 
> Diffs
> -
> 
>   clients/src/main/java/org/apache/kafka/clients/consumer/Consumer.java 
> 8f587bc0705b65b3ef37c86e0c25bb43ab8803de 
>   
> clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerRecords.java 
> 1ca75f83d3667f7d01da1ae2fd9488fb79562364 
>   
> clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerWakeupException.java
>  PRE-CREATION 
>   clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java 
> 951c34c92710fc4b38d656e99d2a41255c60aeb7 
>   clients/src/main/java/org/apache/kafka/clients/consumer/MockConsumer.java 
> f50da825756938c193d7f07bee953e000e2627d9 
>   
> clients/src/main/java/org/apache/kafka/clients/consumer/OffsetResetStrategy.java
>  PRE-CREATION 
>   
> clients/src/main/java/org/apache/kafka/clients/consumer/internals/Coordinator.java
>  41cb9458f51875ac9418fce52f264b35adba92f4 
>   
> clients/src/main/java/org/apache/kafka/clients/consumer/internals/Fetcher.java
>  56281ee15cc33dfc96ff64d5b1e596047c7132a4 
>   
> clients/src/main/java/org/apache/kafka/clients/consumer/internals/Heartbeat.java
>  e7cfaaad296fa6e325026a5eee1aaf9b9c0fe1fe 
>   
> clients/src/main/java/org/apache/kafka/clients/consumer/internals/RequestFuture.java
>  PRE-CREATION 
>   
> clients/src/main/java/org/apache/kafka/clients/consumer/internals/SubscriptionState.java
>  cee75410127dd1b86c1156563003216d93a086b3 
>   clients/src/main/java/org/apache/kafka/common/utils/Utils.java 
> f73eedb030987f018d8446bb1dcd98d19fa97331 
>   
> clients/src/test/java/org/apache/kafka/clients/consumer/MockConsumerTest.java 
> 677edd385f35d4262342b567262c0b874876d25b 
>   
> clients/src/test/java/org/apache/kafka/clients/consumer/internals/CoordinatorTest.java
>  1454ab73df22cce028f41f74b970628829da4e9d 
>   
> clients/src/test/java/org/apache/kafka/clients/consumer/internals/FetcherTest.java
>  419541011d652becf0cda7a5e62ce813cddb1732 
>   
> clients/src/test/java/org/apache/kafka/clients/consumer/internals/HeartbeatTest.java
>  ecc78cedf59a994fcf084fa7a458fe9ed5386b00 
>   
> clients/src/test/java/org/apache/kafka/clients/consumer/internals/SubscriptionStateTest.java
>  e000cf8e10ebfacd6c9ee68d7b88ff8c157f73c6 
>   clients/src/test/java/org/apache/kafka/common/utils/UtilsTest.java 
> 2ebe3c21f611dc133a2dbb8c7dfb0845f8c21498 
> 
> Diff: https://reviews.apache.org/r/34789/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Jason Gustafson
> 
>



Re: [GitHub] kafka pull request: Kafka 2276

2015-06-22 Thread Gwen Shapira
Thanks, I indeed missed the original :)

Is the plan to squash the commits and merge a pull request with single
commit that matches the JIRA #?
This will be more in line with how commits were organized until now
and will make life much easier when cherry-picking.

Gwen

On Mon, Jun 22, 2015 at 1:58 PM, Geoffrey Anderson  wrote:
> Hi,
>
> I'm pinging the dev list regarding KAFKA-2276 (KIP-25 initial patch) again
> since it sounds like at least one person I spoke with did not see the
> initial pull request.
>
> Pull request: https://github.com/apache/kafka/pull/70/
> JIRA: https://issues.apache.org/jira/browse/KAFKA-2276
>
> Thanks!
> Geoff
>
>
> On Tue, Jun 16, 2015 at 2:50 PM, granders  wrote:
>
>> GitHub user granders opened a pull request:
>>
>> https://github.com/apache/kafka/pull/70
>>
>> Kafka 2276
>>
>> Initial patch for KIP-25
>>
>> Note that to install ducktape, do *not* use pip to install ducktape.
>> Instead:
>>
>> ```
>> $ git clone g...@github.com:confluentinc/ducktape.git
>> $ cd ducktape
>> $ python setup.py install
>> ```
>>
>>
>> You can merge this pull request into a Git repository by running:
>>
>> $ git pull https://github.com/confluentinc/kafka KAFKA-2276
>>
>> Alternatively you can review and apply these changes as the patch at:
>>
>> https://github.com/apache/kafka/pull/70.patch
>>
>> To close this pull request, make a commit to your master/trunk branch
>> with (at least) the following in the commit message:
>>
>> This closes #70
>>
>> 
>> commit 81e41562f3836e95e89e12f215c82b1b2d505381
>> Author: Liquan Pei 
>> Date:   2015-04-24T01:32:54Z
>>
>> Bootstrap Kafka system tests
>>
>> commit f1914c3ba9b52d0f8db3989c8b031127b42ac59e
>> Author: Liquan Pei 
>> Date:   2015-04-24T01:33:44Z
>>
>> Merge pull request #2 from confluentinc/system_tests
>>
>> Bootstrap Kafka system tests
>>
>> commit a2789885806f98dcd1fd58edc9a10a30e4bd314c
>> Author: Geoff Anderson 
>> Date:   2015-05-26T22:21:23Z
>>
>> fixed typos
>>
>> commit 07cd1c66a952ee29fc3c8e85464acb43a6981b8a
>> Author: Geoff Anderson 
>> Date:   2015-05-26T22:22:14Z
>>
>> Added simple producer which prints status of produced messages to
>> stdout.
>>
>> commit da94b8cbe79e6634cc32fbe8f6deb25388923029
>> Author: Geoff Anderson 
>> Date:   2015-05-27T21:07:20Z
>>
>> Added number of messages option.
>>
>> commit 212b39a2d75027299fbb1b1008d463a82aab
>> Author: Geoff Anderson 
>> Date:   2015-05-27T22:35:06Z
>>
>> Added some metadata to producer output.
>>
>> commit 8b4b1f2aa9681632ef65aa92dfd3066cd7d62851
>> Author: Geoff Anderson 
>> Date:   2015-05-29T23:38:32Z
>>
>> Minor updates to VerboseProducer
>>
>> commit c0526fe44cea739519a0889ebe9ead01b406b365
>> Author: Geoff Anderson 
>> Date:   2015-06-01T02:27:15Z
>>
>> Updates per review comments.
>>
>> commit bc009f218e00241cbdd23931d01b52c442eef6b7
>> Author: Geoff Anderson 
>> Date:   2015-06-01T02:28:28Z
>>
>> Got rid of VerboseProducer in core (moved to clients)
>>
>> commit 475423bb642ac8f816e8080f891867a6362c17fa
>> Author: Geoff Anderson 
>> Date:   2015-06-01T04:05:09Z
>>
>> Convert class to string before adding to json object.
>>
>> commit 0a5de8e0590e3a8dce1a91769ad41497b5e07d17
>> Author: Geoff Anderson 
>> Date:   2015-06-02T22:46:52Z
>>
>> Fixed checkstyle errors. Changed name to VerifiableProducer. Added
>> synchronization for thread safety on println statements.
>>
>> commit 9100417ce0717a71c822c5a279fe7858bfe7a7ee
>> Author: Geoff Anderson 
>> Date:   2015-06-03T19:50:11Z
>>
>> Updated command-line options for VerifiableProducer. Extracted
>> throughput logic to make it reusable.
>>
>> commit 1228eefc4e52b58c214b3ad45feab36a475d5a66
>> Author: Geoff Anderson 
>> Date:   2015-06-04T01:09:14Z
>>
>> Renamed throttler
>>
>> commit 6842ed1ffad62a84df67a0f0b6a651a6df085d12
>> Author: Geoff Anderson 
>> Date:   2015-06-04T01:12:11Z
>>
>> left out a file from last commit
>>
>> commit d586fb0eb63409807c02f280fae786cec55fb348
>> Author: Geoff Anderson 
>> Date:   2015-06-04T01:22:34Z
>>
>> Updated comments to reflect that throttler is not message-specific
>>
>> commit a80a4282ba9a288edba7cdf409d31f01ebf3d458
>> Author: Geoff Anderson 
>> Date:   2015-06-04T20:47:21Z
>>
>> Added shell program for VerifiableProducer.
>>
>> commit 51a94fd6ece926bcdd864af353efcf4c4d1b8ad8
>> Author: Geoff Anderson 
>> Date:   2015-06-04T20:55:02Z
>>
>> Use argparse4j instead of joptsimple. ThroughputThrottler now has more
>> intuitive behavior when targetThroughput is 0.
>>
>> commit 632be12d2384bfd1ed3b057913dfd363cab71726
>> Author: Geoff 
>> Date:   2015-06-04T22:22:44Z
>>
>> Merge pull request #3 from confluentinc/verbose-client
>>
>> Verbose client
>>
>> commit fc7c81c1f6cce497c19da34f7c452ee44800ab6d
>> Author: Geoff Anderson 
>> Date:   2015-06-11T01:01:39Z
>>
>> added setup.py
>>
>> commit 884b20e3a7ce7a94f22594782322e4366b51f7eb
>> Author: Geoff Anderson 
>> D