Re: librdkafka 0.8.0 released

2013-11-25 Thread Magnus Edenhill
The following tests were using a single producer application
(examples/rdkafka_performance):

* Test1: 2 brokers, 2 partitions, required.acks=2, 100 byte messages:
85 messages/second, 85 MB/second

* Test2: 1 broker, 1 partition, required.acks=0, 100 byte messages: 71
messages/second, 71 MB/second

* Test3: 2 broker2, 2 partitions, required.acks=2, 100 byte messages,
snappy compression: 30 messages/second, 30 MB/second

* Test4: 2 broker2, 2 partitions, required.acks=2, 100 byte messages, gzip
compression: 23 messages/second, 23 MB/second


log.flush broker configuration was increased to avoid the disk being the
bottleneck.


/Magnus



2013/11/24 Neha Narkhede neha.narkh...@gmail.com

 So, a single producer'a throughput is 80 MB/s? That seems pretty high. What
 was the number of acks setting? Thanks for sharing these numbers.

 On Sunday, November 24, 2013, Magnus Edenhill wrote:

  Hi Neha,
 
  these tests were done using 100 byte messages. More information about the
  producer performance tests can be found here:
 
 
 https://github.com/edenhill/librdkafka/blob/master/INTRODUCTION.md#performance-numbers
 
  The tests are indicative at best and in no way scientific, but I must say
  that the Kafka broker performance is impressive.
 
  Regards,
  Magnus
 
 
 
  2013/11/22 Neha Narkhede neha.narkh...@gmail.com javascript:;
 
   Thanks for sharing this! What is the message size for the throughput
   numbers stated below?
  
   Thanks,
   Neha
   On Nov 22, 2013 6:59 AM, Magnus Edenhill mag...@edenhill.se
 javascript:;
  wrote:
  
This announces the 0.8.0 release of librdkafka - The Apache Kafka
  client
   C
library - now with 0.8 protocol support.
   
Features:
* Producer (~800K msgs/s)
* Consumer  (~3M msgs/s)
* Compression (Snappy, gzip)
* Proper failover and leader re-election support - no message is ever
   lost.
* Configuration properties compatible with official Apache Kafka.
* Stabilized ABI-safe API
* Mainline Debian package submitted
* Production quality
   
   
Home:
https://github.com/edenhill/librdkafka
   
Introduction and performance numbers:
https://github.com/edenhill/librdkafka/blob/master/INTRODUCTION.md
   
Have fun.
   
Regards,
Magnus
   
P.S.
Check out Wikimedia Foundation's varnishkafka daemon for a use case -
varnish log forwarding over Kafka:
https://github.com/wikimedia/varnishkafka
   
  
 



[jira] [Created] (KAFKA-1144) commitOffsets can be passed the offsets to commit

2013-11-25 Thread Imran Rashid (JIRA)
Imran Rashid created KAFKA-1144:
---

 Summary: commitOffsets can be passed the offsets to commit
 Key: KAFKA-1144
 URL: https://issues.apache.org/jira/browse/KAFKA-1144
 Project: Kafka
  Issue Type: Improvement
  Components: consumer
Reporter: Imran Rashid
Assignee: Neha Narkhede




This adds another version of commitOffsets that takes the offsets to commit as 
a parameter.

Without this change, getting correct user code is very hard. Despite kafka's 
at-least-once guarantees, most user code doesn't actually have that guarantee, 
and is almost certainly wrong if doing batch processing. Getting it right 
requires some very careful synchronization between all consumer threads, which 
is both:
1) painful to get right
2) slow b/c of the need to stop all workers during a commit.

This small change simplifies a lot of this. This was discussed extensively on 
the user mailing list, on the thread are kafka consumer apps guaranteed to see 
msgs at least once?

You can also see an example implementation of a user api which makes use of 
this, to get proper at-least-once guarantees by user code, even for batches:
https://github.com/quantifind/kafka-utils/pull/1

I'm open to any suggestions on how to add unit tests for this.




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (KAFKA-1144) commitOffsets can be passed the offsets to commit

2013-11-25 Thread Imran Rashid (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Imran Rashid updated KAFKA-1144:


Attachment: 0002-add-protection-against-backward-commits.patch
0001-allow-committing-of-arbitrary-offsets-to-facilitate-.patch

 commitOffsets can be passed the offsets to commit
 -

 Key: KAFKA-1144
 URL: https://issues.apache.org/jira/browse/KAFKA-1144
 Project: Kafka
  Issue Type: Improvement
  Components: consumer
Affects Versions: 0.8
Reporter: Imran Rashid
Assignee: Neha Narkhede
 Attachments: 
 0001-allow-committing-of-arbitrary-offsets-to-facilitate-.patch, 
 0002-add-protection-against-backward-commits.patch


 This adds another version of commitOffsets that takes the offsets to commit 
 as a parameter.
 Without this change, getting correct user code is very hard. Despite kafka's 
 at-least-once guarantees, most user code doesn't actually have that 
 guarantee, and is almost certainly wrong if doing batch processing. Getting 
 it right requires some very careful synchronization between all consumer 
 threads, which is both:
 1) painful to get right
 2) slow b/c of the need to stop all workers during a commit.
 This small change simplifies a lot of this. This was discussed extensively on 
 the user mailing list, on the thread are kafka consumer apps guaranteed to 
 see msgs at least once?
 You can also see an example implementation of a user api which makes use of 
 this, to get proper at-least-once guarantees by user code, even for batches:
 https://github.com/quantifind/kafka-utils/pull/1
 I'm open to any suggestions on how to add unit tests for this.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (KAFKA-1144) commitOffsets can be passed the offsets to commit

2013-11-25 Thread Imran Rashid (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Imran Rashid updated KAFKA-1144:


Affects Version/s: 0.8
   Status: Patch Available  (was: Open)

 commitOffsets can be passed the offsets to commit
 -

 Key: KAFKA-1144
 URL: https://issues.apache.org/jira/browse/KAFKA-1144
 Project: Kafka
  Issue Type: Improvement
  Components: consumer
Affects Versions: 0.8
Reporter: Imran Rashid
Assignee: Neha Narkhede
 Attachments: 
 0001-allow-committing-of-arbitrary-offsets-to-facilitate-.patch, 
 0002-add-protection-against-backward-commits.patch


 This adds another version of commitOffsets that takes the offsets to commit 
 as a parameter.
 Without this change, getting correct user code is very hard. Despite kafka's 
 at-least-once guarantees, most user code doesn't actually have that 
 guarantee, and is almost certainly wrong if doing batch processing. Getting 
 it right requires some very careful synchronization between all consumer 
 threads, which is both:
 1) painful to get right
 2) slow b/c of the need to stop all workers during a commit.
 This small change simplifies a lot of this. This was discussed extensively on 
 the user mailing list, on the thread are kafka consumer apps guaranteed to 
 see msgs at least once?
 You can also see an example implementation of a user api which makes use of 
 this, to get proper at-least-once guarantees by user code, even for batches:
 https://github.com/quantifind/kafka-utils/pull/1
 I'm open to any suggestions on how to add unit tests for this.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


kafka pull request: commitOffsets can be passed the offsets to commit

2013-11-25 Thread squito
Github user squito closed the pull request at:

https://github.com/apache/kafka/pull/10



Re: librdkafka 0.8.0 released

2013-11-25 Thread Jun Rao
Thanks for sharing the results. Was the topic created with replication
factor of 2? Could you test acks=-1 as well?

Thanks,

Jun


On Mon, Nov 25, 2013 at 4:30 AM, Magnus Edenhill mag...@edenhill.se wrote:

 The following tests were using a single producer application
 (examples/rdkafka_performance):

 * Test1: 2 brokers, 2 partitions, required.acks=2, 100 byte messages:
 85 messages/second, 85 MB/second

 * Test2: 1 broker, 1 partition, required.acks=0, 100 byte messages: 71
 messages/second, 71 MB/second

 * Test3: 2 broker2, 2 partitions, required.acks=2, 100 byte messages,
 snappy compression: 30 messages/second, 30 MB/second

 * Test4: 2 broker2, 2 partitions, required.acks=2, 100 byte messages, gzip
 compression: 23 messages/second, 23 MB/second


 log.flush broker configuration was increased to avoid the disk being the
 bottleneck.


 /Magnus



 2013/11/24 Neha Narkhede neha.narkh...@gmail.com

  So, a single producer'a throughput is 80 MB/s? That seems pretty high.
 What
  was the number of acks setting? Thanks for sharing these numbers.
 
  On Sunday, November 24, 2013, Magnus Edenhill wrote:
 
   Hi Neha,
  
   these tests were done using 100 byte messages. More information about
 the
   producer performance tests can be found here:
  
  
 
 https://github.com/edenhill/librdkafka/blob/master/INTRODUCTION.md#performance-numbers
  
   The tests are indicative at best and in no way scientific, but I must
 say
   that the Kafka broker performance is impressive.
  
   Regards,
   Magnus
  
  
  
   2013/11/22 Neha Narkhede neha.narkh...@gmail.com javascript:;
  
Thanks for sharing this! What is the message size for the throughput
numbers stated below?
   
Thanks,
Neha
On Nov 22, 2013 6:59 AM, Magnus Edenhill mag...@edenhill.se
  javascript:;
   wrote:
   
 This announces the 0.8.0 release of librdkafka - The Apache Kafka
   client
C
 library - now with 0.8 protocol support.

 Features:
 * Producer (~800K msgs/s)
 * Consumer  (~3M msgs/s)
 * Compression (Snappy, gzip)
 * Proper failover and leader re-election support - no message is
 ever
lost.
 * Configuration properties compatible with official Apache Kafka.
 * Stabilized ABI-safe API
 * Mainline Debian package submitted
 * Production quality


 Home:
 https://github.com/edenhill/librdkafka

 Introduction and performance numbers:
 https://github.com/edenhill/librdkafka/blob/master/INTRODUCTION.md

 Have fun.

 Regards,
 Magnus

 P.S.
 Check out Wikimedia Foundation's varnishkafka daemon for a use
 case -
 varnish log forwarding over Kafka:
 https://github.com/wikimedia/varnishkafka

   
  
 



[jira] [Commented] (KAFKA-1144) commitOffsets can be passed the offsets to commit

2013-11-25 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13831608#comment-13831608
 ] 

Jun Rao commented on KAFKA-1144:


Thanks for the patch. A couple of things.

1. We have a jira that tries to move the storage of offsets off ZK 
(https://issues.apache.org/jira/browse/KAFKA-1000) since ZK is not really 
designed for that. So, we may not be able to do conditional updates for offsets 
in the future.

2. We will be rewriting the consumer client 
(https://cwiki.apache.org/confluence/display/KAFKA/Client+Rewrite). In the new 
api, we will add a callback during consumer rebalances. Do you think that 
addresses your issue as well?


 commitOffsets can be passed the offsets to commit
 -

 Key: KAFKA-1144
 URL: https://issues.apache.org/jira/browse/KAFKA-1144
 Project: Kafka
  Issue Type: Improvement
  Components: consumer
Affects Versions: 0.8
Reporter: Imran Rashid
Assignee: Neha Narkhede
 Attachments: 
 0001-allow-committing-of-arbitrary-offsets-to-facilitate-.patch, 
 0002-add-protection-against-backward-commits.patch


 This adds another version of commitOffsets that takes the offsets to commit 
 as a parameter.
 Without this change, getting correct user code is very hard. Despite kafka's 
 at-least-once guarantees, most user code doesn't actually have that 
 guarantee, and is almost certainly wrong if doing batch processing. Getting 
 it right requires some very careful synchronization between all consumer 
 threads, which is both:
 1) painful to get right
 2) slow b/c of the need to stop all workers during a commit.
 This small change simplifies a lot of this. This was discussed extensively on 
 the user mailing list, on the thread are kafka consumer apps guaranteed to 
 see msgs at least once?
 You can also see an example implementation of a user api which makes use of 
 this, to get proper at-least-once guarantees by user code, even for batches:
 https://github.com/quantifind/kafka-utils/pull/1
 I'm open to any suggestions on how to add unit tests for this.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (KAFKA-1144) commitOffsets can be passed the offsets to commit

2013-11-25 Thread Imran Rashid (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13831658#comment-13831658
 ] 

Imran Rashid commented on KAFKA-1144:
-

Thanks for quick response, Jun.

(1) is unfortunate, though it doesn't technically break this.  It just 
increases the number of messages that would get processed multiple times on a 
rebalance + crash (I'm pretty sure thats the only way you could end up with a 
lasting backwards commit without the conditional update).  I thought the move 
off of zk wasn't until 0.9, sorry -- I think this patch can go forward w/ out 
the conditional update.  (should I update the patches to remove it?  or just 
leave it in, and then it will go away when there is no more zk?)

(2) Notification on rebalances does not eliminate the desire for this patch.  
(It would, however, eliminate the need for conditional updates!)  Even without 
rebalances, with the current api, you really need to stop all worker threads 
before doing a commit if you want to guarantee that your app has seen all the 
messages.  This is especially true w/ batch processing.

Again, the patch isn't necessary, but its a small change that makes it sooo 
much easier to get user code right, not to mention more efficient.

Maybe other changes in the 0.9 api will make this unnecessary, I dunno.  but I 
think this is useful for 0.8 in the meantime.  And I'd hope the client rewrite 
would also make it easy to write batch consumers, like the api I put together 
in the other repo.  (I'd happily submit that directly to kafka, if it was 
desired, though its very scala-y, and I guess the user api is going to be 
java-only?)

 commitOffsets can be passed the offsets to commit
 -

 Key: KAFKA-1144
 URL: https://issues.apache.org/jira/browse/KAFKA-1144
 Project: Kafka
  Issue Type: Improvement
  Components: consumer
Affects Versions: 0.8
Reporter: Imran Rashid
Assignee: Neha Narkhede
 Attachments: 
 0001-allow-committing-of-arbitrary-offsets-to-facilitate-.patch, 
 0002-add-protection-against-backward-commits.patch


 This adds another version of commitOffsets that takes the offsets to commit 
 as a parameter.
 Without this change, getting correct user code is very hard. Despite kafka's 
 at-least-once guarantees, most user code doesn't actually have that 
 guarantee, and is almost certainly wrong if doing batch processing. Getting 
 it right requires some very careful synchronization between all consumer 
 threads, which is both:
 1) painful to get right
 2) slow b/c of the need to stop all workers during a commit.
 This small change simplifies a lot of this. This was discussed extensively on 
 the user mailing list, on the thread are kafka consumer apps guaranteed to 
 see msgs at least once?
 You can also see an example implementation of a user api which makes use of 
 this, to get proper at-least-once guarantees by user code, even for batches:
 https://github.com/quantifind/kafka-utils/pull/1
 I'm open to any suggestions on how to add unit tests for this.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Resolved] (KAFKA-394) update site with steps and notes for doing a release under developer

2013-11-25 Thread Joe Stein (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein resolved KAFKA-394.
-

Resolution: Fixed

https://cwiki.apache.org/confluence/display/KAFKA/Release+Process

 update site with steps and notes for doing a release under developer
 

 Key: KAFKA-394
 URL: https://issues.apache.org/jira/browse/KAFKA-394
 Project: Kafka
  Issue Type: Bug
Reporter: Joe Stein
Assignee: Joe Stein

 steps in release process including updating the dist directory



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] Subscription: outstanding kafka patches

2013-11-25 Thread jira
Issue Subscription
Filter: outstanding kafka patches (75 issues)
The list of outstanding kafka patches
Subscriber: kafka-mailing-list

Key Summary
KAFKA-1144  commitOffsets can be passed the offsets to commit
https://issues.apache.org/jira/browse/KAFKA-1144
KAFKA-1142  Patch review tool should take diff with origin from last divergent 
point
https://issues.apache.org/jira/browse/KAFKA-1142
KAFKA-1140  Move the decoding logic from ConsumerIterator.makeNext to next
https://issues.apache.org/jira/browse/KAFKA-1140
KAFKA-1130  log.dirs is a confusing property name
https://issues.apache.org/jira/browse/KAFKA-1130
KAFKA-1116  Need to upgrade sbt-assembly to compile on scala 2.10.2
https://issues.apache.org/jira/browse/KAFKA-1116
KAFKA-1110  Unable to produce messages with snappy/gzip compression
https://issues.apache.org/jira/browse/KAFKA-1110
KAFKA-1109  Need to fix GC log configuration code, not able to override 
KAFKA_GC_LOG_OPTS
https://issues.apache.org/jira/browse/KAFKA-1109
KAFKA-1106  HighwaterMarkCheckpoint failure puting broker into a bad state
https://issues.apache.org/jira/browse/KAFKA-1106
KAFKA-1093  Log.getOffsetsBefore(t, …) does not return the last confirmed 
offset before t
https://issues.apache.org/jira/browse/KAFKA-1093
KAFKA-1086  Improve GetOffsetShell to find metadata automatically
https://issues.apache.org/jira/browse/KAFKA-1086
KAFKA-1082  zkclient dies after UnknownHostException in zk reconnect
https://issues.apache.org/jira/browse/KAFKA-1082
KAFKA-1079  Liars in PrimitiveApiTest that promise to test api in compression 
mode, but don't do this actually
https://issues.apache.org/jira/browse/KAFKA-1079
KAFKA-1074  Reassign partitions should delete the old replicas from disk
https://issues.apache.org/jira/browse/KAFKA-1074
KAFKA-1049  Encoder implementations are required to provide an undocumented 
constructor.
https://issues.apache.org/jira/browse/KAFKA-1049
KAFKA-1032  Messages sent to the old leader will be lost on broker GC resulted 
failure
https://issues.apache.org/jira/browse/KAFKA-1032
KAFKA-1020  Remove getAllReplicasOnBroker from KafkaController
https://issues.apache.org/jira/browse/KAFKA-1020
KAFKA-1012  Implement an Offset Manager and hook offset requests to it
https://issues.apache.org/jira/browse/KAFKA-1012
KAFKA-1011  Decompression and re-compression on MirrorMaker could result in 
messages being dropped in the pipeline
https://issues.apache.org/jira/browse/KAFKA-1011
KAFKA-1005  kafka.perf.ConsumerPerformance not shutting down consumer
https://issues.apache.org/jira/browse/KAFKA-1005
KAFKA-1004  Handle topic event for trivial whitelist topic filters
https://issues.apache.org/jira/browse/KAFKA-1004
KAFKA-998   Producer should not retry on non-recoverable error codes
https://issues.apache.org/jira/browse/KAFKA-998
KAFKA-997   Provide a strict verification mode when reading configuration 
properties
https://issues.apache.org/jira/browse/KAFKA-997
KAFKA-996   Capitalize first letter for log entries
https://issues.apache.org/jira/browse/KAFKA-996
KAFKA-984   Avoid a full rebalance in cases when a new topic is discovered but 
container/broker set stay the same
https://issues.apache.org/jira/browse/KAFKA-984
KAFKA-976   Order-Preserving Mirror Maker Testcase
https://issues.apache.org/jira/browse/KAFKA-976
KAFKA-967   Use key range in ProducerPerformance
https://issues.apache.org/jira/browse/KAFKA-967
KAFKA-917   Expose zk.session.timeout.ms in console consumer
https://issues.apache.org/jira/browse/KAFKA-917
KAFKA-885   sbt package builds two kafka jars
https://issues.apache.org/jira/browse/KAFKA-885
KAFKA-881   Kafka broker not respecting log.roll.hours
https://issues.apache.org/jira/browse/KAFKA-881
KAFKA-873   Consider replacing zkclient with curator (with zkclient-bridge)
https://issues.apache.org/jira/browse/KAFKA-873
KAFKA-868   System Test - add test case for rolling controlled shutdown
https://issues.apache.org/jira/browse/KAFKA-868
KAFKA-863   System Test - update 0.7 version of kafka-run-class.sh for 
Migration Tool test cases
https://issues.apache.org/jira/browse/KAFKA-863
KAFKA-859   support basic auth protection of mx4j console
https://issues.apache.org/jira/browse/KAFKA-859
KAFKA-855   Ant+Ivy build for Kafka
https://issues.apache.org/jira/browse/KAFKA-855
KAFKA-854   Upgrade dependencies for 0.8
https://issues.apache.org/jira/browse/KAFKA-854
KAFKA-815   Improve SimpleConsumerShell to take in a max messages config option
https://issues.apache.org/jira/browse/KAFKA-815
KAFKA-745   Remove getShutdownReceive() and other kafka specific 

[VOTE] Apache Kafka Release 0.8.0 - Candidate 4

2013-11-25 Thread Joe Stein
This is the fourth candidate for release of Apache Kafka 0.8.0.   This
release resolves https://issues.apache.org/jira/browse/KAFKA-1131 and
https://issues.apache.org/jira/browse/KAFKA-1133

Release Notes for the 0.8.0 release
http://people.apache.org/~joestein/kafka-0.8.0-candidate4/RELEASE_NOTES.html

*** Please download, test and vote by Monday December, 2nd, 12pm PDT

Kafka's KEYS file containing PGP keys we use to sign the release:
http://svn.apache.org/repos/asf/kafka/KEYS in addition to the md5 and sha1
checksum

* Release artifacts to be voted upon (source and binary):
http://people.apache.org/~joestein/kafka-0.8.0-candidate4/

* Maven artifacts to be voted upon prior to release:
https://repository.apache.org/content/groups/staging/

(i.e. in sbt land this can be added to the build.sbt to use Kafka
resolvers += Apache Staging at 
https://repository.apache.org/content/groups/staging/;
libraryDependencies += org.apache.kafka % kafka_2.10 % 0.8.0
)

* The tag to be voted upon (off the 0.8 branch) is the 0.8.0 tag
https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=2c20a71a010659e25af075a024cbd692c87d4c89

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop
/


Re: Review Request 15805: KAFKA-1140.v2: addressed Jun's comments

2013-11-25 Thread Guozhang Wang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15805/
---

(Updated Nov. 25, 2013, 8:53 p.m.)


Review request for kafka.


Summary (updated)
-

KAFKA-1140.v2: addressed Jun's comments


Bugs: KAFKA-1140
https://issues.apache.org/jira/browse/KAFKA-1140


Repository: kafka


Description
---

KAFKA-1140.v1


Dummy


Diffs (updated)
-

  core/src/main/scala/kafka/consumer/ConsumerIterator.scala 
a4227a49684c7de08e07cb1f3a10d2f76ba28da7 

Diff: https://reviews.apache.org/r/15805/diff/


Testing
---


Thanks,

Guozhang Wang



[jira] [Commented] (KAFKA-1140) Move the decoding logic from ConsumerIterator.makeNext to next

2013-11-25 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13831892#comment-13831892
 ] 

Guozhang Wang commented on KAFKA-1140:
--

Updated reviewboard https://reviews.apache.org/r/15805/
 against branch origin/trunk

 Move the decoding logic from ConsumerIterator.makeNext to next
 --

 Key: KAFKA-1140
 URL: https://issues.apache.org/jira/browse/KAFKA-1140
 Project: Kafka
  Issue Type: Bug
Reporter: Guozhang Wang
Assignee: Guozhang Wang
 Fix For: 0.8.1

 Attachments: KAFKA-1140.patch, KAFKA-1140_2013-11-25_12:53:17.patch


 Usually people will write code around consumer like
 while(iter.hasNext()) {
 try {
   msg = iter.next()
   // do something
 }
 catch{
 }
 }
 
 However, the iter.hasNext() call itself can throw exceptions due to decoding 
 failures. It would be better to move the decoding to the next function call.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (KAFKA-1140) Move the decoding logic from ConsumerIterator.makeNext to next

2013-11-25 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-1140:
-

Attachment: KAFKA-1140_2013-11-25_12:53:17.patch

 Move the decoding logic from ConsumerIterator.makeNext to next
 --

 Key: KAFKA-1140
 URL: https://issues.apache.org/jira/browse/KAFKA-1140
 Project: Kafka
  Issue Type: Bug
Reporter: Guozhang Wang
Assignee: Guozhang Wang
 Fix For: 0.8.1

 Attachments: KAFKA-1140.patch, KAFKA-1140_2013-11-25_12:53:17.patch


 Usually people will write code around consumer like
 while(iter.hasNext()) {
 try {
   msg = iter.next()
   // do something
 }
 catch{
 }
 }
 
 However, the iter.hasNext() call itself can throw exceptions due to decoding 
 failures. It would be better to move the decoding to the next function call.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


Re: Review Request 15805: KAFKA-1140.v2: addressed Jun's comments

2013-11-25 Thread Guozhang Wang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15805/
---

(Updated Nov. 25, 2013, 8:55 p.m.)


Review request for kafka.


Bugs: KAFKA-1140
https://issues.apache.org/jira/browse/KAFKA-1140


Repository: kafka


Description (updated)
---

KAFKA-1140.v2


KAFKA-1140.v1


Dummy


Diffs (updated)
-

  core/src/main/scala/kafka/consumer/ConsumerIterator.scala 
a4227a49684c7de08e07cb1f3a10d2f76ba28da7 
  core/src/test/scala/unit/kafka/consumer/ConsumerIteratorTest.scala 
ef1de8321c713cd9d27ef937216f5b76a5d8c574 

Diff: https://reviews.apache.org/r/15805/diff/


Testing
---


Thanks,

Guozhang Wang



[jira] [Commented] (KAFKA-1140) Move the decoding logic from ConsumerIterator.makeNext to next

2013-11-25 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13831893#comment-13831893
 ] 

Guozhang Wang commented on KAFKA-1140:
--

Updated reviewboard https://reviews.apache.org/r/15805/
 against branch origin/trunk

 Move the decoding logic from ConsumerIterator.makeNext to next
 --

 Key: KAFKA-1140
 URL: https://issues.apache.org/jira/browse/KAFKA-1140
 Project: Kafka
  Issue Type: Bug
Reporter: Guozhang Wang
Assignee: Guozhang Wang
 Fix For: 0.8.1

 Attachments: KAFKA-1140.patch, KAFKA-1140_2013-11-25_12:53:17.patch, 
 KAFKA-1140_2013-11-25_12:55:34.patch


 Usually people will write code around consumer like
 while(iter.hasNext()) {
 try {
   msg = iter.next()
   // do something
 }
 catch{
 }
 }
 
 However, the iter.hasNext() call itself can throw exceptions due to decoding 
 failures. It would be better to move the decoding to the next function call.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (KAFKA-1140) Move the decoding logic from ConsumerIterator.makeNext to next

2013-11-25 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-1140:
-

Attachment: KAFKA-1140_2013-11-25_12:55:34.patch

 Move the decoding logic from ConsumerIterator.makeNext to next
 --

 Key: KAFKA-1140
 URL: https://issues.apache.org/jira/browse/KAFKA-1140
 Project: Kafka
  Issue Type: Bug
Reporter: Guozhang Wang
Assignee: Guozhang Wang
 Fix For: 0.8.1

 Attachments: KAFKA-1140.patch, KAFKA-1140_2013-11-25_12:53:17.patch, 
 KAFKA-1140_2013-11-25_12:55:34.patch


 Usually people will write code around consumer like
 while(iter.hasNext()) {
 try {
   msg = iter.next()
   // do something
 }
 catch{
 }
 }
 
 However, the iter.hasNext() call itself can throw exceptions due to decoding 
 failures. It would be better to move the decoding to the next function call.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (KAFKA-1144) commitOffsets can be passed the offsets to commit

2013-11-25 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13831993#comment-13831993
 ] 

Guozhang Wang commented on KAFKA-1144:
--

Imran, regarding the client rewrite, your proposal is also discussed as part of 
the project. Basically, the consumer commitOffset function can be something 
like:

void commit(List[String, Int, Long]) // topic, partition-id, offset

The wiki page will be updated soon so that you can take a look by then.

Guozhang



 commitOffsets can be passed the offsets to commit
 -

 Key: KAFKA-1144
 URL: https://issues.apache.org/jira/browse/KAFKA-1144
 Project: Kafka
  Issue Type: Improvement
  Components: consumer
Affects Versions: 0.8
Reporter: Imran Rashid
Assignee: Neha Narkhede
 Attachments: 
 0001-allow-committing-of-arbitrary-offsets-to-facilitate-.patch, 
 0002-add-protection-against-backward-commits.patch


 This adds another version of commitOffsets that takes the offsets to commit 
 as a parameter.
 Without this change, getting correct user code is very hard. Despite kafka's 
 at-least-once guarantees, most user code doesn't actually have that 
 guarantee, and is almost certainly wrong if doing batch processing. Getting 
 it right requires some very careful synchronization between all consumer 
 threads, which is both:
 1) painful to get right
 2) slow b/c of the need to stop all workers during a commit.
 This small change simplifies a lot of this. This was discussed extensively on 
 the user mailing list, on the thread are kafka consumer apps guaranteed to 
 see msgs at least once?
 You can also see an example implementation of a user api which makes use of 
 this, to get proper at-least-once guarantees by user code, even for batches:
 https://github.com/quantifind/kafka-utils/pull/1
 I'm open to any suggestions on how to add unit tests for this.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (KAFKA-1144) commitOffsets can be passed the offsets to commit

2013-11-25 Thread Imran Rashid (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13832019#comment-13832019
 ] 

Imran Rashid commented on KAFKA-1144:
-

yes, that addition to the commit api would do the trick.  (that is the main 
point of this patch, I just happened to choose a different signature.)  With 
rebalance notifications, that removes the need for conditional updates.

glad to hear this will all be in 0.9 -- but does that mean this patch is off 
the table for 0.8.*?  That's too bad, this would be a big help now.  I could 
change the signature to match what it will be in 0.9.  If not, I suppose I can 
always just make the zk updates myself directly, under the covers of the 
consumer api wrapper I'm writing.

 commitOffsets can be passed the offsets to commit
 -

 Key: KAFKA-1144
 URL: https://issues.apache.org/jira/browse/KAFKA-1144
 Project: Kafka
  Issue Type: Improvement
  Components: consumer
Affects Versions: 0.8
Reporter: Imran Rashid
Assignee: Neha Narkhede
 Attachments: 
 0001-allow-committing-of-arbitrary-offsets-to-facilitate-.patch, 
 0002-add-protection-against-backward-commits.patch


 This adds another version of commitOffsets that takes the offsets to commit 
 as a parameter.
 Without this change, getting correct user code is very hard. Despite kafka's 
 at-least-once guarantees, most user code doesn't actually have that 
 guarantee, and is almost certainly wrong if doing batch processing. Getting 
 it right requires some very careful synchronization between all consumer 
 threads, which is both:
 1) painful to get right
 2) slow b/c of the need to stop all workers during a commit.
 This small change simplifies a lot of this. This was discussed extensively on 
 the user mailing list, on the thread are kafka consumer apps guaranteed to 
 see msgs at least once?
 You can also see an example implementation of a user api which makes use of 
 this, to get proper at-least-once guarantees by user code, even for batches:
 https://github.com/quantifind/kafka-utils/pull/1
 I'm open to any suggestions on how to add unit tests for this.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (KAFKA-1135) Code cleanup - use Json.encode() to write json data to zk

2013-11-25 Thread David Lao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13832026#comment-13832026
 ] 

David Lao commented on KAFKA-1135:
--

Hi. This patch seem to have undone all the KAFKA-1112 changes. Can you verify?

 Code cleanup - use Json.encode() to write json data to zk
 -

 Key: KAFKA-1135
 URL: https://issues.apache.org/jira/browse/KAFKA-1135
 Project: Kafka
  Issue Type: Bug
Reporter: Swapnil Ghike
Assignee: Swapnil Ghike
 Fix For: 0.8.1

 Attachments: KAFKA-1135.patch, KAFKA-1135_2013-11-18_19:17:54.patch, 
 KAFKA-1135_2013-11-18_19:20:58.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (KAFKA-1135) Code cleanup - use Json.encode() to write json data to zk

2013-11-25 Thread Swapnil Ghike (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13832037#comment-13832037
 ] 

Swapnil Ghike commented on KAFKA-1135:
--

Thanks for catching this David! Jun, it seems that the diff in the reviewboard 
and what got attached to this JIRA is different. Can you please revert commit 
9b0776d157afd9eacddb84a99f2420fa9c0d505b, download the diff from the 
reviewboard and commit it?

 Code cleanup - use Json.encode() to write json data to zk
 -

 Key: KAFKA-1135
 URL: https://issues.apache.org/jira/browse/KAFKA-1135
 Project: Kafka
  Issue Type: Bug
Reporter: Swapnil Ghike
Assignee: Swapnil Ghike
 Fix For: 0.8.1

 Attachments: KAFKA-1135.patch, KAFKA-1135_2013-11-18_19:17:54.patch, 
 KAFKA-1135_2013-11-18_19:20:58.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (KAFKA-1135) Code cleanup - use Json.encode() to write json data to zk

2013-11-25 Thread Swapnil Ghike (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13832040#comment-13832040
 ] 

Swapnil Ghike commented on KAFKA-1135:
--

[~jjkoshy], does the above issue look similar to KAFKA-1142? 

 Code cleanup - use Json.encode() to write json data to zk
 -

 Key: KAFKA-1135
 URL: https://issues.apache.org/jira/browse/KAFKA-1135
 Project: Kafka
  Issue Type: Bug
Reporter: Swapnil Ghike
Assignee: Swapnil Ghike
 Fix For: 0.8.1

 Attachments: KAFKA-1135.patch, KAFKA-1135_2013-11-18_19:17:54.patch, 
 KAFKA-1135_2013-11-18_19:20:58.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (KAFKA-1135) Code cleanup - use Json.encode() to write json data to zk

2013-11-25 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13832067#comment-13832067
 ] 

Jun Rao commented on KAFKA-1135:


Thanks for pointing this out. Recommitted KAFKA-1112.

 Code cleanup - use Json.encode() to write json data to zk
 -

 Key: KAFKA-1135
 URL: https://issues.apache.org/jira/browse/KAFKA-1135
 Project: Kafka
  Issue Type: Bug
Reporter: Swapnil Ghike
Assignee: Swapnil Ghike
 Fix For: 0.8.1

 Attachments: KAFKA-1135.patch, KAFKA-1135_2013-11-18_19:17:54.patch, 
 KAFKA-1135_2013-11-18_19:20:58.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (KAFKA-1144) commitOffsets can be passed the offsets to commit

2013-11-25 Thread Imran Rashid (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Imran Rashid updated KAFKA-1144:


Attachment: 0003-dont-do-conditional-update-check-if-the-path-doesnt-.patch

just discovered a small bug -- conditional update doesn't work if the path 
doesn't already exist.  fiured I'd supdate this just in case its still in 
consideration ...

 commitOffsets can be passed the offsets to commit
 -

 Key: KAFKA-1144
 URL: https://issues.apache.org/jira/browse/KAFKA-1144
 Project: Kafka
  Issue Type: Improvement
  Components: consumer
Affects Versions: 0.8
Reporter: Imran Rashid
Assignee: Neha Narkhede
 Attachments: 
 0001-allow-committing-of-arbitrary-offsets-to-facilitate-.patch, 
 0002-add-protection-against-backward-commits.patch, 
 0003-dont-do-conditional-update-check-if-the-path-doesnt-.patch


 This adds another version of commitOffsets that takes the offsets to commit 
 as a parameter.
 Without this change, getting correct user code is very hard. Despite kafka's 
 at-least-once guarantees, most user code doesn't actually have that 
 guarantee, and is almost certainly wrong if doing batch processing. Getting 
 it right requires some very careful synchronization between all consumer 
 threads, which is both:
 1) painful to get right
 2) slow b/c of the need to stop all workers during a commit.
 This small change simplifies a lot of this. This was discussed extensively on 
 the user mailing list, on the thread are kafka consumer apps guaranteed to 
 see msgs at least once?
 You can also see an example implementation of a user api which makes use of 
 this, to get proper at-least-once guarantees by user code, even for batches:
 https://github.com/quantifind/kafka-utils/pull/1
 I'm open to any suggestions on how to add unit tests for this.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (KAFKA-1004) Handle topic event for trivial whitelist topic filters

2013-11-25 Thread Joel Koshy (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Koshy updated KAFKA-1004:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

This was fixed in KAFKA-1103


 Handle topic event for trivial whitelist topic filters
 --

 Key: KAFKA-1004
 URL: https://issues.apache.org/jira/browse/KAFKA-1004
 Project: Kafka
  Issue Type: Bug
Reporter: Guozhang Wang
Assignee: Guozhang Wang
 Fix For: 0.8.1, 0.7

 Attachments: KAFKA-1004.v1.patch, KAFKA-1004.v2.patch


 Toay consumer's TopicEventWatcher is not subscribed with trivial whitelist 
 topic names. Hence if the topic is not registered on ZK when the consumer is 
 started, it will not trigger the rebalance of consumers later when it is 
 created and hence not be consumed even if it is in the whilelist. A proposed 
 fix would be always subscribe TopicEventWatcher for all whitelist consumers.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Closed] (KAFKA-1004) Handle topic event for trivial whitelist topic filters

2013-11-25 Thread Joel Koshy (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Koshy closed KAFKA-1004.
-


 Handle topic event for trivial whitelist topic filters
 --

 Key: KAFKA-1004
 URL: https://issues.apache.org/jira/browse/KAFKA-1004
 Project: Kafka
  Issue Type: Bug
Reporter: Guozhang Wang
Assignee: Guozhang Wang
 Fix For: 0.7, 0.8.1

 Attachments: KAFKA-1004.v1.patch, KAFKA-1004.v2.patch


 Toay consumer's TopicEventWatcher is not subscribed with trivial whitelist 
 topic names. Hence if the topic is not registered on ZK when the consumer is 
 started, it will not trigger the rebalance of consumers later when it is 
 created and hence not be consumed even if it is in the whilelist. A proposed 
 fix would be always subscribe TopicEventWatcher for all whitelist consumers.



--
This message was sent by Atlassian JIRA
(v6.1#6144)