[jira] [Commented] (KAFKA-1507) Using GetOffsetShell against non-existent topic creates the topic unintentionally
[ https://issues.apache.org/jira/browse/KAFKA-1507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278929#comment-14278929 ] Jay Kreps commented on KAFKA-1507: -- I think the right way to do this is have a proper create/delete/alter topic api (which i think is in-flight now). We should make having the get metadata request auto-creating topics optional and disable it by default (e.g. add an option like metadata.requests.auto.create=false). We can retain the auto-create functionality in the producer by having it issue this request in response to errors about a non-existant topic. I don't think we should change the java api of the producer to expose this (i.e. add a producer.createTopic(name, replication, partitions, etc). Instead I think we should consider a Java admin client that exposes this functionality. This would be where we would expose other operational apis as well. The rationale for this is that creating, deleting, and modifying topics is actually not part of normal application usage so having it directly exposed in the producer is a bit dangerous. We should definitely do a KIP proposal around this and get the design and API worked out first. I think we could do this in 0.8.3 if you are up to work on it. It would likely depend on some of the changes in KAFKA-1760 so we would want to merge that first. Using GetOffsetShell against non-existent topic creates the topic unintentionally - Key: KAFKA-1507 URL: https://issues.apache.org/jira/browse/KAFKA-1507 Project: Kafka Issue Type: Bug Affects Versions: 0.8.1.1 Environment: centos Reporter: Luke Forehand Assignee: Sriharsha Chintalapani Priority: Minor Labels: newbie Attachments: KAFKA-1507.patch, KAFKA-1507.patch, KAFKA-1507_2014-07-22_10:27:45.patch, KAFKA-1507_2014-07-23_17:07:20.patch, KAFKA-1507_2014-08-12_18:09:06.patch, KAFKA-1507_2014-08-22_11:06:38.patch, KAFKA-1507_2014-08-22_11:08:51.patch A typo in using GetOffsetShell command can cause a topic to be created which cannot be deleted (because deletion is still in progress) ./kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list kafka10:9092,kafka11:9092,kafka12:9092,kafka13:9092 --topic typo --time 1 ./kafka-topics.sh --zookeeper stormqa1/kafka-prod --describe --topic typo Topic:typo PartitionCount:8ReplicationFactor:1 Configs: Topic: typo Partition: 0Leader: 10 Replicas: 10 Isr: 10 ... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1697) remove code related to ack1 on the broker
[ https://issues.apache.org/jira/browse/KAFKA-1697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279037#comment-14279037 ] Joe Stein commented on KAFKA-1697: -- With this patch I think we should change the existing functionality with 1 update to start to LOG as a WARN in the Broker (so it gets people attention to stop using ack 1) but keep everything else the same... the new version of the request (with a match/case) should do the new functionality. remove code related to ack1 on the broker -- Key: KAFKA-1697 URL: https://issues.apache.org/jira/browse/KAFKA-1697 Project: Kafka Issue Type: Bug Reporter: Jun Rao Assignee: Gwen Shapira Fix For: 0.8.3 Attachments: KAFKA-1697.patch, KAFKA-1697_2015-01-14_15:41:37.patch We removed the ack1 support from the producer client in kafka-1555. We can completely remove the code in the broker that supports ack1. Also, we probably want to make NotEnoughReplicasAfterAppend a non-retriable exception and let the client decide what to do. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-1697) remove code related to ack1 on the broker
[ https://issues.apache.org/jira/browse/KAFKA-1697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279037#comment-14279037 ] Joe Stein edited comment on KAFKA-1697 at 1/15/15 6:21 PM: --- With this patch I think we should change the existing functionality with 1 update to start to LOG as a WARN in the Broker (so it gets people attention to stop using ack 1) but keep everything else the same... the new version of the request (with a match/case) should do the new functionality and we support both. was (Author: joestein): With this patch I think we should change the existing functionality with 1 update to start to LOG as a WARN in the Broker (so it gets people attention to stop using ack 1) but keep everything else the same... the new version of the request (with a match/case) should do the new functionality. remove code related to ack1 on the broker -- Key: KAFKA-1697 URL: https://issues.apache.org/jira/browse/KAFKA-1697 Project: Kafka Issue Type: Bug Reporter: Jun Rao Assignee: Gwen Shapira Fix For: 0.8.3 Attachments: KAFKA-1697.patch, KAFKA-1697_2015-01-14_15:41:37.patch We removed the ack1 support from the producer client in kafka-1555. We can completely remove the code in the broker that supports ack1. Also, we probably want to make NotEnoughReplicasAfterAppend a non-retriable exception and let the client decide what to do. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Heads up: KAFKA-1697 - remove code related to ack1 on the broker
Right, so this looks like it could create an issue similar to what's currently being discussed in https://issues.apache.org/jira/browse/KAFKA-1649 where users now get errors under conditions when they previously wouldn't. Old clients won't even know about the error code, so besides failing they won't even be able to log any meaningful error messages. I think there are two options for compatibility: 1. An alternative change is to remove the ack 1 code, but silently upgrade requests with acks 1 to acks = -1. This isn't the same as other changes to behavior since the interaction between the client and server remains the same, no error codes change, etc. The client might just see some increased latency since the message might need to be replicated to more brokers than they requested. 2. Split this into two patches, one that bumps the protocol version on that message to include the new error code but maintains both old (now deprecated) and new behavior, then a second that would be applied in a later release that removes the old protocol + code for handling acks 1. 2 is probably the right thing to do. If we specify the release when we'll remove the deprecated protocol at the time of deprecation it makes things a lot easier for people writing non-java clients and could give users better predictability (e.g. if clients are at most 1 major release behind brokers, they'll remain compatible but possibly use deprecated features). On Wed, Jan 14, 2015 at 3:51 PM, Gwen Shapira gshap...@cloudera.com wrote: Hi Kafka Devs, We are working on KAFKA-1697 - remove code related to ack1 on the broker. Per Neha's suggestion, I'd like to give everyone a heads up on what these changes mean. Once this patch is included, any produce requests that include request.required.acks 1 will result in an exception. This will be InvalidRequiredAcks in new versions (0.8.3 and up, I assume) and UnknownException in existing versions (sorry, but I can't add error codes retroactively). This behavior is already enforced by 0.8.2 producers (sync and new), but we expect impact on users with older producers that relied on acks 1 and external clients (i.e python, go, etc). Users who relied on acks 1 are expected to switch to using acks = -1 and a min.isr parameter than matches their user case. This change was discussed in the past in the context of KAFKA-1555 (min.isr), but let us know if you have any questions or concerns regarding this change. Gwen -- Thanks, Ewen
[DISCUSS] KIPs
The idea of KIPs came up in a previous discussion but there was no real crisp definition of what they were. Here is an attempt at defining a process: https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals The trick here is to have something light-weight enough that it isn't a hassle for small changes, but enough so that changes get the eyeballs of the committers and heavy users. Thoughts? -Jay
[GitHub] kafka pull request: Trunk
GitHub user SylviaVargasCTL opened a pull request: https://github.com/apache/kafka/pull/42 Trunk You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/kafka trunk Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/42.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #42 commit de6066e8e78e62739e3eb6f771d0739bf9b73dfd Author: Stevo Slavic ssla...@apache.org Date: 2014-04-09T15:13:53Z kafka-1370; Gradle startup script for Windows; patched by Stevo Slavic; reviewed by Jun Rao commit 75d5f5bff8519b36d5eb0a904ebbd0d3c0b7c8cc Author: Stevo Slavic ssla...@apache.org Date: 2014-04-09T15:24:13Z kafka-1375; Formatting for in README.md is broken; patched by Stevo Slavic; reviewed by Jun Rao commit 8d15de85114da6012530f0dd837f131bd1e367cd Author: Joel Koshy jjko...@gmail.com Date: 2014-04-08T21:21:46Z KAFKA-1373; Set first dirty (uncompacted) offset to first offset of the log if no checkpoint exists. Reviewed by Neha Narkhede and Timothy Chen. commit 911ff524515148421ff58b41f271c21f792ed9de Author: Guozhang Wang guw...@linkedin.com Date: 2014-04-09T21:49:17Z kafka-1364; ReplicaManagerTest hard-codes log dir; patched by Guozhang Wang; reviewed by Jun Rao commit 47019a849e69209c16defa81001055aa9f57674d Author: Jun Rao jun...@gmail.com Date: 2014-04-09T21:53:03Z kafka-1376; transient test failure in UncleanLeaderElectionTest; patched by Jun Rao; reviewed by Joel Koshy commit 44ee6b7c9d9da207bebe6d927b38ed7df1388df3 Author: Guozhang Wang guw...@linkedin.com Date: 2014-04-10T01:52:23Z kafka-1353;report capacity used by request thread pool and network thread pool; patched by Guozhang Wang; reviewed by Jun Rao commit 2d429e19da22416aeb7de68b9e00f33a337e31a0 Author: Guozhang Wang guw...@linkedin.com Date: 2014-04-11T21:39:35Z kafka-1337; follow-up patch to add broker list for new producer in system test overriden function; patched by Guozhang Wang; reviewed by Neha Narkhede, Jun Rao commit a3a2cba842a9945d4ce7b032e311d17956c33249 Author: Timothy Chen tnac...@gmail.com Date: 2014-04-12T03:31:13Z kafka-1363; testTopicConfigChangesDuringDeleteTopic hangs; patched by Timothy Chen; reviewed by Guozhang Wang, Neha Narkhede and Jun Rao commit 6bb616e5ae6a2f45cfa190e245c9217a8bf9771a Author: Jun Rao jun...@gmail.com Date: 2014-04-12T04:29:09Z trivial change to add kafka doap project file commit 05612ac44de775cf3bc8ffdc15c033920d4a1440 Author: Jun Rao jun...@gmail.com Date: 2014-04-12T20:47:13Z kafka-1377; transient unit test failure in LogOffsetTest; patched by Jun Rao; reviewed by Neha Narkhede commit d37ca7f627551c9960e2edb8498784bf2487d53e Author: Jun Rao jun...@gmail.com Date: 2014-04-12T20:48:57Z kafka-1378; transient unit test failure in LogRecoveryTest; patched by Jun Rao; reviewed by Neha Narkhede commit 2bfd49b955831f3455ff486ce4f613d73239a317 Author: Jun Rao jun...@gmail.com Date: 2014-04-12T20:51:29Z kafka-1381; transient unit test failure in AddPartitionsTest; patched by Jun Rao; reviewed by Neha Narkhede commit 4bd33e5ba792667913991638a15f0a2c9f20d7b5 Author: Stevo Slavic ssla...@apache.org Date: 2014-04-13T15:43:53Z kafka-1210; Windows Bat files are not working properly; patched by Stevo Slavic; reviewed by Jun Rao commit 9a6f7113ed630d8e6bb50f4a58846d976a2d5f97 Author: Jun Rao jun...@gmail.com Date: 2014-04-15T20:46:54Z kafka-1390; TestUtils.waitUntilLeaderIsElectedOrChanged may wait longer than it needs; patched by Jun Rao; reviewed by Guozhang Wang commit ec075c5a853e4168ff30cf133493588671aa2fac Author: Jay Kreps jay.kr...@gmail.com Date: 2014-04-16T17:19:18Z KAFKA-1359: Ensure all topic/server metrics registered at once. commit 97f13ec73255f3978c1cbb80ea26f446cf756515 Author: Joel Koshy jjko...@gmail.com Date: 2014-04-15T21:08:21Z KAFKA-1323; Fix regression due to KAFKA-1315 (support for relative directories in log.dirs property broke). Patched by Timothy Chen and Guozhang Wang; reviewed by Joel Koshy, Neha Narkhede and Jun Rao. commit b9351e04f0a26f2f9e4abc7ec526ed802495f991 Author: Jay Kreps jay.kr...@gmail.com Date: 2014-04-18T00:07:27Z KAFKA-1398 dynamic config changes are broken. commit 8b052150f55fb9a6f8c4aedc6b3fa0528719671e Author: Jay Kreps jay.kr...@gmail.com Date: 2014-04-17T00:32:43Z KAFKA-1337: Fix incorrect producer configs after config renaming. commit 037c054be260bcc3b470b9724572cb814b704bff Author: Joel Koshy jjko...@gmail.com Date: 2014-04-18T20:10:34Z KAFKA-1362; Publish sources and javadoc jars; (also removed Scala 2.8.2-specific actions). Reviewed by Jun Rao and Joe Stein commit 89f040c8c9807fc4f9e219f0b57279b692b6e45d Author: Jay Kreps jay.kr...@gmail.com Date: 2014-04-18T18:03:37Z
Re: Heads up: KAFKA-1697 - remove code related to ack1 on the broker
Looping in the mailing list that the client developers live on because they are all not on dev (though they should be if they want to be helping to build the best client libraries they can). I whole hardily believe that we need to not break existing functionality of the client protocol, ever. There are many reasons for this and we have other threads on the mailing list where we are discussing that topic (no pun intended) that I don't want to re-hash here. If we change wire protocol functionality *OR* the binary format (either) we must bump version *AND* treat version as a feature flag with backward compatibility support until it is deprecated for some time for folks to deal with it. match version = { case 0: keepDoingWhatWeWereDoing() case 1: doNewStuff() case 2: doEvenMoreNewStuff() } has to be a practice we adopt imho ... I know feature flags can be construed as messy code but I am eager to hear another (better? different?) solution to this. If we don't do a feature flag like this specifically with this change then what happens is that someone upgrades their brokers with a rolling restart in 0.8.3 and every single one of their producer requests start to fail and they have a major production outage. k I do 100% agree that 1 makes no sense and we *REALLY* need people to start using 0,1,-1 but we need to-do that in a way that is going to work for everyone. Old producers and consumers must keep working with new brokers and if we are not going to support that then I am unclear what the use of version is based on our original intentions of having it because of the 0.7=-0.8. We said no more breaking changes when we did that. - Joe Stein On Thu, Jan 15, 2015 at 12:38 PM, Ewen Cheslack-Postava e...@confluent.io wrote: Right, so this looks like it could create an issue similar to what's currently being discussed in https://issues.apache.org/jira/browse/KAFKA-1649 where users now get errors under conditions when they previously wouldn't. Old clients won't even know about the error code, so besides failing they won't even be able to log any meaningful error messages. I think there are two options for compatibility: 1. An alternative change is to remove the ack 1 code, but silently upgrade requests with acks 1 to acks = -1. This isn't the same as other changes to behavior since the interaction between the client and server remains the same, no error codes change, etc. The client might just see some increased latency since the message might need to be replicated to more brokers than they requested. 2. Split this into two patches, one that bumps the protocol version on that message to include the new error code but maintains both old (now deprecated) and new behavior, then a second that would be applied in a later release that removes the old protocol + code for handling acks 1. 2 is probably the right thing to do. If we specify the release when we'll remove the deprecated protocol at the time of deprecation it makes things a lot easier for people writing non-java clients and could give users better predictability (e.g. if clients are at most 1 major release behind brokers, they'll remain compatible but possibly use deprecated features). On Wed, Jan 14, 2015 at 3:51 PM, Gwen Shapira gshap...@cloudera.com wrote: Hi Kafka Devs, We are working on KAFKA-1697 - remove code related to ack1 on the broker. Per Neha's suggestion, I'd like to give everyone a heads up on what these changes mean. Once this patch is included, any produce requests that include request.required.acks 1 will result in an exception. This will be InvalidRequiredAcks in new versions (0.8.3 and up, I assume) and UnknownException in existing versions (sorry, but I can't add error codes retroactively). This behavior is already enforced by 0.8.2 producers (sync and new), but we expect impact on users with older producers that relied on acks 1 and external clients (i.e python, go, etc). Users who relied on acks 1 are expected to switch to using acks = -1 and a min.isr parameter than matches their user case. This change was discussed in the past in the context of KAFKA-1555 (min.isr), but let us know if you have any questions or concerns regarding this change. Gwen -- Thanks, Ewen
Re: [kafka-clients] Re: Heads up: KAFKA-1697 - remove code related to ack1 on the broker
This would sting a whole lot less if there was a programmatic way to get what server version is in use. Also, how will this work in mixed version clusters (during an upgrade, for example)? On Jan 15, 2015, at 10:10, Joe Stein joe.st...@stealth.ly wrote: Looping in the mailing list that the client developers live on because they are all not on dev (though they should be if they want to be helping to build the best client libraries they can). I whole hardily believe that we need to not break existing functionality of the client protocol, ever. There are many reasons for this and we have other threads on the mailing list where we are discussing that topic (no pun intended) that I don't want to re-hash here. If we change wire protocol functionality OR the binary format (either) we must bump version AND treat version as a feature flag with backward compatibility support until it is deprecated for some time for folks to deal with it. match version = { case 0: keepDoingWhatWeWereDoing() case 1: doNewStuff() case 2: doEvenMoreNewStuff() } has to be a practice we adopt imho ... I know feature flags can be construed as messy code but I am eager to hear another (better? different?) solution to this. If we don't do a feature flag like this specifically with this change then what happens is that someone upgrades their brokers with a rolling restart in 0.8.3 and every single one of their producer requests start to fail and they have a major production outage. k I do 100% agree that 1 makes no sense and we *REALLY* need people to start using 0,1,-1 but we need to-do that in a way that is going to work for everyone. Old producers and consumers must keep working with new brokers and if we are not going to support that then I am unclear what the use of version is based on our original intentions of having it because of the 0.7=-0.8. We said no more breaking changes when we did that. - Joe Stein On Thu, Jan 15, 2015 at 12:38 PM, Ewen Cheslack-Postava e...@confluent.io wrote: Right, so this looks like it could create an issue similar to what's currently being discussed in https://issues.apache.org/jira/browse/KAFKA-1649 where users now get errors under conditions when they previously wouldn't. Old clients won't even know about the error code, so besides failing they won't even be able to log any meaningful error messages. I think there are two options for compatibility: 1. An alternative change is to remove the ack 1 code, but silently upgrade requests with acks 1 to acks = -1. This isn't the same as other changes to behavior since the interaction between the client and server remains the same, no error codes change, etc. The client might just see some increased latency since the message might need to be replicated to more brokers than they requested. 2. Split this into two patches, one that bumps the protocol version on that message to include the new error code but maintains both old (now deprecated) and new behavior, then a second that would be applied in a later release that removes the old protocol + code for handling acks 1. 2 is probably the right thing to do. If we specify the release when we'll remove the deprecated protocol at the time of deprecation it makes things a lot easier for people writing non-java clients and could give users better predictability (e.g. if clients are at most 1 major release behind brokers, they'll remain compatible but possibly use deprecated features). On Wed, Jan 14, 2015 at 3:51 PM, Gwen Shapira gshap...@cloudera.com wrote: Hi Kafka Devs, We are working on KAFKA-1697 - remove code related to ack1 on the broker. Per Neha's suggestion, I'd like to give everyone a heads up on what these changes mean. Once this patch is included, any produce requests that include request.required.acks 1 will result in an exception. This will be InvalidRequiredAcks in new versions (0.8.3 and up, I assume) and UnknownException in existing versions (sorry, but I can't add error codes retroactively). This behavior is already enforced by 0.8.2 producers (sync and new), but we expect impact on users with older producers that relied on acks 1 and external clients (i.e python, go, etc). Users who relied on acks 1 are expected to switch to using acks = -1 and a min.isr parameter than matches their user case. This change was discussed in the past in the context of KAFKA-1555 (min.isr), but let us know if you have any questions or concerns regarding this change. Gwen -- Thanks, Ewen -- You received this message because you are subscribed to the Google Groups kafka-clients group. To unsubscribe from this group and stop receiving emails from it, send an email to kafka-clients+unsubscr...@googlegroups.com. To post to this group, send email to kafka-clie...@googlegroups.com. Visit this group at
[jira] [Commented] (KAFKA-1507) Using GetOffsetShell against non-existent topic creates the topic unintentionally
[ https://issues.apache.org/jira/browse/KAFKA-1507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279012#comment-14279012 ] Sriharsha Chintalapani commented on KAFKA-1507: --- [~jkreps] thanks for the comments. We should make having the get metadata request auto-creating topics optional and disable it by default (e.g. add an option like metadata.requests.auto.create=false) currently meta data request checks for broker config auto.create.topics.enable . This is disabled in my patch. We can retain the auto-create functionality in the producer by having it issue this request in response to errors about a non-existant topic. This is what my current patch does. it sets auto.create.topics.enable to false on the broker config and when the producer makes TopicMetadataRequest it returns Unknown_Topic_or_partition. ProducerConfig has new properties like auto.create.topics.enable . If this property set to true( by default) producer issues a new request for CreateTopicRequest upon receiving unknown_topic_or_partition error. which will than issues create topic on broker side with the configured numPartitions and replicationFactor. I don't think we should change the java api of the producer to expose this (i.e. add a producer.createTopic(name, replication, partitions, etc). . I agree and my patch doesn't change any api of the producer and it doesn't have producer.createTopic but it does introduce createTopicRequest and createTopicResponse. Instead I think we should consider a Java admin client that exposes this functionality. This would be where we would expose other operational apis as well. The rationale for this is that creating, deleting, and modifying topics is actually not part of normal application usage so having it directly exposed in the producer is a bit dangerous. I am not sure if I understand correctly about the java admin client. We already have AdminUtils , is this about introducing network apis for create/delete/alter topics? . Even in this case I think this patch can be useful as the producer just makes createTopicRequest which I think what you want unless I missed something :). Using GetOffsetShell against non-existent topic creates the topic unintentionally - Key: KAFKA-1507 URL: https://issues.apache.org/jira/browse/KAFKA-1507 Project: Kafka Issue Type: Bug Affects Versions: 0.8.1.1 Environment: centos Reporter: Luke Forehand Assignee: Sriharsha Chintalapani Priority: Minor Labels: newbie Attachments: KAFKA-1507.patch, KAFKA-1507.patch, KAFKA-1507_2014-07-22_10:27:45.patch, KAFKA-1507_2014-07-23_17:07:20.patch, KAFKA-1507_2014-08-12_18:09:06.patch, KAFKA-1507_2014-08-22_11:06:38.patch, KAFKA-1507_2014-08-22_11:08:51.patch A typo in using GetOffsetShell command can cause a topic to be created which cannot be deleted (because deletion is still in progress) ./kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list kafka10:9092,kafka11:9092,kafka12:9092,kafka13:9092 --topic typo --time 1 ./kafka-topics.sh --zookeeper stormqa1/kafka-prod --describe --topic typo Topic:typo PartitionCount:8ReplicationFactor:1 Configs: Topic: typo Partition: 0Leader: 10 Replicas: 10 Isr: 10 ... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Heads up: KAFKA-1697 - remove code related to ack1 on the broker
I very much agree on what Joe is saying, let's use the version field as intended and be very strict with not removing nor altering existing behaviour without bumping the version. Old API versions could be deprecated (documentation only?) immediately and removed completely in the next minor version bump (0.8-0.9). An API to query supported API versions would be a good addition in the long run but doesn't help current clients much as such a request to an older broker version will kill the connection without any error reporting to the client, thus making it rather useless in the short term. Regards, Magnus
[jira] [Comment Edited] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior
[ https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278895#comment-14278895 ] Sriharsha Chintalapani edited comment on KAFKA-1461 at 1/15/15 4:24 PM: [~guozhang] I had the following code in my mind about backoff retries incase of any error. This code will be under ReplicaFetcherThread.handlePartitions. I am thinking off maintaining two maps in ReplicaFetcherThread private val partitionsWithErrorStandbyMap = new mutable.HashMap[TopicAndPartition, Long] // a (topic, partition) - offset private val partitionsWithErrorMap = new mutable.HashMap[TopicAndPartition, Long] // a (topic, partition) - timestamp one for offset and one for timestamp. remove the partitions from the AbstractFetcherThread.partitionsMap and add back to the map once the currentTime partitionsWithErrorMap.timestamp + replicaFetcherRetryBackoffMs . I am not quite sure about maintaining these two maps . If its look ok to you , I'll send a patch or if you have any other approach please let me know. {code} def handlePartitionsWithErrors(partitions: Iterable[TopicAndPartition]) { //add to the partitionsWithErrorMap with currentTime. for (partition - partitions) { if(!partitionsWithErrorMap.contains(partition)) { partitionsWithErrorMap.put(partition, System.currentTimeMillis()) currentOffset(partition) match { case Some(offset: Long) = partitionsWithErrorStandbyMap.put(partition, offset) } } } removePartitions(partitions.toSet) val partitionsToBeAdded = new mutable.HashMap[TopicAndPartition, Long] // process partitionsWithErrorMap and add partitions back if the backoff time elapsed. partitionsWithErrorMap.foreach { case((topicAndPartition, timeMs)) = if(System.currentTimeMillis() timeMs + brokerConfig.replicaFetcherRetryBackoffMs) { partitionsWithErrorStandbyMap.get(topicAndPartition) match { case Some(offset: Long) = partitionsToBeAdded.put(topicAndPartition, offset) } partitionsWithErrorStandbyMap.remove(topicAndPartition) } } addPartitions(partitionsToBeAdded) } {code} was (Author: sriharsha): [~guozhang] I had the following code in my mind about backoff retries incase of any error. This code will be under ReplicaFetcherThread.handlePartitions. I am thinking off maintaining two maps in ReplicaFetcherThread private val partitionsWithErrorStandbyMap = new mutable.HashMap[TopicAndPartition, Long] // a (topic, partition) - offset private val partitionsWithErrorMap = new mutable.HashMap[TopicAndPartition, Long] // a (topic, partition) - timestamp one for offset and one for timestamp. remove the partitions from the AbstractFetcherThread.partitionsMap and add back to the map once the currentTime partitionsWithErrorMap.timestamp + replicaFetcherRetryBackoffMs . I am not quite sure about maintaining these two maps . If its look ok to you , I'll send a patch or if you have any other approach please let me know. ```code def handlePartitionsWithErrors(partitions: Iterable[TopicAndPartition]) { //add to the partitionsWithErrorMap with currentTime. for (partition - partitions) { if(!partitionsWithErrorMap.contains(partition)) { partitionsWithErrorMap.put(partition, System.currentTimeMillis()) currentOffset(partition) match { case Some(offset: Long) = partitionsWithErrorStandbyMap.put(partition, offset) } } } removePartitions(partitions.toSet) val partitionsToBeAdded = new mutable.HashMap[TopicAndPartition, Long] // process partitionsWithErrorMap and add partitions back if the backoff time elapsed. partitionsWithErrorMap.foreach { case((topicAndPartition, timeMs)) = if(System.currentTimeMillis() timeMs + brokerConfig.replicaFetcherRetryBackoffMs) { partitionsWithErrorStandbyMap.get(topicAndPartition) match { case Some(offset: Long) = partitionsToBeAdded.put(topicAndPartition, offset) } partitionsWithErrorStandbyMap.remove(topicAndPartition) } } addPartitions(partitionsToBeAdded) } ``` Replica fetcher thread does not implement any back-off behavior --- Key: KAFKA-1461 URL: https://issues.apache.org/jira/browse/KAFKA-1461 Project: Kafka Issue Type: Improvement Components: replication Affects Versions: 0.8.1.1 Reporter: Sam Meder Assignee: Sriharsha Chintalapani Labels: newbie++ Fix For: 0.8.3 The current replica fetcher thread will retry in a tight loop if any error occurs during the fetch call. For example, we've seen cases where the fetch continuously throws a
[jira] [Commented] (KAFKA-1333) Add consumer co-ordinator module to the server
[ https://issues.apache.org/jira/browse/KAFKA-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278905#comment-14278905 ] Onur Karaman commented on KAFKA-1333: - Hey everyone. I spoke with Guozhang about this yesterday. He has started this work. Add consumer co-ordinator module to the server -- Key: KAFKA-1333 URL: https://issues.apache.org/jira/browse/KAFKA-1333 Project: Kafka Issue Type: Sub-task Components: consumer Affects Versions: 0.9.0 Reporter: Neha Narkhede Assignee: Guozhang Wang Scope of this JIRA is to just add a consumer co-ordinator module that do the following: 1) coordinator start-up, metadata initialization 2) simple join group handling (just updating metadata, no failure detection / rebalancing): this should be sufficient for consumers doing self offset / partition management. Offset manager will still run side-by-side with the coordinator in this JIRA, and we will merge it in KAFKA-1740. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1649) Protocol documentation does not indicate that ReplicaNotAvailable can be ignored
[ https://issues.apache.org/jira/browse/KAFKA-1649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278790#comment-14278790 ] Hernan Rivas Inaka commented on KAFKA-1649: --- I managed to get that error on 0.8.1 (don't remember the specific revision) by killing leaders but it didn't always happen, but my tests always had 3 brokers so that might have something to do with it. I will try to allocate some time to replicate the issue both on 0.8.1 and 0.8.2 and see what I can find. Protocol documentation does not indicate that ReplicaNotAvailable can be ignored Key: KAFKA-1649 URL: https://issues.apache.org/jira/browse/KAFKA-1649 Project: Kafka Issue Type: Improvement Components: website Affects Versions: 0.8.1.1 Reporter: Hernan Rivas Inaka Priority: Minor Labels: protocol-documentation Original Estimate: 10m Remaining Estimate: 10m The protocol documentation here https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-ErrorCodes should indicate that error 9 (ReplicaNotAvailable) can be safely ignored on producers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1577) Exception in ConnectionQuotas while shutting down
[ https://issues.apache.org/jira/browse/KAFKA-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278880#comment-14278880 ] German Borbolla commented on KAFKA-1577: I will try to reproduce with the release candidate for 0.8.2 Exception in ConnectionQuotas while shutting down - Key: KAFKA-1577 URL: https://issues.apache.org/jira/browse/KAFKA-1577 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8.2 Reporter: Joel Koshy Assignee: Sriharsha Chintalapani Priority: Blocker Labels: newbie Fix For: 0.8.2 Attachments: KAFKA-1577.patch, KAFKA-1577.patch, KAFKA-1577_2014-08-20_19:57:44.patch, KAFKA-1577_2014-08-26_07:33:13.patch, KAFKA-1577_2014-09-26_19:13:05.patch, KAFKA-1577_check_counter_before_decrementing.patch, kafka-logs.tar.gz {code} [2014-08-07 19:38:08,228] ERROR Uncaught exception in thread 'kafka-network-thread-9092-0': (kafka.utils.Utils$) java.util.NoSuchElementException: None.get at scala.None$.get(Option.scala:185) at scala.None$.get(Option.scala:183) at kafka.network.ConnectionQuotas.dec(SocketServer.scala:471) at kafka.network.AbstractServerThread.close(SocketServer.scala:158) at kafka.network.AbstractServerThread.close(SocketServer.scala:150) at kafka.network.AbstractServerThread.closeAll(SocketServer.scala:171) at kafka.network.Processor.run(SocketServer.scala:338) at java.lang.Thread.run(Thread.java:662) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Jenkins build is back to normal : Kafka-trunk #370
See https://builds.apache.org/job/Kafka-trunk/370/changes
[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior
[ https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278895#comment-14278895 ] Sriharsha Chintalapani commented on KAFKA-1461: --- [~guozhang] I had the following code in my mind about backoff retries incase of any error. This code will be under ReplicaFetcherThread.handlePartitions. I am thinking off maintaining two maps in ReplicaFetcherThread private val partitionsWithErrorStandbyMap = new mutable.HashMap[TopicAndPartition, Long] // a (topic, partition) - offset private val partitionsWithErrorMap = new mutable.HashMap[TopicAndPartition, Long] // a (topic, partition) - timestamp one for offset and one for timestamp. remove the partitions from the AbstractFetcherThread.partitionsMap and add back to the map once the currentTime partitionsWithErrorMap.timestamp + replicaFetcherRetryBackoffMs . I am not quite sure about maintaining these two maps . If its look ok to you , I'll send a patch or if you have any other approach please let me know. ```code def handlePartitionsWithErrors(partitions: Iterable[TopicAndPartition]) { //add to the partitionsWithErrorMap with currentTime. for (partition - partitions) { if(!partitionsWithErrorMap.contains(partition)) { partitionsWithErrorMap.put(partition, System.currentTimeMillis()) currentOffset(partition) match { case Some(offset: Long) = partitionsWithErrorStandbyMap.put(partition, offset) } } } removePartitions(partitions.toSet) val partitionsToBeAdded = new mutable.HashMap[TopicAndPartition, Long] // process partitionsWithErrorMap and add partitions back if the backoff time elapsed. partitionsWithErrorMap.foreach { case((topicAndPartition, timeMs)) = if(System.currentTimeMillis() timeMs + brokerConfig.replicaFetcherRetryBackoffMs) { partitionsWithErrorStandbyMap.get(topicAndPartition) match { case Some(offset: Long) = partitionsToBeAdded.put(topicAndPartition, offset) } partitionsWithErrorStandbyMap.remove(topicAndPartition) } } addPartitions(partitionsToBeAdded) } ``` Replica fetcher thread does not implement any back-off behavior --- Key: KAFKA-1461 URL: https://issues.apache.org/jira/browse/KAFKA-1461 Project: Kafka Issue Type: Improvement Components: replication Affects Versions: 0.8.1.1 Reporter: Sam Meder Assignee: Sriharsha Chintalapani Labels: newbie++ Fix For: 0.8.3 The current replica fetcher thread will retry in a tight loop if any error occurs during the fetch call. For example, we've seen cases where the fetch continuously throws a connection refused exception leading to several replica fetcher threads that spin in a pretty tight loop. To a much lesser degree this is also an issue in the consumer fetcher thread, although the fact that erroring partitions are removed so a leader can be re-discovered helps some. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [VOTE] 0.8.2.0 Candidate 1
Thanks for reporting this. I will remove that option in RC2. Jun On Thu, Jan 15, 2015 at 5:21 AM, Jaikiran Pai jai.forums2...@gmail.com wrote: I just downloaded the Kafka binary and am trying this on my 32 bit JVM (Java 7)? Trying to start Zookeeper or Kafka server keeps failing with Unrecognized VM option 'UseCompressedOops': ./zookeeper-server-start.sh ../config/zookeeper.properties Unrecognized VM option 'UseCompressedOops' Error: Could not create the Java Virtual Machine. Error: A fatal exception has occurred. Program will exit. Same with the Kafka server startup scripts. My Java version is: java version 1.7.0_71 Java(TM) SE Runtime Environment (build 1.7.0_71-b14) Java HotSpot(TM) Server VM (build 24.71-b01, mixed mode) Should there be a check in the script, before adding this option? -Jaikiran On Wednesday 14 January 2015 10:08 PM, Jun Rao wrote: + users mailing list. It would be great if people can test this out and report any blocker issues. Thanks, Jun On Tue, Jan 13, 2015 at 7:16 PM, Jun Rao j...@confluent.io wrote: This is the first candidate for release of Apache Kafka 0.8.2.0. There has been some changes since the 0.8.2 beta release, especially in the new java producer api and jmx mbean names. It would be great if people can test this out thoroughly. We are giving people 10 days for testing and voting. Release Notes for the 0.8.2.0 release *https://people.apache.org/~junrao/kafka-0.8.2.0- candidate1/RELEASE_NOTES.html https://people.apache.org/~junrao/kafka-0.8.2.0- candidate1/RELEASE_NOTES.html* *** Please download, test and vote by Friday, Jan 23h, 7pm PT Kafka's KEYS file containing PGP keys we use to sign the release: *https://people.apache.org/~junrao/kafka-0.8.2.0-candidate1/KEYS https://people.apache.org/~junrao/kafka-0.8.2.0-candidate1/KEYS* in addition to the md5, sha1 and sha2 (SHA256) checksum. * Release artifacts to be voted upon (source and binary): *https://people.apache.org/~junrao/kafka-0.8.2.0-candidate1/ https://people.apache.org/~junrao/kafka-0.8.2.0-candidate1/* * Maven artifacts to be voted upon prior to release: *https://people.apache.org/~junrao/kafka-0.8.2.0- candidate1/maven_staging/ https://people.apache.org/~junrao/kafka-0.8.2.0- candidate1/maven_staging/* * scala-doc *https://people.apache.org/~junrao/kafka-0.8.2.0- candidate1/scaladoc/#package https://people.apache.org/~junrao/kafka-0.8.2.0- candidate1/scaladoc/#package* * java-doc *https://people.apache.org/~junrao/kafka-0.8.2.0-candidate1/javadoc/ https://people.apache.org/~junrao/kafka-0.8.2.0-candidate1/javadoc/* * The tag to be voted upon (off the 0.8.2 branch) is the 0.8.2.0 tag *https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h= b0c7d579f8aeb5750573008040a42b7377a651d5 https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h= b0c7d579f8aeb5750573008040a42b7377a651d5* /*** Thanks, Jun
[jira] [Commented] (KAFKA-1577) Exception in ConnectionQuotas while shutting down
[ https://issues.apache.org/jira/browse/KAFKA-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278873#comment-14278873 ] Joel Koshy commented on KAFKA-1577: --- [~german.borbolla] can you make sure you don't have a stray kafka jar in your classpath? I have noticed this when switching across git hashes that include changes to our build mechanism. Do a gradlew clean and ensure that find . -name kafka*.jar returns nothing and then rebuild before trying to reproduce this. Exception in ConnectionQuotas while shutting down - Key: KAFKA-1577 URL: https://issues.apache.org/jira/browse/KAFKA-1577 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8.2 Reporter: Joel Koshy Assignee: Sriharsha Chintalapani Priority: Blocker Labels: newbie Fix For: 0.8.2 Attachments: KAFKA-1577.patch, KAFKA-1577.patch, KAFKA-1577_2014-08-20_19:57:44.patch, KAFKA-1577_2014-08-26_07:33:13.patch, KAFKA-1577_2014-09-26_19:13:05.patch, KAFKA-1577_check_counter_before_decrementing.patch, kafka-logs.tar.gz {code} [2014-08-07 19:38:08,228] ERROR Uncaught exception in thread 'kafka-network-thread-9092-0': (kafka.utils.Utils$) java.util.NoSuchElementException: None.get at scala.None$.get(Option.scala:185) at scala.None$.get(Option.scala:183) at kafka.network.ConnectionQuotas.dec(SocketServer.scala:471) at kafka.network.AbstractServerThread.close(SocketServer.scala:158) at kafka.network.AbstractServerThread.close(SocketServer.scala:150) at kafka.network.AbstractServerThread.closeAll(SocketServer.scala:171) at kafka.network.Processor.run(SocketServer.scala:338) at java.lang.Thread.run(Thread.java:662) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (KAFKA-595) Decouple producer side compression from server-side compression.
[ https://issues.apache.org/jira/browse/KAFKA-595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Koshy resolved KAFKA-595. -- Resolution: Implemented Assignee: Manikumar Reddy Yes I think we can close this. Decouple producer side compression from server-side compression. Key: KAFKA-595 URL: https://issues.apache.org/jira/browse/KAFKA-595 Project: Kafka Issue Type: Improvement Affects Versions: 0.8.0 Reporter: Neha Narkhede Assignee: Manikumar Reddy Labels: feature In 0.7 Kafka always appended messages to the log using whatever compression codec the client used. In 0.8, after the KAFKA-506 patch, the master always recompresses the message before appending to the log to assign ids. Currently the server uses a funky heuristic to choose a compression codec based on the codecs the producer used. This doesn't actually make that much sense. It would be better for the server to have its own compression (a global default and per-topic override) that specified the compression codec, and have the server always recompress with this codec regardless of the original codec. Compression currently happens in kafka.log.Log.assignOffsets (perhaps should be renamed if it takes on compression as an official responsibility instead of a side-effect). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1507) Using GetOffsetShell against non-existent topic creates the topic unintentionally
[ https://issues.apache.org/jira/browse/KAFKA-1507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278907#comment-14278907 ] Sriharsha Chintalapani commented on KAFKA-1507: --- [~junrao] [~nehanarkhede] Is this JIRA that can be included in 0.8.3 or 0.9.0 . If so I'll do an upmerge and resend the patch. Thanks. Using GetOffsetShell against non-existent topic creates the topic unintentionally - Key: KAFKA-1507 URL: https://issues.apache.org/jira/browse/KAFKA-1507 Project: Kafka Issue Type: Bug Affects Versions: 0.8.1.1 Environment: centos Reporter: Luke Forehand Assignee: Sriharsha Chintalapani Priority: Minor Labels: newbie Attachments: KAFKA-1507.patch, KAFKA-1507.patch, KAFKA-1507_2014-07-22_10:27:45.patch, KAFKA-1507_2014-07-23_17:07:20.patch, KAFKA-1507_2014-08-12_18:09:06.patch, KAFKA-1507_2014-08-22_11:06:38.patch, KAFKA-1507_2014-08-22_11:08:51.patch A typo in using GetOffsetShell command can cause a topic to be created which cannot be deleted (because deletion is still in progress) ./kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list kafka10:9092,kafka11:9092,kafka12:9092,kafka13:9092 --topic typo --time 1 ./kafka-topics.sh --zookeeper stormqa1/kafka-prod --describe --topic typo Topic:typo PartitionCount:8ReplicationFactor:1 Configs: Topic: typo Partition: 0Leader: 10 Replicas: 10 Isr: 10 ... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Need some pointers to writing (real) tests
The integration tests for the producer are in with the server since the dependency is that the server depends on the clients rather than vice versa). Only the mock tests are with the clients. You should be able to add to one of the tests in core/src/test/scala/integration/kafka/api/Producer* On Wed, Jan 14, 2015 at 11:07 PM, Jaikiran Pai jai.forums2...@gmail.com wrote: I have been looking at some unassigned JIRAs to work on during some spare time and found this one https://issues.apache.org/jira/browse/KAFKA-1837. As I note in that JIRA, I can see why this happens and have a potential fix for it. But to first reproduce the issue and then verify the fix, I have been attempting a testcase (in the clients). Some of the tests that are already present (like SenderTest) use MockProducer which won't be relevant in testing this issue, from what I see. So I need some inputs or pointers to create a test which will use the real KafkaProducer/Sender/NetworkClient. My initial attempt at this uses the TestUtils to create a (dummy) cluster, and that one fails for obvious reasons (the client not receiving a metadata update over the wire from the server): @Test public void testFailedSend() throws Exception { final TopicPartition tp = new TopicPartition(test, 0); final String producedValue = foobar; final ProducerRecord product = new ProducerRecord(tp.topic(), producedValue); final Cluster cluster = TestUtils.singletonCluster(test, 1); final Node node = this.cluster.nodes().get(0); final Properties kakfaProducerConfigs = new Properties(); kakfaProducerConfigs.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, node.host() + : + node.port()); kakfaProducerConfigs.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName()); kakfaProducerConfigs.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName()); final Producer producer = new KafkaProducer(kakfaProducerConfigs); // This times out waiting for a metadata update from the server for the cluster (because there isn't really any real server around) final FutureRecordMetadata futureAck = producer.send(product); Any pointers to existing tests? -Jaikiran
[jira] [Commented] (KAFKA-1333) Add consumer co-ordinator module to the server
[ https://issues.apache.org/jira/browse/KAFKA-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278855#comment-14278855 ] Jay Kreps commented on KAFKA-1333: -- Hey [~abiletskyi] A relatively full-fledged Java consumer has been posted with stubs for the new APIs on KAFKA-1760 but none of the server-side work has been started as far as I know... Add consumer co-ordinator module to the server -- Key: KAFKA-1333 URL: https://issues.apache.org/jira/browse/KAFKA-1333 Project: Kafka Issue Type: Sub-task Components: consumer Affects Versions: 0.9.0 Reporter: Neha Narkhede Assignee: Guozhang Wang Scope of this JIRA is to just add a consumer co-ordinator module that do the following: 1) coordinator start-up, metadata initialization 2) simple join group handling (just updating metadata, no failure detection / rebalancing): this should be sufficient for consumers doing self offset / partition management. Offset manager will still run side-by-side with the coordinator in this JIRA, and we will merge it in KAFKA-1740. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1333) Add consumer co-ordinator module to the server
[ https://issues.apache.org/jira/browse/KAFKA-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278920#comment-14278920 ] Jay Kreps commented on KAFKA-1333: -- Woooh!!! :-) Add consumer co-ordinator module to the server -- Key: KAFKA-1333 URL: https://issues.apache.org/jira/browse/KAFKA-1333 Project: Kafka Issue Type: Sub-task Components: consumer Affects Versions: 0.9.0 Reporter: Neha Narkhede Assignee: Guozhang Wang Scope of this JIRA is to just add a consumer co-ordinator module that do the following: 1) coordinator start-up, metadata initialization 2) simple join group handling (just updating metadata, no failure detection / rebalancing): this should be sufficient for consumers doing self offset / partition management. Offset manager will still run side-by-side with the coordinator in this JIRA, and we will merge it in KAFKA-1740. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
RE: [kafka-clients] Re: Heads up: KAFKA-1697 - remove code related to ack1 on the broker
Next time the protocol is evolved and new error codes can be introduced, would it make sense to add a new one called Deprecated (or Deprecation or DeprecatedOperation or whatever sounds best)? This would act as a more precise form of Unknown error. It could help identify what the problem is more easily when debugging clients. Of course, this is the kind of lever one would prefer never pulling, but when you need it, you're better off having it than not, and if you end up having it and never using it, it does not do much harm either. -- Felix GV Data Infrastructure Engineer Distributed Data Systems LinkedIn f...@linkedin.commailto:f...@linkedin.com linkedin.com/in/felixgvhttp://linkedin.com/in/felixgv From: kafka-clie...@googlegroups.com [kafka-clie...@googlegroups.com] on behalf of Magnus Edenhill [mag...@edenhill.se] Sent: Thursday, January 15, 2015 10:40 AM To: dev@kafka.apache.org Cc: kafka-clie...@googlegroups.com Subject: [kafka-clients] Re: Heads up: KAFKA-1697 - remove code related to ack1 on the broker I very much agree on what Joe is saying, let's use the version field as intended and be very strict with not removing nor altering existing behaviour without bumping the version. Old API versions could be deprecated (documentation only?) immediately and removed completely in the next minor version bump (0.8-0.9). An API to query supported API versions would be a good addition in the long run but doesn't help current clients much as such a request to an older broker version will kill the connection without any error reporting to the client, thus making it rather useless in the short term. Regards, Magnus -- You received this message because you are subscribed to the Google Groups kafka-clients group. To unsubscribe from this group and stop receiving emails from it, send an email to kafka-clients+unsubscr...@googlegroups.commailto:kafka-clients+unsubscr...@googlegroups.com. To post to this group, send email to kafka-clie...@googlegroups.commailto:kafka-clie...@googlegroups.com. Visit this group at http://groups.google.com/group/kafka-clients. To view this discussion on the web visit https://groups.google.com/d/msgid/kafka-clients/CAHCQUcBtJ1nXi5_dEaHyR2QcRycQHh03rUCY%2BRo2Ussg9kM6UQ%40mail.gmail.comhttps://groups.google.com/d/msgid/kafka-clients/CAHCQUcBtJ1nXi5_dEaHyR2QcRycQHh03rUCY%2BRo2Ussg9kM6UQ%40mail.gmail.com?utm_medium=emailutm_source=footer. For more options, visit https://groups.google.com/d/optout.
Re: [kafka-clients] Re: Heads up: KAFKA-1697 - remove code related to ack1 on the broker
Is the protocol bump caused by the behavior change or the new error code? 1) IMO, error_codes are data, and clients can expect to receive errors that they don't understand (i.e. unknown errors). AFAIK, clients don't break on unknown errors, they are simple more challenging to debug. If we document the new behavior, then its definitely debuggable and fixable. 2) The behavior change is basically a deprecation - i.e. acks 1 were never documented, and are not supported by Kafka clients starting with version 0.8.2. I'm not sure this requires a protocol bump either, although its a better case than new error codes. Thanks, Gwen On Thu, Jan 15, 2015 at 10:10 AM, Joe Stein joe.st...@stealth.ly wrote: Looping in the mailing list that the client developers live on because they are all not on dev (though they should be if they want to be helping to build the best client libraries they can). I whole hardily believe that we need to not break existing functionality of the client protocol, ever. There are many reasons for this and we have other threads on the mailing list where we are discussing that topic (no pun intended) that I don't want to re-hash here. If we change wire protocol functionality OR the binary format (either) we must bump version AND treat version as a feature flag with backward compatibility support until it is deprecated for some time for folks to deal with it. match version = { case 0: keepDoingWhatWeWereDoing() case 1: doNewStuff() case 2: doEvenMoreNewStuff() } has to be a practice we adopt imho ... I know feature flags can be construed as messy code but I am eager to hear another (better? different?) solution to this. If we don't do a feature flag like this specifically with this change then what happens is that someone upgrades their brokers with a rolling restart in 0.8.3 and every single one of their producer requests start to fail and they have a major production outage. k I do 100% agree that 1 makes no sense and we *REALLY* need people to start using 0,1,-1 but we need to-do that in a way that is going to work for everyone. Old producers and consumers must keep working with new brokers and if we are not going to support that then I am unclear what the use of version is based on our original intentions of having it because of the 0.7=-0.8. We said no more breaking changes when we did that. - Joe Stein On Thu, Jan 15, 2015 at 12:38 PM, Ewen Cheslack-Postava e...@confluent.io wrote: Right, so this looks like it could create an issue similar to what's currently being discussed in https://issues.apache.org/jira/browse/KAFKA-1649 where users now get errors under conditions when they previously wouldn't. Old clients won't even know about the error code, so besides failing they won't even be able to log any meaningful error messages. I think there are two options for compatibility: 1. An alternative change is to remove the ack 1 code, but silently upgrade requests with acks 1 to acks = -1. This isn't the same as other changes to behavior since the interaction between the client and server remains the same, no error codes change, etc. The client might just see some increased latency since the message might need to be replicated to more brokers than they requested. 2. Split this into two patches, one that bumps the protocol version on that message to include the new error code but maintains both old (now deprecated) and new behavior, then a second that would be applied in a later release that removes the old protocol + code for handling acks 1. 2 is probably the right thing to do. If we specify the release when we'll remove the deprecated protocol at the time of deprecation it makes things a lot easier for people writing non-java clients and could give users better predictability (e.g. if clients are at most 1 major release behind brokers, they'll remain compatible but possibly use deprecated features). On Wed, Jan 14, 2015 at 3:51 PM, Gwen Shapira gshap...@cloudera.com wrote: Hi Kafka Devs, We are working on KAFKA-1697 - remove code related to ack1 on the broker. Per Neha's suggestion, I'd like to give everyone a heads up on what these changes mean. Once this patch is included, any produce requests that include request.required.acks 1 will result in an exception. This will be InvalidRequiredAcks in new versions (0.8.3 and up, I assume) and UnknownException in existing versions (sorry, but I can't add error codes retroactively). This behavior is already enforced by 0.8.2 producers (sync and new), but we expect impact on users with older producers that relied on acks 1 and external clients (i.e python, go, etc). Users who relied on acks 1 are expected to switch to using acks = -1 and a min.isr parameter than matches their user case. This change was discussed in the past in the context of KAFKA-1555 (min.isr), but let us know if you have any questions or
Re: [kafka-clients] Re: Heads up: KAFKA-1697 - remove code related to ack1 on the broker
clients don't break on unknown errors maybe true for the official java clients, but I dont think the assumption holds true for community-maintained clients and users of those clients. kafka-python generally follows the fail-fast philosophy and raises an exception on any unrecognized error code in any server response. in this case, kafka-python allows users to set their own required-acks policy when creating a producer instance. It is possible that users of kafka-python have deployed producer code that uses ack1 -- perhaps in production environments -- and for those users the new error code will crash their producer code. I would not be surprised if the same were true of other community clients. *one reason for the fail-fast approach is that there isn't great documentation on what errors to expect for each request / response -- so we use failures to alert that some error case is not handled properly. and because of that, introducing new error cases without bumping the api version is likely to cause those errors to get raised/thrown all the way back up to the user. of course we (client maintainers) can fix the issues in the client libraries and suggest users upgrade, but it's not the ideal situation. long-winded way of saying: I agree w/ Joe. -Dana On Thu, Jan 15, 2015 at 12:07 PM, Gwen Shapira gshap...@cloudera.com wrote: Is the protocol bump caused by the behavior change or the new error code? 1) IMO, error_codes are data, and clients can expect to receive errors that they don't understand (i.e. unknown errors). AFAIK, clients don't break on unknown errors, they are simple more challenging to debug. If we document the new behavior, then its definitely debuggable and fixable. 2) The behavior change is basically a deprecation - i.e. acks 1 were never documented, and are not supported by Kafka clients starting with version 0.8.2. I'm not sure this requires a protocol bump either, although its a better case than new error codes. Thanks, Gwen On Thu, Jan 15, 2015 at 10:10 AM, Joe Stein joe.st...@stealth.ly wrote: Looping in the mailing list that the client developers live on because they are all not on dev (though they should be if they want to be helping to build the best client libraries they can). I whole hardily believe that we need to not break existing functionality of the client protocol, ever. There are many reasons for this and we have other threads on the mailing list where we are discussing that topic (no pun intended) that I don't want to re-hash here. If we change wire protocol functionality OR the binary format (either) we must bump version AND treat version as a feature flag with backward compatibility support until it is deprecated for some time for folks to deal with it. match version = { case 0: keepDoingWhatWeWereDoing() case 1: doNewStuff() case 2: doEvenMoreNewStuff() } has to be a practice we adopt imho ... I know feature flags can be construed as messy code but I am eager to hear another (better? different?) solution to this. If we don't do a feature flag like this specifically with this change then what happens is that someone upgrades their brokers with a rolling restart in 0.8.3 and every single one of their producer requests start to fail and they have a major production outage. k I do 100% agree that 1 makes no sense and we *REALLY* need people to start using 0,1,-1 but we need to-do that in a way that is going to work for everyone. Old producers and consumers must keep working with new brokers and if we are not going to support that then I am unclear what the use of version is based on our original intentions of having it because of the 0.7=-0.8. We said no more breaking changes when we did that. - Joe Stein On Thu, Jan 15, 2015 at 12:38 PM, Ewen Cheslack-Postava e...@confluent.io wrote: Right, so this looks like it could create an issue similar to what's currently being discussed in https://issues.apache.org/jira/browse/KAFKA-1649 where users now get errors under conditions when they previously wouldn't. Old clients won't even know about the error code, so besides failing they won't even be able to log any meaningful error messages. I think there are two options for compatibility: 1. An alternative change is to remove the ack 1 code, but silently upgrade requests with acks 1 to acks = -1. This isn't the same as other changes to behavior since the interaction between the client and server remains the same, no error codes change, etc. The client might just see some increased latency since the message might need to be replicated to more brokers than they requested. 2. Split this into two patches, one that bumps the protocol version on that message to include the new error code but maintains both old (now deprecated) and new behavior, then a second that would be applied in a later release
[jira] [Commented] (KAFKA-1476) Get a list of consumer groups
[ https://issues.apache.org/jira/browse/KAFKA-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279407#comment-14279407 ] Onur Karaman commented on KAFKA-1476: - Updated reviewboard https://reviews.apache.org/r/29831/diff/ against branch origin/trunk Get a list of consumer groups - Key: KAFKA-1476 URL: https://issues.apache.org/jira/browse/KAFKA-1476 Project: Kafka Issue Type: Wish Components: tools Affects Versions: 0.8.1.1 Reporter: Ryan Williams Assignee: Balaji Seshadri Labels: newbie Fix For: 0.9.0 Attachments: ConsumerCommand.scala, KAFKA-1476-LIST-GROUPS.patch, KAFKA-1476-RENAME.patch, KAFKA-1476-REVIEW-COMMENTS.patch, KAFKA-1476.patch, KAFKA-1476.patch, KAFKA-1476.patch, KAFKA-1476.patch, KAFKA-1476_2014-11-10_11:58:26.patch, KAFKA-1476_2014-11-10_12:04:01.patch, KAFKA-1476_2014-11-10_12:06:35.patch, KAFKA-1476_2014-12-05_12:00:12.patch, KAFKA-1476_2015-01-12_16:22:26.patch, KAFKA-1476_2015-01-12_16:31:20.patch, KAFKA-1476_2015-01-13_10:36:18.patch, KAFKA-1476_2015-01-15_14:30:04.patch It would be useful to have a way to get a list of consumer groups currently active via some tool/script that ships with kafka. This would be helpful so that the system tools can be explored more easily. For example, when running the ConsumerOffsetChecker, it requires a group option bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --topic test --group ? But, when just getting started with kafka, using the console producer and consumer, it is not clear what value to use for the group option. If a list of consumer groups could be listed, then it would be clear what value to use. Background: http://mail-archives.apache.org/mod_mbox/kafka-users/201405.mbox/%3cCAOq_b1w=slze5jrnakxvak0gu9ctdkpazak1g4dygvqzbsg...@mail.gmail.com%3e -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1476) Get a list of consumer groups
[ https://issues.apache.org/jira/browse/KAFKA-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Onur Karaman updated KAFKA-1476: Attachment: KAFKA-1476_2015-01-15_14:30:04.patch Get a list of consumer groups - Key: KAFKA-1476 URL: https://issues.apache.org/jira/browse/KAFKA-1476 Project: Kafka Issue Type: Wish Components: tools Affects Versions: 0.8.1.1 Reporter: Ryan Williams Assignee: Balaji Seshadri Labels: newbie Fix For: 0.9.0 Attachments: ConsumerCommand.scala, KAFKA-1476-LIST-GROUPS.patch, KAFKA-1476-RENAME.patch, KAFKA-1476-REVIEW-COMMENTS.patch, KAFKA-1476.patch, KAFKA-1476.patch, KAFKA-1476.patch, KAFKA-1476.patch, KAFKA-1476_2014-11-10_11:58:26.patch, KAFKA-1476_2014-11-10_12:04:01.patch, KAFKA-1476_2014-11-10_12:06:35.patch, KAFKA-1476_2014-12-05_12:00:12.patch, KAFKA-1476_2015-01-12_16:22:26.patch, KAFKA-1476_2015-01-12_16:31:20.patch, KAFKA-1476_2015-01-13_10:36:18.patch, KAFKA-1476_2015-01-15_14:30:04.patch It would be useful to have a way to get a list of consumer groups currently active via some tool/script that ships with kafka. This would be helpful so that the system tools can be explored more easily. For example, when running the ConsumerOffsetChecker, it requires a group option bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --topic test --group ? But, when just getting started with kafka, using the console producer and consumer, it is not clear what value to use for the group option. If a list of consumer groups could be listed, then it would be clear what value to use. Background: http://mail-archives.apache.org/mod_mbox/kafka-users/201405.mbox/%3cCAOq_b1w=slze5jrnakxvak0gu9ctdkpazak1g4dygvqzbsg...@mail.gmail.com%3e -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1866) LogStartOffset gauge throws exceptions after log.delete()
[ https://issues.apache.org/jira/browse/KAFKA-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279412#comment-14279412 ] Gian Merlino commented on KAFKA-1866: - At least, I think it has that benefit. I haven't actually tested it yet :) LogStartOffset gauge throws exceptions after log.delete() - Key: KAFKA-1866 URL: https://issues.apache.org/jira/browse/KAFKA-1866 Project: Kafka Issue Type: Bug Reporter: Gian Merlino The LogStartOffset gauge does logSegments.head.baseOffset, which throws NoSuchElementException on an empty list, which can occur after a delete() of the log. This makes life harder for custom MetricsReporters, since they have to deal with .value() possibly throwing an exception. Locally we're dealing with this by having Log.delete() also call removeMetric on all the gauges. That also has the benefit of not having a bunch of metrics floating around for logs that the broker is not actually handling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-1866) LogStartOffset gauge throws exceptions after log.delete()
Gian Merlino created KAFKA-1866: --- Summary: LogStartOffset gauge throws exceptions after log.delete() Key: KAFKA-1866 URL: https://issues.apache.org/jira/browse/KAFKA-1866 Project: Kafka Issue Type: Bug Reporter: Gian Merlino The LogStartOffset gauge does logSegments.head.baseOffset, which throws NoSuchElementException on an empty list, which can occur after a delete() of the log. This makes life harder for custom MetricsReporters, since they have to deal with .value() possibly throwing an exception. Locally we're dealing with this by having Log.delete() also call removeMetric on all the gauges. That also has the benefit of not having a bunch of metrics floating around for logs that the broker is not actually handling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 29831: Patch for KAFKA-1476
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29831/ --- (Updated Jan. 15, 2015, 10:30 p.m.) Review request for kafka. Bugs: KAFKA-1476 https://issues.apache.org/jira/browse/KAFKA-1476 Repository: kafka Description --- Merged in work for KAFKA-1476 and sub-task KAFKA-1826 Diffs (updated) - bin/kafka-consumer-groups.sh PRE-CREATION core/src/main/scala/kafka/admin/AdminUtils.scala 28b12c7b89a56c113b665fbde1b95f873f8624a3 core/src/main/scala/kafka/admin/ConsumerGroupCommand.scala PRE-CREATION core/src/main/scala/kafka/utils/ZkUtils.scala c14bd455b6642f5e6eb254670bef9f57ae41d6cb core/src/test/scala/unit/kafka/admin/DeleteConsumerGroupTest.scala PRE-CREATION core/src/test/scala/unit/kafka/utils/TestUtils.scala ac15d34425795d5be20c51b01fa1108bdcd66583 Diff: https://reviews.apache.org/r/29831/diff/ Testing --- Thanks, Onur Karaman
Re: Review Request 29831: Patch for KAFKA-1476
On Jan. 13, 2015, 5:36 a.m., Neha Narkhede wrote: To make review easier, could you add the output of the command for all options for 2 consumer groups that consume 2 or more topics to the JIRA? It will make it easier to review. One thing to watch out for is the ease of scripting the output from this tool. I'd also suggest asking Clark/Tood or one of the SREs to review the output from the tool. I just attached the output from a run through all the commands and options on the JIRA. I excluded the noise coming from SLF4J that I was getting on every command. - Onur --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29831/#review67789 --- On Jan. 15, 2015, 10:30 p.m., Onur Karaman wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29831/ --- (Updated Jan. 15, 2015, 10:30 p.m.) Review request for kafka. Bugs: KAFKA-1476 https://issues.apache.org/jira/browse/KAFKA-1476 Repository: kafka Description --- Merged in work for KAFKA-1476 and sub-task KAFKA-1826 Diffs - bin/kafka-consumer-groups.sh PRE-CREATION core/src/main/scala/kafka/admin/AdminUtils.scala 28b12c7b89a56c113b665fbde1b95f873f8624a3 core/src/main/scala/kafka/admin/ConsumerGroupCommand.scala PRE-CREATION core/src/main/scala/kafka/utils/ZkUtils.scala c14bd455b6642f5e6eb254670bef9f57ae41d6cb core/src/test/scala/unit/kafka/admin/DeleteConsumerGroupTest.scala PRE-CREATION core/src/test/scala/unit/kafka/utils/TestUtils.scala ac15d34425795d5be20c51b01fa1108bdcd66583 Diff: https://reviews.apache.org/r/29831/diff/ Testing --- Thanks, Onur Karaman
[jira] [Comment Edited] (KAFKA-1333) Add consumer co-ordinator module to the server
[ https://issues.apache.org/jira/browse/KAFKA-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279518#comment-14279518 ] Guozhang Wang edited comment on KAFKA-1333 at 1/15/15 11:34 PM: [~abiletskyi] sorry for getting late on this. We are currently making the following progress on KAFKA-1326: 1. I am working on KAFKA-1333, which will define the coordinator module along with its functionality declaration. 2. [~onurkaraman] will work on KAFKA-1334 once it gets unblocked from KAFKA-1333. 3. I am also reviewing KAFKA-1760 for the new client API. At the mean time, you could also take a look at KAFKA-1760 and see if the API definitions fits well for your use case. was (Author: guozhang): [~abiletskyi] sorry for getting late on this. We are currently making the following progress on KAFKA-1326: 1. I am working on KAFKA-1333, which will define the coordinator module along with its functionality declaration. 2. [~onurkaraman] will work on KAFKA-1334 once it gets unblocked from KAFKA-1333. 3. I am also reviewing KAFKA-1670 for the new client API. At the mean time, you could also take a look at KAFKA-1670 and see if the API definitions fits well for your use case. Add consumer co-ordinator module to the server -- Key: KAFKA-1333 URL: https://issues.apache.org/jira/browse/KAFKA-1333 Project: Kafka Issue Type: Sub-task Components: consumer Affects Versions: 0.9.0 Reporter: Neha Narkhede Assignee: Guozhang Wang Scope of this JIRA is to just add a consumer co-ordinator module that do the following: 1) coordinator start-up, metadata initialization 2) simple join group handling (just updating metadata, no failure detection / rebalancing): this should be sufficient for consumers doing self offset / partition management. Offset manager will still run side-by-side with the coordinator in this JIRA, and we will merge it in KAFKA-1740. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1333) Add consumer co-ordinator module to the server
[ https://issues.apache.org/jira/browse/KAFKA-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279518#comment-14279518 ] Guozhang Wang commented on KAFKA-1333: -- [~abiletskyi] sorry for getting late on this. We are currently making the following progress on KAFKA-1326: 1. I am working on KAFKA-1333, which will define the coordinator module along with its functionality declaration. 2. [~onurkaraman] will work on KAFKA-1334 once it gets unblocked from KAFKA-1333. 3. I am also reviewing KAFKA-1670 for the new client API. At the mean time, you could also take a look at KAFKA-1670 and see if the API definitions fits well for your use case. Add consumer co-ordinator module to the server -- Key: KAFKA-1333 URL: https://issues.apache.org/jira/browse/KAFKA-1333 Project: Kafka Issue Type: Sub-task Components: consumer Affects Versions: 0.9.0 Reporter: Neha Narkhede Assignee: Guozhang Wang Scope of this JIRA is to just add a consumer co-ordinator module that do the following: 1) coordinator start-up, metadata initialization 2) simple join group handling (just updating metadata, no failure detection / rebalancing): this should be sufficient for consumers doing self offset / partition management. Offset manager will still run side-by-side with the coordinator in this JIRA, and we will merge it in KAFKA-1740. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Latency Tracking Across All Kafka Component
Hi, At LinkedIn we used an audit module to track the latency / message counts at each tier of the pipeline (for your example it will have the producer / local / central / HDFS tiers). Some details can be found on our recent talk slides (slide 41/42): http://www.slideshare.net/GuozhangWang/apache-kafka-at-linkedin-43307044 This audit is specific to the usage of Avro as our serialization tool though, and we are considering ways to get it generalized hence open-sourced. Guozhang On Mon, Jan 5, 2015 at 3:33 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, That sounds a bit like needing a full, cross-app, cross-network transaction/call tracing, and not something specific or limited to Kafka, doesn't it? Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Mon, Jan 5, 2015 at 2:43 PM, Bhavesh Mistry mistry.p.bhav...@gmail.com wrote: Hi Kafka Team/Users, We are using Linked-in Kafka data pipe-line end-to-end. Producer(s) -Local DC Brokers - MM - Central brokers - Camus Job - HDFS This is working out very well for us, but we need to have visibility of latency at each layer (Local DC Brokers - MM - Central brokers - Camus Job - HDFS). Our events are time-based (time event was produce). Is there any feature or any audit trail mentioned at ( https://github.com/linkedin/camus/) ? But, I would like to know in-between latency and time event spent in each hope? So, we do not know where is problem and what t o optimize ? Any of this cover in 0.9.0 or any other version of upcoming Kafka release ? How might we achive this latency tracking across all components ? Thanks, Bhavesh -- -- Guozhang
Re: [kafka-clients] Re: Heads up: KAFKA-1697 - remove code related to ack1 on the broker
Gwen, I think the only option that wouldn't require a protocol version change is the one where acks 1 is converted to acks = -1 since it's the only one that doesn't potentially break older clients. The protocol guide says that the expected upgrade path is servers first, then clients, so old clients, including non-java clients, that may be using acks 1 should be able to work with a new broker version. It's more work, but I think dealing with the protocol change is the right thing to do since it eventually gets us to the behavior I think is better -- the broker should reject requests with invalid values. I think Joe and I were basically in agreement. In my mind the major piece missing from his description is how long we're going to maintain his case 0 behavior. It's impractical to maintain old versions forever, but it sounds like there hasn't been a decision on how long to maintain them. Maybe that's another item to add to KIPs -- protocol versions and behavior need to be listed as deprecated and the earliest version in which they'll be removed should be specified so users can understand which versions are guaranteed to be compatible, even if they're using (well-written) non-java clients. -Ewen On Thu, Jan 15, 2015 at 12:52 PM, Dana Powers dana.pow...@gmail.com wrote: clients don't break on unknown errors maybe true for the official java clients, but I dont think the assumption holds true for community-maintained clients and users of those clients. kafka-python generally follows the fail-fast philosophy and raises an exception on any unrecognized error code in any server response. in this case, kafka-python allows users to set their own required-acks policy when creating a producer instance. It is possible that users of kafka-python have deployed producer code that uses ack1 -- perhaps in production environments -- and for those users the new error code will crash their producer code. I would not be surprised if the same were true of other community clients. *one reason for the fail-fast approach is that there isn't great documentation on what errors to expect for each request / response -- so we use failures to alert that some error case is not handled properly. and because of that, introducing new error cases without bumping the api version is likely to cause those errors to get raised/thrown all the way back up to the user. of course we (client maintainers) can fix the issues in the client libraries and suggest users upgrade, but it's not the ideal situation. long-winded way of saying: I agree w/ Joe. -Dana On Thu, Jan 15, 2015 at 12:07 PM, Gwen Shapira gshap...@cloudera.com wrote: Is the protocol bump caused by the behavior change or the new error code? 1) IMO, error_codes are data, and clients can expect to receive errors that they don't understand (i.e. unknown errors). AFAIK, clients don't break on unknown errors, they are simple more challenging to debug. If we document the new behavior, then its definitely debuggable and fixable. 2) The behavior change is basically a deprecation - i.e. acks 1 were never documented, and are not supported by Kafka clients starting with version 0.8.2. I'm not sure this requires a protocol bump either, although its a better case than new error codes. Thanks, Gwen On Thu, Jan 15, 2015 at 10:10 AM, Joe Stein joe.st...@stealth.ly wrote: Looping in the mailing list that the client developers live on because they are all not on dev (though they should be if they want to be helping to build the best client libraries they can). I whole hardily believe that we need to not break existing functionality of the client protocol, ever. There are many reasons for this and we have other threads on the mailing list where we are discussing that topic (no pun intended) that I don't want to re-hash here. If we change wire protocol functionality OR the binary format (either) we must bump version AND treat version as a feature flag with backward compatibility support until it is deprecated for some time for folks to deal with it. match version = { case 0: keepDoingWhatWeWereDoing() case 1: doNewStuff() case 2: doEvenMoreNewStuff() } has to be a practice we adopt imho ... I know feature flags can be construed as messy code but I am eager to hear another (better? different?) solution to this. If we don't do a feature flag like this specifically with this change then what happens is that someone upgrades their brokers with a rolling restart in 0.8.3 and every single one of their producer requests start to fail and they have a major production outage. k I do 100% agree that 1 makes no sense and we *REALLY* need people to start using 0,1,-1 but we need to-do that in a way that is going to work for everyone. Old producers and consumers must keep working with new brokers and if
[jira] [Assigned] (KAFKA-1866) LogStartOffset gauge throws exceptions after log.delete()
[ https://issues.apache.org/jira/browse/KAFKA-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriharsha Chintalapani reassigned KAFKA-1866: - Assignee: Sriharsha Chintalapani LogStartOffset gauge throws exceptions after log.delete() - Key: KAFKA-1866 URL: https://issues.apache.org/jira/browse/KAFKA-1866 Project: Kafka Issue Type: Bug Reporter: Gian Merlino Assignee: Sriharsha Chintalapani The LogStartOffset gauge does logSegments.head.baseOffset, which throws NoSuchElementException on an empty list, which can occur after a delete() of the log. This makes life harder for custom MetricsReporters, since they have to deal with .value() possibly throwing an exception. Locally we're dealing with this by having Log.delete() also call removeMetric on all the gauges. That also has the benefit of not having a bunch of metrics floating around for logs that the broker is not actually handling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1866) LogStartOffset gauge throws exceptions after log.delete()
[ https://issues.apache.org/jira/browse/KAFKA-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279529#comment-14279529 ] Gian Merlino commented on KAFKA-1866: - Anything that calls log.delete() should do it. It looks like the replica-stopping code does this. In our case we started seeing these exceptions after reassigning some partitions between brokers-- perhaps a non-leading partition got moved and was delete()ed. LogStartOffset gauge throws exceptions after log.delete() - Key: KAFKA-1866 URL: https://issues.apache.org/jira/browse/KAFKA-1866 Project: Kafka Issue Type: Bug Reporter: Gian Merlino Assignee: Sriharsha Chintalapani The LogStartOffset gauge does logSegments.head.baseOffset, which throws NoSuchElementException on an empty list, which can occur after a delete() of the log. This makes life harder for custom MetricsReporters, since they have to deal with .value() possibly throwing an exception. Locally we're dealing with this by having Log.delete() also call removeMetric on all the gauges. That also has the benefit of not having a bunch of metrics floating around for logs that the broker is not actually handling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1866) LogStartOffset gauge throws exceptions after log.delete()
[ https://issues.apache.org/jira/browse/KAFKA-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279473#comment-14279473 ] Sriharsha Chintalapani commented on KAFKA-1866: --- [~gian] any steps to reproduce this. Thanks. LogStartOffset gauge throws exceptions after log.delete() - Key: KAFKA-1866 URL: https://issues.apache.org/jira/browse/KAFKA-1866 Project: Kafka Issue Type: Bug Reporter: Gian Merlino The LogStartOffset gauge does logSegments.head.baseOffset, which throws NoSuchElementException on an empty list, which can occur after a delete() of the log. This makes life harder for custom MetricsReporters, since they have to deal with .value() possibly throwing an exception. Locally we're dealing with this by having Log.delete() also call removeMetric on all the gauges. That also has the benefit of not having a bunch of metrics floating around for logs that the broker is not actually handling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1476) Get a list of consumer groups
[ https://issues.apache.org/jira/browse/KAFKA-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Onur Karaman updated KAFKA-1476: Attachment: sample-kafka-consumer-groups-sh-output.txt I've attached a sample run of kafka-consumer-groups.sh covering all the commands and options. I've excluded the following noise from SLF4J coming from every command: {code} SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/vagrant/core/build/dependant-libs-2.10.4/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/vagrant/core/build/dependant-libs-2.10.4/slf4j-log4j12-1.7.6.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] {code} Get a list of consumer groups - Key: KAFKA-1476 URL: https://issues.apache.org/jira/browse/KAFKA-1476 Project: Kafka Issue Type: Wish Components: tools Affects Versions: 0.8.1.1 Reporter: Ryan Williams Assignee: Balaji Seshadri Labels: newbie Fix For: 0.9.0 Attachments: ConsumerCommand.scala, KAFKA-1476-LIST-GROUPS.patch, KAFKA-1476-RENAME.patch, KAFKA-1476-REVIEW-COMMENTS.patch, KAFKA-1476.patch, KAFKA-1476.patch, KAFKA-1476.patch, KAFKA-1476.patch, KAFKA-1476_2014-11-10_11:58:26.patch, KAFKA-1476_2014-11-10_12:04:01.patch, KAFKA-1476_2014-11-10_12:06:35.patch, KAFKA-1476_2014-12-05_12:00:12.patch, KAFKA-1476_2015-01-12_16:22:26.patch, KAFKA-1476_2015-01-12_16:31:20.patch, KAFKA-1476_2015-01-13_10:36:18.patch, KAFKA-1476_2015-01-15_14:30:04.patch, sample-kafka-consumer-groups-sh-output.txt It would be useful to have a way to get a list of consumer groups currently active via some tool/script that ships with kafka. This would be helpful so that the system tools can be explored more easily. For example, when running the ConsumerOffsetChecker, it requires a group option bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --topic test --group ? But, when just getting started with kafka, using the console producer and consumer, it is not clear what value to use for the group option. If a list of consumer groups could be listed, then it would be clear what value to use. Background: http://mail-archives.apache.org/mod_mbox/kafka-users/201405.mbox/%3cCAOq_b1w=slze5jrnakxvak0gu9ctdkpazak1g4dygvqzbsg...@mail.gmail.com%3e -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [DISCUSS] KIPs
Thanks Jay for kicking this off! I think the confluence page you wrote up is a great start. The KIP makes sense to me (at a minimum) if there is going to be any breaking change. This way Kafka can continue to grow and blossom and we have a process in place if we are going to release a thorn ... and when we do it is *CLEAR* about what and why that is. We can easily document which KIPs where involved with this release (which I think should get committed afterwards somewhere so no chance of edit after release). This approach I had been thinking about also allows changes to occur as they do now as long as they are backwards compatible. Hopefully we never need a KIP but when we do the PMC can vote on it and folks can read the release notes with *CLEAR* understanding what is going to break their existing setup... at least that is how I have been thinking about it. Let me know what you think about this base minimum approach... I hadn't really thought of the KIP for *ANY* major change and I have to think more about that. I have some other comments for minor items in the confluence page I will make once I think more about how I feel having a KIP for more than what I was thinking about already. I do think we should have major changes go through confluence, mailing list discuss and JIRA but kind of feel we have been doing that already ... if there are cases where that isn't the case we should highlight and learn from them and formalize that more if need be. /*** Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop / On Thu, Jan 15, 2015 at 1:42 PM, Jay Kreps jay.kr...@gmail.com wrote: The idea of KIPs came up in a previous discussion but there was no real crisp definition of what they were. Here is an attempt at defining a process: https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals The trick here is to have something light-weight enough that it isn't a hassle for small changes, but enough so that changes get the eyeballs of the committers and heavy users. Thoughts? -Jay
[jira] [Assigned] (KAFKA-1864) Revisit defaults for the internal offsets topic
[ https://issues.apache.org/jira/browse/KAFKA-1864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jun Rao reassigned KAFKA-1864: -- Assignee: Jun Rao Revisit defaults for the internal offsets topic --- Key: KAFKA-1864 URL: https://issues.apache.org/jira/browse/KAFKA-1864 Project: Kafka Issue Type: Bug Affects Versions: 0.8.2 Reporter: Neha Narkhede Assignee: Jun Rao Priority: Blocker Realize this is late, but as I was reviewing the 0.8.2 RC, I found that our defaults for the offsets topic are not ideal. The # of partitions currently default to 1 and the replication factor is 1 as well. Granted that the replication factor is changeable in the future (through the admin tool), changing the # of partitions is a very disruptive change. The concern is that this feature is on by default on the server and will be activated the moment the first client turns on kafka based offset storage. My proposal is to change the # of partitions to something large (50 or so) and change the replication factor to min(# of alive brokers, configured replication factor) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [kafka-clients] Re: Heads up: KAFKA-1697 - remove code related to ack1 on the broker
The errors are part of the KIP process now, so I think the clients are safe :) https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals On Thu, Jan 15, 2015 at 5:12 PM, Steve Morin steve.mo...@gmail.com wrote: Agree errors should be part of the protocol On Jan 15, 2015, at 17:59, Gwen Shapira gshap...@cloudera.com wrote: Hi, I got convinced by Joe and Dana that errors are indeed part of the protocol and can't be randomly added. So, it looks like we need to bump version of ProduceRequest in the following way: Version 0 - accept acks 1. I think we should keep the existing behavior too (i.e. not replace it with -1) to avoid surprising clients, but I'm willing to hear other opinions. Version 1 - do not accept acks 1 and return an error. Are we ok with the error I added in KAFKA-1697? We can use something less specific like InvalidRequestParameter. This error can be reused in the future and reduce the need to add errors, but will also be less clear to the client and its users. Maybe even add the error message string to the protocol in addition to the error code? (since we are bumping versions) I think maintaining the old version throughout 0.8.X makes sense. IMO dropping it for 0.9 is feasible, but I'll let client owners help make that call. Am I missing anything? Should I start a KIP? It seems like a KIP-type discussion :) Gwen On Thu, Jan 15, 2015 at 2:31 PM, Ewen Cheslack-Postava e...@confluent.io wrote: Gwen, I think the only option that wouldn't require a protocol version change is the one where acks 1 is converted to acks = -1 since it's the only one that doesn't potentially break older clients. The protocol guide says that the expected upgrade path is servers first, then clients, so old clients, including non-java clients, that may be using acks 1 should be able to work with a new broker version. It's more work, but I think dealing with the protocol change is the right thing to do since it eventually gets us to the behavior I think is better -- the broker should reject requests with invalid values. I think Joe and I were basically in agreement. In my mind the major piece missing from his description is how long we're going to maintain his case 0 behavior. It's impractical to maintain old versions forever, but it sounds like there hasn't been a decision on how long to maintain them. Maybe that's another item to add to KIPs -- protocol versions and behavior need to be listed as deprecated and the earliest version in which they'll be removed should be specified so users can understand which versions are guaranteed to be compatible, even if they're using (well-written) non-java clients. -Ewen On Thu, Jan 15, 2015 at 12:52 PM, Dana Powers dana.pow...@gmail.com wrote: clients don't break on unknown errors maybe true for the official java clients, but I dont think the assumption holds true for community-maintained clients and users of those clients. kafka-python generally follows the fail-fast philosophy and raises an exception on any unrecognized error code in any server response. in this case, kafka-python allows users to set their own required-acks policy when creating a producer instance. It is possible that users of kafka-python have deployed producer code that uses ack1 -- perhaps in production environments -- and for those users the new error code will crash their producer code. I would not be surprised if the same were true of other community clients. *one reason for the fail-fast approach is that there isn't great documentation on what errors to expect for each request / response -- so we use failures to alert that some error case is not handled properly. and because of that, introducing new error cases without bumping the api version is likely to cause those errors to get raised/thrown all the way back up to the user. of course we (client maintainers) can fix the issues in the client libraries and suggest users upgrade, but it's not the ideal situation. long-winded way of saying: I agree w/ Joe. -Dana On Thu, Jan 15, 2015 at 12:07 PM, Gwen Shapira gshap...@cloudera.com wrote: Is the protocol bump caused by the behavior change or the new error code? 1) IMO, error_codes are data, and clients can expect to receive errors that they don't understand (i.e. unknown errors). AFAIK, clients don't break on unknown errors, they are simple more challenging to debug. If we document the new behavior, then its definitely debuggable and fixable. 2) The behavior change is basically a deprecation - i.e. acks 1 were never documented, and are not supported by Kafka clients starting with version 0.8.2. I'm not sure this requires a protocol bump either, although its a better case than new error codes. Thanks, Gwen On Thu, Jan 15, 2015 at 10:10 AM, Joe Stein joe.st...@stealth.ly wrote: Looping in the mailing list that the client developers live on because they are all not on
[jira] [Commented] (KAFKA-1476) Get a list of consumer groups
[ https://issues.apache.org/jira/browse/KAFKA-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279715#comment-14279715 ] Neha Narkhede commented on KAFKA-1476: -- Thanks for attaching the output, [~onurkaraman]. Here are some review comments- 1. The error stack trace for describing groups that don't exist is pretty ugly, so let's remove that and output a message that states the group doesn't exist {code} vagrant@worker1:/opt/kafka$ bin/kafka-consumer-groups.sh --zookeeper 192.168.50.11:2181 --describe --group g3 Error while executing consumer group command org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /consumers/g3/owners org.I0Itec.zkclient.exception.ZkNoNodeException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /consumers/g3/owners at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47) at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685) at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:413) at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:409) at kafka.utils.ZkUtils$.getTopicsByConsumerGroup(ZkUtils.scala:758) at kafka.admin.ConsumerGroupCommand$.describe(ConsumerGroupCommand.scala:81) at kafka.admin.ConsumerGroupCommand$.main(ConsumerGroupCommand.scala:56) at kafka.admin.ConsumerGroupCommand.main(ConsumerGroupCommand.scala) Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /consumers/g3/owners at org.apache.zookeeper.KeeperException.create(KeeperException.java:111) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1472) at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1500) at org.I0Itec.zkclient.ZkConnection.getChildren(ZkConnection.java:99) at org.I0Itec.zkclient.ZkClient$2.call(ZkClient.java:416) at org.I0Itec.zkclient.ZkClient$2.call(ZkClient.java:413) at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675) ... 6 more {code} 2. I'm not sure why the header fields are not comma separated but the data is. It will be best to stick to one separator only. So if you pick , as the separator, then the output should change to {code} GROUP, TOPIC, PID, CURRENT, OFFSET, LOG SIZE, LAG, OWNER g2, t2, 0, 1, 1, 0, none {code} 3. For deleting only a specific topic's information, it is sufficient to rely on the user specifying the --topic along with --delete. The --delete-with-topic option seems unnecessary. Get a list of consumer groups - Key: KAFKA-1476 URL: https://issues.apache.org/jira/browse/KAFKA-1476 Project: Kafka Issue Type: Wish Components: tools Affects Versions: 0.8.1.1 Reporter: Ryan Williams Assignee: Balaji Seshadri Labels: newbie Fix For: 0.9.0 Attachments: ConsumerCommand.scala, KAFKA-1476-LIST-GROUPS.patch, KAFKA-1476-RENAME.patch, KAFKA-1476-REVIEW-COMMENTS.patch, KAFKA-1476.patch, KAFKA-1476.patch, KAFKA-1476.patch, KAFKA-1476.patch, KAFKA-1476_2014-11-10_11:58:26.patch, KAFKA-1476_2014-11-10_12:04:01.patch, KAFKA-1476_2014-11-10_12:06:35.patch, KAFKA-1476_2014-12-05_12:00:12.patch, KAFKA-1476_2015-01-12_16:22:26.patch, KAFKA-1476_2015-01-12_16:31:20.patch, KAFKA-1476_2015-01-13_10:36:18.patch, KAFKA-1476_2015-01-15_14:30:04.patch, sample-kafka-consumer-groups-sh-output.txt It would be useful to have a way to get a list of consumer groups currently active via some tool/script that ships with kafka. This would be helpful so that the system tools can be explored more easily. For example, when running the ConsumerOffsetChecker, it requires a group option bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --topic test --group ? But, when just getting started with kafka, using the console producer and consumer, it is not clear what value to use for the group option. If a list of consumer groups could be listed, then it would be clear what value to use. Background: http://mail-archives.apache.org/mod_mbox/kafka-users/201405.mbox/%3cCAOq_b1w=slze5jrnakxvak0gu9ctdkpazak1g4dygvqzbsg...@mail.gmail.com%3e -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Compatibility + Unknown APIs
The hacky method that Dana suggests does not sound too hacky to me actually. Since such scenario will only happen when 1) new clients talk to older server and 2) older clients talk to new server with some APIs deprecated, and correlation_id is always set to meaningful numbers before, old clients will not check for its validity. So we only need to upgrade the clients once for handling -1 correlation_id once at ANY time, while before that happens the old client will just throw SerializationException instead of ERROR: Closing socket for for both cases, which gives them similar semantics. For such situation we do not need to require version bump. On Mon, Jan 12, 2015 at 6:36 PM, Jay Kreps jay.kr...@gmail.com wrote: I totally agree but I still think we shouldn't do it. :-) That change would cause the reimplementation of ALL existing Kafka clients. (You can't chose not to implement a new protocol version or else we are committing to keeping the old version supported both ways on the server forever). The problem it fixes is fairly minor: clients that want to adaptively detect apis. In general I agree this isn't easy to do, but I also don't think it is really recommended. I think it is probably better for clients to just implement against reasonably conservative versions and trust us not to break them going forward. That is simpler and less likely to break. We also haven't actually addressed the issue originally brought up that lead to not doing it--how to interpret and set the top-level error in the presence of nested errors (which exception does the client throw and when). This is kind of icky to, though probably preferable if we were starting over. I see either of these alternatives as imperfect but changing now has a high cost and doesn't really address a top 50 pain point. But I do agree that KIPs would really help draw attention to these kinds of decisions as we make them and help us get serious about sticking with them without having that kind of it sucks but... feeling. -Jay On Mon, Jan 12, 2015 at 5:57 PM, Joe Stein joe.st...@stealth.ly wrote: There are benefits of moving the error code to the response header. 1) I think it is the right thing to-do from an implementation perspective. It makes the most sense. You send a request and you get back a response. The response tells you something is wrong in the header. 2) With such a large change we can make sure we have our solution to solve these issues (see other thread on Compatibility and KIP) setup and in place moving forward. If we can make such a large change then smaller ones should work well too. We could even use this one change as a way to best flush out the way we want to implement it preserving functionality AND adding the new response format. When we release 0.8.3 (assuming this was in there) developers can read KIP-1 (or whatever) and decide if they want to support the version bump required, if not then fine keep working with 0.8.2 and you are good to go. /*** Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop / On Mon, Jan 12, 2015 at 8:37 PM, Jay Kreps jay.kr...@gmail.com wrote: Yeah, adding it to the metadata request probably makes sense. What you describe of making it a per-broker field is technically correct, since each broker could be on a different software version. But I wonder if it might not be more usable to just give back a single list of api versions. This will be more compact and also easier to interpret as a client. An easy implementation of this would be for the broker that answers the metadata request by just giving whatever versions it supports. A slightly better implementation would be for each broker to register what it supports in ZK and have the responding broker give back the intersection (i.e. apis supported by all brokers). Since the broker actually supports multiple versions at the same time this will need to be in the form [ApiId [ApiVersion]]. -Jay On Mon, Jan 12, 2015 at 5:19 PM, Dana Powers dana.pow...@rd.io wrote: Perhaps a bit hacky, but you could also reserve a specific correlationId (maybe -1) to represent errors and send back to the client an UnknownAPIResponse like: Response = -1 UnknownAPIResponse UnknownAPIResponse = originalCorrelationId errorCode The benefit here would be that it does not break the current API and current clients should be able to continue operating as usual as long as they ignore unknown correlationIds and don't use the reserved Id. For clients that want to catch unknownAPI errors, they can handle -1 correlationIds and dispatch as needed. Otherwise
Re: Review Request 29952: Patch for kafka-1864
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29952/#review68395 --- core/src/main/scala/kafka/server/OffsetManager.scala https://reviews.apache.org/r/29952/#comment112588 I'm wondering why you chose to change defaults here and not in KafkaConfig? Unless I'm missing something, it looks like we are defining defaults in two different places, and they don't match any more. - Gwen Shapira On Jan. 16, 2015, 12:52 a.m., Jun Rao wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29952/ --- (Updated Jan. 16, 2015, 12:52 a.m.) Review request for kafka. Bugs: kafka-1864 https://issues.apache.org/jira/browse/kafka-1864 Repository: kafka Description --- create offset topic with a larger replication factor by default Diffs - core/src/main/scala/kafka/server/KafkaApis.scala d626b1710813648524eefa5a3df098393c3e7743 core/src/main/scala/kafka/server/KafkaConfig.scala 6e26c5436feb4629d17f199011f3ebb674aa767f core/src/main/scala/kafka/server/OffsetManager.scala 43eb2a35bb54d32c66cdb94772df657b3a104d1a core/src/test/scala/integration/kafka/api/ProducerFailureHandlingTest.scala 07a7beee9dec733eae943b425ae58c54f08458d8 core/src/test/scala/unit/kafka/server/OffsetCommitTest.scala 4a3a5b264a021e55c39f4d7424ce04ee591503ef Diff: https://reviews.apache.org/r/29952/diff/ Testing --- Thanks, Jun Rao
Re: Review Request 29952: Patch for kafka-1864
On Jan. 16, 2015, 2:54 a.m., Gwen Shapira wrote: core/src/main/scala/kafka/server/OffsetManager.scala, line 77 https://reviews.apache.org/r/29952/diff/1/?file=823279#file823279line77 I'm wondering why you chose to change defaults here and not in KafkaConfig? Unless I'm missing something, it looks like we are defining defaults in two different places, and they don't match any more. KafkaConfig references this value in OffsetManager, right? - Jun --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29952/#review68395 --- On Jan. 16, 2015, 12:52 a.m., Jun Rao wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29952/ --- (Updated Jan. 16, 2015, 12:52 a.m.) Review request for kafka. Bugs: kafka-1864 https://issues.apache.org/jira/browse/kafka-1864 Repository: kafka Description --- create offset topic with a larger replication factor by default Diffs - core/src/main/scala/kafka/server/KafkaApis.scala d626b1710813648524eefa5a3df098393c3e7743 core/src/main/scala/kafka/server/KafkaConfig.scala 6e26c5436feb4629d17f199011f3ebb674aa767f core/src/main/scala/kafka/server/OffsetManager.scala 43eb2a35bb54d32c66cdb94772df657b3a104d1a core/src/test/scala/integration/kafka/api/ProducerFailureHandlingTest.scala 07a7beee9dec733eae943b425ae58c54f08458d8 core/src/test/scala/unit/kafka/server/OffsetCommitTest.scala 4a3a5b264a021e55c39f4d7424ce04ee591503ef Diff: https://reviews.apache.org/r/29952/diff/ Testing --- Thanks, Jun Rao
Re: Review Request 29952: Patch for kafka-1864
On Jan. 16, 2015, 2:54 a.m., Gwen Shapira wrote: core/src/main/scala/kafka/server/OffsetManager.scala, line 77 https://reviews.apache.org/r/29952/diff/1/?file=823279#file823279line77 I'm wondering why you chose to change defaults here and not in KafkaConfig? Unless I'm missing something, it looks like we are defining defaults in two different places, and they don't match any more. Jun Rao wrote: KafkaConfig references this value in OffsetManager, right? Right, sorry - got confused. Typically its the other way around, I think. But this will work too. - Gwen --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29952/#review68395 --- On Jan. 16, 2015, 12:52 a.m., Jun Rao wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29952/ --- (Updated Jan. 16, 2015, 12:52 a.m.) Review request for kafka. Bugs: kafka-1864 https://issues.apache.org/jira/browse/kafka-1864 Repository: kafka Description --- create offset topic with a larger replication factor by default Diffs - core/src/main/scala/kafka/server/KafkaApis.scala d626b1710813648524eefa5a3df098393c3e7743 core/src/main/scala/kafka/server/KafkaConfig.scala 6e26c5436feb4629d17f199011f3ebb674aa767f core/src/main/scala/kafka/server/OffsetManager.scala 43eb2a35bb54d32c66cdb94772df657b3a104d1a core/src/test/scala/integration/kafka/api/ProducerFailureHandlingTest.scala 07a7beee9dec733eae943b425ae58c54f08458d8 core/src/test/scala/unit/kafka/server/OffsetCommitTest.scala 4a3a5b264a021e55c39f4d7424ce04ee591503ef Diff: https://reviews.apache.org/r/29952/diff/ Testing --- Thanks, Jun Rao
Re: kafka consumer
Hi, 1. Not sure if I understand your question.. could you elaborate? 2. Yes, and then the data for that topic will be distributed at the granularity of partitions to your consumers. 3. The default value is set to 60 seconds I believe. You can read the config docs for its semantics here: http://kafka.apache.org/documentation.html#brokerconfigs Guozhang On Sun, Dec 28, 2014 at 6:00 PM, panqing...@163.com panqing...@163.com wrote: HI, I recently learning Kafka, there are several problems 1, ActiveMQ is broker push message, consumer established the messagelistener gets the message, but the message in Kafka are consumer pull from broker, Timing acquisition from brokeror can build the listener on the broker? 2, I now have more than one consumer, to consume the same topic, should put them in the same group? 3, this value should be set to zookeeper.session.timeout.ms how much? 400ms java example, but will appear Unable to connect to zookeeper server within timeout: 400 panqing...@163.com -- -- Guozhang
[jira] [Updated] (KAFKA-1864) Revisit defaults for the internal offsets topic
[ https://issues.apache.org/jira/browse/KAFKA-1864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jun Rao updated KAFKA-1864: --- Attachment: kafka-1864.patch Revisit defaults for the internal offsets topic --- Key: KAFKA-1864 URL: https://issues.apache.org/jira/browse/KAFKA-1864 Project: Kafka Issue Type: Bug Affects Versions: 0.8.2 Reporter: Neha Narkhede Assignee: Jun Rao Priority: Blocker Attachments: kafka-1864.patch Realize this is late, but as I was reviewing the 0.8.2 RC, I found that our defaults for the offsets topic are not ideal. The # of partitions currently default to 1 and the replication factor is 1 as well. Granted that the replication factor is changeable in the future (through the admin tool), changing the # of partitions is a very disruptive change. The concern is that this feature is on by default on the server and will be activated the moment the first client turns on kafka based offset storage. My proposal is to change the # of partitions to something large (50 or so) and change the replication factor to min(# of alive brokers, configured replication factor) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1864) Revisit defaults for the internal offsets topic
[ https://issues.apache.org/jira/browse/KAFKA-1864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279615#comment-14279615 ] Jun Rao commented on KAFKA-1864: Created reviewboard https://reviews.apache.org/r/29952/diff/ against branch origin/0.8.2 Revisit defaults for the internal offsets topic --- Key: KAFKA-1864 URL: https://issues.apache.org/jira/browse/KAFKA-1864 Project: Kafka Issue Type: Bug Affects Versions: 0.8.2 Reporter: Neha Narkhede Assignee: Jun Rao Priority: Blocker Attachments: kafka-1864.patch Realize this is late, but as I was reviewing the 0.8.2 RC, I found that our defaults for the offsets topic are not ideal. The # of partitions currently default to 1 and the replication factor is 1 as well. Granted that the replication factor is changeable in the future (through the admin tool), changing the # of partitions is a very disruptive change. The concern is that this feature is on by default on the server and will be activated the moment the first client turns on kafka based offset storage. My proposal is to change the # of partitions to something large (50 or so) and change the replication factor to min(# of alive brokers, configured replication factor) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1864) Revisit defaults for the internal offsets topic
[ https://issues.apache.org/jira/browse/KAFKA-1864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279616#comment-14279616 ] Jun Rao commented on KAFKA-1864: Attached is a patch. A couple of the unit tests fail because of KAFKA-1867. Fixing KAFKA-1867 is a bit tricky and we probably don't want to do that in 0.8.2. So, patching the unit test by overriding the default value for offsets.topic.replication.factor for now. Revisit defaults for the internal offsets topic --- Key: KAFKA-1864 URL: https://issues.apache.org/jira/browse/KAFKA-1864 Project: Kafka Issue Type: Bug Affects Versions: 0.8.2 Reporter: Neha Narkhede Assignee: Jun Rao Priority: Blocker Attachments: kafka-1864.patch Realize this is late, but as I was reviewing the 0.8.2 RC, I found that our defaults for the offsets topic are not ideal. The # of partitions currently default to 1 and the replication factor is 1 as well. Granted that the replication factor is changeable in the future (through the admin tool), changing the # of partitions is a very disruptive change. The concern is that this feature is on by default on the server and will be activated the moment the first client turns on kafka based offset storage. My proposal is to change the # of partitions to something large (50 or so) and change the replication factor to min(# of alive brokers, configured replication factor) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [kafka-clients] Re: Heads up: KAFKA-1697 - remove code related to ack1 on the broker
Hi, I got convinced by Joe and Dana that errors are indeed part of the protocol and can't be randomly added. So, it looks like we need to bump version of ProduceRequest in the following way: Version 0 - accept acks 1. I think we should keep the existing behavior too (i.e. not replace it with -1) to avoid surprising clients, but I'm willing to hear other opinions. Version 1 - do not accept acks 1 and return an error. Are we ok with the error I added in KAFKA-1697? We can use something less specific like InvalidRequestParameter. This error can be reused in the future and reduce the need to add errors, but will also be less clear to the client and its users. Maybe even add the error message string to the protocol in addition to the error code? (since we are bumping versions) I think maintaining the old version throughout 0.8.X makes sense. IMO dropping it for 0.9 is feasible, but I'll let client owners help make that call. Am I missing anything? Should I start a KIP? It seems like a KIP-type discussion :) Gwen On Thu, Jan 15, 2015 at 2:31 PM, Ewen Cheslack-Postava e...@confluent.io wrote: Gwen, I think the only option that wouldn't require a protocol version change is the one where acks 1 is converted to acks = -1 since it's the only one that doesn't potentially break older clients. The protocol guide says that the expected upgrade path is servers first, then clients, so old clients, including non-java clients, that may be using acks 1 should be able to work with a new broker version. It's more work, but I think dealing with the protocol change is the right thing to do since it eventually gets us to the behavior I think is better -- the broker should reject requests with invalid values. I think Joe and I were basically in agreement. In my mind the major piece missing from his description is how long we're going to maintain his case 0 behavior. It's impractical to maintain old versions forever, but it sounds like there hasn't been a decision on how long to maintain them. Maybe that's another item to add to KIPs -- protocol versions and behavior need to be listed as deprecated and the earliest version in which they'll be removed should be specified so users can understand which versions are guaranteed to be compatible, even if they're using (well-written) non-java clients. -Ewen On Thu, Jan 15, 2015 at 12:52 PM, Dana Powers dana.pow...@gmail.com wrote: clients don't break on unknown errors maybe true for the official java clients, but I dont think the assumption holds true for community-maintained clients and users of those clients. kafka-python generally follows the fail-fast philosophy and raises an exception on any unrecognized error code in any server response. in this case, kafka-python allows users to set their own required-acks policy when creating a producer instance. It is possible that users of kafka-python have deployed producer code that uses ack1 -- perhaps in production environments -- and for those users the new error code will crash their producer code. I would not be surprised if the same were true of other community clients. *one reason for the fail-fast approach is that there isn't great documentation on what errors to expect for each request / response -- so we use failures to alert that some error case is not handled properly. and because of that, introducing new error cases without bumping the api version is likely to cause those errors to get raised/thrown all the way back up to the user. of course we (client maintainers) can fix the issues in the client libraries and suggest users upgrade, but it's not the ideal situation. long-winded way of saying: I agree w/ Joe. -Dana On Thu, Jan 15, 2015 at 12:07 PM, Gwen Shapira gshap...@cloudera.com wrote: Is the protocol bump caused by the behavior change or the new error code? 1) IMO, error_codes are data, and clients can expect to receive errors that they don't understand (i.e. unknown errors). AFAIK, clients don't break on unknown errors, they are simple more challenging to debug. If we document the new behavior, then its definitely debuggable and fixable. 2) The behavior change is basically a deprecation - i.e. acks 1 were never documented, and are not supported by Kafka clients starting with version 0.8.2. I'm not sure this requires a protocol bump either, although its a better case than new error codes. Thanks, Gwen On Thu, Jan 15, 2015 at 10:10 AM, Joe Stein joe.st...@stealth.ly wrote: Looping in the mailing list that the client developers live on because they are all not on dev (though they should be if they want to be helping to build the best client libraries they can). I whole hardily believe that we need to not break existing functionality of the client protocol, ever. There are many reasons for this and we have other threads on the mailing list where we are discussing that topic
[jira] [Created] (KAFKA-1868) ConsoleConsumer shouldn't override dual.commit.enabled to false if not explicitly set
Jun Rao created KAFKA-1868: -- Summary: ConsoleConsumer shouldn't override dual.commit.enabled to false if not explicitly set Key: KAFKA-1868 URL: https://issues.apache.org/jira/browse/KAFKA-1868 Project: Kafka Issue Type: Bug Affects Versions: 0.8.2 Reporter: Jun Rao Assignee: Jun Rao Priority: Blocker Fix For: 0.8.2 In ConsoleConsumer, we override dual.commit.enabled to false if not explicitly set. However, if offset.storage is set to kafka, by default, dual.commit.enabled is set to true and we shouldn't override that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 29953: Patch for kafka-1868
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29953/#review68383 --- Ship it! Ship It! - Joel Koshy On Jan. 16, 2015, 1:02 a.m., Jun Rao wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29953/ --- (Updated Jan. 16, 2015, 1:02 a.m.) Review request for kafka. Bugs: kafka-1868 https://issues.apache.org/jira/browse/kafka-1868 Repository: kafka Description --- remove the overrides Diffs - core/src/main/scala/kafka/tools/ConsoleConsumer.scala 323fc8566d974acc4e5c7d7c2a065794f3b5df4a Diff: https://reviews.apache.org/r/29953/diff/ Testing --- Thanks, Jun Rao
[jira] [Commented] (KAFKA-1864) Revisit defaults for the internal offsets topic
[ https://issues.apache.org/jira/browse/KAFKA-1864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279677#comment-14279677 ] Jun Rao commented on KAFKA-1864: Also, another impact in this patch is that the offset topic may not be guaranteed to be created with the configured offset replication factor since we take the min btw the configured value and the # of live brokers. An alternative is to use negative values as suggested in KAFKA-1846 as the default. Then we can treat the positive values as the hard requirement. Not sure if this will cause more confusing. Revisit defaults for the internal offsets topic --- Key: KAFKA-1864 URL: https://issues.apache.org/jira/browse/KAFKA-1864 Project: Kafka Issue Type: Bug Affects Versions: 0.8.2 Reporter: Neha Narkhede Assignee: Jun Rao Priority: Blocker Attachments: kafka-1864.patch Realize this is late, but as I was reviewing the 0.8.2 RC, I found that our defaults for the offsets topic are not ideal. The # of partitions currently default to 1 and the replication factor is 1 as well. Granted that the replication factor is changeable in the future (through the admin tool), changing the # of partitions is a very disruptive change. The concern is that this feature is on by default on the server and will be activated the moment the first client turns on kafka based offset storage. My proposal is to change the # of partitions to something large (50 or so) and change the replication factor to min(# of alive brokers, configured replication factor) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Review Request 29953: Patch for kafka-1868
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29953/ --- Review request for kafka. Bugs: kafka-1868 https://issues.apache.org/jira/browse/kafka-1868 Repository: kafka Description --- remove the overrides Diffs - core/src/main/scala/kafka/tools/ConsoleConsumer.scala 323fc8566d974acc4e5c7d7c2a065794f3b5df4a Diff: https://reviews.apache.org/r/29953/diff/ Testing --- Thanks, Jun Rao
[jira] [Commented] (KAFKA-1868) ConsoleConsumer shouldn't override dual.commit.enabled to false if not explicitly set
[ https://issues.apache.org/jira/browse/KAFKA-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279626#comment-14279626 ] Jun Rao commented on KAFKA-1868: Created reviewboard https://reviews.apache.org/r/29953/diff/ against branch origin/0.8.2 ConsoleConsumer shouldn't override dual.commit.enabled to false if not explicitly set - Key: KAFKA-1868 URL: https://issues.apache.org/jira/browse/KAFKA-1868 Project: Kafka Issue Type: Bug Affects Versions: 0.8.2 Reporter: Jun Rao Assignee: Jun Rao Priority: Blocker Fix For: 0.8.2 Attachments: kafka-1868.patch In ConsoleConsumer, we override dual.commit.enabled to false if not explicitly set. However, if offset.storage is set to kafka, by default, dual.commit.enabled is set to true and we shouldn't override that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [kafka-clients] Re: Heads up: KAFKA-1697 - remove code related to ack1 on the broker
Agree errors should be part of the protocol On Jan 15, 2015, at 17:59, Gwen Shapira gshap...@cloudera.com wrote: Hi, I got convinced by Joe and Dana that errors are indeed part of the protocol and can't be randomly added. So, it looks like we need to bump version of ProduceRequest in the following way: Version 0 - accept acks 1. I think we should keep the existing behavior too (i.e. not replace it with -1) to avoid surprising clients, but I'm willing to hear other opinions. Version 1 - do not accept acks 1 and return an error. Are we ok with the error I added in KAFKA-1697? We can use something less specific like InvalidRequestParameter. This error can be reused in the future and reduce the need to add errors, but will also be less clear to the client and its users. Maybe even add the error message string to the protocol in addition to the error code? (since we are bumping versions) I think maintaining the old version throughout 0.8.X makes sense. IMO dropping it for 0.9 is feasible, but I'll let client owners help make that call. Am I missing anything? Should I start a KIP? It seems like a KIP-type discussion :) Gwen On Thu, Jan 15, 2015 at 2:31 PM, Ewen Cheslack-Postava e...@confluent.io wrote: Gwen, I think the only option that wouldn't require a protocol version change is the one where acks 1 is converted to acks = -1 since it's the only one that doesn't potentially break older clients. The protocol guide says that the expected upgrade path is servers first, then clients, so old clients, including non-java clients, that may be using acks 1 should be able to work with a new broker version. It's more work, but I think dealing with the protocol change is the right thing to do since it eventually gets us to the behavior I think is better -- the broker should reject requests with invalid values. I think Joe and I were basically in agreement. In my mind the major piece missing from his description is how long we're going to maintain his case 0 behavior. It's impractical to maintain old versions forever, but it sounds like there hasn't been a decision on how long to maintain them. Maybe that's another item to add to KIPs -- protocol versions and behavior need to be listed as deprecated and the earliest version in which they'll be removed should be specified so users can understand which versions are guaranteed to be compatible, even if they're using (well-written) non-java clients. -Ewen On Thu, Jan 15, 2015 at 12:52 PM, Dana Powers dana.pow...@gmail.com wrote: clients don't break on unknown errors maybe true for the official java clients, but I dont think the assumption holds true for community-maintained clients and users of those clients. kafka-python generally follows the fail-fast philosophy and raises an exception on any unrecognized error code in any server response. in this case, kafka-python allows users to set their own required-acks policy when creating a producer instance. It is possible that users of kafka-python have deployed producer code that uses ack1 -- perhaps in production environments -- and for those users the new error code will crash their producer code. I would not be surprised if the same were true of other community clients. *one reason for the fail-fast approach is that there isn't great documentation on what errors to expect for each request / response -- so we use failures to alert that some error case is not handled properly. and because of that, introducing new error cases without bumping the api version is likely to cause those errors to get raised/thrown all the way back up to the user. of course we (client maintainers) can fix the issues in the client libraries and suggest users upgrade, but it's not the ideal situation. long-winded way of saying: I agree w/ Joe. -Dana On Thu, Jan 15, 2015 at 12:07 PM, Gwen Shapira gshap...@cloudera.com wrote: Is the protocol bump caused by the behavior change or the new error code? 1) IMO, error_codes are data, and clients can expect to receive errors that they don't understand (i.e. unknown errors). AFAIK, clients don't break on unknown errors, they are simple more challenging to debug. If we document the new behavior, then its definitely debuggable and fixable. 2) The behavior change is basically a deprecation - i.e. acks 1 were never documented, and are not supported by Kafka clients starting with version 0.8.2. I'm not sure this requires a protocol bump either, although its a better case than new error codes. Thanks, Gwen On Thu, Jan 15, 2015 at 10:10 AM, Joe Stein joe.st...@stealth.ly wrote: Looping in the mailing list that the client developers live on because they are all not on dev (though they should be if they want to be helping to build the best client libraries they can). I whole hardily believe that we need to not break existing functionality of the client protocol,
Re: Review Request 29952: Patch for kafka-1864
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29952/#review68381 --- core/src/main/scala/kafka/server/OffsetManager.scala https://reviews.apache.org/r/29952/#comment112577 The only issue here is the problem raised in KAFKA-1867 - even though that should not happen in practice since you would generally only commit offsets after topics do exist in the cluster. Anyway, wouldn't it just be simpler to keep the replication factor default as 1 given that it is possible to change it? - Joel Koshy On Jan. 16, 2015, 12:52 a.m., Jun Rao wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29952/ --- (Updated Jan. 16, 2015, 12:52 a.m.) Review request for kafka. Bugs: kafka-1864 https://issues.apache.org/jira/browse/kafka-1864 Repository: kafka Description --- create offset topic with a larger replication factor by default Diffs - core/src/main/scala/kafka/server/KafkaApis.scala d626b1710813648524eefa5a3df098393c3e7743 core/src/main/scala/kafka/server/KafkaConfig.scala 6e26c5436feb4629d17f199011f3ebb674aa767f core/src/main/scala/kafka/server/OffsetManager.scala 43eb2a35bb54d32c66cdb94772df657b3a104d1a core/src/test/scala/integration/kafka/api/ProducerFailureHandlingTest.scala 07a7beee9dec733eae943b425ae58c54f08458d8 core/src/test/scala/unit/kafka/server/OffsetCommitTest.scala 4a3a5b264a021e55c39f4d7424ce04ee591503ef Diff: https://reviews.apache.org/r/29952/diff/ Testing --- Thanks, Jun Rao
[jira] [Updated] (KAFKA-1868) ConsoleConsumer shouldn't override dual.commit.enabled to false if not explicitly set
[ https://issues.apache.org/jira/browse/KAFKA-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jun Rao updated KAFKA-1868: --- Resolution: Fixed Status: Resolved (was: Patch Available) Thanks for the review. Committed to both 0.8.2 and trunk. ConsoleConsumer shouldn't override dual.commit.enabled to false if not explicitly set - Key: KAFKA-1868 URL: https://issues.apache.org/jira/browse/KAFKA-1868 Project: Kafka Issue Type: Bug Affects Versions: 0.8.2 Reporter: Jun Rao Assignee: Jun Rao Priority: Blocker Fix For: 0.8.2 Attachments: kafka-1868.patch In ConsoleConsumer, we override dual.commit.enabled to false if not explicitly set. However, if offset.storage is set to kafka, by default, dual.commit.enabled is set to true and we shouldn't override that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-1476) Get a list of consumer groups
[ https://issues.apache.org/jira/browse/KAFKA-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279715#comment-14279715 ] Neha Narkhede edited comment on KAFKA-1476 at 1/16/15 2:21 AM: --- Thanks for attaching the output, [~onurkaraman]. Here are some review comments- 1. The error stack trace for describing groups that don't exist is pretty ugly, so let's remove that and output a message that states the group doesn't exist {code} vagrant@worker1:/opt/kafka$ bin/kafka-consumer-groups.sh --zookeeper 192.168.50.11:2181 --describe --group g3 Error while executing consumer group command org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /consumers/g3/owners org.I0Itec.zkclient.exception.ZkNoNodeException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /consumers/g3/owners at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47) at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685) at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:413) at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:409) at kafka.utils.ZkUtils$.getTopicsByConsumerGroup(ZkUtils.scala:758) at kafka.admin.ConsumerGroupCommand$.describe(ConsumerGroupCommand.scala:81) at kafka.admin.ConsumerGroupCommand$.main(ConsumerGroupCommand.scala:56) at kafka.admin.ConsumerGroupCommand.main(ConsumerGroupCommand.scala) Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /consumers/g3/owners at org.apache.zookeeper.KeeperException.create(KeeperException.java:111) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1472) at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1500) at org.I0Itec.zkclient.ZkConnection.getChildren(ZkConnection.java:99) at org.I0Itec.zkclient.ZkClient$2.call(ZkClient.java:416) at org.I0Itec.zkclient.ZkClient$2.call(ZkClient.java:413) at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675) ... 6 more {code} 2. I'm not sure why the header fields are not comma separated but the data is. It will be best to stick to one separator only. So if you pick comma as the separator, then the output should change to {code} GROUP, TOPIC, PID, CURRENT, OFFSET, LOG SIZE, LAG, OWNER g2, t2, 0, 1, 1, 0, none {code} 3. For deleting only a specific topic's information, it is sufficient to rely on the user specifying the --topic along with --delete. The --delete-with-topic option seems unnecessary. was (Author: nehanarkhede): Thanks for attaching the output, [~onurkaraman]. Here are some review comments- 1. The error stack trace for describing groups that don't exist is pretty ugly, so let's remove that and output a message that states the group doesn't exist {code} vagrant@worker1:/opt/kafka$ bin/kafka-consumer-groups.sh --zookeeper 192.168.50.11:2181 --describe --group g3 Error while executing consumer group command org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /consumers/g3/owners org.I0Itec.zkclient.exception.ZkNoNodeException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /consumers/g3/owners at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47) at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685) at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:413) at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:409) at kafka.utils.ZkUtils$.getTopicsByConsumerGroup(ZkUtils.scala:758) at kafka.admin.ConsumerGroupCommand$.describe(ConsumerGroupCommand.scala:81) at kafka.admin.ConsumerGroupCommand$.main(ConsumerGroupCommand.scala:56) at kafka.admin.ConsumerGroupCommand.main(ConsumerGroupCommand.scala) Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /consumers/g3/owners at org.apache.zookeeper.KeeperException.create(KeeperException.java:111) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1472) at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1500) at org.I0Itec.zkclient.ZkConnection.getChildren(ZkConnection.java:99) at org.I0Itec.zkclient.ZkClient$2.call(ZkClient.java:416) at org.I0Itec.zkclient.ZkClient$2.call(ZkClient.java:413) at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675) ... 6 more {code} 2. I'm not sure why the header fields are not comma separated but the data is. It will be best to stick to one separator
Re: [DISCUSS] KIPs
Hey Joe, Yeah I guess the question is what is the definition of major? I agree we definitely don't want to generate a bunch of paperwork. We have enough problems just getting all the contributions reviewed and checked in in a timely fashion... So obviously bug fixes would not apply here. I think it is also pretty clear that big features should get reviewed and discussed. To pick on myself, for example, the log compaction work was done without enough public discussion about how it worked and why (imho). I hope/claim that enough rigour in thinking about use-cases and operations and so on was done that it turned out well, but the discussion was just between a few people with no real public output. This kind of feature is clearly a big change and something we should discuss. If we limit ourselves to just the public contracts the KIP introduces the discussion would just be on the new configs and monitoring without really a discussion of the design and how it works which is obviously closely related. I don't think this should be more work because in practice we are making wiki pages for any big thing anyway. So this would just be a consistent way of numbering and structuring these pages. This would also give a clear call to action: hey kafka people, come read my wiki and think this through. I actually thinking the voting aspect is less important. I think it is generally clear when there is agreement on something and not. So from my point of view we could actually just eliminate that part if that is too formal, it just seemed like a good way to formally adopt something. To address some of your comments from the wiki: 1. This doesn't inhibit someone coming along and putting up a patch. It is just that when they do if it is a big thing introducing new functionality we would ask for a little discussion on the basic feature/contracts prior to code review. 2. We definitely definitely don't want people generating a lot of these things every time they have an idea that they aren't going to implement. So this is only applicable to things you absolutely will check in code for. We also don't want to be making proposals before things are thought through, which often requires writing the code. So I think the right time for a KIP is when you are far enough along that you know the issues and tradeoffs but not so far along that you are going to be totally opposed to any change. Sometimes that is prior to writing any code and sometimes not until you are practically done. The key problem I see this fixing is that there is enough new development happening that it is pretty hard for everyone to review every line of every iteration of every patch. But all of us should be fully aware of new features, the ramifications, the new public interfaces, etc. If we aren't aware of that we can't really build a holistic system that is beautiful and consistent across. So the idea is that if you fully review the KIPs you can be sure that even if you don't know every new line of code, you know the major changes coming in. -Jay On Thu, Jan 15, 2015 at 12:18 PM, Joe Stein joe.st...@stealth.ly wrote: Thanks Jay for kicking this off! I think the confluence page you wrote up is a great start. The KIP makes sense to me (at a minimum) if there is going to be any breaking change. This way Kafka can continue to grow and blossom and we have a process in place if we are going to release a thorn ... and when we do it is *CLEAR* about what and why that is. We can easily document which KIPs where involved with this release (which I think should get committed afterwards somewhere so no chance of edit after release). This approach I had been thinking about also allows changes to occur as they do now as long as they are backwards compatible. Hopefully we never need a KIP but when we do the PMC can vote on it and folks can read the release notes with *CLEAR* understanding what is going to break their existing setup... at least that is how I have been thinking about it. Let me know what you think about this base minimum approach... I hadn't really thought of the KIP for *ANY* major change and I have to think more about that. I have some other comments for minor items in the confluence page I will make once I think more about how I feel having a KIP for more than what I was thinking about already. I do think we should have major changes go through confluence, mailing list discuss and JIRA but kind of feel we have been doing that already ... if there are cases where that isn't the case we should highlight and learn from them and formalize that more if need be. /*** Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop / On Thu, Jan 15, 2015 at 1:42 PM, Jay Kreps jay.kr...@gmail.com wrote: The idea of KIPs came up in a
[jira] [Updated] (KAFKA-1674) auto.create.topics.enable docs are misleading
[ https://issues.apache.org/jira/browse/KAFKA-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikumar Reddy updated KAFKA-1674: --- Fix Version/s: 0.8.2 This is simple documentation correction. we can push this to 0.8.2 auto.create.topics.enable docs are misleading - Key: KAFKA-1674 URL: https://issues.apache.org/jira/browse/KAFKA-1674 Project: Kafka Issue Type: Bug Affects Versions: 0.8.1.1 Reporter: Stevo Slavic Priority: Minor Labels: newbie Fix For: 0.8.2 {{auto.create.topics.enable}} is currently [documented|http://kafka.apache.org/08/configuration.html] with {quote} Enable auto creation of topic on the server. If this is set to true then attempts to produce, consume, or fetch metadata for a non-existent topic will automatically create it with the default replication factor and number of partitions. {quote} In Kafka 0.8.1.1 reality, topics are only created when trying to publish a message on non-existing topic. After [discussion|http://mail-archives.apache.org/mod_mbox/kafka-users/201410.mbox/%3CCAFbh0Q1WXLUDO-im1fQ1yEvrMduxmXbj5HXVc3Cq8B%3DfeMso9g%40mail.gmail.com%3E] with [~junrao] conclusion was that it's documentation issue which needs to be fixed. Before fixing docs, please check once more if this is just non-working functionality. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1333) Add consumer co-ordinator module to the server
[ https://issues.apache.org/jira/browse/KAFKA-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279930#comment-14279930 ] Andrii Biletskyi commented on KAFKA-1333: - [~guozhang] many thanks for such informative update! Add consumer co-ordinator module to the server -- Key: KAFKA-1333 URL: https://issues.apache.org/jira/browse/KAFKA-1333 Project: Kafka Issue Type: Sub-task Components: consumer Affects Versions: 0.9.0 Reporter: Neha Narkhede Assignee: Guozhang Wang Scope of this JIRA is to just add a consumer co-ordinator module that do the following: 1) coordinator start-up, metadata initialization 2) simple join group handling (just updating metadata, no failure detection / rebalancing): this should be sufficient for consumers doing self offset / partition management. Offset manager will still run side-by-side with the coordinator in this JIRA, and we will merge it in KAFKA-1740. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-1804) Kafka network thread lacks top exception handler
[ https://issues.apache.org/jira/browse/KAFKA-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279396#comment-14279396 ] Alexey Ozeritskiy edited comment on KAFKA-1804 at 1/16/15 7:44 AM: --- We've written the simple patch for kafka-network-thread: {code:java} override def run(): Unit = { try { iteration() // = the original run() } catch { case e: Throwable = error(ERROR IN NETWORK THREAD: %s.format(e), e) Runtime.getRuntime.halt(1) } } {code} and got the following trace: {code} [2015-01-15 23:04:08,537] ERROR ERROR IN NETWORK THREAD: java.util.NoSuchElementException: None.get (kafka.network.Processor) java.util.NoSuchElementException: None.get at scala.None$.get(Option.scala:313) at scala.None$.get(Option.scala:311) at kafka.network.ConnectionQuotas.dec(SocketServer.scala:544) at kafka.network.AbstractServerThread.close(SocketServer.scala:165) at kafka.network.AbstractServerThread.close(SocketServer.scala:157) at kafka.network.Processor.close(SocketServer.scala:394) at kafka.network.Processor.processNewResponses(SocketServer.scala:426) at kafka.network.Processor.iteration(SocketServer.scala:328) at kafka.network.Processor.run(SocketServer.scala:381) at java.lang.Thread.run(Thread.java:745) {code} was (Author: aozeritsky): We've written the simple patch for kafka-network-thread: {code:java} override def run(): Unit = { try { original_run() } catch { case e: Throwable = error(ERROR IN NETWORK THREAD: %s.format(e), e) Runtime.getRuntime.halt(1) } } {code} and got the following trace: {code} [2015-01-15 23:04:08,537] ERROR ERROR IN NETWORK THREAD: java.util.NoSuchElementException: None.get (kafka.network.Processor) java.util.NoSuchElementException: None.get at scala.None$.get(Option.scala:313) at scala.None$.get(Option.scala:311) at kafka.network.ConnectionQuotas.dec(SocketServer.scala:544) at kafka.network.AbstractServerThread.close(SocketServer.scala:165) at kafka.network.AbstractServerThread.close(SocketServer.scala:157) at kafka.network.Processor.close(SocketServer.scala:394) at kafka.network.Processor.processNewResponses(SocketServer.scala:426) at kafka.network.Processor.iteration(SocketServer.scala:328) at kafka.network.Processor.run(SocketServer.scala:381) at java.lang.Thread.run(Thread.java:745) {code} Kafka network thread lacks top exception handler Key: KAFKA-1804 URL: https://issues.apache.org/jira/browse/KAFKA-1804 Project: Kafka Issue Type: Bug Reporter: Oleg Golovin We have faced the problem that some kafka network threads may fail, so that jstack attached to Kafka process showed fewer threads than we had defined in our Kafka configuration. This leads to API requests processed by this thread getting stuck unresponed. There were no error messages in the log regarding thread failure. We have examined Kafka code to find out there is no top try-catch block in the network thread code, which could at least log possible errors. Could you add top-level try-catch block for the network thread, which should recover network thread in case of exception? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-595) Decouple producer side compression from server-side compression.
[ https://issues.apache.org/jira/browse/KAFKA-595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278435#comment-14278435 ] Manikumar Reddy commented on KAFKA-595: --- [~nehanarkhede] We have given support for broker-side compression in KAFKA-1499. This issue is similar to KAFKA-1499. I think we can close this issue. Decouple producer side compression from server-side compression. Key: KAFKA-595 URL: https://issues.apache.org/jira/browse/KAFKA-595 Project: Kafka Issue Type: Improvement Affects Versions: 0.8.0 Reporter: Neha Narkhede Labels: feature In 0.7 Kafka always appended messages to the log using whatever compression codec the client used. In 0.8, after the KAFKA-506 patch, the master always recompresses the message before appending to the log to assign ids. Currently the server uses a funky heuristic to choose a compression codec based on the codecs the producer used. This doesn't actually make that much sense. It would be better for the server to have its own compression (a global default and per-topic override) that specified the compression codec, and have the server always recompress with this codec regardless of the original codec. Compression currently happens in kafka.log.Log.assignOffsets (perhaps should be renamed if it takes on compression as an official responsibility instead of a side-effect). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [VOTE] 0.8.2.0 Candidate 1
I just downloaded the Kafka binary and am trying this on my 32 bit JVM (Java 7)? Trying to start Zookeeper or Kafka server keeps failing with Unrecognized VM option 'UseCompressedOops': ./zookeeper-server-start.sh ../config/zookeeper.properties Unrecognized VM option 'UseCompressedOops' Error: Could not create the Java Virtual Machine. Error: A fatal exception has occurred. Program will exit. Same with the Kafka server startup scripts. My Java version is: java version 1.7.0_71 Java(TM) SE Runtime Environment (build 1.7.0_71-b14) Java HotSpot(TM) Server VM (build 24.71-b01, mixed mode) Should there be a check in the script, before adding this option? -Jaikiran On Wednesday 14 January 2015 10:08 PM, Jun Rao wrote: + users mailing list. It would be great if people can test this out and report any blocker issues. Thanks, Jun On Tue, Jan 13, 2015 at 7:16 PM, Jun Rao j...@confluent.io wrote: This is the first candidate for release of Apache Kafka 0.8.2.0. There has been some changes since the 0.8.2 beta release, especially in the new java producer api and jmx mbean names. It would be great if people can test this out thoroughly. We are giving people 10 days for testing and voting. Release Notes for the 0.8.2.0 release *https://people.apache.org/~junrao/kafka-0.8.2.0-candidate1/RELEASE_NOTES.html https://people.apache.org/~junrao/kafka-0.8.2.0-candidate1/RELEASE_NOTES.html* *** Please download, test and vote by Friday, Jan 23h, 7pm PT Kafka's KEYS file containing PGP keys we use to sign the release: *https://people.apache.org/~junrao/kafka-0.8.2.0-candidate1/KEYS https://people.apache.org/~junrao/kafka-0.8.2.0-candidate1/KEYS* in addition to the md5, sha1 and sha2 (SHA256) checksum. * Release artifacts to be voted upon (source and binary): *https://people.apache.org/~junrao/kafka-0.8.2.0-candidate1/ https://people.apache.org/~junrao/kafka-0.8.2.0-candidate1/* * Maven artifacts to be voted upon prior to release: *https://people.apache.org/~junrao/kafka-0.8.2.0-candidate1/maven_staging/ https://people.apache.org/~junrao/kafka-0.8.2.0-candidate1/maven_staging/* * scala-doc *https://people.apache.org/~junrao/kafka-0.8.2.0-candidate1/scaladoc/#package https://people.apache.org/~junrao/kafka-0.8.2.0-candidate1/scaladoc/#package* * java-doc *https://people.apache.org/~junrao/kafka-0.8.2.0-candidate1/javadoc/ https://people.apache.org/~junrao/kafka-0.8.2.0-candidate1/javadoc/* * The tag to be voted upon (off the 0.8.2 branch) is the 0.8.2.0 tag *https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=b0c7d579f8aeb5750573008040a42b7377a651d5 https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=b0c7d579f8aeb5750573008040a42b7377a651d5* /*** Thanks, Jun
Re: [VOTE] 0.8.2.0 Candidate 1
Yes, we can add a check. This option works only with 64 bit jvm. On Jan 15, 2015 6:53 PM, Jaikiran Pai jai.forums2...@gmail.com wrote: I just downloaded the Kafka binary and am trying this on my 32 bit JVM (Java 7)? Trying to start Zookeeper or Kafka server keeps failing with Unrecognized VM option 'UseCompressedOops': ./zookeeper-server-start.sh ../config/zookeeper.properties Unrecognized VM option 'UseCompressedOops' Error: Could not create the Java Virtual Machine. Error: A fatal exception has occurred. Program will exit. Same with the Kafka server startup scripts. My Java version is: java version 1.7.0_71 Java(TM) SE Runtime Environment (build 1.7.0_71-b14) Java HotSpot(TM) Server VM (build 24.71-b01, mixed mode) Should there be a check in the script, before adding this option? -Jaikiran On Wednesday 14 January 2015 10:08 PM, Jun Rao wrote: + users mailing list. It would be great if people can test this out and report any blocker issues. Thanks, Jun On Tue, Jan 13, 2015 at 7:16 PM, Jun Rao j...@confluent.io wrote: This is the first candidate for release of Apache Kafka 0.8.2.0. There has been some changes since the 0.8.2 beta release, especially in the new java producer api and jmx mbean names. It would be great if people can test this out thoroughly. We are giving people 10 days for testing and voting. Release Notes for the 0.8.2.0 release *https://people.apache.org/~junrao/kafka-0.8.2.0- candidate1/RELEASE_NOTES.html https://people.apache.org/~junrao/kafka-0.8.2.0- candidate1/RELEASE_NOTES.html* *** Please download, test and vote by Friday, Jan 23h, 7pm PT Kafka's KEYS file containing PGP keys we use to sign the release: *https://people.apache.org/~junrao/kafka-0.8.2.0-candidate1/KEYS https://people.apache.org/~junrao/kafka-0.8.2.0-candidate1/KEYS* in addition to the md5, sha1 and sha2 (SHA256) checksum. * Release artifacts to be voted upon (source and binary): *https://people.apache.org/~junrao/kafka-0.8.2.0-candidate1/ https://people.apache.org/~junrao/kafka-0.8.2.0-candidate1/* * Maven artifacts to be voted upon prior to release: *https://people.apache.org/~junrao/kafka-0.8.2.0- candidate1/maven_staging/ https://people.apache.org/~junrao/kafka-0.8.2.0- candidate1/maven_staging/* * scala-doc *https://people.apache.org/~junrao/kafka-0.8.2.0- candidate1/scaladoc/#package https://people.apache.org/~junrao/kafka-0.8.2.0- candidate1/scaladoc/#package* * java-doc *https://people.apache.org/~junrao/kafka-0.8.2.0-candidate1/javadoc/ https://people.apache.org/~junrao/kafka-0.8.2.0-candidate1/javadoc/* * The tag to be voted upon (off the 0.8.2 branch) is the 0.8.2.0 tag *https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h= b0c7d579f8aeb5750573008040a42b7377a651d5 https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h= b0c7d579f8aeb5750573008040a42b7377a651d5* /*** Thanks, Jun
[jira] [Commented] (KAFKA-1333) Add consumer co-ordinator module to the server
[ https://issues.apache.org/jira/browse/KAFKA-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278739#comment-14278739 ] Andrii Biletskyi commented on KAFKA-1333: - Hey [~guozhang], We are considering using 0.9 Consumer parts for full-fledged high level Consumer which supports all basic stuff like rebalance etc plus some additional specific to our needs features.Since you are the assignee of this ticket can you please tell where are you on this piece, when there will be some first alpha version patch to review / contribute? Thank you in advance! Add consumer co-ordinator module to the server -- Key: KAFKA-1333 URL: https://issues.apache.org/jira/browse/KAFKA-1333 Project: Kafka Issue Type: Sub-task Components: consumer Affects Versions: 0.9.0 Reporter: Neha Narkhede Assignee: Guozhang Wang Scope of this JIRA is to just add a consumer co-ordinator module that do the following: 1) coordinator start-up, metadata initialization 2) simple join group handling (just updating metadata, no failure detection / rebalancing): this should be sufficient for consumers doing self offset / partition management. Offset manager will still run side-by-side with the coordinator in this JIRA, and we will merge it in KAFKA-1740. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 29952: Patch for kafka-1864
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29952/#review68406 --- Ship it! core/src/main/scala/kafka/server/OffsetManager.scala https://reviews.apache.org/r/29952/#comment112600 Although I think the documentation makes this clear the default should probably be conservative so this sounds reasonable. - Joel Koshy On Jan. 16, 2015, 12:52 a.m., Jun Rao wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29952/ --- (Updated Jan. 16, 2015, 12:52 a.m.) Review request for kafka. Bugs: kafka-1864 https://issues.apache.org/jira/browse/kafka-1864 Repository: kafka Description --- create offset topic with a larger replication factor by default Diffs - core/src/main/scala/kafka/server/KafkaApis.scala d626b1710813648524eefa5a3df098393c3e7743 core/src/main/scala/kafka/server/KafkaConfig.scala 6e26c5436feb4629d17f199011f3ebb674aa767f core/src/main/scala/kafka/server/OffsetManager.scala 43eb2a35bb54d32c66cdb94772df657b3a104d1a core/src/test/scala/integration/kafka/api/ProducerFailureHandlingTest.scala 07a7beee9dec733eae943b425ae58c54f08458d8 core/src/test/scala/unit/kafka/server/OffsetCommitTest.scala 4a3a5b264a021e55c39f4d7424ce04ee591503ef Diff: https://reviews.apache.org/r/29952/diff/ Testing --- Thanks, Jun Rao
[jira] [Updated] (KAFKA-1810) Add IP Filtering / Whitelists-Blacklists
[ https://issues.apache.org/jira/browse/KAFKA-1810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Holoman updated KAFKA-1810: Attachment: KAFKA-1810_2015-01-15_19:47:14.patch Add IP Filtering / Whitelists-Blacklists - Key: KAFKA-1810 URL: https://issues.apache.org/jira/browse/KAFKA-1810 Project: Kafka Issue Type: New Feature Components: core, network Reporter: Jeff Holoman Assignee: Jeff Holoman Priority: Minor Fix For: 0.8.3 Attachments: KAFKA-1810.patch, KAFKA-1810_2015-01-15_19:47:14.patch While longer-term goals of security in Kafka are on the roadmap there exists some value for the ability to restrict connection to Kafka brokers based on IP address. This is not intended as a replacement for security but more of a precaution against misconfiguration and to provide some level of control to Kafka administrators about who is reading/writing to their cluster. 1) In some organizations software administration vs o/s systems administration and network administration is disjointed and not well choreographed. Providing software administrators the ability to configure their platform relatively independently (after initial configuration) from Systems administrators is desirable. 2) Configuration and deployment is sometimes error prone and there are situations when test environments could erroneously read/write to production environments 3) An additional precaution against reading sensitive data is typically welcomed in most large enterprise deployments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 29714: Patch for KAFKA-1810
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29714/ --- (Updated Jan. 16, 2015, 12:47 a.m.) Review request for kafka. Bugs: KAFKA-1810 https://issues.apache.org/jira/browse/KAFKA-1810 Repository: kafka Description (updated) --- KAFKA-1810 Refactor KAFKA-1810 Refactor 2 Diffs (updated) - core/src/main/scala/kafka/network/IPFilter.scala PRE-CREATION core/src/main/scala/kafka/network/SocketServer.scala 39b1651b680b2995cedfde95d74c086d9c6219ef core/src/main/scala/kafka/server/KafkaConfig.scala 6e26c5436feb4629d17f199011f3ebb674aa767f core/src/main/scala/kafka/server/KafkaServer.scala 1691ad7fc80ca0b112f68e3ea0cbab265c75b26b core/src/main/scala/kafka/utils/VerifiableProperties.scala 2ffc7f452dc7a1b6a06ca7a509ed49e1ab3d1e68 core/src/test/scala/unit/kafka/network/IPFilterTest.scala PRE-CREATION core/src/test/scala/unit/kafka/network/SocketServerTest.scala 78b431f9c88cca1bc5e430ffd41083d0e92b7e75 core/src/test/scala/unit/kafka/server/KafkaConfigTest.scala 2377abe4933e065d037828a214c3a87e1773a8ef Diff: https://reviews.apache.org/r/29714/diff/ Testing --- This code centers around a new class, CIDRRange in IPFilter.scala. The IPFilter class is created and holds two fields, the ruleType (allow|deny|none) and a list of CIDRRange objects. This is used in the Socket Server acceptor thread. The check does an exists on the value in the list if the rule type is allow or deny. On object creation, we pre-calculate the lower and upper range values and store those as a BigInt. The overhead of the check should be fairly minimal as it involves converting the incoming IP Address to a BigInt and then just doing a compare to the low/high values. In writing this review up I realized that I can optimize this further to convert to bigint first and move that conversion out of the range check, which I can address. Testing covers the CIDRRange and IPFilter classes and validation of IPV6, IPV4, and configurations. Additionally the functionality is tested in SocketServerTest. Other changes are just to assist in configuration. I modified the SocketServerTest to use a method for grabbing the Socket server to make the code a bit more concise. One key point is that, if there is an error in configuration, we halt the startup of the broker. The thinking there is that if you screw up security-related configs, you want to know about it right away rather than silently accept connections. (thanks Joe Stein for the input). There are two new exceptions realted to this functionality, one to handle configuration errors, and one to handle blocking the request. Currently the level is set to INFO. Does it make sense to move this to WARN ? Thanks, Jeff Holoman
[jira] [Commented] (KAFKA-1810) Add IP Filtering / Whitelists-Blacklists
[ https://issues.apache.org/jira/browse/KAFKA-1810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279609#comment-14279609 ] Jeff Holoman commented on KAFKA-1810: - Updated reviewboard https://reviews.apache.org/r/29714/diff/ against branch origin/trunk Add IP Filtering / Whitelists-Blacklists - Key: KAFKA-1810 URL: https://issues.apache.org/jira/browse/KAFKA-1810 Project: Kafka Issue Type: New Feature Components: core, network Reporter: Jeff Holoman Assignee: Jeff Holoman Priority: Minor Fix For: 0.8.3 Attachments: KAFKA-1810.patch, KAFKA-1810_2015-01-15_19:47:14.patch While longer-term goals of security in Kafka are on the roadmap there exists some value for the ability to restrict connection to Kafka brokers based on IP address. This is not intended as a replacement for security but more of a precaution against misconfiguration and to provide some level of control to Kafka administrators about who is reading/writing to their cluster. 1) In some organizations software administration vs o/s systems administration and network administration is disjointed and not well choreographed. Providing software administrators the ability to configure their platform relatively independently (after initial configuration) from Systems administrators is desirable. 2) Configuration and deployment is sometimes error prone and there are situations when test environments could erroneously read/write to production environments 3) An additional precaution against reading sensitive data is typically welcomed in most large enterprise deployments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1868) ConsoleConsumer shouldn't override dual.commit.enabled to false if not explicitly set
[ https://issues.apache.org/jira/browse/KAFKA-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jun Rao updated KAFKA-1868: --- Attachment: kafka-1868.patch ConsoleConsumer shouldn't override dual.commit.enabled to false if not explicitly set - Key: KAFKA-1868 URL: https://issues.apache.org/jira/browse/KAFKA-1868 Project: Kafka Issue Type: Bug Affects Versions: 0.8.2 Reporter: Jun Rao Assignee: Jun Rao Priority: Blocker Fix For: 0.8.2 Attachments: kafka-1868.patch In ConsoleConsumer, we override dual.commit.enabled to false if not explicitly set. However, if offset.storage is set to kafka, by default, dual.commit.enabled is set to true and we shouldn't override that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1868) ConsoleConsumer shouldn't override dual.commit.enabled to false if not explicitly set
[ https://issues.apache.org/jira/browse/KAFKA-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jun Rao updated KAFKA-1868: --- Status: Patch Available (was: Open) ConsoleConsumer shouldn't override dual.commit.enabled to false if not explicitly set - Key: KAFKA-1868 URL: https://issues.apache.org/jira/browse/KAFKA-1868 Project: Kafka Issue Type: Bug Affects Versions: 0.8.2 Reporter: Jun Rao Assignee: Jun Rao Priority: Blocker Fix For: 0.8.2 Attachments: kafka-1868.patch In ConsoleConsumer, we override dual.commit.enabled to false if not explicitly set. However, if offset.storage is set to kafka, by default, dual.commit.enabled is set to true and we shouldn't override that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1810) Add IP Filtering / Whitelists-Blacklists
[ https://issues.apache.org/jira/browse/KAFKA-1810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe Stein updated KAFKA-1810: - Reviewer: Gwen Shapira Add IP Filtering / Whitelists-Blacklists - Key: KAFKA-1810 URL: https://issues.apache.org/jira/browse/KAFKA-1810 Project: Kafka Issue Type: New Feature Components: core, network Reporter: Jeff Holoman Assignee: Jeff Holoman Priority: Minor Fix For: 0.8.3 Attachments: KAFKA-1810.patch, KAFKA-1810_2015-01-15_19:47:14.patch While longer-term goals of security in Kafka are on the roadmap there exists some value for the ability to restrict connection to Kafka brokers based on IP address. This is not intended as a replacement for security but more of a precaution against misconfiguration and to provide some level of control to Kafka administrators about who is reading/writing to their cluster. 1) In some organizations software administration vs o/s systems administration and network administration is disjointed and not well choreographed. Providing software administrators the ability to configure their platform relatively independently (after initial configuration) from Systems administrators is desirable. 2) Configuration and deployment is sometimes error prone and there are situations when test environments could erroneously read/write to production environments 3) An additional precaution against reading sensitive data is typically welcomed in most large enterprise deployments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1868) ConsoleConsumer shouldn't override dual.commit.enabled to false if not explicitly set
[ https://issues.apache.org/jira/browse/KAFKA-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279660#comment-14279660 ] Joel Koshy commented on KAFKA-1868: --- Appears to be a side-effect of KAFKA-924 ConsoleConsumer shouldn't override dual.commit.enabled to false if not explicitly set - Key: KAFKA-1868 URL: https://issues.apache.org/jira/browse/KAFKA-1868 Project: Kafka Issue Type: Bug Affects Versions: 0.8.2 Reporter: Jun Rao Assignee: Jun Rao Priority: Blocker Fix For: 0.8.2 Attachments: kafka-1868.patch In ConsoleConsumer, we override dual.commit.enabled to false if not explicitly set. However, if offset.storage is set to kafka, by default, dual.commit.enabled is set to true and we shouldn't override that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 29952: Patch for kafka-1864
On Jan. 16, 2015, 1:25 a.m., Joel Koshy wrote: core/src/main/scala/kafka/server/OffsetManager.scala, line 79 https://reviews.apache.org/r/29952/diff/1/?file=823279#file823279line79 The only issue here is the problem raised in KAFKA-1867 - even though that should not happen in practice since you would generally only commit offsets after topics do exist in the cluster. Anyway, wouldn't it just be simpler to keep the replication factor default as 1 given that it is possible to change it? The main purpose of the patch is to make the default behavior good. For that, we want to have enough partitions and enough redundancy. The issue with defaulting replication factor to 1 is that people may not realize the issue until one of the brokers goes down. At that time, it's too late to change the replication factor. - Jun --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29952/#review68381 --- On Jan. 16, 2015, 12:52 a.m., Jun Rao wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29952/ --- (Updated Jan. 16, 2015, 12:52 a.m.) Review request for kafka. Bugs: kafka-1864 https://issues.apache.org/jira/browse/kafka-1864 Repository: kafka Description --- create offset topic with a larger replication factor by default Diffs - core/src/main/scala/kafka/server/KafkaApis.scala d626b1710813648524eefa5a3df098393c3e7743 core/src/main/scala/kafka/server/KafkaConfig.scala 6e26c5436feb4629d17f199011f3ebb674aa767f core/src/main/scala/kafka/server/OffsetManager.scala 43eb2a35bb54d32c66cdb94772df657b3a104d1a core/src/test/scala/integration/kafka/api/ProducerFailureHandlingTest.scala 07a7beee9dec733eae943b425ae58c54f08458d8 core/src/test/scala/unit/kafka/server/OffsetCommitTest.scala 4a3a5b264a021e55c39f4d7424ce04ee591503ef Diff: https://reviews.apache.org/r/29952/diff/ Testing --- Thanks, Jun Rao
Re: [kafka-clients] Re: Heads up: KAFKA-1697 - remove code related to ack1 on the broker
+1 on Joe's suggestions, glad to see it happening! On Thu, Jan 15, 2015 at 5:19 PM, Gwen Shapira gshap...@cloudera.com wrote: The errors are part of the KIP process now, so I think the clients are safe :) https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals On Thu, Jan 15, 2015 at 5:12 PM, Steve Morin steve.mo...@gmail.com wrote: Agree errors should be part of the protocol On Jan 15, 2015, at 17:59, Gwen Shapira gshap...@cloudera.com wrote: Hi, I got convinced by Joe and Dana that errors are indeed part of the protocol and can't be randomly added. So, it looks like we need to bump version of ProduceRequest in the following way: Version 0 - accept acks 1. I think we should keep the existing behavior too (i.e. not replace it with -1) to avoid surprising clients, but I'm willing to hear other opinions. Version 1 - do not accept acks 1 and return an error. Are we ok with the error I added in KAFKA-1697? We can use something less specific like InvalidRequestParameter. This error can be reused in the future and reduce the need to add errors, but will also be less clear to the client and its users. Maybe even add the error message string to the protocol in addition to the error code? (since we are bumping versions) I think maintaining the old version throughout 0.8.X makes sense. IMO dropping it for 0.9 is feasible, but I'll let client owners help make that call. Am I missing anything? Should I start a KIP? It seems like a KIP-type discussion :) Gwen On Thu, Jan 15, 2015 at 2:31 PM, Ewen Cheslack-Postava e...@confluent.io wrote: Gwen, I think the only option that wouldn't require a protocol version change is the one where acks 1 is converted to acks = -1 since it's the only one that doesn't potentially break older clients. The protocol guide says that the expected upgrade path is servers first, then clients, so old clients, including non-java clients, that may be using acks 1 should be able to work with a new broker version. It's more work, but I think dealing with the protocol change is the right thing to do since it eventually gets us to the behavior I think is better -- the broker should reject requests with invalid values. I think Joe and I were basically in agreement. In my mind the major piece missing from his description is how long we're going to maintain his case 0 behavior. It's impractical to maintain old versions forever, but it sounds like there hasn't been a decision on how long to maintain them. Maybe that's another item to add to KIPs -- protocol versions and behavior need to be listed as deprecated and the earliest version in which they'll be removed should be specified so users can understand which versions are guaranteed to be compatible, even if they're using (well-written) non-java clients. -Ewen On Thu, Jan 15, 2015 at 12:52 PM, Dana Powers dana.pow...@gmail.com wrote: clients don't break on unknown errors maybe true for the official java clients, but I dont think the assumption holds true for community-maintained clients and users of those clients. kafka-python generally follows the fail-fast philosophy and raises an exception on any unrecognized error code in any server response. in this case, kafka-python allows users to set their own required-acks policy when creating a producer instance. It is possible that users of kafka-python have deployed producer code that uses ack1 -- perhaps in production environments -- and for those users the new error code will crash their producer code. I would not be surprised if the same were true of other community clients. *one reason for the fail-fast approach is that there isn't great documentation on what errors to expect for each request / response -- so we use failures to alert that some error case is not handled properly. and because of that, introducing new error cases without bumping the api version is likely to cause those errors to get raised/thrown all the way back up to the user. of course we (client maintainers) can fix the issues in the client libraries and suggest users upgrade, but it's not the ideal situation. long-winded way of saying: I agree w/ Joe. -Dana On Thu, Jan 15, 2015 at 12:07 PM, Gwen Shapira gshap...@cloudera.com wrote: Is the protocol bump caused by the behavior change or the new error code? 1) IMO, error_codes are data, and clients can expect to receive errors that they don't understand (i.e. unknown errors). AFAIK, clients don't break on unknown errors, they are simple more challenging to debug. If we document the new behavior, then its definitely debuggable and fixable. 2) The behavior change is basically a deprecation - i.e. acks 1 were never documented, and are not supported by Kafka clients starting with version 0.8.2. I'm not sure this
Re: [kafka-clients] Re: Heads up: KAFKA-1697 - remove code related to ack1 on the broker
I created a KIP for this suggestion: https://cwiki.apache.org/confluence/display/KAFKA/KIP-1+-+Remove+support+of+request.required.acks Basically documenting what was already discussed here. Comments will be awesome! Gwen On Thu, Jan 15, 2015 at 5:19 PM, Gwen Shapira gshap...@cloudera.com wrote: The errors are part of the KIP process now, so I think the clients are safe :) https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals On Thu, Jan 15, 2015 at 5:12 PM, Steve Morin steve.mo...@gmail.com wrote: Agree errors should be part of the protocol On Jan 15, 2015, at 17:59, Gwen Shapira gshap...@cloudera.com wrote: Hi, I got convinced by Joe and Dana that errors are indeed part of the protocol and can't be randomly added. So, it looks like we need to bump version of ProduceRequest in the following way: Version 0 - accept acks 1. I think we should keep the existing behavior too (i.e. not replace it with -1) to avoid surprising clients, but I'm willing to hear other opinions. Version 1 - do not accept acks 1 and return an error. Are we ok with the error I added in KAFKA-1697? We can use something less specific like InvalidRequestParameter. This error can be reused in the future and reduce the need to add errors, but will also be less clear to the client and its users. Maybe even add the error message string to the protocol in addition to the error code? (since we are bumping versions) I think maintaining the old version throughout 0.8.X makes sense. IMO dropping it for 0.9 is feasible, but I'll let client owners help make that call. Am I missing anything? Should I start a KIP? It seems like a KIP-type discussion :) Gwen On Thu, Jan 15, 2015 at 2:31 PM, Ewen Cheslack-Postava e...@confluent.io wrote: Gwen, I think the only option that wouldn't require a protocol version change is the one where acks 1 is converted to acks = -1 since it's the only one that doesn't potentially break older clients. The protocol guide says that the expected upgrade path is servers first, then clients, so old clients, including non-java clients, that may be using acks 1 should be able to work with a new broker version. It's more work, but I think dealing with the protocol change is the right thing to do since it eventually gets us to the behavior I think is better -- the broker should reject requests with invalid values. I think Joe and I were basically in agreement. In my mind the major piece missing from his description is how long we're going to maintain his case 0 behavior. It's impractical to maintain old versions forever, but it sounds like there hasn't been a decision on how long to maintain them. Maybe that's another item to add to KIPs -- protocol versions and behavior need to be listed as deprecated and the earliest version in which they'll be removed should be specified so users can understand which versions are guaranteed to be compatible, even if they're using (well-written) non-java clients. -Ewen On Thu, Jan 15, 2015 at 12:52 PM, Dana Powers dana.pow...@gmail.com wrote: clients don't break on unknown errors maybe true for the official java clients, but I dont think the assumption holds true for community-maintained clients and users of those clients. kafka-python generally follows the fail-fast philosophy and raises an exception on any unrecognized error code in any server response. in this case, kafka-python allows users to set their own required-acks policy when creating a producer instance. It is possible that users of kafka-python have deployed producer code that uses ack1 -- perhaps in production environments -- and for those users the new error code will crash their producer code. I would not be surprised if the same were true of other community clients. *one reason for the fail-fast approach is that there isn't great documentation on what errors to expect for each request / response -- so we use failures to alert that some error case is not handled properly. and because of that, introducing new error cases without bumping the api version is likely to cause those errors to get raised/thrown all the way back up to the user. of course we (client maintainers) can fix the issues in the client libraries and suggest users upgrade, but it's not the ideal situation. long-winded way of saying: I agree w/ Joe. -Dana On Thu, Jan 15, 2015 at 12:07 PM, Gwen Shapira gshap...@cloudera.com wrote: Is the protocol bump caused by the behavior change or the new error code? 1) IMO, error_codes are data, and clients can expect to receive errors that they don't understand (i.e. unknown errors). AFAIK, clients don't break on unknown errors, they are simple more challenging to debug. If we document the new behavior, then its definitely debuggable and fixable. 2) The behavior change is basically a deprecation - i.e. acks 1 were never documented, and are not supported by Kafka clients starting