[jira] [Commented] (KAFKA-1702) Messages silently Lost by producer
[ https://issues.apache.org/jira/browse/KAFKA-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14168942#comment-14168942 ] Alexis Midon commented on KAFKA-1702: - Also if #groupMessagesToSet is not in a try/catch, the error will break the loop on the broker list. All messages will get dropped, retries ignored, metrics won't get updated, etc. > Messages silently Lost by producer > -- > > Key: KAFKA-1702 > URL: https://issues.apache.org/jira/browse/KAFKA-1702 > Project: Kafka > Issue Type: Bug > Components: producer >Affects Versions: 0.8.1.1 >Reporter: Alexis Midon >Assignee: Jun Rao > Attachments: KAFKA-1702.0.patch > > > Hello, > we lost millions of messages because of this {{try/catch}} in the producer > {{DefaultEventHandler}}: > https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/producer/async/DefaultEventHandler.scala#L114-L116 > If a Throwable is caught by this {{try/catch}}, the retry policy will have no > effect and all yet-to-be-sent messages are lost (the error will break the > loop over the broker list). > This issue is very hard to detect because: the producer (async or sync) > cannot even catch the error, and *all* the metrics are updated as if > everything was fine. > Only the abnormal drop in the producers network I/O, or the incoming message > rate on the brokers; or the alerting on errors in producer logs could have > revealed the issue. > This behavior was introduced by KAFKA-300. I can't see a good reason for it, > so here is a patch that will let the retry-policy do its job when such a > {{Throwable}} occurs. > Thanks in advance for your help. > Alexis > ps: you might wonder how could this {{try/catch}} ever caught something? > {{DefaultEventHandler#groupMessagesToSet}} looks so harmless. > Here are the details: > We use Snappy compression. When the native snappy library is not installed on > the host, Snappy, during the initialization of class > {{org.xerial.snappy.Snappy}} will [write a C > library|https://github.com/xerial/snappy-java/blob/1.1.0/src/main/java/org/xerial/snappy/SnappyLoader.java#L312] > in the JVM temp directory {{java.io.tmpdir}}. > In our scenario, {{java.io.tmpdir}} was a subdirectory of {{/tmp}}. After an > instance reboot (thank you > [AWS|https://twitter.com/hashtag/AWSReboot?src=hash]!), the JVM temp > directory was removed. The JVM was then running with a non-existing temp dir. > Snappy class would be impossible to initialize and the following message > would be silently logged: > {code} > ERROR [2014-10-07 22:23:56,530] kafka.producer.async.DefaultEventHandler: > Failed to send messages > ! java.lang.NoClassDefFoundError: Could not initialize class > org.xerial.snappy.Snappy > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1702) Messages silently Lost by producer
[ https://issues.apache.org/jira/browse/KAFKA-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14168941#comment-14168941 ] Alexis Midon commented on KAFKA-1702: - I agree. In the async mode, until there is a callback, the best we can do is to make sure all the metrics are updated correctly, in particular ResendsPerSec, FailedSendsPerSec, which is critical for monitoring of async producers. In the sync mode, producer will get the exception, which is an improvement. thanks for your review > Messages silently Lost by producer > -- > > Key: KAFKA-1702 > URL: https://issues.apache.org/jira/browse/KAFKA-1702 > Project: Kafka > Issue Type: Bug > Components: producer >Affects Versions: 0.8.1.1 >Reporter: Alexis Midon >Assignee: Jun Rao > Attachments: KAFKA-1702.0.patch > > > Hello, > we lost millions of messages because of this {{try/catch}} in the producer > {{DefaultEventHandler}}: > https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/producer/async/DefaultEventHandler.scala#L114-L116 > If a Throwable is caught by this {{try/catch}}, the retry policy will have no > effect and all yet-to-be-sent messages are lost (the error will break the > loop over the broker list). > This issue is very hard to detect because: the producer (async or sync) > cannot even catch the error, and *all* the metrics are updated as if > everything was fine. > Only the abnormal drop in the producers network I/O, or the incoming message > rate on the brokers; or the alerting on errors in producer logs could have > revealed the issue. > This behavior was introduced by KAFKA-300. I can't see a good reason for it, > so here is a patch that will let the retry-policy do its job when such a > {{Throwable}} occurs. > Thanks in advance for your help. > Alexis > ps: you might wonder how could this {{try/catch}} ever caught something? > {{DefaultEventHandler#groupMessagesToSet}} looks so harmless. > Here are the details: > We use Snappy compression. When the native snappy library is not installed on > the host, Snappy, during the initialization of class > {{org.xerial.snappy.Snappy}} will [write a C > library|https://github.com/xerial/snappy-java/blob/1.1.0/src/main/java/org/xerial/snappy/SnappyLoader.java#L312] > in the JVM temp directory {{java.io.tmpdir}}. > In our scenario, {{java.io.tmpdir}} was a subdirectory of {{/tmp}}. After an > instance reboot (thank you > [AWS|https://twitter.com/hashtag/AWSReboot?src=hash]!), the JVM temp > directory was removed. The JVM was then running with a non-existing temp dir. > Snappy class would be impossible to initialize and the following message > would be silently logged: > {code} > ERROR [2014-10-07 22:23:56,530] kafka.producer.async.DefaultEventHandler: > Failed to send messages > ! java.lang.NoClassDefFoundError: Could not initialize class > org.xerial.snappy.Snappy > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (KAFKA-1702) Messages silently Lost by producer
[ https://issues.apache.org/jira/browse/KAFKA-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexis Midon updated KAFKA-1702: Comment: was deleted (was: diff --git core/src/main/scala/kafka/producer/async/DefaultEventHandler.scala core/src/main/scala/kafka/producer/async/DefaultEventHandler.scala index d8ac915..0f7f941 100644 --- core/src/main/scala/kafka/producer/async/DefaultEventHandler.scala +++ core/src/main/scala/kafka/producer/async/DefaultEventHandler.scala @@ -95,27 +95,28 @@ class DefaultEventHandler[K,V](config: ProducerConfig, val partitionedDataOpt = partitionAndCollate(messages) partitionedDataOpt match { case Some(partitionedData) => -val failedProduceRequests = new ArrayBuffer[KeyedMessage[K,Message]] -try { - for ((brokerid, messagesPerBrokerMap) <- partitionedData) { -if (logger.isTraceEnabled) - messagesPerBrokerMap.foreach(partitionAndEvent => -trace("Handling event for Topic: %s, Broker: %d, Partitions: %s".format(partitionAndEvent._1, brokerid, partitionAndEvent._2))) -val messageSetPerBroker = groupMessagesToSet(messagesPerBrokerMap) - -val failedTopicPartitions = send(brokerid, messageSetPerBroker) -failedTopicPartitions.foreach(topicPartition => { - messagesPerBrokerMap.get(topicPartition) match { -case Some(data) => failedProduceRequests.appendAll(data) -case None => // nothing - } -}) +val failedProduceRequests = new ArrayBuffer[KeyedMessage[K, Message]] +for ((brokerid, messagesPerBrokerMap) <- partitionedData) { + if (logger.isTraceEnabled) { +messagesPerBrokerMap.foreach(partitionAndEvent => + trace("Handling event for Topic: %s, Broker: %d, Partitions: %s".format(partitionAndEvent._1, brokerid, partitionAndEvent._2))) + } + val messageSetPerBrokerOpt = groupMessagesToSet(messagesPerBrokerMap) + messageSetPerBrokerOpt match { +case Some(messageSetPerBroker) => + val failedTopicPartitions = send(brokerid, messageSetPerBroker) + failedTopicPartitions.foreach(topicPartition => { +messagesPerBrokerMap.get(topicPartition) match { + case Some(data) => failedProduceRequests.appendAll(data) + case None => // nothing +} + }) +case None => // failed to group messages + messagesPerBrokerMap.values.foreach(m => failedProduceRequests.appendAll(m)) } -} catch { - case t: Throwable => error("Failed to send messages", t) } failedProduceRequests - case None => // all produce requests failed + case None => // failed to collate messages messages } } @@ -290,43 +291,46 @@ class DefaultEventHandler[K,V](config: ProducerConfig, } } - private def groupMessagesToSet(messagesPerTopicAndPartition: collection.mutable.Map[TopicAndPartition, Seq[KeyedMessage[K,Message]]]) = { + private def groupMessagesToSet(messagesPerTopicAndPartition: collection.mutable.Map[TopicAndPartition, Seq[KeyedMessage[K, Message]]]) = { /** enforce the compressed.topics config here. - * If the compression codec is anything other than NoCompressionCodec, - *Enable compression only for specified topics if any - *If the list of compressed topics is empty, then enable the specified compression codec for all topics - * If the compression codec is NoCompressionCodec, compression is disabled for all topics + * If the compression codec is anything other than NoCompressionCodec, + * Enable compression only for specified topics if any + * If the list of compressed topics is empty, then enable the specified compression codec for all topics + * If the compression codec is NoCompressionCodec, compression is disabled for all topics */ - -val messagesPerTopicPartition = messagesPerTopicAndPartition.map { case (topicAndPartition, messages) => - val rawMessages = messages.map(_.message) - ( topicAndPartition, -config.compressionCodec match { - case NoCompressionCodec => -debug("Sending %d messages with no compression to %s".format(messages.size, topicAndPartition)) -new ByteBufferMessageSet(NoCompressionCodec, rawMessages: _*) - case _ => -config.compressedTopics.size match { - case 0 => -debug("Sending %d messages with compression codec %d to %s" - .format(messages.size, config.compressionCodec.codec, topicAndPartition)) -new ByteBufferMessageSet(config.compressionCodec, rawMessages: _*) - case _ => -if(config.compressedTopics.contains(topicAndParti
[jira] [Updated] (KAFKA-1702) Messages silently Lost by producer
[ https://issues.apache.org/jira/browse/KAFKA-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexis Midon updated KAFKA-1702: Attachment: KAFKA-1702.0.patch > Messages silently Lost by producer > -- > > Key: KAFKA-1702 > URL: https://issues.apache.org/jira/browse/KAFKA-1702 > Project: Kafka > Issue Type: Bug > Components: producer >Affects Versions: 0.8.1.1 >Reporter: Alexis Midon >Assignee: Jun Rao > Attachments: KAFKA-1702.0.patch > > > Hello, > we lost millions of messages because of this {{try/catch}} in the producer > {{DefaultEventHandler}}: > https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/producer/async/DefaultEventHandler.scala#L114-L116 > If a Throwable is caught by this {{try/catch}}, the retry policy will have no > effect and all yet-to-be-sent messages are lost (the error will break the > loop over the broker list). > This issue is very hard to detect because: the producer (async or sync) > cannot even catch the error, and *all* the metrics are updated as if > everything was fine. > Only the abnormal drop in the producers network I/O, or the incoming message > rate on the brokers; or the alerting on errors in producer logs could have > revealed the issue. > This behavior was introduced by KAFKA-300. I can't see a good reason for it, > so here is a patch that will let the retry-policy do its job when such a > {{Throwable}} occurs. > Thanks in advance for your help. > Alexis > ps: you might wonder how could this {{try/catch}} ever caught something? > {{DefaultEventHandler#groupMessagesToSet}} looks so harmless. > Here are the details: > We use Snappy compression. When the native snappy library is not installed on > the host, Snappy, during the initialization of class > {{org.xerial.snappy.Snappy}} will [write a C > library|https://github.com/xerial/snappy-java/blob/1.1.0/src/main/java/org/xerial/snappy/SnappyLoader.java#L312] > in the JVM temp directory {{java.io.tmpdir}}. > In our scenario, {{java.io.tmpdir}} was a subdirectory of {{/tmp}}. After an > instance reboot (thank you > [AWS|https://twitter.com/hashtag/AWSReboot?src=hash]!), the JVM temp > directory was removed. The JVM was then running with a non-existing temp dir. > Snappy class would be impossible to initialize and the following message > would be silently logged: > {code} > ERROR [2014-10-07 22:23:56,530] kafka.producer.async.DefaultEventHandler: > Failed to send messages > ! java.lang.NoClassDefFoundError: Could not initialize class > org.xerial.snappy.Snappy > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1702) Messages silently Lost by producer
[ https://issues.apache.org/jira/browse/KAFKA-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexis Midon updated KAFKA-1702: Status: Patch Available (was: Open) diff --git core/src/main/scala/kafka/producer/async/DefaultEventHandler.scala core/src/main/scala/kafka/producer/async/DefaultEventHandler.scala index d8ac915..0f7f941 100644 --- core/src/main/scala/kafka/producer/async/DefaultEventHandler.scala +++ core/src/main/scala/kafka/producer/async/DefaultEventHandler.scala @@ -95,27 +95,28 @@ class DefaultEventHandler[K,V](config: ProducerConfig, val partitionedDataOpt = partitionAndCollate(messages) partitionedDataOpt match { case Some(partitionedData) => -val failedProduceRequests = new ArrayBuffer[KeyedMessage[K,Message]] -try { - for ((brokerid, messagesPerBrokerMap) <- partitionedData) { -if (logger.isTraceEnabled) - messagesPerBrokerMap.foreach(partitionAndEvent => -trace("Handling event for Topic: %s, Broker: %d, Partitions: %s".format(partitionAndEvent._1, brokerid, partitionAndEvent._2))) -val messageSetPerBroker = groupMessagesToSet(messagesPerBrokerMap) - -val failedTopicPartitions = send(brokerid, messageSetPerBroker) -failedTopicPartitions.foreach(topicPartition => { - messagesPerBrokerMap.get(topicPartition) match { -case Some(data) => failedProduceRequests.appendAll(data) -case None => // nothing - } -}) +val failedProduceRequests = new ArrayBuffer[KeyedMessage[K, Message]] +for ((brokerid, messagesPerBrokerMap) <- partitionedData) { + if (logger.isTraceEnabled) { +messagesPerBrokerMap.foreach(partitionAndEvent => + trace("Handling event for Topic: %s, Broker: %d, Partitions: %s".format(partitionAndEvent._1, brokerid, partitionAndEvent._2))) + } + val messageSetPerBrokerOpt = groupMessagesToSet(messagesPerBrokerMap) + messageSetPerBrokerOpt match { +case Some(messageSetPerBroker) => + val failedTopicPartitions = send(brokerid, messageSetPerBroker) + failedTopicPartitions.foreach(topicPartition => { +messagesPerBrokerMap.get(topicPartition) match { + case Some(data) => failedProduceRequests.appendAll(data) + case None => // nothing +} + }) +case None => // failed to group messages + messagesPerBrokerMap.values.foreach(m => failedProduceRequests.appendAll(m)) } -} catch { - case t: Throwable => error("Failed to send messages", t) } failedProduceRequests - case None => // all produce requests failed + case None => // failed to collate messages messages } } @@ -290,43 +291,46 @@ class DefaultEventHandler[K,V](config: ProducerConfig, } } - private def groupMessagesToSet(messagesPerTopicAndPartition: collection.mutable.Map[TopicAndPartition, Seq[KeyedMessage[K,Message]]]) = { + private def groupMessagesToSet(messagesPerTopicAndPartition: collection.mutable.Map[TopicAndPartition, Seq[KeyedMessage[K, Message]]]) = { /** enforce the compressed.topics config here. - * If the compression codec is anything other than NoCompressionCodec, - *Enable compression only for specified topics if any - *If the list of compressed topics is empty, then enable the specified compression codec for all topics - * If the compression codec is NoCompressionCodec, compression is disabled for all topics + * If the compression codec is anything other than NoCompressionCodec, + * Enable compression only for specified topics if any + * If the list of compressed topics is empty, then enable the specified compression codec for all topics + * If the compression codec is NoCompressionCodec, compression is disabled for all topics */ - -val messagesPerTopicPartition = messagesPerTopicAndPartition.map { case (topicAndPartition, messages) => - val rawMessages = messages.map(_.message) - ( topicAndPartition, -config.compressionCodec match { - case NoCompressionCodec => -debug("Sending %d messages with no compression to %s".format(messages.size, topicAndPartition)) -new ByteBufferMessageSet(NoCompressionCodec, rawMessages: _*) - case _ => -config.compressedTopics.size match { - case 0 => -debug("Sending %d messages with compression codec %d to %s" - .format(messages.size, config.compressionCodec.codec, topicAndPartition)) -new ByteBufferMessageSet(config.compressionCodec, rawMessages: _*) - case _ => -if(config.compressedTopics.contains(topi
[jira] [Created] (KAFKA-1702) Messages silently Lost by producer
Alexis Midon created KAFKA-1702: --- Summary: Messages silently Lost by producer Key: KAFKA-1702 URL: https://issues.apache.org/jira/browse/KAFKA-1702 Project: Kafka Issue Type: Bug Components: producer Affects Versions: 0.8.1.1 Reporter: Alexis Midon Assignee: Jun Rao Hello, we lost millions of messages because of this {{try/catch}} in the producer {{DefaultEventHandler}}: https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/producer/async/DefaultEventHandler.scala#L114-L116 If a Throwable is caught by this {{try/catch}}, the retry policy will have no effect and all yet-to-be-sent messages are lost (the error will break the loop over the broker list). This issue is very hard to detect because: the producer (async or sync) cannot even catch the error, and *all* the metrics are updated as if everything was fine. Only the abnormal drop in the producers network I/O, or the incoming message rate on the brokers; or the alerting on errors in producer logs could have revealed the issue. This behavior was introduced by KAFKA-300. I can't see a good reason for it, so here is a patch that will let the retry-policy do its job when such a {{Throwable}} occurs. Thanks in advance for your help. Alexis ps: you might wonder how could this {{try/catch}} ever caught something? {{DefaultEventHandler#groupMessagesToSet}} looks so harmless. Here are the details: We use Snappy compression. When the native snappy library is not installed on the host, Snappy, during the initialization of class {{org.xerial.snappy.Snappy}} will [write a C library|https://github.com/xerial/snappy-java/blob/1.1.0/src/main/java/org/xerial/snappy/SnappyLoader.java#L312] in the JVM temp directory {{java.io.tmpdir}}. In our scenario, {{java.io.tmpdir}} was a subdirectory of {{/tmp}}. After an instance reboot (thank you [AWS|https://twitter.com/hashtag/AWSReboot?src=hash]!), the JVM temp directory was removed. The JVM was then running with a non-existing temp dir. Snappy class would be impossible to initialize and the following message would be silently logged: {code} ERROR [2014-10-07 22:23:56,530] kafka.producer.async.DefaultEventHandler: Failed to send messages ! java.lang.NoClassDefFoundError: Could not initialize class org.xerial.snappy.Snappy {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1597) New metrics: ResponseQueueSize and BeingSentResponses
[ https://issues.apache.org/jira/browse/KAFKA-1597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14104815#comment-14104815 ] Alexis Midon commented on KAFKA-1597: - [~nehanarkhede] patch updated. thanks for your feedback! > New metrics: ResponseQueueSize and BeingSentResponses > - > > Key: KAFKA-1597 > URL: https://issues.apache.org/jira/browse/KAFKA-1597 > Project: Kafka > Issue Type: New Feature > Components: core >Reporter: Alexis Midon >Priority: Minor > Labels: features > Attachments: ResponseQueueSize.patch, ResponsesBeingSent.patch > > > This patch adds two metrics: > h3. ResponseQueueSize > As of 0.8.1, the sizes of the response queues are [reported as different > metrics|https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/network/RequestChannel.scala#L127-L134] > - one per processor thread. This is not very ideal for different reasons: > * charts have to sum the different metrics > * the metrics collection system might not support 'wild card queries' like > {{sum:kafka.network.RequestChannel.Processor_*_ResponseQueueSize}} in which > case monitoring now depends on the number of configured network threads > * monitoring the response by thread is not very valuable. However the global > number of responses is useful. > * proposal* > So this patch exposes the total number of queued responses as a metric > {{ResponseQueueSize}} > *implementation* > In {{RequestChannel}}, create a Gauge that adds up the size of the response > queues. > h3. BeingSentResponses > As of 0.8.1, the processor threads will poll responses from the queues and > attach them to the SelectionKey as fast as possible. The consequence of that > is that the response queues are not a good indicator of the number of > "in-flight" responses. The {{ServerSocketChannel}} acting as another queue of > response to be sent. > The current metrics don't reflect the size of this "buffer", which is an > issue. > *proposal* > This patch adds a gauge that keeps track of the number of responses being > handled by the {{ServerSocketChannel}}. > That new metric is named "BeingSentResponses" (who said naming was hard?) > *implementation* > To calculate that metric, the patch adds up the number of SelectionKeys > interested in writing, across processor threads. > Another approach could be to keep all in-flight responses in a data structure > (let's say a map) shared by the processor threads. A response will be added > to that map when dequeued from the response queue, and removed when the write > is complete. The gauge will simply report the size of that map. I decided > against that second approach as it is more intrusive and requires some > additional bookkeeping to gather information already available through the > {{SelectionKey}}'s -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (KAFKA-1597) New metrics: ResponseQueueSize and BeingSentResponses
[ https://issues.apache.org/jira/browse/KAFKA-1597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexis Midon updated KAFKA-1597: Attachment: (was: KAFKA-1594_ResponseQueueSize.patch) > New metrics: ResponseQueueSize and BeingSentResponses > - > > Key: KAFKA-1597 > URL: https://issues.apache.org/jira/browse/KAFKA-1597 > Project: Kafka > Issue Type: New Feature > Components: core >Reporter: Alexis Midon >Priority: Minor > Labels: features > Attachments: ResponseQueueSize.patch, ResponsesBeingSent.patch > > > This patch adds two metrics: > h3. ResponseQueueSize > As of 0.8.1, the sizes of the response queues are [reported as different > metrics|https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/network/RequestChannel.scala#L127-L134] > - one per processor thread. This is not very ideal for different reasons: > * charts have to sum the different metrics > * the metrics collection system might not support 'wild card queries' like > {{sum:kafka.network.RequestChannel.Processor_*_ResponseQueueSize}} in which > case monitoring now depends on the number of configured network threads > * monitoring the response by thread is not very valuable. However the global > number of responses is useful. > * proposal* > So this patch exposes the total number of queued responses as a metric > {{ResponseQueueSize}} > *implementation* > In {{RequestChannel}}, create a Gauge that adds up the size of the response > queues. > h3. BeingSentResponses > As of 0.8.1, the processor threads will poll responses from the queues and > attach them to the SelectionKey as fast as possible. The consequence of that > is that the response queues are not a good indicator of the number of > "in-flight" responses. The {{ServerSocketChannel}} acting as another queue of > response to be sent. > The current metrics don't reflect the size of this "buffer", which is an > issue. > *proposal* > This patch adds a gauge that keeps track of the number of responses being > handled by the {{ServerSocketChannel}}. > That new metric is named "BeingSentResponses" (who said naming was hard?) > *implementation* > To calculate that metric, the patch adds up the number of SelectionKeys > interested in writing, across processor threads. > Another approach could be to keep all in-flight responses in a data structure > (let's say a map) shared by the processor threads. A response will be added > to that map when dequeued from the response queue, and removed when the write > is complete. The gauge will simply report the size of that map. I decided > against that second approach as it is more intrusive and requires some > additional bookkeeping to gather information already available through the > {{SelectionKey}}'s -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (KAFKA-1597) New metrics: ResponseQueueSize and BeingSentResponses
[ https://issues.apache.org/jira/browse/KAFKA-1597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexis Midon updated KAFKA-1597: Attachment: (was: KAFKA-1594_ResponsesBeingSent.patch) > New metrics: ResponseQueueSize and BeingSentResponses > - > > Key: KAFKA-1597 > URL: https://issues.apache.org/jira/browse/KAFKA-1597 > Project: Kafka > Issue Type: New Feature > Components: core >Reporter: Alexis Midon >Priority: Minor > Labels: features > Attachments: ResponseQueueSize.patch, ResponsesBeingSent.patch > > > This patch adds two metrics: > h3. ResponseQueueSize > As of 0.8.1, the sizes of the response queues are [reported as different > metrics|https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/network/RequestChannel.scala#L127-L134] > - one per processor thread. This is not very ideal for different reasons: > * charts have to sum the different metrics > * the metrics collection system might not support 'wild card queries' like > {{sum:kafka.network.RequestChannel.Processor_*_ResponseQueueSize}} in which > case monitoring now depends on the number of configured network threads > * monitoring the response by thread is not very valuable. However the global > number of responses is useful. > * proposal* > So this patch exposes the total number of queued responses as a metric > {{ResponseQueueSize}} > *implementation* > In {{RequestChannel}}, create a Gauge that adds up the size of the response > queues. > h3. BeingSentResponses > As of 0.8.1, the processor threads will poll responses from the queues and > attach them to the SelectionKey as fast as possible. The consequence of that > is that the response queues are not a good indicator of the number of > "in-flight" responses. The {{ServerSocketChannel}} acting as another queue of > response to be sent. > The current metrics don't reflect the size of this "buffer", which is an > issue. > *proposal* > This patch adds a gauge that keeps track of the number of responses being > handled by the {{ServerSocketChannel}}. > That new metric is named "BeingSentResponses" (who said naming was hard?) > *implementation* > To calculate that metric, the patch adds up the number of SelectionKeys > interested in writing, across processor threads. > Another approach could be to keep all in-flight responses in a data structure > (let's say a map) shared by the processor threads. A response will be added > to that map when dequeued from the response queue, and removed when the write > is complete. The gauge will simply report the size of that map. I decided > against that second approach as it is more intrusive and requires some > additional bookkeeping to gather information already available through the > {{SelectionKey}}'s -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (KAFKA-1597) New metrics: ResponseQueueSize and BeingSentResponses
[ https://issues.apache.org/jira/browse/KAFKA-1597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexis Midon updated KAFKA-1597: Attachment: ResponsesBeingSent.patch ResponseQueueSize.patch > New metrics: ResponseQueueSize and BeingSentResponses > - > > Key: KAFKA-1597 > URL: https://issues.apache.org/jira/browse/KAFKA-1597 > Project: Kafka > Issue Type: New Feature > Components: core >Reporter: Alexis Midon >Priority: Minor > Labels: features > Attachments: ResponseQueueSize.patch, ResponsesBeingSent.patch > > > This patch adds two metrics: > h3. ResponseQueueSize > As of 0.8.1, the sizes of the response queues are [reported as different > metrics|https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/network/RequestChannel.scala#L127-L134] > - one per processor thread. This is not very ideal for different reasons: > * charts have to sum the different metrics > * the metrics collection system might not support 'wild card queries' like > {{sum:kafka.network.RequestChannel.Processor_*_ResponseQueueSize}} in which > case monitoring now depends on the number of configured network threads > * monitoring the response by thread is not very valuable. However the global > number of responses is useful. > * proposal* > So this patch exposes the total number of queued responses as a metric > {{ResponseQueueSize}} > *implementation* > In {{RequestChannel}}, create a Gauge that adds up the size of the response > queues. > h3. BeingSentResponses > As of 0.8.1, the processor threads will poll responses from the queues and > attach them to the SelectionKey as fast as possible. The consequence of that > is that the response queues are not a good indicator of the number of > "in-flight" responses. The {{ServerSocketChannel}} acting as another queue of > response to be sent. > The current metrics don't reflect the size of this "buffer", which is an > issue. > *proposal* > This patch adds a gauge that keeps track of the number of responses being > handled by the {{ServerSocketChannel}}. > That new metric is named "BeingSentResponses" (who said naming was hard?) > *implementation* > To calculate that metric, the patch adds up the number of SelectionKeys > interested in writing, across processor threads. > Another approach could be to keep all in-flight responses in a data structure > (let's say a map) shared by the processor threads. A response will be added > to that map when dequeued from the response queue, and removed when the write > is complete. The gauge will simply report the size of that map. I decided > against that second approach as it is more intrusive and requires some > additional bookkeeping to gather information already available through the > {{SelectionKey}}'s -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (KAFKA-1597) New metrics: ResponseQueueSize and BeingSentResponses
[ https://issues.apache.org/jira/browse/KAFKA-1597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexis Midon updated KAFKA-1597: Attachment: (was: KAFKA-1594_BeingSentResponses.patch) > New metrics: ResponseQueueSize and BeingSentResponses > - > > Key: KAFKA-1597 > URL: https://issues.apache.org/jira/browse/KAFKA-1597 > Project: Kafka > Issue Type: New Feature > Components: core >Reporter: Alexis Midon >Priority: Minor > Labels: features > Attachments: KAFKA-1594_ResponseQueueSize.patch, > KAFKA-1594_ResponsesBeingSent.patch > > > This patch adds two metrics: > h3. ResponseQueueSize > As of 0.8.1, the sizes of the response queues are [reported as different > metrics|https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/network/RequestChannel.scala#L127-L134] > - one per processor thread. This is not very ideal for different reasons: > * charts have to sum the different metrics > * the metrics collection system might not support 'wild card queries' like > {{sum:kafka.network.RequestChannel.Processor_*_ResponseQueueSize}} in which > case monitoring now depends on the number of configured network threads > * monitoring the response by thread is not very valuable. However the global > number of responses is useful. > * proposal* > So this patch exposes the total number of queued responses as a metric > {{ResponseQueueSize}} > *implementation* > In {{RequestChannel}}, create a Gauge that adds up the size of the response > queues. > h3. BeingSentResponses > As of 0.8.1, the processor threads will poll responses from the queues and > attach them to the SelectionKey as fast as possible. The consequence of that > is that the response queues are not a good indicator of the number of > "in-flight" responses. The {{ServerSocketChannel}} acting as another queue of > response to be sent. > The current metrics don't reflect the size of this "buffer", which is an > issue. > *proposal* > This patch adds a gauge that keeps track of the number of responses being > handled by the {{ServerSocketChannel}}. > That new metric is named "BeingSentResponses" (who said naming was hard?) > *implementation* > To calculate that metric, the patch adds up the number of SelectionKeys > interested in writing, across processor threads. > Another approach could be to keep all in-flight responses in a data structure > (let's say a map) shared by the processor threads. A response will be added > to that map when dequeued from the response queue, and removed when the write > is complete. The gauge will simply report the size of that map. I decided > against that second approach as it is more intrusive and requires some > additional bookkeeping to gather information already available through the > {{SelectionKey}}'s -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (KAFKA-1597) New metrics: ResponseQueueSize and BeingSentResponses
[ https://issues.apache.org/jira/browse/KAFKA-1597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexis Midon updated KAFKA-1597: Attachment: KAFKA-1594_ResponsesBeingSent.patch rename {{BeingSentResponses}} into {{ResponsesBeingSent}} > New metrics: ResponseQueueSize and BeingSentResponses > - > > Key: KAFKA-1597 > URL: https://issues.apache.org/jira/browse/KAFKA-1597 > Project: Kafka > Issue Type: New Feature > Components: core >Reporter: Alexis Midon >Priority: Minor > Labels: features > Attachments: KAFKA-1594_ResponseQueueSize.patch, > KAFKA-1594_ResponsesBeingSent.patch > > > This patch adds two metrics: > h3. ResponseQueueSize > As of 0.8.1, the sizes of the response queues are [reported as different > metrics|https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/network/RequestChannel.scala#L127-L134] > - one per processor thread. This is not very ideal for different reasons: > * charts have to sum the different metrics > * the metrics collection system might not support 'wild card queries' like > {{sum:kafka.network.RequestChannel.Processor_*_ResponseQueueSize}} in which > case monitoring now depends on the number of configured network threads > * monitoring the response by thread is not very valuable. However the global > number of responses is useful. > * proposal* > So this patch exposes the total number of queued responses as a metric > {{ResponseQueueSize}} > *implementation* > In {{RequestChannel}}, create a Gauge that adds up the size of the response > queues. > h3. BeingSentResponses > As of 0.8.1, the processor threads will poll responses from the queues and > attach them to the SelectionKey as fast as possible. The consequence of that > is that the response queues are not a good indicator of the number of > "in-flight" responses. The {{ServerSocketChannel}} acting as another queue of > response to be sent. > The current metrics don't reflect the size of this "buffer", which is an > issue. > *proposal* > This patch adds a gauge that keeps track of the number of responses being > handled by the {{ServerSocketChannel}}. > That new metric is named "BeingSentResponses" (who said naming was hard?) > *implementation* > To calculate that metric, the patch adds up the number of SelectionKeys > interested in writing, across processor threads. > Another approach could be to keep all in-flight responses in a data structure > (let's say a map) shared by the processor threads. A response will be added > to that map when dequeued from the response queue, and removed when the write > is complete. The gauge will simply report the size of that map. I decided > against that second approach as it is more intrusive and requires some > additional bookkeeping to gather information already available through the > {{SelectionKey}}'s -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (KAFKA-1597) New metrics: ResponseQueueSize and BeingSentResponses
[ https://issues.apache.org/jira/browse/KAFKA-1597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14102480#comment-14102480 ] Alexis Midon commented on KAFKA-1597: - # NetworkProcessorAvgIdlePercent and RequestHandlerAvgIdlePercent don't give information regarding response processing # could you explain what RequestBeingProcessed is? how different is it from the number of I/O threads? My understanding of the code is that requests are processed by the I/O threads process requests. https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/server/KafkaServer.scala#L90 https://dchtm6r471mui.cloudfront.net/hackpad.com_SIX7UAFo7hB_p.186494_1405370457117_KafkaThreadingModel.png # Regarding InflightRequests. A request flows through a sequence of transformations/states. To me, measuring every component of that pipeline is the most basic and reliable way of understanding the behavior of the system. It also gives users the option to use the data in any way they might want. So i'd that exposing these root metrics would be great. Considering that only BeingSentResponses is missing, the implementation cost is pretty low. > New metrics: ResponseQueueSize and BeingSentResponses > - > > Key: KAFKA-1597 > URL: https://issues.apache.org/jira/browse/KAFKA-1597 > Project: Kafka > Issue Type: New Feature > Components: core >Reporter: Alexis Midon >Priority: Minor > Labels: features > Attachments: KAFKA-1594_BeingSentResponses.patch, > KAFKA-1594_ResponseQueueSize.patch > > > This patch adds two metrics: > h3. ResponseQueueSize > As of 0.8.1, the sizes of the response queues are [reported as different > metrics|https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/network/RequestChannel.scala#L127-L134] > - one per processor thread. This is not very ideal for different reasons: > * charts have to sum the different metrics > * the metrics collection system might not support 'wild card queries' like > {{sum:kafka.network.RequestChannel.Processor_*_ResponseQueueSize}} in which > case monitoring now depends on the number of configured network threads > * monitoring the response by thread is not very valuable. However the global > number of responses is useful. > * proposal* > So this patch exposes the total number of queued responses as a metric > {{ResponseQueueSize}} > *implementation* > In {{RequestChannel}}, create a Gauge that adds up the size of the response > queues. > h3. BeingSentResponses > As of 0.8.1, the processor threads will poll responses from the queues and > attach them to the SelectionKey as fast as possible. The consequence of that > is that the response queues are not a good indicator of the number of > "in-flight" responses. The {{ServerSocketChannel}} acting as another queue of > response to be sent. > The current metrics don't reflect the size of this "buffer", which is an > issue. > *proposal* > This patch adds a gauge that keeps track of the number of responses being > handled by the {{ServerSocketChannel}}. > That new metric is named "BeingSentResponses" (who said naming was hard?) > *implementation* > To calculate that metric, the patch adds up the number of SelectionKeys > interested in writing, across processor threads. > Another approach could be to keep all in-flight responses in a data structure > (let's say a map) shared by the processor threads. A response will be added > to that map when dequeued from the response queue, and removed when the write > is complete. The gauge will simply report the size of that map. I decided > against that second approach as it is more intrusive and requires some > additional bookkeeping to gather information already available through the > {{SelectionKey}}'s -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (KAFKA-1597) New metrics: ResponseQueueSize and BeingSentResponses
[ https://issues.apache.org/jira/browse/KAFKA-1597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14101515#comment-14101515 ] Alexis Midon edited comment on KAFKA-1597 at 8/18/14 11:22 PM: --- * regarding {{ResponseQueueSize}}, I think I'm missing some context. Why is an analysis by response thread useful? Your experience might help me understand. It seems to me that the total number of queued responses is the only metric that helps understand the load on the Socket server (assuming it has some kind of back-pressure mechanism). * with {{BeingSentResponses}}, my goal was to avoid having a blind spot with the Socket server. Every element on the "request handling chain" is monitored except the last, the socket server. I considered tracking {{InflightResponses}} which would be {{ResponseQueueSize + BeingSentResponses}} but I decided against since having 2 metrics helps in monitoring 2 different component of the "response chain". I think that {{InflightRequests}} will have the same issue: it would be a superset of already available metrics {{InflightRequest = RequestQueueSize + number of I/O threads+ ResponseQueueSize + BeingSentResponses}}. And I think the break down would be more useful. was (Author: alexismidon): * regarding {{ResponseQueueSize}}, I think I'm missing some context. Why is an analysis by response thread useful? Your experience might help me understand. It seems to me that the total number of queued responses is the only metric that helps understand the load on the Socket server (assuming it has some kind of back-pressure mechanism). * with {{BeingSentResponses}}, my goal was to avoid having a blind spot with the Socket server. Every element on the "request handling chain" is monitored except the last, the socket server. I considered tracking {{InflightResponses}} which would be {{ResponseQueueSize + BeingSentResponses }} but I decided against since having 2 metrics helps in monitoring 2 different component of the "response chain". I think that {{InflightRequests}} will have the same issue: it would be a superset of already available metrics {{InflightRequest = RequestQueueSize + number of I/O threads+ ResponseQueueSize + BeingSentResponses}}. And I think the break down would be more useful. > New metrics: ResponseQueueSize and BeingSentResponses > - > > Key: KAFKA-1597 > URL: https://issues.apache.org/jira/browse/KAFKA-1597 > Project: Kafka > Issue Type: New Feature > Components: core >Reporter: Alexis Midon >Priority: Minor > Labels: features > Attachments: KAFKA-1594_BeingSentResponses.patch, > KAFKA-1594_ResponseQueueSize.patch > > > This patch adds two metrics: > h3. ResponseQueueSize > As of 0.8.1, the sizes of the response queues are [reported as different > metrics|https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/network/RequestChannel.scala#L127-L134] > - one per processor thread. This is not very ideal for different reasons: > * charts have to sum the different metrics > * the metrics collection system might not support 'wild card queries' like > {{sum:kafka.network.RequestChannel.Processor_*_ResponseQueueSize}} in which > case monitoring now depends on the number of configured network threads > * monitoring the response by thread is not very valuable. However the global > number of responses is useful. > * proposal* > So this patch exposes the total number of queued responses as a metric > {{ResponseQueueSize}} > *implementation* > In {{RequestChannel}}, create a Gauge that adds up the size of the response > queues. > h3. BeingSentResponses > As of 0.8.1, the processor threads will poll responses from the queues and > attach them to the SelectionKey as fast as possible. The consequence of that > is that the response queues are not a good indicator of the number of > "in-flight" responses. The {{ServerSocketChannel}} acting as another queue of > response to be sent. > The current metrics don't reflect the size of this "buffer", which is an > issue. > *proposal* > This patch adds a gauge that keeps track of the number of responses being > handled by the {{ServerSocketChannel}}. > That new metric is named "BeingSentResponses" (who said naming was hard?) > *implementation* > To calculate that metric, the patch adds up the number of SelectionKeys > interested in writing, across processor threads. > Another approach could be to keep all in-flight responses in a data structure > (let's say a map) shared by the processor threads. A response will be added > to that map when dequeued from the response queue, and removed when the write > is complete. The gauge will simply report the size of that map. I decided > against that second approach as it is more int
[jira] [Comment Edited] (KAFKA-1597) New metrics: ResponseQueueSize and BeingSentResponses
[ https://issues.apache.org/jira/browse/KAFKA-1597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14101515#comment-14101515 ] Alexis Midon edited comment on KAFKA-1597 at 8/18/14 11:22 PM: --- * regarding {{ResponseQueueSize}}, I think I'm missing some context. Why is an analysis by response thread useful? Your experience might help me understand. It seems to me that the total number of queued responses is the only metric that helps understand the load on the Socket server (assuming it has some kind of back-pressure mechanism). * with {{BeingSentResponses}}, my goal was to avoid having a blind spot with the Socket server. Every element on the "request handling chain" is monitored except the last, the socket server. I considered tracking {{InflightResponses}} which would be {{ResponseQueueSize + BeingSentResponses }} but I decided against since having 2 metrics helps in monitoring 2 different component of the "response chain". I think that {{InflightRequests}} will have the same issue: it would be a superset of already available metrics {{InflightRequest = RequestQueueSize + number of I/O threads+ ResponseQueueSize + BeingSentResponses}}. And I think the break down would be more useful. was (Author: alexismidon): * regarding {{ResponseQueueSize}}, I think I'm missing some context. Why is an analysis by response thread useful? Your experience might help me understand. It seems to me that the total number of queued responses is the only metric that helps understand the load on the Socket server (assuming it has some kind of back-pressure mechanism). * with {{BeingSentResponses}}, my goal was to avoid having a blind spot with the Socket server. Every element on the "request handling chain" is monitored except the last, the socket server. I considered tracking {{InflightResponses}} which would be {{ ResponseQueueSize + BeingSentResponses }} but I decided against since having 2 metrics helps in monitoring 2 different component of the "response chain". I think that {{InflightRequests}} will have the same issue: it would be a superset of already available metrics {{InflightRequest = RequestQueueSize + number of I/O threads+ ResponseQueueSize + BeingSentResponses}}. And I think the break down would be more useful. > New metrics: ResponseQueueSize and BeingSentResponses > - > > Key: KAFKA-1597 > URL: https://issues.apache.org/jira/browse/KAFKA-1597 > Project: Kafka > Issue Type: New Feature > Components: core >Reporter: Alexis Midon >Priority: Minor > Labels: features > Attachments: KAFKA-1594_BeingSentResponses.patch, > KAFKA-1594_ResponseQueueSize.patch > > > This patch adds two metrics: > h3. ResponseQueueSize > As of 0.8.1, the sizes of the response queues are [reported as different > metrics|https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/network/RequestChannel.scala#L127-L134] > - one per processor thread. This is not very ideal for different reasons: > * charts have to sum the different metrics > * the metrics collection system might not support 'wild card queries' like > {{sum:kafka.network.RequestChannel.Processor_*_ResponseQueueSize}} in which > case monitoring now depends on the number of configured network threads > * monitoring the response by thread is not very valuable. However the global > number of responses is useful. > * proposal* > So this patch exposes the total number of queued responses as a metric > {{ResponseQueueSize}} > *implementation* > In {{RequestChannel}}, create a Gauge that adds up the size of the response > queues. > h3. BeingSentResponses > As of 0.8.1, the processor threads will poll responses from the queues and > attach them to the SelectionKey as fast as possible. The consequence of that > is that the response queues are not a good indicator of the number of > "in-flight" responses. The {{ServerSocketChannel}} acting as another queue of > response to be sent. > The current metrics don't reflect the size of this "buffer", which is an > issue. > *proposal* > This patch adds a gauge that keeps track of the number of responses being > handled by the {{ServerSocketChannel}}. > That new metric is named "BeingSentResponses" (who said naming was hard?) > *implementation* > To calculate that metric, the patch adds up the number of SelectionKeys > interested in writing, across processor threads. > Another approach could be to keep all in-flight responses in a data structure > (let's say a map) shared by the processor threads. A response will be added > to that map when dequeued from the response queue, and removed when the write > is complete. The gauge will simply report the size of that map. I decided > against that second approach as it is more i
[jira] [Commented] (KAFKA-1597) New metrics: ResponseQueueSize and BeingSentResponses
[ https://issues.apache.org/jira/browse/KAFKA-1597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14101515#comment-14101515 ] Alexis Midon commented on KAFKA-1597: - * regarding {{ResponseQueueSize}}, I think I'm missing some context. Why is an analysis by response thread useful? Your experience might help me understand. It seems to me that the total number of queued responses is the only metric that helps understand the load on the Socket server (assuming it has some kind of back-pressure mechanism). * with {{BeingSentResponses}}, my goal was to avoid having a blind spot with the Socket server. Every element on the "request handling chain" is monitored except the last, the socket server. I considered tracking {{InflightResponses}} which would be {{ ResponseQueueSize + BeingSentResponses }} but I decided against since having 2 metrics helps in monitoring 2 different component of the "response chain". I think that {{InflightRequests}} will have the same issue: it would be a superset of already available metrics {{InflightRequest = RequestQueueSize + number of I/O threads+ ResponseQueueSize + BeingSentResponses}}. And I think the break down would be more useful. > New metrics: ResponseQueueSize and BeingSentResponses > - > > Key: KAFKA-1597 > URL: https://issues.apache.org/jira/browse/KAFKA-1597 > Project: Kafka > Issue Type: New Feature > Components: core >Reporter: Alexis Midon >Priority: Minor > Labels: features > Attachments: KAFKA-1594_BeingSentResponses.patch, > KAFKA-1594_ResponseQueueSize.patch > > > This patch adds two metrics: > h3. ResponseQueueSize > As of 0.8.1, the sizes of the response queues are [reported as different > metrics|https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/network/RequestChannel.scala#L127-L134] > - one per processor thread. This is not very ideal for different reasons: > * charts have to sum the different metrics > * the metrics collection system might not support 'wild card queries' like > {{sum:kafka.network.RequestChannel.Processor_*_ResponseQueueSize}} in which > case monitoring now depends on the number of configured network threads > * monitoring the response by thread is not very valuable. However the global > number of responses is useful. > * proposal* > So this patch exposes the total number of queued responses as a metric > {{ResponseQueueSize}} > *implementation* > In {{RequestChannel}}, create a Gauge that adds up the size of the response > queues. > h3. BeingSentResponses > As of 0.8.1, the processor threads will poll responses from the queues and > attach them to the SelectionKey as fast as possible. The consequence of that > is that the response queues are not a good indicator of the number of > "in-flight" responses. The {{ServerSocketChannel}} acting as another queue of > response to be sent. > The current metrics don't reflect the size of this "buffer", which is an > issue. > *proposal* > This patch adds a gauge that keeps track of the number of responses being > handled by the {{ServerSocketChannel}}. > That new metric is named "BeingSentResponses" (who said naming was hard?) > *implementation* > To calculate that metric, the patch adds up the number of SelectionKeys > interested in writing, across processor threads. > Another approach could be to keep all in-flight responses in a data structure > (let's say a map) shared by the processor threads. A response will be added > to that map when dequeued from the response queue, and removed when the write > is complete. The gauge will simply report the size of that map. I decided > against that second approach as it is more intrusive and requires some > additional bookkeeping to gather information already available through the > {{SelectionKey}}'s -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (KAFKA-1594) new metrics: ResponseQueueSize and BeingSentResponses
[ https://issues.apache.org/jira/browse/KAFKA-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14097722#comment-14097722 ] Alexis Midon commented on KAFKA-1594: - you know what, I closed KAFKA-1593 thinking you could re-open KAFKA-1594. The irony.. ;) I created a third ticket, KAFKA-1597. > new metrics: ResponseQueueSize and BeingSentResponses > - > > Key: KAFKA-1594 > URL: https://issues.apache.org/jira/browse/KAFKA-1594 > Project: Kafka > Issue Type: New Feature > Components: core >Reporter: Alexis Midon >Priority: Minor > Labels: features > Attachments: KAFKA-1594_BeingSentResponses.patch, > KAFKA-1594_ResponseQueueSize.patch > > > This patch adds two metrics: > h3. ResponseQueueSize > As of 0.8.1, the sizes of the response queues are [reported as different > metrics|https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/network/RequestChannel.scala#L127-L134] > - one per processor thread. This is not very ideal for different reasons: > * charts have to sum the different metrics > * the metrics collection system might not support 'wild card queries' like > {{sum:kafka.network.RequestChannel.Processor_*_ResponseQueueSize}} in which > case monitoring now depends on the number of configured network threads > * monitoring the response by thread is not very valuable. However the global > number of responses is useful. > * proposal* > So this patch exposes the total number of queued responses as a metric > {{ResponseQueueSize}} > *implementation* > In {{RequestChannel}}, create a Gauge that adds up the size of the response > queues. > h3. BeingSentResponses > As of 0.8.1, the processor threads will poll responses from the queues and > attach them to the SelectionKey as fast as possible. The consequence of that > is that the response queues are not a good indicator of the number of > "in-flight" responses. The {{ServerSocketChannel}} acting as another queue of > response to be sent. > The current metrics don't reflect the size of this "buffer", which is an > issue. > *proposal* > This patch adds a gauge that keeps track of the number of responses being > handled by the {{ServerSocketChannel}}. > That new metric is named "BeingSentResponses" (who said naming was hard?) > *implementation* > To calculate that metric, the patch adds up the number of SelectionKeys > interested in writing, across processor threads. > Another approach could be to keep all in-flight responses in a data structure > (let's say a map) shared by the processor threads. A response will be added > to that map when dequeued from the response queue, and removed when the write > is complete. The gauge will simply report the size of that map. I decided > against that second approach as it is more intrusive and requires some > additional bookkeeping to gather information already available through the > {{SelectionKey}}'s -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (KAFKA-1594) 2 new metrics
[ https://issues.apache.org/jira/browse/KAFKA-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexis Midon updated KAFKA-1594: Summary: 2 new metrics (was: new metrics: ResponseQueueSize and BeingSentResponses) > 2 new metrics > - > > Key: KAFKA-1594 > URL: https://issues.apache.org/jira/browse/KAFKA-1594 > Project: Kafka > Issue Type: New Feature > Components: core >Reporter: Alexis Midon >Priority: Minor > Labels: features > Attachments: KAFKA-1594_BeingSentResponses.patch, > KAFKA-1594_ResponseQueueSize.patch > > > This patch adds two metrics: > h3. ResponseQueueSize > As of 0.8.1, the sizes of the response queues are [reported as different > metrics|https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/network/RequestChannel.scala#L127-L134] > - one per processor thread. This is not very ideal for different reasons: > * charts have to sum the different metrics > * the metrics collection system might not support 'wild card queries' like > {{sum:kafka.network.RequestChannel.Processor_*_ResponseQueueSize}} in which > case monitoring now depends on the number of configured network threads > * monitoring the response by thread is not very valuable. However the global > number of responses is useful. > * proposal* > So this patch exposes the total number of queued responses as a metric > {{ResponseQueueSize}} > *implementation* > In {{RequestChannel}}, create a Gauge that adds up the size of the response > queues. > h3. BeingSentResponses > As of 0.8.1, the processor threads will poll responses from the queues and > attach them to the SelectionKey as fast as possible. The consequence of that > is that the response queues are not a good indicator of the number of > "in-flight" responses. The {{ServerSocketChannel}} acting as another queue of > response to be sent. > The current metrics don't reflect the size of this "buffer", which is an > issue. > *proposal* > This patch adds a gauge that keeps track of the number of responses being > handled by the {{ServerSocketChannel}}. > That new metric is named "BeingSentResponses" (who said naming was hard?) > *implementation* > To calculate that metric, the patch adds up the number of SelectionKeys > interested in writing, across processor threads. > Another approach could be to keep all in-flight responses in a data structure > (let's say a map) shared by the processor threads. A response will be added > to that map when dequeued from the response queue, and removed when the write > is complete. The gauge will simply report the size of that map. I decided > against that second approach as it is more intrusive and requires some > additional bookkeeping to gather information already available through the > {{SelectionKey}}'s -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (KAFKA-1597) New metrics: ResponseQueueSize and BeingSentResponses
Alexis Midon created KAFKA-1597: --- Summary: New metrics: ResponseQueueSize and BeingSentResponses Key: KAFKA-1597 URL: https://issues.apache.org/jira/browse/KAFKA-1597 Project: Kafka Issue Type: New Feature Components: core Reporter: Alexis Midon Priority: Minor Attachments: KAFKA-1594_BeingSentResponses.patch, KAFKA-1594_ResponseQueueSize.patch This patch adds two metrics: h3. ResponseQueueSize As of 0.8.1, the sizes of the response queues are [reported as different metrics|https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/network/RequestChannel.scala#L127-L134] - one per processor thread. This is not very ideal for different reasons: * charts have to sum the different metrics * the metrics collection system might not support 'wild card queries' like {{sum:kafka.network.RequestChannel.Processor_*_ResponseQueueSize}} in which case monitoring now depends on the number of configured network threads * monitoring the response by thread is not very valuable. However the global number of responses is useful. * proposal* So this patch exposes the total number of queued responses as a metric {{ResponseQueueSize}} *implementation* In {{RequestChannel}}, create a Gauge that adds up the size of the response queues. h3. BeingSentResponses As of 0.8.1, the processor threads will poll responses from the queues and attach them to the SelectionKey as fast as possible. The consequence of that is that the response queues are not a good indicator of the number of "in-flight" responses. The {{ServerSocketChannel}} acting as another queue of response to be sent. The current metrics don't reflect the size of this "buffer", which is an issue. *proposal* This patch adds a gauge that keeps track of the number of responses being handled by the {{ServerSocketChannel}}. That new metric is named "BeingSentResponses" (who said naming was hard?) *implementation* To calculate that metric, the patch adds up the number of SelectionKeys interested in writing, across processor threads. Another approach could be to keep all in-flight responses in a data structure (let's say a map) shared by the processor threads. A response will be added to that map when dequeued from the response queue, and removed when the write is complete. The gauge will simply report the size of that map. I decided against that second approach as it is more intrusive and requires some additional bookkeeping to gather information already available through the {{SelectionKey}}'s -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (KAFKA-1594) new metrics: ResponseQueueSize and BeingSentResponses
[ https://issues.apache.org/jira/browse/KAFKA-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14097100#comment-14097100 ] Alexis Midon commented on KAFKA-1594: - Hi [~nehanarkhede] could you please re-open this issue, it has more details than KAFKA-1593 and patches are also attached. I closed KAFKA-1593. Sorry for the confusion, Alexis > new metrics: ResponseQueueSize and BeingSentResponses > - > > Key: KAFKA-1594 > URL: https://issues.apache.org/jira/browse/KAFKA-1594 > Project: Kafka > Issue Type: New Feature > Components: core >Reporter: Alexis Midon >Priority: Minor > Labels: features > Attachments: KAFKA-1594_BeingSentResponses.patch, > KAFKA-1594_ResponseQueueSize.patch > > > This patch adds two metrics: > h3. ResponseQueueSize > As of 0.8.1, the sizes of the response queues are [reported as different > metrics|https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/network/RequestChannel.scala#L127-L134] > - one per processor thread. This is not very ideal for different reasons: > * charts have to sum the different metrics > * the metrics collection system might not support 'wild card queries' like > {{sum:kafka.network.RequestChannel.Processor_*_ResponseQueueSize}} in which > case monitoring now depends on the number of configured network threads > * monitoring the response by thread is not very valuable. However the global > number of responses is useful. > * proposal* > So this patch exposes the total number of queued responses as a metric > {{ResponseQueueSize}} > *implementation* > In {{RequestChannel}}, create a Gauge that adds up the size of the response > queues. > h3. BeingSentResponses > As of 0.8.1, the processor threads will poll responses from the queues and > attach them to the SelectionKey as fast as possible. The consequence of that > is that the response queues are not a good indicator of the number of > "in-flight" responses. The {{ServerSocketChannel}} acting as another queue of > response to be sent. > The current metrics don't reflect the size of this "buffer", which is an > issue. > *proposal* > This patch adds a gauge that keeps track of the number of responses being > handled by the {{ServerSocketChannel}}. > That new metric is named "BeingSentResponses" (who said naming was hard?) > *implementation* > To calculate that metric, the patch adds up the number of SelectionKeys > interested in writing, across processor threads. > Another approach could be to keep all in-flight responses in a data structure > (let's say a map) shared by the processor threads. A response will be added > to that map when dequeued from the response queue, and removed when the write > is complete. The gauge will simply report the size of that map. I decided > against that second approach as it is more intrusive and requires some > additional bookkeeping to gather information already available through the > {{SelectionKey}}'s -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (KAFKA-1593) new metrics: ResponseQueueSize and BeingSentResponses
[ https://issues.apache.org/jira/browse/KAFKA-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexis Midon resolved KAFKA-1593. - Resolution: Duplicate created by error as a duplicate of KAFKA-1594 > new metrics: ResponseQueueSize and BeingSentResponses > - > > Key: KAFKA-1593 > URL: https://issues.apache.org/jira/browse/KAFKA-1593 > Project: Kafka > Issue Type: New Feature > Components: core >Reporter: Alexis Midon >Priority: Minor > > This patch adds two metrics: > h3. ResponseQueueSize > As of 0.8.1, the sizes of the response queues are [reported as different > metrics|https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/network/RequestChannel.scala#L127-L134] > - one per processor thread. This is not very ideal for different reasons: > * charts have to sum the different metrics > * the metrics collection system might not support 'wild card queries' like > {{sum:kafka.network.RequestChannel.Processor_*_ResponseQueueSize}} in which > case monitoring now depends on the number of configured network threads > * monitoring the response by thread is not very valuable. However the global > number of responses is useful. > * proposal* > So this patch exposes the total number of queued responses as a metric > {{ResponseQueueSize}} > *implementation* > In {{RequestChannel}}, create a Gauge that adds up the size of the response > queues. > *BeingSentResponses > As of 0.8.1, the processor threads will poll responses from the queues and > attach them to the SelectionKey as fast as possible. The consequence of that > is that the response queues are not a good indicator of the number of > "in-flight" responses. The {{ServerSocketChannel}} acting as another queue of > response to be sent. > The current metrics don't reflect the size of this "buffer", which is an > issue. > *proposal* > This patch adds a gauge that keeps track of the number of responses being > handled by the {{ServerSocketChannel}}. > That new metric is named "BeingSentResponses" (who said naming was hard?) > *implementation* > To calculate that metric, the patch adds up the number of SelectionKeys > interested in writing across processor threads. > An another approach could be to keep all in-flight responses in a data > structure (let's say a map) shared by the processor thread. A response will > be added to that map when dequeued from the response queue, and removed when > the write is complete. I decided agains that second approach as it is more > intrusive and requires some additional bookkeeping to gather information > already available through the {{SelectionKey}}'s -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (KAFKA-1594) new metrics: ResponseQueueSize and BeingSentResponses
[ https://issues.apache.org/jira/browse/KAFKA-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexis Midon updated KAFKA-1594: Description: This patch adds two metrics: h3. ResponseQueueSize As of 0.8.1, the sizes of the response queues are [reported as different metrics|https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/network/RequestChannel.scala#L127-L134] - one per processor thread. This is not very ideal for different reasons: * charts have to sum the different metrics * the metrics collection system might not support 'wild card queries' like {{sum:kafka.network.RequestChannel.Processor_*_ResponseQueueSize}} in which case monitoring now depends on the number of configured network threads * monitoring the response by thread is not very valuable. However the global number of responses is useful. * proposal* So this patch exposes the total number of queued responses as a metric {{ResponseQueueSize}} *implementation* In {{RequestChannel}}, create a Gauge that adds up the size of the response queues. h3. BeingSentResponses As of 0.8.1, the processor threads will poll responses from the queues and attach them to the SelectionKey as fast as possible. The consequence of that is that the response queues are not a good indicator of the number of "in-flight" responses. The {{ServerSocketChannel}} acting as another queue of response to be sent. The current metrics don't reflect the size of this "buffer", which is an issue. *proposal* This patch adds a gauge that keeps track of the number of responses being handled by the {{ServerSocketChannel}}. That new metric is named "BeingSentResponses" (who said naming was hard?) *implementation* To calculate that metric, the patch adds up the number of SelectionKeys interested in writing, across processor threads. Another approach could be to keep all in-flight responses in a data structure (let's say a map) shared by the processor threads. A response will be added to that map when dequeued from the response queue, and removed when the write is complete. The gauge will simply report the size of that map. I decided against that second approach as it is more intrusive and requires some additional bookkeeping to gather information already available through the {{SelectionKey}}'s was: This patch adds two metrics: h3. ResponseQueueSize As of 0.8.1, the sizes of the response queues are [reported as different metrics|https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/network/RequestChannel.scala#L127-L134] - one per processor thread. This is not very ideal for different reasons: * charts have to sum the different metrics * the metrics collection system might not support 'wild card queries' like {{sum:kafka.network.RequestChannel.Processor_*_ResponseQueueSize}} in which case monitoring now depends on the number of configured network threads * monitoring the response by thread is not very valuable. However the global number of responses is useful. * proposal* So this patch exposes the total number of queued responses as a metric {{ResponseQueueSize}} *implementation* In {{RequestChannel}}, create a Gauge that adds up the size of the response queues. h3. BeingSentResponses As of 0.8.1, the processor threads will poll responses from the queues and attach them to the SelectionKey as fast as possible. The consequence of that is that the response queues are not a good indicator of the number of "in-flight" responses. The {{ServerSocketChannel}} acting as another queue of response to be sent. The current metrics don't reflect the size of this "buffer", which is an issue. *proposal* This patch adds a gauge that keeps track of the number of responses being handled by the {{ServerSocketChannel}}. That new metric is named "BeingSentResponses" (who said naming was hard?) *implementation* To calculate that metric, the patch adds up the number of SelectionKeys interested in writing across processor threads. Another approach could be to keep all in-flight responses in a data structure (let's say a map) shared by the processor threads. A response will be added to that map when dequeued from the response queue, and removed when the write is complete. The gauge will simply report the size of that map. I decided against that second approach as it is more intrusive and requires some additional bookkeeping to gather information already available through the {{SelectionKey}}'s > new metrics: ResponseQueueSize and BeingSentResponses > - > > Key: KAFKA-1594 > URL: https://issues.apache.org/jira/browse/KAFKA-1594 > Project: Kafka > Issue Type: New Feature > Components: core >Reporter: Alexis Midon >Priority: Minor > Labels: features > Attachments: KAFKA-1594_BeingSentResponses.pat
[jira] [Updated] (KAFKA-1594) new metrics: ResponseQueueSize and BeingSentResponses
[ https://issues.apache.org/jira/browse/KAFKA-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexis Midon updated KAFKA-1594: Attachment: KAFKA-1594_ResponseQueueSize.patch KAFKA-1594_BeingSentResponses.patch adding 2 patch files > new metrics: ResponseQueueSize and BeingSentResponses > - > > Key: KAFKA-1594 > URL: https://issues.apache.org/jira/browse/KAFKA-1594 > Project: Kafka > Issue Type: New Feature > Components: core >Reporter: Alexis Midon >Priority: Minor > Labels: features > Attachments: KAFKA-1594_BeingSentResponses.patch, > KAFKA-1594_ResponseQueueSize.patch > > > This patch adds two metrics: > h3. ResponseQueueSize > As of 0.8.1, the sizes of the response queues are [reported as different > metrics|https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/network/RequestChannel.scala#L127-L134] > - one per processor thread. This is not very ideal for different reasons: > * charts have to sum the different metrics > * the metrics collection system might not support 'wild card queries' like > {{sum:kafka.network.RequestChannel.Processor_*_ResponseQueueSize}} in which > case monitoring now depends on the number of configured network threads > * monitoring the response by thread is not very valuable. However the global > number of responses is useful. > * proposal* > So this patch exposes the total number of queued responses as a metric > {{ResponseQueueSize}} > *implementation* > In {{RequestChannel}}, create a Gauge that adds up the size of the response > queues. > h3. BeingSentResponses > As of 0.8.1, the processor threads will poll responses from the queues and > attach them to the SelectionKey as fast as possible. The consequence of that > is that the response queues are not a good indicator of the number of > "in-flight" responses. The {{ServerSocketChannel}} acting as another queue of > response to be sent. > The current metrics don't reflect the size of this "buffer", which is an > issue. > *proposal* > This patch adds a gauge that keeps track of the number of responses being > handled by the {{ServerSocketChannel}}. > That new metric is named "BeingSentResponses" (who said naming was hard?) > *implementation* > To calculate that metric, the patch adds up the number of SelectionKeys > interested in writing across processor threads. > An another approach could be to keep all in-flight responses in a data > structure (let's say a map) shared by the processor thread. A response will > be added to that map when dequeued from the response queue, and removed when > the write is complete. I decided agains that second approach as it is more > intrusive and requires some additional bookkeeping to gather information > already available through the {{SelectionKey}}'s -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (KAFKA-1594) new metrics: ResponseQueueSize and BeingSentResponses
[ https://issues.apache.org/jira/browse/KAFKA-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexis Midon updated KAFKA-1594: Description: This patch adds two metrics: h3. ResponseQueueSize As of 0.8.1, the sizes of the response queues are [reported as different metrics|https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/network/RequestChannel.scala#L127-L134] - one per processor thread. This is not very ideal for different reasons: * charts have to sum the different metrics * the metrics collection system might not support 'wild card queries' like {{sum:kafka.network.RequestChannel.Processor_*_ResponseQueueSize}} in which case monitoring now depends on the number of configured network threads * monitoring the response by thread is not very valuable. However the global number of responses is useful. * proposal* So this patch exposes the total number of queued responses as a metric {{ResponseQueueSize}} *implementation* In {{RequestChannel}}, create a Gauge that adds up the size of the response queues. h3. BeingSentResponses As of 0.8.1, the processor threads will poll responses from the queues and attach them to the SelectionKey as fast as possible. The consequence of that is that the response queues are not a good indicator of the number of "in-flight" responses. The {{ServerSocketChannel}} acting as another queue of response to be sent. The current metrics don't reflect the size of this "buffer", which is an issue. *proposal* This patch adds a gauge that keeps track of the number of responses being handled by the {{ServerSocketChannel}}. That new metric is named "BeingSentResponses" (who said naming was hard?) *implementation* To calculate that metric, the patch adds up the number of SelectionKeys interested in writing across processor threads. Another approach could be to keep all in-flight responses in a data structure (let's say a map) shared by the processor threads. A response will be added to that map when dequeued from the response queue, and removed when the write is complete. The gauge will simply report the size of that map. I decided against that second approach as it is more intrusive and requires some additional bookkeeping to gather information already available through the {{SelectionKey}}'s was: This patch adds two metrics: h3. ResponseQueueSize As of 0.8.1, the sizes of the response queues are [reported as different metrics|https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/network/RequestChannel.scala#L127-L134] - one per processor thread. This is not very ideal for different reasons: * charts have to sum the different metrics * the metrics collection system might not support 'wild card queries' like {{sum:kafka.network.RequestChannel.Processor_*_ResponseQueueSize}} in which case monitoring now depends on the number of configured network threads * monitoring the response by thread is not very valuable. However the global number of responses is useful. * proposal* So this patch exposes the total number of queued responses as a metric {{ResponseQueueSize}} *implementation* In {{RequestChannel}}, create a Gauge that adds up the size of the response queues. h3. BeingSentResponses As of 0.8.1, the processor threads will poll responses from the queues and attach them to the SelectionKey as fast as possible. The consequence of that is that the response queues are not a good indicator of the number of "in-flight" responses. The {{ServerSocketChannel}} acting as another queue of response to be sent. The current metrics don't reflect the size of this "buffer", which is an issue. *proposal* This patch adds a gauge that keeps track of the number of responses being handled by the {{ServerSocketChannel}}. That new metric is named "BeingSentResponses" (who said naming was hard?) *implementation* To calculate that metric, the patch adds up the number of SelectionKeys interested in writing across processor threads. Another approach could be to keep all in-flight responses in a data structure (let's say a map) shared by the processor threads. A response will be added to that map when dequeued from the response queue, and removed when the write is complete. The gauge will simply report the size of that map. I decided agains that second approach as it is more intrusive and requires some additional bookkeeping to gather information already available through the {{SelectionKey}}'s > new metrics: ResponseQueueSize and BeingSentResponses > - > > Key: KAFKA-1594 > URL: https://issues.apache.org/jira/browse/KAFKA-1594 > Project: Kafka > Issue Type: New Feature > Components: core >Reporter: Alexis Midon >Priority: Minor > Labels: features > Attachments: KAFKA-1594_BeingSentResponses.patch
[jira] [Updated] (KAFKA-1594) new metrics: ResponseQueueSize and BeingSentResponses
[ https://issues.apache.org/jira/browse/KAFKA-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexis Midon updated KAFKA-1594: Description: This patch adds two metrics: h3. ResponseQueueSize As of 0.8.1, the sizes of the response queues are [reported as different metrics|https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/network/RequestChannel.scala#L127-L134] - one per processor thread. This is not very ideal for different reasons: * charts have to sum the different metrics * the metrics collection system might not support 'wild card queries' like {{sum:kafka.network.RequestChannel.Processor_*_ResponseQueueSize}} in which case monitoring now depends on the number of configured network threads * monitoring the response by thread is not very valuable. However the global number of responses is useful. * proposal* So this patch exposes the total number of queued responses as a metric {{ResponseQueueSize}} *implementation* In {{RequestChannel}}, create a Gauge that adds up the size of the response queues. h3. BeingSentResponses As of 0.8.1, the processor threads will poll responses from the queues and attach them to the SelectionKey as fast as possible. The consequence of that is that the response queues are not a good indicator of the number of "in-flight" responses. The {{ServerSocketChannel}} acting as another queue of response to be sent. The current metrics don't reflect the size of this "buffer", which is an issue. *proposal* This patch adds a gauge that keeps track of the number of responses being handled by the {{ServerSocketChannel}}. That new metric is named "BeingSentResponses" (who said naming was hard?) *implementation* To calculate that metric, the patch adds up the number of SelectionKeys interested in writing across processor threads. Another approach could be to keep all in-flight responses in a data structure (let's say a map) shared by the processor threads. A response will be added to that map when dequeued from the response queue, and removed when the write is complete. The gauge will simply report the size of that map. I decided agains that second approach as it is more intrusive and requires some additional bookkeeping to gather information already available through the {{SelectionKey}}'s was: This patch adds two metrics: h3. ResponseQueueSize As of 0.8.1, the sizes of the response queues are [reported as different metrics|https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/network/RequestChannel.scala#L127-L134] - one per processor thread. This is not very ideal for different reasons: * charts have to sum the different metrics * the metrics collection system might not support 'wild card queries' like {{sum:kafka.network.RequestChannel.Processor_*_ResponseQueueSize}} in which case monitoring now depends on the number of configured network threads * monitoring the response by thread is not very valuable. However the global number of responses is useful. * proposal* So this patch exposes the total number of queued responses as a metric {{ResponseQueueSize}} *implementation* In {{RequestChannel}}, create a Gauge that adds up the size of the response queues. h3. BeingSentResponses As of 0.8.1, the processor threads will poll responses from the queues and attach them to the SelectionKey as fast as possible. The consequence of that is that the response queues are not a good indicator of the number of "in-flight" responses. The {{ServerSocketChannel}} acting as another queue of response to be sent. The current metrics don't reflect the size of this "buffer", which is an issue. *proposal* This patch adds a gauge that keeps track of the number of responses being handled by the {{ServerSocketChannel}}. That new metric is named "BeingSentResponses" (who said naming was hard?) *implementation* To calculate that metric, the patch adds up the number of SelectionKeys interested in writing across processor threads. An another approach could be to keep all in-flight responses in a data structure (let's say a map) shared by the processor thread. A response will be added to that map when dequeued from the response queue, and removed when the write is complete. I decided agains that second approach as it is more intrusive and requires some additional bookkeeping to gather information already available through the {{SelectionKey}}'s > new metrics: ResponseQueueSize and BeingSentResponses > - > > Key: KAFKA-1594 > URL: https://issues.apache.org/jira/browse/KAFKA-1594 > Project: Kafka > Issue Type: New Feature > Components: core >Reporter: Alexis Midon >Priority: Minor > Labels: features > Attachments: KAFKA-1594_BeingSentResponses.patch, > KAFKA-1594_ResponseQueueSize.patch > > > This
[jira] [Updated] (KAFKA-1594) new metrics: ResponseQueueSize and BeingSentResponses
[ https://issues.apache.org/jira/browse/KAFKA-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexis Midon updated KAFKA-1594: Description: This patch adds two metrics: h3. ResponseQueueSize As of 0.8.1, the sizes of the response queues are [reported as different metrics|https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/network/RequestChannel.scala#L127-L134] - one per processor thread. This is not very ideal for different reasons: * charts have to sum the different metrics * the metrics collection system might not support 'wild card queries' like {{sum:kafka.network.RequestChannel.Processor_*_ResponseQueueSize}} in which case monitoring now depends on the number of configured network threads * monitoring the response by thread is not very valuable. However the global number of responses is useful. * proposal* So this patch exposes the total number of queued responses as a metric {{ResponseQueueSize}} *implementation* In {{RequestChannel}}, create a Gauge that adds up the size of the response queues. h3. BeingSentResponses As of 0.8.1, the processor threads will poll responses from the queues and attach them to the SelectionKey as fast as possible. The consequence of that is that the response queues are not a good indicator of the number of "in-flight" responses. The {{ServerSocketChannel}} acting as another queue of response to be sent. The current metrics don't reflect the size of this "buffer", which is an issue. *proposal* This patch adds a gauge that keeps track of the number of responses being handled by the {{ServerSocketChannel}}. That new metric is named "BeingSentResponses" (who said naming was hard?) *implementation* To calculate that metric, the patch adds up the number of SelectionKeys interested in writing across processor threads. An another approach could be to keep all in-flight responses in a data structure (let's say a map) shared by the processor thread. A response will be added to that map when dequeued from the response queue, and removed when the write is complete. I decided agains that second approach as it is more intrusive and requires some additional bookkeeping to gather information already available through the {{SelectionKey}}'s was: This patch adds two metrics: h3. ResponseQueueSize As of 0.8.1, the sizes of the response queues are [reported as different metrics|https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/network/RequestChannel.scala#L127-L134] - one per processor thread. This is not very ideal for different reasons: * charts have to sum the different metrics * the metrics collection system might not support 'wild card queries' like {{sum:kafka.network.RequestChannel.Processor_*_ResponseQueueSize}} in which case monitoring now depends on the number of configured network threads * monitoring the response by thread is not very valuable. However the global number of responses is useful. * proposal* So this patch exposes the total number of queued responses as a metric {{ResponseQueueSize}} *implementation* In {{RequestChannel}}, create a Gauge that adds up the size of the response queues. *BeingSentResponses As of 0.8.1, the processor threads will poll responses from the queues and attach them to the SelectionKey as fast as possible. The consequence of that is that the response queues are not a good indicator of the number of "in-flight" responses. The {{ServerSocketChannel}} acting as another queue of response to be sent. The current metrics don't reflect the size of this "buffer", which is an issue. *proposal* This patch adds a gauge that keeps track of the number of responses being handled by the {{ServerSocketChannel}}. That new metric is named "BeingSentResponses" (who said naming was hard?) *implementation* To calculate that metric, the patch adds up the number of SelectionKeys interested in writing across processor threads. An another approach could be to keep all in-flight responses in a data structure (let's say a map) shared by the processor thread. A response will be added to that map when dequeued from the response queue, and removed when the write is complete. I decided agains that second approach as it is more intrusive and requires some additional bookkeeping to gather information already available through the {{SelectionKey}}'s > new metrics: ResponseQueueSize and BeingSentResponses > - > > Key: KAFKA-1594 > URL: https://issues.apache.org/jira/browse/KAFKA-1594 > Project: Kafka > Issue Type: New Feature > Components: core >Reporter: Alexis Midon >Priority: Minor > Labels: features > > This patch adds two metrics: > h3. ResponseQueueSize > As of 0.8.1, the sizes of the response queues are [reported as different > metrics|https://github.com
[jira] [Created] (KAFKA-1594) new metrics: ResponseQueueSize and BeingSentResponses
Alexis Midon created KAFKA-1594: --- Summary: new metrics: ResponseQueueSize and BeingSentResponses Key: KAFKA-1594 URL: https://issues.apache.org/jira/browse/KAFKA-1594 Project: Kafka Issue Type: New Feature Components: core Reporter: Alexis Midon Priority: Minor This patch adds two metrics: h3. ResponseQueueSize As of 0.8.1, the sizes of the response queues are [reported as different metrics|https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/network/RequestChannel.scala#L127-L134] - one per processor thread. This is not very ideal for different reasons: * charts have to sum the different metrics * the metrics collection system might not support 'wild card queries' like {{sum:kafka.network.RequestChannel.Processor_*_ResponseQueueSize}} in which case monitoring now depends on the number of configured network threads * monitoring the response by thread is not very valuable. However the global number of responses is useful. * proposal* So this patch exposes the total number of queued responses as a metric {{ResponseQueueSize}} *implementation* In {{RequestChannel}}, create a Gauge that adds up the size of the response queues. *BeingSentResponses As of 0.8.1, the processor threads will poll responses from the queues and attach them to the SelectionKey as fast as possible. The consequence of that is that the response queues are not a good indicator of the number of "in-flight" responses. The {{ServerSocketChannel}} acting as another queue of response to be sent. The current metrics don't reflect the size of this "buffer", which is an issue. *proposal* This patch adds a gauge that keeps track of the number of responses being handled by the {{ServerSocketChannel}}. That new metric is named "BeingSentResponses" (who said naming was hard?) *implementation* To calculate that metric, the patch adds up the number of SelectionKeys interested in writing across processor threads. An another approach could be to keep all in-flight responses in a data structure (let's say a map) shared by the processor thread. A response will be added to that map when dequeued from the response queue, and removed when the write is complete. I decided agains that second approach as it is more intrusive and requires some additional bookkeeping to gather information already available through the {{SelectionKey}}'s -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (KAFKA-1593) new metrics: ResponseQueueSize and BeingSentResponses
Alexis Midon created KAFKA-1593: --- Summary: new metrics: ResponseQueueSize and BeingSentResponses Key: KAFKA-1593 URL: https://issues.apache.org/jira/browse/KAFKA-1593 Project: Kafka Issue Type: New Feature Components: core Reporter: Alexis Midon Priority: Minor This patch adds two metrics: h3. ResponseQueueSize As of 0.8.1, the sizes of the response queues are [reported as different metrics|https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/network/RequestChannel.scala#L127-L134] - one per processor thread. This is not very ideal for different reasons: * charts have to sum the different metrics * the metrics collection system might not support 'wild card queries' like {{sum:kafka.network.RequestChannel.Processor_*_ResponseQueueSize}} in which case monitoring now depends on the number of configured network threads * monitoring the response by thread is not very valuable. However the global number of responses is useful. * proposal* So this patch exposes the total number of queued responses as a metric {{ResponseQueueSize}} *implementation* In {{RequestChannel}}, create a Gauge that adds up the size of the response queues. *BeingSentResponses As of 0.8.1, the processor threads will poll responses from the queues and attach them to the SelectionKey as fast as possible. The consequence of that is that the response queues are not a good indicator of the number of "in-flight" responses. The {{ServerSocketChannel}} acting as another queue of response to be sent. The current metrics don't reflect the size of this "buffer", which is an issue. *proposal* This patch adds a gauge that keeps track of the number of responses being handled by the {{ServerSocketChannel}}. That new metric is named "BeingSentResponses" (who said naming was hard?) *implementation* To calculate that metric, the patch adds up the number of SelectionKeys interested in writing across processor threads. An another approach could be to keep all in-flight responses in a data structure (let's say a map) shared by the processor thread. A response will be added to that map when dequeued from the response queue, and removed when the write is complete. I decided agains that second approach as it is more intrusive and requires some additional bookkeeping to gather information already available through the {{SelectionKey}}'s -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (KAFKA-1112) broker can not start itself after kafka is killed with -9
[ https://issues.apache.org/jira/browse/KAFKA-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1406#comment-1406 ] Alexis Midon commented on KAFKA-1112: - I created https://issues.apache.org/jira/browse/KAFKA-1554. thanks > broker can not start itself after kafka is killed with -9 > - > > Key: KAFKA-1112 > URL: https://issues.apache.org/jira/browse/KAFKA-1112 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.8.0, 0.8.1 >Reporter: Kane Kim >Assignee: Jay Kreps >Priority: Critical > Fix For: 0.8.1 > > Attachments: KAFKA-1112-v1.patch, KAFKA-1112-v2.patch, > KAFKA-1112-v3.patch, KAFKA-1112-v4.patch, KAFKA-1112.out > > > When I kill kafka with -9, broker cannot start itself because of corrupted > index logs. I think kafka should try to delete/rebuild indexes itself without > manual intervention. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (KAFKA-1554) Corrupt index found on clean startup
Alexis Midon created KAFKA-1554: --- Summary: Corrupt index found on clean startup Key: KAFKA-1554 URL: https://issues.apache.org/jira/browse/KAFKA-1554 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8.1 Environment: ubuntu 12.04, oracle jdk 1.7 Reporter: Alexis Midon Priority: Critical On a clean start up, corrupted index files are found. After investigations, it appears that some pre-allocated index files are not "compacted" correctly and the end of the file is full of zeroes. As a result, on start up, the last relative offset is zero which yields an offset equal to the base offset. The workaround is to delete all index files of size 10MB (the size of the pre-allocated files), and restart. Index files will be re-created. {code} find $your_data_directory -size 10485760c -name *.index #-delete {code} This is issue might be related/similar to https://issues.apache.org/jira/browse/KAFKA-1112 {code} 2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,696 INFO main kafka.server.KafkaServer.info - [Kafka Server 847605514], starting 2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,698 INFO main kafka.server.KafkaServer.info - [Kafka Server 847605514], Connecting to zookeeper on zk-main0.XXX:2181,zk-main1.XXX:2181,zk-main2.:2181/production/kafka/main 2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,708 INFO ZkClient-EventThread-14-zk-main0.XXX.com:2181,zk-main1.XXX.com:2181,zk-main2.XXX.com:2181,zk-main3.XXX.com:2181,zk-main4.XXX.com:2181/production/kafka/main org.I0Itec.zkclient.ZkEventThread.run - Starting ZkClient event thread. 2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,714 INFO main org.apache.zookeeper.ZooKeeper.logEnv - Client environment:zookeeper.version=3.3.3-1203054, built on 11/17/2011 05:47 GMT 2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,714 INFO main org.apache.zookeeper.ZooKeeper.logEnv - Client environment:host.name=i-6b948138.inst.aws.airbnb.com 2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,714 INFO main org.apache.zookeeper.ZooKeeper.logEnv - Client environment:java.version=1.7.0_55 2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,715 INFO main org.apache.zookeeper.ZooKeeper.logEnv - Client environment:java.vendor=Oracle Corporation 2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,715 INFO main org.apache.zookeeper.ZooKeeper.logEnv - Client environment:java.home=/usr/lib/jvm/jre-7-oracle-x64/jre 2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,715 INFO main org.apache.zookeeper.ZooKeeper.logEnv - Client environment:java.class.path=libs/snappy-java-1.0.5.jar:libs/scala-library-2.10.1.jar:libs/slf4j-api-1.7.2.jar:libs/jopt-simple-3.2.jar:libs/metrics-annotation-2.2.0.jar:libs/log4j-1.2.15.jar:libs/kafka_2.10-0.8.1.jar:libs/zkclient-0.3.jar:libs/zookeeper-3.3.4.jar:libs/metrics-core-2.2.0.jar 2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,715 INFO main org.apache.zookeeper.ZooKeeper.logEnv - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib 2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,716 INFO main org.apache.zookeeper.ZooKeeper.logEnv - Client environment:java.io.tmpdir=/tmp 2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,716 INFO main org.apache.zookeeper.ZooKeeper.logEnv - Client environment:java.compiler= 2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,716 INFO main org.apache.zookeeper.ZooKeeper.logEnv - Client environment:os.name=Linux 2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,716 INFO main org.apache.zookeeper.ZooKeeper.logEnv - Client environment:os.arch=amd64 2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,717 INFO main org.apache.zookeeper.ZooKeeper.logEnv - Client environment:os.version=3.2.0-61-virtual 2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,717 INFO main org.apache.zookeeper.ZooKeeper.logEnv - Client environment:user.name=kafka 2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,717 INFO main org.apache.zookeeper.ZooKeeper.logEnv - Client environment:user.home=/srv/kafka 2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,717 INFO main org.apache.zookeeper.ZooKeeper.logEnv - Client environment:user.dir=/srv/kafka/kafka_2.10-0.8.1 2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,718 INFO main org.apache.zookeeper.ZooKeeper. - Initiating client connection, connectString=zk-main0.XXX.com:2181,zk-main1.XXX.com:2181,zk-main2.XXX.com:2181,zk-main3.XXX.com:2181,zk-main4.XXX.com:2181/producti
[jira] [Commented] (KAFKA-1112) broker can not start itself after kafka is killed with -9
[ https://issues.apache.org/jira/browse/KAFKA-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14059533#comment-14059533 ] Alexis Midon commented on KAFKA-1112: - Hello, I suffered from the same error using Kafka 0.8.1. Should I reopen this issue or create a new one? {code} 2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,696 INFO main kafka.server.KafkaServer.info - [Kafka Server 847605514], starting 2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,698 INFO main kafka.server.KafkaServer.info - [Kafka Server 847605514], Connecting to zookeeper on zk-main0.XXX:2181,zk-main1.XXX:2181,zk-main2.:2181/production/kafka/main 2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,708 INFO ZkClient-EventThread-14-zk-main0.XXX.com:2181,zk-main1.XXX.com:2181,zk-main2.XXX.com:2181,zk-main3.XXX.com:2181,zk-main4.XXX.com:2181/production/kafka/main org.I0Itec.zkclient.ZkEventThread.run - Starting ZkClient event thread. 2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,714 INFO main org.apache.zookeeper.ZooKeeper.logEnv - Client environment:zookeeper.version=3.3.3-1203054, built on 11/17/2011 05:47 GMT 2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,714 INFO main org.apache.zookeeper.ZooKeeper.logEnv - Client environment:host.name=i-6b948138.inst.aws.airbnb.com 2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,714 INFO main org.apache.zookeeper.ZooKeeper.logEnv - Client environment:java.version=1.7.0_55 2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,715 INFO main org.apache.zookeeper.ZooKeeper.logEnv - Client environment:java.vendor=Oracle Corporation 2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,715 INFO main org.apache.zookeeper.ZooKeeper.logEnv - Client environment:java.home=/usr/lib/jvm/jre-7-oracle-x64/jre 2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,715 INFO main org.apache.zookeeper.ZooKeeper.logEnv - Client environment:java.class.path=libs/snappy-java-1.0.5.jar:libs/scala-library-2.10.1.jar:libs/slf4j-api-1.7.2.jar:libs/jopt-simple-3.2.jar:libs/metrics-annotation-2.2.0.jar:libs/log4j-1.2.15.jar:libs/kafka_2.10-0.8.1.jar:libs/zkclient-0.3.jar:libs/zookeeper-3.3.4.jar:libs/metrics-core-2.2.0.jar 2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,715 INFO main org.apache.zookeeper.ZooKeeper.logEnv - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib 2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,716 INFO main org.apache.zookeeper.ZooKeeper.logEnv - Client environment:java.io.tmpdir=/tmp 2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,716 INFO main org.apache.zookeeper.ZooKeeper.logEnv - Client environment:java.compiler= 2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,716 INFO main org.apache.zookeeper.ZooKeeper.logEnv - Client environment:os.name=Linux 2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,716 INFO main org.apache.zookeeper.ZooKeeper.logEnv - Client environment:os.arch=amd64 2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,717 INFO main org.apache.zookeeper.ZooKeeper.logEnv - Client environment:os.version=3.2.0-61-virtual 2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,717 INFO main org.apache.zookeeper.ZooKeeper.logEnv - Client environment:user.name=kafka 2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,717 INFO main org.apache.zookeeper.ZooKeeper.logEnv - Client environment:user.home=/srv/kafka 2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,717 INFO main org.apache.zookeeper.ZooKeeper.logEnv - Client environment:user.dir=/srv/kafka/kafka_2.10-0.8.1 2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,718 INFO main org.apache.zookeeper.ZooKeeper. - Initiating client connection, connectString=zk-main0.XXX.com:2181,zk-main1.XXX.com:2181,zk-main2.XXX.com:2181,zk-main3.XXX.com:2181,zk-main4.XXX.com:2181/production/kafka/main sessionTimeout=6000 watcher=org.I0Itec.zkclient.ZkClient@4758af63 2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,733 INFO main-SendThread() org.apache.zookeeper.ClientCnxn.startConnect - Opening socket connection to server zk-main1.XXX.com/10.12.135.61:2181 2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,738 INFO main-SendThread(zk-main1.XXX.com:2181) org.apache.zookeeper.ClientCnxn.primeConnection - Socket connection established to zk-main1.XXX.com/10.12.135.61:2181, initiating session 2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,745 INFO main-SendThread(zk-main1.XXX.com:2181) org.apache.zookeeper.ClientCnxn.readConnectR
[jira] [Comment Edited] (KAFKA-1300) Added WaitForReplaction admin tool.
[ https://issues.apache.org/jira/browse/KAFKA-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14059512#comment-14059512 ] Alexis Midon edited comment on KAFKA-1300 at 7/11/14 11:47 PM: --- Consiering that Kafka is designed to handle some replication lag, if you need to shutdown a broker it does not seem very useful to wait for the replica lag to be zero. (If the broker is X messages behind, and my maintenance requires Y=F(message throughput) minutes, I can safely shutdown the broker is X+Y/throughput < replica.lag.max.messages. So maybe that command will be more useful if it could take an argument that characterize X, i.e. how far behind can the broker be before a shutdown. was (Author: alexismidon): Consiering that Kafka is designed to handle some replication lag, if you need to shutdown a broker it does not seem very useful to wait for the replica lag to be zero. (If the broker is X messages behind, and my maintenance requires Y=F(message throughput) minutes, I can safely shutdown the broker is X+Y < replica.lag.max.messages. So maybe that command will be more useful if it could take an argument that characterize X, i.e. how far behind can the broker be before a shutdown. > Added WaitForReplaction admin tool. > --- > > Key: KAFKA-1300 > URL: https://issues.apache.org/jira/browse/KAFKA-1300 > Project: Kafka > Issue Type: New Feature > Components: tools >Affects Versions: 0.8.0 > Environment: Ubuntu 12.04 >Reporter: Brenden Matthews > Labels: patch > Fix For: 0.8.1 > > Attachments: 0001-Added-WaitForReplaction-admin-tool.patch > > > I have created a tool similar to the broker shutdown tool for doing rolling > restarts of Kafka clusters. > The tool watches the max replica lag of the specified broker, and waits until > the lag drops to 0 before exiting. > To do a rolling restart, here's the process we use: > for (broker <- brokers) { > run shutdown tool for broker > terminate broker > start new broker > run wait for replication tool on new broker > } > Here's an example command line use: > ./kafka-run-class.sh kafka.admin.WaitForReplication --zookeeper > zk.host.com:2181 --num.retries 100 --retry.interval.ms 6 --broker 0 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (KAFKA-1300) Added WaitForReplaction admin tool.
[ https://issues.apache.org/jira/browse/KAFKA-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14059512#comment-14059512 ] Alexis Midon edited comment on KAFKA-1300 at 7/11/14 11:46 PM: --- Consiering that Kafka is designed to handle some replication lag, if you need to shutdown a broker it does not seem very useful to wait for the replica lag to be zero. (If the broker is X messages behind, and my maintenance requires Y=F(message throughput) minutes, I can safely shutdown the broker is X+Y < replica.lag.max.messages. So maybe that command will be more useful if it could take an argument that characterize X, i.e. how far behind can the broker be before a shutdown. was (Author: alexismidon): Consiering that Kafka is designed to handle some replication lag, if you need to shutdown a broker it does not seem very useful to wait for the replica lag to be zero. (If the broker is X messages behind, and my maintenance requires F(message throughput) minutes, I can safely shutdown the broker is X+ throughput*Y < replica.lag.max.messages. So maybe that command will be more useful if it could take an argument that characterize X, i.e. how far behind can the broker be before a shutdown. > Added WaitForReplaction admin tool. > --- > > Key: KAFKA-1300 > URL: https://issues.apache.org/jira/browse/KAFKA-1300 > Project: Kafka > Issue Type: New Feature > Components: tools >Affects Versions: 0.8.0 > Environment: Ubuntu 12.04 >Reporter: Brenden Matthews > Labels: patch > Fix For: 0.8.1 > > Attachments: 0001-Added-WaitForReplaction-admin-tool.patch > > > I have created a tool similar to the broker shutdown tool for doing rolling > restarts of Kafka clusters. > The tool watches the max replica lag of the specified broker, and waits until > the lag drops to 0 before exiting. > To do a rolling restart, here's the process we use: > for (broker <- brokers) { > run shutdown tool for broker > terminate broker > start new broker > run wait for replication tool on new broker > } > Here's an example command line use: > ./kafka-run-class.sh kafka.admin.WaitForReplication --zookeeper > zk.host.com:2181 --num.retries 100 --retry.interval.ms 6 --broker 0 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (KAFKA-1300) Added WaitForReplaction admin tool.
[ https://issues.apache.org/jira/browse/KAFKA-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14059512#comment-14059512 ] Alexis Midon commented on KAFKA-1300: - Consiering that Kafka is designed to handle some replication lag, if you need to shutdown a broker it does not seem very useful to wait for the replica lag to be zero. (If the broker is X messages behind, and my maintenance requires F(message throughput) minutes, I can safely shutdown the broker is X+ throughput*Y < replica.lag.max.messages. So maybe that command will be more useful if it could take an argument that characterize X, i.e. how far behind can the broker be before a shutdown. > Added WaitForReplaction admin tool. > --- > > Key: KAFKA-1300 > URL: https://issues.apache.org/jira/browse/KAFKA-1300 > Project: Kafka > Issue Type: New Feature > Components: tools >Affects Versions: 0.8.0 > Environment: Ubuntu 12.04 >Reporter: Brenden Matthews > Labels: patch > Fix For: 0.8.1 > > Attachments: 0001-Added-WaitForReplaction-admin-tool.patch > > > I have created a tool similar to the broker shutdown tool for doing rolling > restarts of Kafka clusters. > The tool watches the max replica lag of the specified broker, and waits until > the lag drops to 0 before exiting. > To do a rolling restart, here's the process we use: > for (broker <- brokers) { > run shutdown tool for broker > terminate broker > start new broker > run wait for replication tool on new broker > } > Here's an example command line use: > ./kafka-run-class.sh kafka.admin.WaitForReplication --zookeeper > zk.host.com:2181 --num.retries 100 --retry.interval.ms 6 --broker 0 -- This message was sent by Atlassian JIRA (v6.2#6252)