[jira] [Commented] (KAFKA-1702) Messages silently Lost by producer

2014-10-12 Thread Alexis Midon (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14168942#comment-14168942
 ] 

Alexis Midon commented on KAFKA-1702:
-

Also if #groupMessagesToSet is not in a try/catch, the error will break the 
loop on the broker list. All messages will get dropped, retries ignored, 
metrics won't get updated, etc.

> Messages silently Lost by producer
> --
>
> Key: KAFKA-1702
> URL: https://issues.apache.org/jira/browse/KAFKA-1702
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.8.1.1
>Reporter: Alexis Midon
>Assignee: Jun Rao
> Attachments: KAFKA-1702.0.patch
>
>
> Hello,
> we lost millions of messages because of this {{try/catch}} in  the producer 
> {{DefaultEventHandler}}:
> https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/producer/async/DefaultEventHandler.scala#L114-L116
> If a Throwable is caught by this {{try/catch}}, the retry policy will have no 
> effect and all yet-to-be-sent messages are lost (the error will break the 
> loop over the broker list).
> This issue is very hard to detect because: the producer (async or sync) 
> cannot even catch the error, and *all* the metrics are updated as if 
> everything was fine.
> Only the abnormal drop in the producers network I/O, or the incoming message 
> rate on the brokers; or the alerting on errors in producer logs could have 
> revealed the issue. 
> This behavior was introduced by KAFKA-300. I can't see a good reason for it, 
> so here is a patch that will let the retry-policy do its job when such a 
> {{Throwable}} occurs.
> Thanks in advance for your help.
> Alexis
> ps: you might wonder how could this {{try/catch}} ever caught something? 
> {{DefaultEventHandler#groupMessagesToSet}} looks so harmless. 
> Here are the details:
> We use Snappy compression. When the native snappy library is not installed on 
> the host, Snappy, during the initialization of class 
> {{org.xerial.snappy.Snappy}}  will [write a C 
> library|https://github.com/xerial/snappy-java/blob/1.1.0/src/main/java/org/xerial/snappy/SnappyLoader.java#L312]
>  in the JVM temp directory {{java.io.tmpdir}}.
> In our scenario, {{java.io.tmpdir}} was a subdirectory of {{/tmp}}. After an 
> instance reboot (thank you 
> [AWS|https://twitter.com/hashtag/AWSReboot?src=hash]!), the JVM temp 
> directory was removed. The JVM was then running with a non-existing temp dir. 
> Snappy class would be impossible to initialize and the following message 
> would be silently logged:
> {code}
> ERROR [2014-10-07 22:23:56,530] kafka.producer.async.DefaultEventHandler: 
> Failed to send messages
> ! java.lang.NoClassDefFoundError: Could not initialize class 
> org.xerial.snappy.Snappy
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1702) Messages silently Lost by producer

2014-10-12 Thread Alexis Midon (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14168941#comment-14168941
 ] 

Alexis Midon commented on KAFKA-1702:
-

I agree. In the async mode, until there is a callback, the best we can do is to 
make sure all the metrics are updated correctly, in particular ResendsPerSec, 
FailedSendsPerSec, which is critical for monitoring of async producers.

In the sync mode, producer will get the exception, which is an improvement.

thanks for your review

> Messages silently Lost by producer
> --
>
> Key: KAFKA-1702
> URL: https://issues.apache.org/jira/browse/KAFKA-1702
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.8.1.1
>Reporter: Alexis Midon
>Assignee: Jun Rao
> Attachments: KAFKA-1702.0.patch
>
>
> Hello,
> we lost millions of messages because of this {{try/catch}} in  the producer 
> {{DefaultEventHandler}}:
> https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/producer/async/DefaultEventHandler.scala#L114-L116
> If a Throwable is caught by this {{try/catch}}, the retry policy will have no 
> effect and all yet-to-be-sent messages are lost (the error will break the 
> loop over the broker list).
> This issue is very hard to detect because: the producer (async or sync) 
> cannot even catch the error, and *all* the metrics are updated as if 
> everything was fine.
> Only the abnormal drop in the producers network I/O, or the incoming message 
> rate on the brokers; or the alerting on errors in producer logs could have 
> revealed the issue. 
> This behavior was introduced by KAFKA-300. I can't see a good reason for it, 
> so here is a patch that will let the retry-policy do its job when such a 
> {{Throwable}} occurs.
> Thanks in advance for your help.
> Alexis
> ps: you might wonder how could this {{try/catch}} ever caught something? 
> {{DefaultEventHandler#groupMessagesToSet}} looks so harmless. 
> Here are the details:
> We use Snappy compression. When the native snappy library is not installed on 
> the host, Snappy, during the initialization of class 
> {{org.xerial.snappy.Snappy}}  will [write a C 
> library|https://github.com/xerial/snappy-java/blob/1.1.0/src/main/java/org/xerial/snappy/SnappyLoader.java#L312]
>  in the JVM temp directory {{java.io.tmpdir}}.
> In our scenario, {{java.io.tmpdir}} was a subdirectory of {{/tmp}}. After an 
> instance reboot (thank you 
> [AWS|https://twitter.com/hashtag/AWSReboot?src=hash]!), the JVM temp 
> directory was removed. The JVM was then running with a non-existing temp dir. 
> Snappy class would be impossible to initialize and the following message 
> would be silently logged:
> {code}
> ERROR [2014-10-07 22:23:56,530] kafka.producer.async.DefaultEventHandler: 
> Failed to send messages
> ! java.lang.NoClassDefFoundError: Could not initialize class 
> org.xerial.snappy.Snappy
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (KAFKA-1702) Messages silently Lost by producer

2014-10-12 Thread Alexis Midon (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexis Midon updated KAFKA-1702:

Comment: was deleted

(was: diff --git 
core/src/main/scala/kafka/producer/async/DefaultEventHandler.scala 
core/src/main/scala/kafka/producer/async/DefaultEventHandler.scala
index d8ac915..0f7f941 100644
--- core/src/main/scala/kafka/producer/async/DefaultEventHandler.scala
+++ core/src/main/scala/kafka/producer/async/DefaultEventHandler.scala
@@ -95,27 +95,28 @@ class DefaultEventHandler[K,V](config: ProducerConfig,
 val partitionedDataOpt = partitionAndCollate(messages)
 partitionedDataOpt match {
   case Some(partitionedData) =>
-val failedProduceRequests = new ArrayBuffer[KeyedMessage[K,Message]]
-try {
-  for ((brokerid, messagesPerBrokerMap) <- partitionedData) {
-if (logger.isTraceEnabled)
-  messagesPerBrokerMap.foreach(partitionAndEvent =>
-trace("Handling event for Topic: %s, Broker: %d, Partitions: 
%s".format(partitionAndEvent._1, brokerid, partitionAndEvent._2)))
-val messageSetPerBroker = groupMessagesToSet(messagesPerBrokerMap)
-
-val failedTopicPartitions = send(brokerid, messageSetPerBroker)
-failedTopicPartitions.foreach(topicPartition => {
-  messagesPerBrokerMap.get(topicPartition) match {
-case Some(data) => failedProduceRequests.appendAll(data)
-case None => // nothing
-  }
-})
+val failedProduceRequests = new ArrayBuffer[KeyedMessage[K, Message]]
+for ((brokerid, messagesPerBrokerMap) <- partitionedData) {
+  if (logger.isTraceEnabled) {
+messagesPerBrokerMap.foreach(partitionAndEvent =>
+  trace("Handling event for Topic: %s, Broker: %d, Partitions: 
%s".format(partitionAndEvent._1, brokerid, partitionAndEvent._2)))
+  }
+  val messageSetPerBrokerOpt = groupMessagesToSet(messagesPerBrokerMap)
+  messageSetPerBrokerOpt match {
+case Some(messageSetPerBroker) =>
+  val failedTopicPartitions = send(brokerid, messageSetPerBroker)
+  failedTopicPartitions.foreach(topicPartition => {
+messagesPerBrokerMap.get(topicPartition) match {
+  case Some(data) => failedProduceRequests.appendAll(data)
+  case None => // nothing
+}
+  })
+case None => // failed to group messages
+  messagesPerBrokerMap.values.foreach(m => 
failedProduceRequests.appendAll(m))
   }
-} catch {
-  case t: Throwable => error("Failed to send messages", t)
 }
 failedProduceRequests
-  case None => // all produce requests failed
+  case None => // failed to collate messages
 messages
 }
   }
@@ -290,43 +291,46 @@ class DefaultEventHandler[K,V](config: ProducerConfig,
 }
   }
 
-  private def groupMessagesToSet(messagesPerTopicAndPartition: 
collection.mutable.Map[TopicAndPartition, Seq[KeyedMessage[K,Message]]]) = {
+  private def groupMessagesToSet(messagesPerTopicAndPartition: 
collection.mutable.Map[TopicAndPartition, Seq[KeyedMessage[K, Message]]]) = {
 /** enforce the compressed.topics config here.
-  *  If the compression codec is anything other than NoCompressionCodec,
-  *Enable compression only for specified topics if any
-  *If the list of compressed topics is empty, then enable the 
specified compression codec for all topics
-  *  If the compression codec is NoCompressionCodec, compression is 
disabled for all topics
+  * If the compression codec is anything other than NoCompressionCodec,
+  * Enable compression only for specified topics if any
+  * If the list of compressed topics is empty, then enable the specified 
compression codec for all topics
+  * If the compression codec is NoCompressionCodec, compression is 
disabled for all topics
   */
-
-val messagesPerTopicPartition = messagesPerTopicAndPartition.map { case 
(topicAndPartition, messages) =>
-  val rawMessages = messages.map(_.message)
-  ( topicAndPartition,
-config.compressionCodec match {
-  case NoCompressionCodec =>
-debug("Sending %d messages with no compression to 
%s".format(messages.size, topicAndPartition))
-new ByteBufferMessageSet(NoCompressionCodec, rawMessages: _*)
-  case _ =>
-config.compressedTopics.size match {
-  case 0 =>
-debug("Sending %d messages with compression codec %d to %s"
-  .format(messages.size, config.compressionCodec.codec, 
topicAndPartition))
-new ByteBufferMessageSet(config.compressionCodec, rawMessages: 
_*)
-  case _ =>
-if(config.compressedTopics.contains(topicAndParti

[jira] [Updated] (KAFKA-1702) Messages silently Lost by producer

2014-10-12 Thread Alexis Midon (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexis Midon updated KAFKA-1702:

Attachment: KAFKA-1702.0.patch

> Messages silently Lost by producer
> --
>
> Key: KAFKA-1702
> URL: https://issues.apache.org/jira/browse/KAFKA-1702
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.8.1.1
>Reporter: Alexis Midon
>Assignee: Jun Rao
> Attachments: KAFKA-1702.0.patch
>
>
> Hello,
> we lost millions of messages because of this {{try/catch}} in  the producer 
> {{DefaultEventHandler}}:
> https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/producer/async/DefaultEventHandler.scala#L114-L116
> If a Throwable is caught by this {{try/catch}}, the retry policy will have no 
> effect and all yet-to-be-sent messages are lost (the error will break the 
> loop over the broker list).
> This issue is very hard to detect because: the producer (async or sync) 
> cannot even catch the error, and *all* the metrics are updated as if 
> everything was fine.
> Only the abnormal drop in the producers network I/O, or the incoming message 
> rate on the brokers; or the alerting on errors in producer logs could have 
> revealed the issue. 
> This behavior was introduced by KAFKA-300. I can't see a good reason for it, 
> so here is a patch that will let the retry-policy do its job when such a 
> {{Throwable}} occurs.
> Thanks in advance for your help.
> Alexis
> ps: you might wonder how could this {{try/catch}} ever caught something? 
> {{DefaultEventHandler#groupMessagesToSet}} looks so harmless. 
> Here are the details:
> We use Snappy compression. When the native snappy library is not installed on 
> the host, Snappy, during the initialization of class 
> {{org.xerial.snappy.Snappy}}  will [write a C 
> library|https://github.com/xerial/snappy-java/blob/1.1.0/src/main/java/org/xerial/snappy/SnappyLoader.java#L312]
>  in the JVM temp directory {{java.io.tmpdir}}.
> In our scenario, {{java.io.tmpdir}} was a subdirectory of {{/tmp}}. After an 
> instance reboot (thank you 
> [AWS|https://twitter.com/hashtag/AWSReboot?src=hash]!), the JVM temp 
> directory was removed. The JVM was then running with a non-existing temp dir. 
> Snappy class would be impossible to initialize and the following message 
> would be silently logged:
> {code}
> ERROR [2014-10-07 22:23:56,530] kafka.producer.async.DefaultEventHandler: 
> Failed to send messages
> ! java.lang.NoClassDefFoundError: Could not initialize class 
> org.xerial.snappy.Snappy
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1702) Messages silently Lost by producer

2014-10-12 Thread Alexis Midon (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexis Midon updated KAFKA-1702:

Status: Patch Available  (was: Open)

diff --git core/src/main/scala/kafka/producer/async/DefaultEventHandler.scala 
core/src/main/scala/kafka/producer/async/DefaultEventHandler.scala
index d8ac915..0f7f941 100644
--- core/src/main/scala/kafka/producer/async/DefaultEventHandler.scala
+++ core/src/main/scala/kafka/producer/async/DefaultEventHandler.scala
@@ -95,27 +95,28 @@ class DefaultEventHandler[K,V](config: ProducerConfig,
 val partitionedDataOpt = partitionAndCollate(messages)
 partitionedDataOpt match {
   case Some(partitionedData) =>
-val failedProduceRequests = new ArrayBuffer[KeyedMessage[K,Message]]
-try {
-  for ((brokerid, messagesPerBrokerMap) <- partitionedData) {
-if (logger.isTraceEnabled)
-  messagesPerBrokerMap.foreach(partitionAndEvent =>
-trace("Handling event for Topic: %s, Broker: %d, Partitions: 
%s".format(partitionAndEvent._1, brokerid, partitionAndEvent._2)))
-val messageSetPerBroker = groupMessagesToSet(messagesPerBrokerMap)
-
-val failedTopicPartitions = send(brokerid, messageSetPerBroker)
-failedTopicPartitions.foreach(topicPartition => {
-  messagesPerBrokerMap.get(topicPartition) match {
-case Some(data) => failedProduceRequests.appendAll(data)
-case None => // nothing
-  }
-})
+val failedProduceRequests = new ArrayBuffer[KeyedMessage[K, Message]]
+for ((brokerid, messagesPerBrokerMap) <- partitionedData) {
+  if (logger.isTraceEnabled) {
+messagesPerBrokerMap.foreach(partitionAndEvent =>
+  trace("Handling event for Topic: %s, Broker: %d, Partitions: 
%s".format(partitionAndEvent._1, brokerid, partitionAndEvent._2)))
+  }
+  val messageSetPerBrokerOpt = groupMessagesToSet(messagesPerBrokerMap)
+  messageSetPerBrokerOpt match {
+case Some(messageSetPerBroker) =>
+  val failedTopicPartitions = send(brokerid, messageSetPerBroker)
+  failedTopicPartitions.foreach(topicPartition => {
+messagesPerBrokerMap.get(topicPartition) match {
+  case Some(data) => failedProduceRequests.appendAll(data)
+  case None => // nothing
+}
+  })
+case None => // failed to group messages
+  messagesPerBrokerMap.values.foreach(m => 
failedProduceRequests.appendAll(m))
   }
-} catch {
-  case t: Throwable => error("Failed to send messages", t)
 }
 failedProduceRequests
-  case None => // all produce requests failed
+  case None => // failed to collate messages
 messages
 }
   }
@@ -290,43 +291,46 @@ class DefaultEventHandler[K,V](config: ProducerConfig,
 }
   }
 
-  private def groupMessagesToSet(messagesPerTopicAndPartition: 
collection.mutable.Map[TopicAndPartition, Seq[KeyedMessage[K,Message]]]) = {
+  private def groupMessagesToSet(messagesPerTopicAndPartition: 
collection.mutable.Map[TopicAndPartition, Seq[KeyedMessage[K, Message]]]) = {
 /** enforce the compressed.topics config here.
-  *  If the compression codec is anything other than NoCompressionCodec,
-  *Enable compression only for specified topics if any
-  *If the list of compressed topics is empty, then enable the 
specified compression codec for all topics
-  *  If the compression codec is NoCompressionCodec, compression is 
disabled for all topics
+  * If the compression codec is anything other than NoCompressionCodec,
+  * Enable compression only for specified topics if any
+  * If the list of compressed topics is empty, then enable the specified 
compression codec for all topics
+  * If the compression codec is NoCompressionCodec, compression is 
disabled for all topics
   */
-
-val messagesPerTopicPartition = messagesPerTopicAndPartition.map { case 
(topicAndPartition, messages) =>
-  val rawMessages = messages.map(_.message)
-  ( topicAndPartition,
-config.compressionCodec match {
-  case NoCompressionCodec =>
-debug("Sending %d messages with no compression to 
%s".format(messages.size, topicAndPartition))
-new ByteBufferMessageSet(NoCompressionCodec, rawMessages: _*)
-  case _ =>
-config.compressedTopics.size match {
-  case 0 =>
-debug("Sending %d messages with compression codec %d to %s"
-  .format(messages.size, config.compressionCodec.codec, 
topicAndPartition))
-new ByteBufferMessageSet(config.compressionCodec, rawMessages: 
_*)
-  case _ =>
-if(config.compressedTopics.contains(topi

[jira] [Created] (KAFKA-1702) Messages silently Lost by producer

2014-10-12 Thread Alexis Midon (JIRA)
Alexis Midon created KAFKA-1702:
---

 Summary: Messages silently Lost by producer
 Key: KAFKA-1702
 URL: https://issues.apache.org/jira/browse/KAFKA-1702
 Project: Kafka
  Issue Type: Bug
  Components: producer 
Affects Versions: 0.8.1.1
Reporter: Alexis Midon
Assignee: Jun Rao



Hello,

we lost millions of messages because of this {{try/catch}} in  the producer 
{{DefaultEventHandler}}:
https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/producer/async/DefaultEventHandler.scala#L114-L116

If a Throwable is caught by this {{try/catch}}, the retry policy will have no 
effect and all yet-to-be-sent messages are lost (the error will break the loop 
over the broker list).
This issue is very hard to detect because: the producer (async or sync) cannot 
even catch the error, and *all* the metrics are updated as if everything was 
fine.

Only the abnormal drop in the producers network I/O, or the incoming message 
rate on the brokers; or the alerting on errors in producer logs could have 
revealed the issue. 

This behavior was introduced by KAFKA-300. I can't see a good reason for it, so 
here is a patch that will let the retry-policy do its job when such a 
{{Throwable}} occurs.

Thanks in advance for your help.

Alexis

ps: you might wonder how could this {{try/catch}} ever caught something? 
{{DefaultEventHandler#groupMessagesToSet}} looks so harmless. 

Here are the details:
We use Snappy compression. When the native snappy library is not installed on 
the host, Snappy, during the initialization of class 
{{org.xerial.snappy.Snappy}}  will [write a C 
library|https://github.com/xerial/snappy-java/blob/1.1.0/src/main/java/org/xerial/snappy/SnappyLoader.java#L312]
 in the JVM temp directory {{java.io.tmpdir}}.

In our scenario, {{java.io.tmpdir}} was a subdirectory of {{/tmp}}. After an 
instance reboot (thank you 
[AWS|https://twitter.com/hashtag/AWSReboot?src=hash]!), the JVM temp directory 
was removed. The JVM was then running with a non-existing temp dir. Snappy 
class would be impossible to initialize and the following message would be 
silently logged:

{code}
ERROR [2014-10-07 22:23:56,530] kafka.producer.async.DefaultEventHandler: 
Failed to send messages
! java.lang.NoClassDefFoundError: Could not initialize class 
org.xerial.snappy.Snappy
{code}





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1597) New metrics: ResponseQueueSize and BeingSentResponses

2014-08-20 Thread Alexis Midon (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14104815#comment-14104815
 ] 

Alexis Midon commented on KAFKA-1597:
-

[~nehanarkhede] patch updated. thanks for your feedback!

> New metrics: ResponseQueueSize and BeingSentResponses
> -
>
> Key: KAFKA-1597
> URL: https://issues.apache.org/jira/browse/KAFKA-1597
> Project: Kafka
>  Issue Type: New Feature
>  Components: core
>Reporter: Alexis Midon
>Priority: Minor
>  Labels: features
> Attachments: ResponseQueueSize.patch, ResponsesBeingSent.patch
>
>
> This patch adds two metrics:
> h3. ResponseQueueSize
> As of 0.8.1, the sizes of the response queues are [reported as different 
> metrics|https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/network/RequestChannel.scala#L127-L134]
>  - one per processor thread. This is not very ideal for different reasons:
> * charts have to sum the different metrics
> * the metrics collection system might not support 'wild card queries' like 
> {{sum:kafka.network.RequestChannel.Processor_*_ResponseQueueSize}} in which 
> case monitoring now depends on the number of configured network threads
> * monitoring the response by thread is not very valuable. However the global 
> number of responses is useful.
> * proposal*
> So this patch exposes the total number of queued responses as a metric 
> {{ResponseQueueSize}}
> *implementation*
> In {{RequestChannel}}, create a Gauge that adds up the size of the response 
> queues.
> h3. BeingSentResponses
> As of 0.8.1, the processor threads will poll responses from the queues and 
> attach them to the SelectionKey as fast as possible. The consequence of that 
> is that the response queues are not a good indicator of the number of 
> "in-flight" responses. The {{ServerSocketChannel}} acting as another queue of 
> response to be sent.
> The current metrics don't reflect the size of this "buffer", which is an 
> issue.
> *proposal*
> This patch adds a gauge that keeps track of the number of responses being 
> handled by the {{ServerSocketChannel}}.
> That new metric is named "BeingSentResponses" (who said naming was hard?)
> *implementation*
> To calculate that metric, the patch adds up the number of SelectionKeys 
> interested in writing, across processor threads.
> Another approach could be to keep all in-flight responses in a data structure 
> (let's say a map) shared by the processor threads. A response will be added 
> to that map when dequeued from the response queue, and removed when the write 
> is complete. The gauge will simply report the size of that map. I decided 
> against that second approach as it is more intrusive and requires some 
> additional bookkeeping to gather information already available through the 
> {{SelectionKey}}'s



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (KAFKA-1597) New metrics: ResponseQueueSize and BeingSentResponses

2014-08-20 Thread Alexis Midon (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexis Midon updated KAFKA-1597:


Attachment: (was: KAFKA-1594_ResponseQueueSize.patch)

> New metrics: ResponseQueueSize and BeingSentResponses
> -
>
> Key: KAFKA-1597
> URL: https://issues.apache.org/jira/browse/KAFKA-1597
> Project: Kafka
>  Issue Type: New Feature
>  Components: core
>Reporter: Alexis Midon
>Priority: Minor
>  Labels: features
> Attachments: ResponseQueueSize.patch, ResponsesBeingSent.patch
>
>
> This patch adds two metrics:
> h3. ResponseQueueSize
> As of 0.8.1, the sizes of the response queues are [reported as different 
> metrics|https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/network/RequestChannel.scala#L127-L134]
>  - one per processor thread. This is not very ideal for different reasons:
> * charts have to sum the different metrics
> * the metrics collection system might not support 'wild card queries' like 
> {{sum:kafka.network.RequestChannel.Processor_*_ResponseQueueSize}} in which 
> case monitoring now depends on the number of configured network threads
> * monitoring the response by thread is not very valuable. However the global 
> number of responses is useful.
> * proposal*
> So this patch exposes the total number of queued responses as a metric 
> {{ResponseQueueSize}}
> *implementation*
> In {{RequestChannel}}, create a Gauge that adds up the size of the response 
> queues.
> h3. BeingSentResponses
> As of 0.8.1, the processor threads will poll responses from the queues and 
> attach them to the SelectionKey as fast as possible. The consequence of that 
> is that the response queues are not a good indicator of the number of 
> "in-flight" responses. The {{ServerSocketChannel}} acting as another queue of 
> response to be sent.
> The current metrics don't reflect the size of this "buffer", which is an 
> issue.
> *proposal*
> This patch adds a gauge that keeps track of the number of responses being 
> handled by the {{ServerSocketChannel}}.
> That new metric is named "BeingSentResponses" (who said naming was hard?)
> *implementation*
> To calculate that metric, the patch adds up the number of SelectionKeys 
> interested in writing, across processor threads.
> Another approach could be to keep all in-flight responses in a data structure 
> (let's say a map) shared by the processor threads. A response will be added 
> to that map when dequeued from the response queue, and removed when the write 
> is complete. The gauge will simply report the size of that map. I decided 
> against that second approach as it is more intrusive and requires some 
> additional bookkeeping to gather information already available through the 
> {{SelectionKey}}'s



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (KAFKA-1597) New metrics: ResponseQueueSize and BeingSentResponses

2014-08-20 Thread Alexis Midon (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexis Midon updated KAFKA-1597:


Attachment: (was: KAFKA-1594_ResponsesBeingSent.patch)

> New metrics: ResponseQueueSize and BeingSentResponses
> -
>
> Key: KAFKA-1597
> URL: https://issues.apache.org/jira/browse/KAFKA-1597
> Project: Kafka
>  Issue Type: New Feature
>  Components: core
>Reporter: Alexis Midon
>Priority: Minor
>  Labels: features
> Attachments: ResponseQueueSize.patch, ResponsesBeingSent.patch
>
>
> This patch adds two metrics:
> h3. ResponseQueueSize
> As of 0.8.1, the sizes of the response queues are [reported as different 
> metrics|https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/network/RequestChannel.scala#L127-L134]
>  - one per processor thread. This is not very ideal for different reasons:
> * charts have to sum the different metrics
> * the metrics collection system might not support 'wild card queries' like 
> {{sum:kafka.network.RequestChannel.Processor_*_ResponseQueueSize}} in which 
> case monitoring now depends on the number of configured network threads
> * monitoring the response by thread is not very valuable. However the global 
> number of responses is useful.
> * proposal*
> So this patch exposes the total number of queued responses as a metric 
> {{ResponseQueueSize}}
> *implementation*
> In {{RequestChannel}}, create a Gauge that adds up the size of the response 
> queues.
> h3. BeingSentResponses
> As of 0.8.1, the processor threads will poll responses from the queues and 
> attach them to the SelectionKey as fast as possible. The consequence of that 
> is that the response queues are not a good indicator of the number of 
> "in-flight" responses. The {{ServerSocketChannel}} acting as another queue of 
> response to be sent.
> The current metrics don't reflect the size of this "buffer", which is an 
> issue.
> *proposal*
> This patch adds a gauge that keeps track of the number of responses being 
> handled by the {{ServerSocketChannel}}.
> That new metric is named "BeingSentResponses" (who said naming was hard?)
> *implementation*
> To calculate that metric, the patch adds up the number of SelectionKeys 
> interested in writing, across processor threads.
> Another approach could be to keep all in-flight responses in a data structure 
> (let's say a map) shared by the processor threads. A response will be added 
> to that map when dequeued from the response queue, and removed when the write 
> is complete. The gauge will simply report the size of that map. I decided 
> against that second approach as it is more intrusive and requires some 
> additional bookkeeping to gather information already available through the 
> {{SelectionKey}}'s



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (KAFKA-1597) New metrics: ResponseQueueSize and BeingSentResponses

2014-08-20 Thread Alexis Midon (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexis Midon updated KAFKA-1597:


Attachment: ResponsesBeingSent.patch
ResponseQueueSize.patch

> New metrics: ResponseQueueSize and BeingSentResponses
> -
>
> Key: KAFKA-1597
> URL: https://issues.apache.org/jira/browse/KAFKA-1597
> Project: Kafka
>  Issue Type: New Feature
>  Components: core
>Reporter: Alexis Midon
>Priority: Minor
>  Labels: features
> Attachments: ResponseQueueSize.patch, ResponsesBeingSent.patch
>
>
> This patch adds two metrics:
> h3. ResponseQueueSize
> As of 0.8.1, the sizes of the response queues are [reported as different 
> metrics|https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/network/RequestChannel.scala#L127-L134]
>  - one per processor thread. This is not very ideal for different reasons:
> * charts have to sum the different metrics
> * the metrics collection system might not support 'wild card queries' like 
> {{sum:kafka.network.RequestChannel.Processor_*_ResponseQueueSize}} in which 
> case monitoring now depends on the number of configured network threads
> * monitoring the response by thread is not very valuable. However the global 
> number of responses is useful.
> * proposal*
> So this patch exposes the total number of queued responses as a metric 
> {{ResponseQueueSize}}
> *implementation*
> In {{RequestChannel}}, create a Gauge that adds up the size of the response 
> queues.
> h3. BeingSentResponses
> As of 0.8.1, the processor threads will poll responses from the queues and 
> attach them to the SelectionKey as fast as possible. The consequence of that 
> is that the response queues are not a good indicator of the number of 
> "in-flight" responses. The {{ServerSocketChannel}} acting as another queue of 
> response to be sent.
> The current metrics don't reflect the size of this "buffer", which is an 
> issue.
> *proposal*
> This patch adds a gauge that keeps track of the number of responses being 
> handled by the {{ServerSocketChannel}}.
> That new metric is named "BeingSentResponses" (who said naming was hard?)
> *implementation*
> To calculate that metric, the patch adds up the number of SelectionKeys 
> interested in writing, across processor threads.
> Another approach could be to keep all in-flight responses in a data structure 
> (let's say a map) shared by the processor threads. A response will be added 
> to that map when dequeued from the response queue, and removed when the write 
> is complete. The gauge will simply report the size of that map. I decided 
> against that second approach as it is more intrusive and requires some 
> additional bookkeeping to gather information already available through the 
> {{SelectionKey}}'s



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (KAFKA-1597) New metrics: ResponseQueueSize and BeingSentResponses

2014-08-20 Thread Alexis Midon (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexis Midon updated KAFKA-1597:


Attachment: (was: KAFKA-1594_BeingSentResponses.patch)

> New metrics: ResponseQueueSize and BeingSentResponses
> -
>
> Key: KAFKA-1597
> URL: https://issues.apache.org/jira/browse/KAFKA-1597
> Project: Kafka
>  Issue Type: New Feature
>  Components: core
>Reporter: Alexis Midon
>Priority: Minor
>  Labels: features
> Attachments: KAFKA-1594_ResponseQueueSize.patch, 
> KAFKA-1594_ResponsesBeingSent.patch
>
>
> This patch adds two metrics:
> h3. ResponseQueueSize
> As of 0.8.1, the sizes of the response queues are [reported as different 
> metrics|https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/network/RequestChannel.scala#L127-L134]
>  - one per processor thread. This is not very ideal for different reasons:
> * charts have to sum the different metrics
> * the metrics collection system might not support 'wild card queries' like 
> {{sum:kafka.network.RequestChannel.Processor_*_ResponseQueueSize}} in which 
> case monitoring now depends on the number of configured network threads
> * monitoring the response by thread is not very valuable. However the global 
> number of responses is useful.
> * proposal*
> So this patch exposes the total number of queued responses as a metric 
> {{ResponseQueueSize}}
> *implementation*
> In {{RequestChannel}}, create a Gauge that adds up the size of the response 
> queues.
> h3. BeingSentResponses
> As of 0.8.1, the processor threads will poll responses from the queues and 
> attach them to the SelectionKey as fast as possible. The consequence of that 
> is that the response queues are not a good indicator of the number of 
> "in-flight" responses. The {{ServerSocketChannel}} acting as another queue of 
> response to be sent.
> The current metrics don't reflect the size of this "buffer", which is an 
> issue.
> *proposal*
> This patch adds a gauge that keeps track of the number of responses being 
> handled by the {{ServerSocketChannel}}.
> That new metric is named "BeingSentResponses" (who said naming was hard?)
> *implementation*
> To calculate that metric, the patch adds up the number of SelectionKeys 
> interested in writing, across processor threads.
> Another approach could be to keep all in-flight responses in a data structure 
> (let's say a map) shared by the processor threads. A response will be added 
> to that map when dequeued from the response queue, and removed when the write 
> is complete. The gauge will simply report the size of that map. I decided 
> against that second approach as it is more intrusive and requires some 
> additional bookkeeping to gather information already available through the 
> {{SelectionKey}}'s



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (KAFKA-1597) New metrics: ResponseQueueSize and BeingSentResponses

2014-08-20 Thread Alexis Midon (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexis Midon updated KAFKA-1597:


Attachment: KAFKA-1594_ResponsesBeingSent.patch

rename {{BeingSentResponses}} into {{ResponsesBeingSent}}

> New metrics: ResponseQueueSize and BeingSentResponses
> -
>
> Key: KAFKA-1597
> URL: https://issues.apache.org/jira/browse/KAFKA-1597
> Project: Kafka
>  Issue Type: New Feature
>  Components: core
>Reporter: Alexis Midon
>Priority: Minor
>  Labels: features
> Attachments: KAFKA-1594_ResponseQueueSize.patch, 
> KAFKA-1594_ResponsesBeingSent.patch
>
>
> This patch adds two metrics:
> h3. ResponseQueueSize
> As of 0.8.1, the sizes of the response queues are [reported as different 
> metrics|https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/network/RequestChannel.scala#L127-L134]
>  - one per processor thread. This is not very ideal for different reasons:
> * charts have to sum the different metrics
> * the metrics collection system might not support 'wild card queries' like 
> {{sum:kafka.network.RequestChannel.Processor_*_ResponseQueueSize}} in which 
> case monitoring now depends on the number of configured network threads
> * monitoring the response by thread is not very valuable. However the global 
> number of responses is useful.
> * proposal*
> So this patch exposes the total number of queued responses as a metric 
> {{ResponseQueueSize}}
> *implementation*
> In {{RequestChannel}}, create a Gauge that adds up the size of the response 
> queues.
> h3. BeingSentResponses
> As of 0.8.1, the processor threads will poll responses from the queues and 
> attach them to the SelectionKey as fast as possible. The consequence of that 
> is that the response queues are not a good indicator of the number of 
> "in-flight" responses. The {{ServerSocketChannel}} acting as another queue of 
> response to be sent.
> The current metrics don't reflect the size of this "buffer", which is an 
> issue.
> *proposal*
> This patch adds a gauge that keeps track of the number of responses being 
> handled by the {{ServerSocketChannel}}.
> That new metric is named "BeingSentResponses" (who said naming was hard?)
> *implementation*
> To calculate that metric, the patch adds up the number of SelectionKeys 
> interested in writing, across processor threads.
> Another approach could be to keep all in-flight responses in a data structure 
> (let's say a map) shared by the processor threads. A response will be added 
> to that map when dequeued from the response queue, and removed when the write 
> is complete. The gauge will simply report the size of that map. I decided 
> against that second approach as it is more intrusive and requires some 
> additional bookkeeping to gather information already available through the 
> {{SelectionKey}}'s



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1597) New metrics: ResponseQueueSize and BeingSentResponses

2014-08-19 Thread Alexis Midon (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14102480#comment-14102480
 ] 

Alexis Midon commented on KAFKA-1597:
-


# NetworkProcessorAvgIdlePercent and RequestHandlerAvgIdlePercent don't give 
information regarding response processing
# could you explain what RequestBeingProcessed is? how different is it from the 
number of I/O threads? My understanding of the code is that requests are 
processed by the I/O threads process requests.
https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/server/KafkaServer.scala#L90
https://dchtm6r471mui.cloudfront.net/hackpad.com_SIX7UAFo7hB_p.186494_1405370457117_KafkaThreadingModel.png
# Regarding InflightRequests. A request flows through a sequence of 
transformations/states. To me, measuring every component of that pipeline is 
the most basic and reliable way of understanding the behavior of the system. It 
also gives users the option to use the data in any way they might want. So i'd 
that exposing these root metrics would be great.  Considering that only 
BeingSentResponses is missing, the implementation cost is pretty low.





> New metrics: ResponseQueueSize and BeingSentResponses
> -
>
> Key: KAFKA-1597
> URL: https://issues.apache.org/jira/browse/KAFKA-1597
> Project: Kafka
>  Issue Type: New Feature
>  Components: core
>Reporter: Alexis Midon
>Priority: Minor
>  Labels: features
> Attachments: KAFKA-1594_BeingSentResponses.patch, 
> KAFKA-1594_ResponseQueueSize.patch
>
>
> This patch adds two metrics:
> h3. ResponseQueueSize
> As of 0.8.1, the sizes of the response queues are [reported as different 
> metrics|https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/network/RequestChannel.scala#L127-L134]
>  - one per processor thread. This is not very ideal for different reasons:
> * charts have to sum the different metrics
> * the metrics collection system might not support 'wild card queries' like 
> {{sum:kafka.network.RequestChannel.Processor_*_ResponseQueueSize}} in which 
> case monitoring now depends on the number of configured network threads
> * monitoring the response by thread is not very valuable. However the global 
> number of responses is useful.
> * proposal*
> So this patch exposes the total number of queued responses as a metric 
> {{ResponseQueueSize}}
> *implementation*
> In {{RequestChannel}}, create a Gauge that adds up the size of the response 
> queues.
> h3. BeingSentResponses
> As of 0.8.1, the processor threads will poll responses from the queues and 
> attach them to the SelectionKey as fast as possible. The consequence of that 
> is that the response queues are not a good indicator of the number of 
> "in-flight" responses. The {{ServerSocketChannel}} acting as another queue of 
> response to be sent.
> The current metrics don't reflect the size of this "buffer", which is an 
> issue.
> *proposal*
> This patch adds a gauge that keeps track of the number of responses being 
> handled by the {{ServerSocketChannel}}.
> That new metric is named "BeingSentResponses" (who said naming was hard?)
> *implementation*
> To calculate that metric, the patch adds up the number of SelectionKeys 
> interested in writing, across processor threads.
> Another approach could be to keep all in-flight responses in a data structure 
> (let's say a map) shared by the processor threads. A response will be added 
> to that map when dequeued from the response queue, and removed when the write 
> is complete. The gauge will simply report the size of that map. I decided 
> against that second approach as it is more intrusive and requires some 
> additional bookkeeping to gather information already available through the 
> {{SelectionKey}}'s



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (KAFKA-1597) New metrics: ResponseQueueSize and BeingSentResponses

2014-08-18 Thread Alexis Midon (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14101515#comment-14101515
 ] 

Alexis Midon edited comment on KAFKA-1597 at 8/18/14 11:22 PM:
---

* regarding {{ResponseQueueSize}}, I think I'm missing some context. Why is an 
analysis by response thread useful? Your experience might help me understand. 
It seems to me that the total number of queued responses is the only metric 
that helps understand the load on the Socket server (assuming it has some kind 
of back-pressure mechanism). 

* with {{BeingSentResponses}}, my goal was to avoid having a blind spot with 
the Socket server. Every element on the "request handling chain" is monitored 
except the last, the socket server. I considered tracking {{InflightResponses}} 
which would be {{ResponseQueueSize + BeingSentResponses}} but I decided against 
since having 2 metrics helps in monitoring 2 different component of the 
"response chain". 
I think that  {{InflightRequests}} will have the same issue: it would be a 
superset of already available metrics {{InflightRequest = RequestQueueSize + 
number of I/O threads+ ResponseQueueSize + BeingSentResponses}}. And I think 
the break down would be more useful.


was (Author: alexismidon):
* regarding {{ResponseQueueSize}}, I think I'm missing some context. Why is an 
analysis by response thread useful? Your experience might help me understand. 
It seems to me that the total number of queued responses is the only metric 
that helps understand the load on the Socket server (assuming it has some kind 
of back-pressure mechanism). 

* with {{BeingSentResponses}}, my goal was to avoid having a blind spot with 
the Socket server. Every element on the "request handling chain" is monitored 
except the last, the socket server. I considered tracking {{InflightResponses}} 
which would be {{ResponseQueueSize + BeingSentResponses }} but I decided 
against since having 2 metrics helps in monitoring 2 different component of the 
"response chain". 
I think that  {{InflightRequests}} will have the same issue: it would be a 
superset of already available metrics {{InflightRequest = RequestQueueSize + 
number of I/O threads+ ResponseQueueSize + BeingSentResponses}}. And I think 
the break down would be more useful.

> New metrics: ResponseQueueSize and BeingSentResponses
> -
>
> Key: KAFKA-1597
> URL: https://issues.apache.org/jira/browse/KAFKA-1597
> Project: Kafka
>  Issue Type: New Feature
>  Components: core
>Reporter: Alexis Midon
>Priority: Minor
>  Labels: features
> Attachments: KAFKA-1594_BeingSentResponses.patch, 
> KAFKA-1594_ResponseQueueSize.patch
>
>
> This patch adds two metrics:
> h3. ResponseQueueSize
> As of 0.8.1, the sizes of the response queues are [reported as different 
> metrics|https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/network/RequestChannel.scala#L127-L134]
>  - one per processor thread. This is not very ideal for different reasons:
> * charts have to sum the different metrics
> * the metrics collection system might not support 'wild card queries' like 
> {{sum:kafka.network.RequestChannel.Processor_*_ResponseQueueSize}} in which 
> case monitoring now depends on the number of configured network threads
> * monitoring the response by thread is not very valuable. However the global 
> number of responses is useful.
> * proposal*
> So this patch exposes the total number of queued responses as a metric 
> {{ResponseQueueSize}}
> *implementation*
> In {{RequestChannel}}, create a Gauge that adds up the size of the response 
> queues.
> h3. BeingSentResponses
> As of 0.8.1, the processor threads will poll responses from the queues and 
> attach them to the SelectionKey as fast as possible. The consequence of that 
> is that the response queues are not a good indicator of the number of 
> "in-flight" responses. The {{ServerSocketChannel}} acting as another queue of 
> response to be sent.
> The current metrics don't reflect the size of this "buffer", which is an 
> issue.
> *proposal*
> This patch adds a gauge that keeps track of the number of responses being 
> handled by the {{ServerSocketChannel}}.
> That new metric is named "BeingSentResponses" (who said naming was hard?)
> *implementation*
> To calculate that metric, the patch adds up the number of SelectionKeys 
> interested in writing, across processor threads.
> Another approach could be to keep all in-flight responses in a data structure 
> (let's say a map) shared by the processor threads. A response will be added 
> to that map when dequeued from the response queue, and removed when the write 
> is complete. The gauge will simply report the size of that map. I decided 
> against that second approach as it is more int

[jira] [Comment Edited] (KAFKA-1597) New metrics: ResponseQueueSize and BeingSentResponses

2014-08-18 Thread Alexis Midon (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14101515#comment-14101515
 ] 

Alexis Midon edited comment on KAFKA-1597 at 8/18/14 11:22 PM:
---

* regarding {{ResponseQueueSize}}, I think I'm missing some context. Why is an 
analysis by response thread useful? Your experience might help me understand. 
It seems to me that the total number of queued responses is the only metric 
that helps understand the load on the Socket server (assuming it has some kind 
of back-pressure mechanism). 

* with {{BeingSentResponses}}, my goal was to avoid having a blind spot with 
the Socket server. Every element on the "request handling chain" is monitored 
except the last, the socket server. I considered tracking {{InflightResponses}} 
which would be {{ResponseQueueSize + BeingSentResponses }} but I decided 
against since having 2 metrics helps in monitoring 2 different component of the 
"response chain". 
I think that  {{InflightRequests}} will have the same issue: it would be a 
superset of already available metrics {{InflightRequest = RequestQueueSize + 
number of I/O threads+ ResponseQueueSize + BeingSentResponses}}. And I think 
the break down would be more useful.


was (Author: alexismidon):
* regarding {{ResponseQueueSize}}, I think I'm missing some context. Why is an 
analysis by response thread useful? Your experience might help me understand. 
It seems to me that the total number of queued responses is the only metric 
that helps understand the load on the Socket server (assuming it has some kind 
of back-pressure mechanism). 

* with {{BeingSentResponses}}, my goal was to avoid having a blind spot with 
the Socket server. Every element on the "request handling chain" is monitored 
except the last, the socket server. I considered tracking {{InflightResponses}} 
which would be {{ ResponseQueueSize + BeingSentResponses }} but I decided 
against since having 2 metrics helps in monitoring 2 different component of the 
"response chain". 
I think that  {{InflightRequests}} will have the same issue: it would be a 
superset of already available metrics {{InflightRequest = RequestQueueSize + 
number of I/O threads+ ResponseQueueSize + BeingSentResponses}}. And I think 
the break down would be more useful.

> New metrics: ResponseQueueSize and BeingSentResponses
> -
>
> Key: KAFKA-1597
> URL: https://issues.apache.org/jira/browse/KAFKA-1597
> Project: Kafka
>  Issue Type: New Feature
>  Components: core
>Reporter: Alexis Midon
>Priority: Minor
>  Labels: features
> Attachments: KAFKA-1594_BeingSentResponses.patch, 
> KAFKA-1594_ResponseQueueSize.patch
>
>
> This patch adds two metrics:
> h3. ResponseQueueSize
> As of 0.8.1, the sizes of the response queues are [reported as different 
> metrics|https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/network/RequestChannel.scala#L127-L134]
>  - one per processor thread. This is not very ideal for different reasons:
> * charts have to sum the different metrics
> * the metrics collection system might not support 'wild card queries' like 
> {{sum:kafka.network.RequestChannel.Processor_*_ResponseQueueSize}} in which 
> case monitoring now depends on the number of configured network threads
> * monitoring the response by thread is not very valuable. However the global 
> number of responses is useful.
> * proposal*
> So this patch exposes the total number of queued responses as a metric 
> {{ResponseQueueSize}}
> *implementation*
> In {{RequestChannel}}, create a Gauge that adds up the size of the response 
> queues.
> h3. BeingSentResponses
> As of 0.8.1, the processor threads will poll responses from the queues and 
> attach them to the SelectionKey as fast as possible. The consequence of that 
> is that the response queues are not a good indicator of the number of 
> "in-flight" responses. The {{ServerSocketChannel}} acting as another queue of 
> response to be sent.
> The current metrics don't reflect the size of this "buffer", which is an 
> issue.
> *proposal*
> This patch adds a gauge that keeps track of the number of responses being 
> handled by the {{ServerSocketChannel}}.
> That new metric is named "BeingSentResponses" (who said naming was hard?)
> *implementation*
> To calculate that metric, the patch adds up the number of SelectionKeys 
> interested in writing, across processor threads.
> Another approach could be to keep all in-flight responses in a data structure 
> (let's say a map) shared by the processor threads. A response will be added 
> to that map when dequeued from the response queue, and removed when the write 
> is complete. The gauge will simply report the size of that map. I decided 
> against that second approach as it is more i

[jira] [Commented] (KAFKA-1597) New metrics: ResponseQueueSize and BeingSentResponses

2014-08-18 Thread Alexis Midon (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14101515#comment-14101515
 ] 

Alexis Midon commented on KAFKA-1597:
-

* regarding {{ResponseQueueSize}}, I think I'm missing some context. Why is an 
analysis by response thread useful? Your experience might help me understand. 
It seems to me that the total number of queued responses is the only metric 
that helps understand the load on the Socket server (assuming it has some kind 
of back-pressure mechanism). 

* with {{BeingSentResponses}}, my goal was to avoid having a blind spot with 
the Socket server. Every element on the "request handling chain" is monitored 
except the last, the socket server. I considered tracking {{InflightResponses}} 
which would be {{ ResponseQueueSize + BeingSentResponses }} but I decided 
against since having 2 metrics helps in monitoring 2 different component of the 
"response chain". 
I think that  {{InflightRequests}} will have the same issue: it would be a 
superset of already available metrics {{InflightRequest = RequestQueueSize + 
number of I/O threads+ ResponseQueueSize + BeingSentResponses}}. And I think 
the break down would be more useful.

> New metrics: ResponseQueueSize and BeingSentResponses
> -
>
> Key: KAFKA-1597
> URL: https://issues.apache.org/jira/browse/KAFKA-1597
> Project: Kafka
>  Issue Type: New Feature
>  Components: core
>Reporter: Alexis Midon
>Priority: Minor
>  Labels: features
> Attachments: KAFKA-1594_BeingSentResponses.patch, 
> KAFKA-1594_ResponseQueueSize.patch
>
>
> This patch adds two metrics:
> h3. ResponseQueueSize
> As of 0.8.1, the sizes of the response queues are [reported as different 
> metrics|https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/network/RequestChannel.scala#L127-L134]
>  - one per processor thread. This is not very ideal for different reasons:
> * charts have to sum the different metrics
> * the metrics collection system might not support 'wild card queries' like 
> {{sum:kafka.network.RequestChannel.Processor_*_ResponseQueueSize}} in which 
> case monitoring now depends on the number of configured network threads
> * monitoring the response by thread is not very valuable. However the global 
> number of responses is useful.
> * proposal*
> So this patch exposes the total number of queued responses as a metric 
> {{ResponseQueueSize}}
> *implementation*
> In {{RequestChannel}}, create a Gauge that adds up the size of the response 
> queues.
> h3. BeingSentResponses
> As of 0.8.1, the processor threads will poll responses from the queues and 
> attach them to the SelectionKey as fast as possible. The consequence of that 
> is that the response queues are not a good indicator of the number of 
> "in-flight" responses. The {{ServerSocketChannel}} acting as another queue of 
> response to be sent.
> The current metrics don't reflect the size of this "buffer", which is an 
> issue.
> *proposal*
> This patch adds a gauge that keeps track of the number of responses being 
> handled by the {{ServerSocketChannel}}.
> That new metric is named "BeingSentResponses" (who said naming was hard?)
> *implementation*
> To calculate that metric, the patch adds up the number of SelectionKeys 
> interested in writing, across processor threads.
> Another approach could be to keep all in-flight responses in a data structure 
> (let's say a map) shared by the processor threads. A response will be added 
> to that map when dequeued from the response queue, and removed when the write 
> is complete. The gauge will simply report the size of that map. I decided 
> against that second approach as it is more intrusive and requires some 
> additional bookkeeping to gather information already available through the 
> {{SelectionKey}}'s



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1594) new metrics: ResponseQueueSize and BeingSentResponses

2014-08-14 Thread Alexis Midon (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14097722#comment-14097722
 ] 

Alexis Midon commented on KAFKA-1594:
-

you know what, I closed KAFKA-1593 thinking you could re-open KAFKA-1594. The 
irony.. ;)

I created a third ticket, KAFKA-1597.


> new metrics: ResponseQueueSize and BeingSentResponses
> -
>
> Key: KAFKA-1594
> URL: https://issues.apache.org/jira/browse/KAFKA-1594
> Project: Kafka
>  Issue Type: New Feature
>  Components: core
>Reporter: Alexis Midon
>Priority: Minor
>  Labels: features
> Attachments: KAFKA-1594_BeingSentResponses.patch, 
> KAFKA-1594_ResponseQueueSize.patch
>
>
> This patch adds two metrics:
> h3. ResponseQueueSize
> As of 0.8.1, the sizes of the response queues are [reported as different 
> metrics|https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/network/RequestChannel.scala#L127-L134]
>  - one per processor thread. This is not very ideal for different reasons:
> * charts have to sum the different metrics
> * the metrics collection system might not support 'wild card queries' like 
> {{sum:kafka.network.RequestChannel.Processor_*_ResponseQueueSize}} in which 
> case monitoring now depends on the number of configured network threads
> * monitoring the response by thread is not very valuable. However the global 
> number of responses is useful.
> * proposal*
> So this patch exposes the total number of queued responses as a metric 
> {{ResponseQueueSize}}
> *implementation*
> In {{RequestChannel}}, create a Gauge that adds up the size of the response 
> queues.
> h3. BeingSentResponses
> As of 0.8.1, the processor threads will poll responses from the queues and 
> attach them to the SelectionKey as fast as possible. The consequence of that 
> is that the response queues are not a good indicator of the number of 
> "in-flight" responses. The {{ServerSocketChannel}} acting as another queue of 
> response to be sent.
> The current metrics don't reflect the size of this "buffer", which is an 
> issue.
> *proposal*
> This patch adds a gauge that keeps track of the number of responses being 
> handled by the {{ServerSocketChannel}}.
> That new metric is named "BeingSentResponses" (who said naming was hard?)
> *implementation*
> To calculate that metric, the patch adds up the number of SelectionKeys 
> interested in writing, across processor threads.
> Another approach could be to keep all in-flight responses in a data structure 
> (let's say a map) shared by the processor threads. A response will be added 
> to that map when dequeued from the response queue, and removed when the write 
> is complete. The gauge will simply report the size of that map. I decided 
> against that second approach as it is more intrusive and requires some 
> additional bookkeeping to gather information already available through the 
> {{SelectionKey}}'s



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (KAFKA-1594) 2 new metrics

2014-08-14 Thread Alexis Midon (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexis Midon updated KAFKA-1594:


Summary: 2 new metrics  (was: new metrics: ResponseQueueSize and 
BeingSentResponses)

> 2 new metrics
> -
>
> Key: KAFKA-1594
> URL: https://issues.apache.org/jira/browse/KAFKA-1594
> Project: Kafka
>  Issue Type: New Feature
>  Components: core
>Reporter: Alexis Midon
>Priority: Minor
>  Labels: features
> Attachments: KAFKA-1594_BeingSentResponses.patch, 
> KAFKA-1594_ResponseQueueSize.patch
>
>
> This patch adds two metrics:
> h3. ResponseQueueSize
> As of 0.8.1, the sizes of the response queues are [reported as different 
> metrics|https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/network/RequestChannel.scala#L127-L134]
>  - one per processor thread. This is not very ideal for different reasons:
> * charts have to sum the different metrics
> * the metrics collection system might not support 'wild card queries' like 
> {{sum:kafka.network.RequestChannel.Processor_*_ResponseQueueSize}} in which 
> case monitoring now depends on the number of configured network threads
> * monitoring the response by thread is not very valuable. However the global 
> number of responses is useful.
> * proposal*
> So this patch exposes the total number of queued responses as a metric 
> {{ResponseQueueSize}}
> *implementation*
> In {{RequestChannel}}, create a Gauge that adds up the size of the response 
> queues.
> h3. BeingSentResponses
> As of 0.8.1, the processor threads will poll responses from the queues and 
> attach them to the SelectionKey as fast as possible. The consequence of that 
> is that the response queues are not a good indicator of the number of 
> "in-flight" responses. The {{ServerSocketChannel}} acting as another queue of 
> response to be sent.
> The current metrics don't reflect the size of this "buffer", which is an 
> issue.
> *proposal*
> This patch adds a gauge that keeps track of the number of responses being 
> handled by the {{ServerSocketChannel}}.
> That new metric is named "BeingSentResponses" (who said naming was hard?)
> *implementation*
> To calculate that metric, the patch adds up the number of SelectionKeys 
> interested in writing, across processor threads.
> Another approach could be to keep all in-flight responses in a data structure 
> (let's say a map) shared by the processor threads. A response will be added 
> to that map when dequeued from the response queue, and removed when the write 
> is complete. The gauge will simply report the size of that map. I decided 
> against that second approach as it is more intrusive and requires some 
> additional bookkeeping to gather information already available through the 
> {{SelectionKey}}'s



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (KAFKA-1597) New metrics: ResponseQueueSize and BeingSentResponses

2014-08-14 Thread Alexis Midon (JIRA)
Alexis Midon created KAFKA-1597:
---

 Summary: New metrics: ResponseQueueSize and BeingSentResponses
 Key: KAFKA-1597
 URL: https://issues.apache.org/jira/browse/KAFKA-1597
 Project: Kafka
  Issue Type: New Feature
  Components: core
Reporter: Alexis Midon
Priority: Minor
 Attachments: KAFKA-1594_BeingSentResponses.patch, 
KAFKA-1594_ResponseQueueSize.patch

This patch adds two metrics:

h3. ResponseQueueSize
As of 0.8.1, the sizes of the response queues are [reported as different 
metrics|https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/network/RequestChannel.scala#L127-L134]
 - one per processor thread. This is not very ideal for different reasons:
* charts have to sum the different metrics
* the metrics collection system might not support 'wild card queries' like 
{{sum:kafka.network.RequestChannel.Processor_*_ResponseQueueSize}} in which 
case monitoring now depends on the number of configured network threads
* monitoring the response by thread is not very valuable. However the global 
number of responses is useful.

* proposal*
So this patch exposes the total number of queued responses as a metric 
{{ResponseQueueSize}}

*implementation*
In {{RequestChannel}}, create a Gauge that adds up the size of the response 
queues.


h3. BeingSentResponses
As of 0.8.1, the processor threads will poll responses from the queues and 
attach them to the SelectionKey as fast as possible. The consequence of that is 
that the response queues are not a good indicator of the number of "in-flight" 
responses. The {{ServerSocketChannel}} acting as another queue of response to 
be sent.
The current metrics don't reflect the size of this "buffer", which is an issue.

*proposal*
This patch adds a gauge that keeps track of the number of responses being 
handled by the {{ServerSocketChannel}}.
That new metric is named "BeingSentResponses" (who said naming was hard?)

*implementation*
To calculate that metric, the patch adds up the number of SelectionKeys 
interested in writing, across processor threads.

Another approach could be to keep all in-flight responses in a data structure 
(let's say a map) shared by the processor threads. A response will be added to 
that map when dequeued from the response queue, and removed when the write is 
complete. The gauge will simply report the size of that map. I decided against 
that second approach as it is more intrusive and requires some additional 
bookkeeping to gather information already available through the 
{{SelectionKey}}'s





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1594) new metrics: ResponseQueueSize and BeingSentResponses

2014-08-14 Thread Alexis Midon (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14097100#comment-14097100
 ] 

Alexis Midon commented on KAFKA-1594:
-

Hi [~nehanarkhede]

could you please re-open this issue, it has more details than KAFKA-1593 and 
patches are also attached.
I closed KAFKA-1593.

Sorry for the confusion,

Alexis

> new metrics: ResponseQueueSize and BeingSentResponses
> -
>
> Key: KAFKA-1594
> URL: https://issues.apache.org/jira/browse/KAFKA-1594
> Project: Kafka
>  Issue Type: New Feature
>  Components: core
>Reporter: Alexis Midon
>Priority: Minor
>  Labels: features
> Attachments: KAFKA-1594_BeingSentResponses.patch, 
> KAFKA-1594_ResponseQueueSize.patch
>
>
> This patch adds two metrics:
> h3. ResponseQueueSize
> As of 0.8.1, the sizes of the response queues are [reported as different 
> metrics|https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/network/RequestChannel.scala#L127-L134]
>  - one per processor thread. This is not very ideal for different reasons:
> * charts have to sum the different metrics
> * the metrics collection system might not support 'wild card queries' like 
> {{sum:kafka.network.RequestChannel.Processor_*_ResponseQueueSize}} in which 
> case monitoring now depends on the number of configured network threads
> * monitoring the response by thread is not very valuable. However the global 
> number of responses is useful.
> * proposal*
> So this patch exposes the total number of queued responses as a metric 
> {{ResponseQueueSize}}
> *implementation*
> In {{RequestChannel}}, create a Gauge that adds up the size of the response 
> queues.
> h3. BeingSentResponses
> As of 0.8.1, the processor threads will poll responses from the queues and 
> attach them to the SelectionKey as fast as possible. The consequence of that 
> is that the response queues are not a good indicator of the number of 
> "in-flight" responses. The {{ServerSocketChannel}} acting as another queue of 
> response to be sent.
> The current metrics don't reflect the size of this "buffer", which is an 
> issue.
> *proposal*
> This patch adds a gauge that keeps track of the number of responses being 
> handled by the {{ServerSocketChannel}}.
> That new metric is named "BeingSentResponses" (who said naming was hard?)
> *implementation*
> To calculate that metric, the patch adds up the number of SelectionKeys 
> interested in writing, across processor threads.
> Another approach could be to keep all in-flight responses in a data structure 
> (let's say a map) shared by the processor threads. A response will be added 
> to that map when dequeued from the response queue, and removed when the write 
> is complete. The gauge will simply report the size of that map. I decided 
> against that second approach as it is more intrusive and requires some 
> additional bookkeeping to gather information already available through the 
> {{SelectionKey}}'s



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (KAFKA-1593) new metrics: ResponseQueueSize and BeingSentResponses

2014-08-14 Thread Alexis Midon (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexis Midon resolved KAFKA-1593.
-

Resolution: Duplicate

created by error as a duplicate of KAFKA-1594

> new metrics: ResponseQueueSize and BeingSentResponses
> -
>
> Key: KAFKA-1593
> URL: https://issues.apache.org/jira/browse/KAFKA-1593
> Project: Kafka
>  Issue Type: New Feature
>  Components: core
>Reporter: Alexis Midon
>Priority: Minor
>
> This patch adds two metrics:
> h3. ResponseQueueSize
> As of 0.8.1, the sizes of the response queues are [reported as different 
> metrics|https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/network/RequestChannel.scala#L127-L134]
>  - one per processor thread. This is not very ideal for different reasons:
> * charts have to sum the different metrics
> * the metrics collection system might not support 'wild card queries' like 
> {{sum:kafka.network.RequestChannel.Processor_*_ResponseQueueSize}} in which 
> case monitoring now depends on the number of configured network threads
> * monitoring the response by thread is not very valuable. However the global 
> number of responses is useful.
> * proposal*
> So this patch exposes the total number of queued responses as a metric 
> {{ResponseQueueSize}}
> *implementation*
> In {{RequestChannel}}, create a Gauge that adds up the size of the response 
> queues.
> *BeingSentResponses
> As of 0.8.1, the processor threads will poll responses from the queues and 
> attach them to the SelectionKey as fast as possible. The consequence of that 
> is that the response queues are not a good indicator of the number of 
> "in-flight" responses. The {{ServerSocketChannel}} acting as another queue of 
> response to be sent.
> The current metrics don't reflect the size of this "buffer", which is an 
> issue.
> *proposal*
> This patch adds a gauge that keeps track of the number of responses being 
> handled by the {{ServerSocketChannel}}.
> That new metric is named "BeingSentResponses" (who said naming was hard?)
> *implementation*
> To calculate that metric, the patch adds up the number of SelectionKeys 
> interested in writing across processor threads.
> An another approach could be to keep all in-flight responses in a data 
> structure (let's say a map) shared by the processor thread. A response will 
> be added to that map when dequeued from the response queue, and removed when 
> the write is complete. I decided agains that second approach as it is more 
> intrusive and requires some additional bookkeeping to gather information 
> already available through the {{SelectionKey}}'s



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (KAFKA-1594) new metrics: ResponseQueueSize and BeingSentResponses

2014-08-13 Thread Alexis Midon (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexis Midon updated KAFKA-1594:


Description: 
This patch adds two metrics:

h3. ResponseQueueSize
As of 0.8.1, the sizes of the response queues are [reported as different 
metrics|https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/network/RequestChannel.scala#L127-L134]
 - one per processor thread. This is not very ideal for different reasons:
* charts have to sum the different metrics
* the metrics collection system might not support 'wild card queries' like 
{{sum:kafka.network.RequestChannel.Processor_*_ResponseQueueSize}} in which 
case monitoring now depends on the number of configured network threads
* monitoring the response by thread is not very valuable. However the global 
number of responses is useful.

* proposal*
So this patch exposes the total number of queued responses as a metric 
{{ResponseQueueSize}}

*implementation*
In {{RequestChannel}}, create a Gauge that adds up the size of the response 
queues.


h3. BeingSentResponses
As of 0.8.1, the processor threads will poll responses from the queues and 
attach them to the SelectionKey as fast as possible. The consequence of that is 
that the response queues are not a good indicator of the number of "in-flight" 
responses. The {{ServerSocketChannel}} acting as another queue of response to 
be sent.
The current metrics don't reflect the size of this "buffer", which is an issue.

*proposal*
This patch adds a gauge that keeps track of the number of responses being 
handled by the {{ServerSocketChannel}}.
That new metric is named "BeingSentResponses" (who said naming was hard?)

*implementation*
To calculate that metric, the patch adds up the number of SelectionKeys 
interested in writing, across processor threads.

Another approach could be to keep all in-flight responses in a data structure 
(let's say a map) shared by the processor threads. A response will be added to 
that map when dequeued from the response queue, and removed when the write is 
complete. The gauge will simply report the size of that map. I decided against 
that second approach as it is more intrusive and requires some additional 
bookkeeping to gather information already available through the 
{{SelectionKey}}'s



  was:
This patch adds two metrics:

h3. ResponseQueueSize
As of 0.8.1, the sizes of the response queues are [reported as different 
metrics|https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/network/RequestChannel.scala#L127-L134]
 - one per processor thread. This is not very ideal for different reasons:
* charts have to sum the different metrics
* the metrics collection system might not support 'wild card queries' like 
{{sum:kafka.network.RequestChannel.Processor_*_ResponseQueueSize}} in which 
case monitoring now depends on the number of configured network threads
* monitoring the response by thread is not very valuable. However the global 
number of responses is useful.

* proposal*
So this patch exposes the total number of queued responses as a metric 
{{ResponseQueueSize}}

*implementation*
In {{RequestChannel}}, create a Gauge that adds up the size of the response 
queues.


h3. BeingSentResponses
As of 0.8.1, the processor threads will poll responses from the queues and 
attach them to the SelectionKey as fast as possible. The consequence of that is 
that the response queues are not a good indicator of the number of "in-flight" 
responses. The {{ServerSocketChannel}} acting as another queue of response to 
be sent.
The current metrics don't reflect the size of this "buffer", which is an issue.

*proposal*
This patch adds a gauge that keeps track of the number of responses being 
handled by the {{ServerSocketChannel}}.
That new metric is named "BeingSentResponses" (who said naming was hard?)

*implementation*
To calculate that metric, the patch adds up the number of SelectionKeys 
interested in writing across processor threads.

Another approach could be to keep all in-flight responses in a data structure 
(let's say a map) shared by the processor threads. A response will be added to 
that map when dequeued from the response queue, and removed when the write is 
complete. The gauge will simply report the size of that map. I decided against 
that second approach as it is more intrusive and requires some additional 
bookkeeping to gather information already available through the 
{{SelectionKey}}'s




> new metrics: ResponseQueueSize and BeingSentResponses
> -
>
> Key: KAFKA-1594
> URL: https://issues.apache.org/jira/browse/KAFKA-1594
> Project: Kafka
>  Issue Type: New Feature
>  Components: core
>Reporter: Alexis Midon
>Priority: Minor
>  Labels: features
> Attachments: KAFKA-1594_BeingSentResponses.pat

[jira] [Updated] (KAFKA-1594) new metrics: ResponseQueueSize and BeingSentResponses

2014-08-13 Thread Alexis Midon (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexis Midon updated KAFKA-1594:


Attachment: KAFKA-1594_ResponseQueueSize.patch
KAFKA-1594_BeingSentResponses.patch

adding 2 patch files

> new metrics: ResponseQueueSize and BeingSentResponses
> -
>
> Key: KAFKA-1594
> URL: https://issues.apache.org/jira/browse/KAFKA-1594
> Project: Kafka
>  Issue Type: New Feature
>  Components: core
>Reporter: Alexis Midon
>Priority: Minor
>  Labels: features
> Attachments: KAFKA-1594_BeingSentResponses.patch, 
> KAFKA-1594_ResponseQueueSize.patch
>
>
> This patch adds two metrics:
> h3. ResponseQueueSize
> As of 0.8.1, the sizes of the response queues are [reported as different 
> metrics|https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/network/RequestChannel.scala#L127-L134]
>  - one per processor thread. This is not very ideal for different reasons:
> * charts have to sum the different metrics
> * the metrics collection system might not support 'wild card queries' like 
> {{sum:kafka.network.RequestChannel.Processor_*_ResponseQueueSize}} in which 
> case monitoring now depends on the number of configured network threads
> * monitoring the response by thread is not very valuable. However the global 
> number of responses is useful.
> * proposal*
> So this patch exposes the total number of queued responses as a metric 
> {{ResponseQueueSize}}
> *implementation*
> In {{RequestChannel}}, create a Gauge that adds up the size of the response 
> queues.
> h3. BeingSentResponses
> As of 0.8.1, the processor threads will poll responses from the queues and 
> attach them to the SelectionKey as fast as possible. The consequence of that 
> is that the response queues are not a good indicator of the number of 
> "in-flight" responses. The {{ServerSocketChannel}} acting as another queue of 
> response to be sent.
> The current metrics don't reflect the size of this "buffer", which is an 
> issue.
> *proposal*
> This patch adds a gauge that keeps track of the number of responses being 
> handled by the {{ServerSocketChannel}}.
> That new metric is named "BeingSentResponses" (who said naming was hard?)
> *implementation*
> To calculate that metric, the patch adds up the number of SelectionKeys 
> interested in writing across processor threads.
> An another approach could be to keep all in-flight responses in a data 
> structure (let's say a map) shared by the processor thread. A response will 
> be added to that map when dequeued from the response queue, and removed when 
> the write is complete. I decided agains that second approach as it is more 
> intrusive and requires some additional bookkeeping to gather information 
> already available through the {{SelectionKey}}'s



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (KAFKA-1594) new metrics: ResponseQueueSize and BeingSentResponses

2014-08-13 Thread Alexis Midon (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexis Midon updated KAFKA-1594:


Description: 
This patch adds two metrics:

h3. ResponseQueueSize
As of 0.8.1, the sizes of the response queues are [reported as different 
metrics|https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/network/RequestChannel.scala#L127-L134]
 - one per processor thread. This is not very ideal for different reasons:
* charts have to sum the different metrics
* the metrics collection system might not support 'wild card queries' like 
{{sum:kafka.network.RequestChannel.Processor_*_ResponseQueueSize}} in which 
case monitoring now depends on the number of configured network threads
* monitoring the response by thread is not very valuable. However the global 
number of responses is useful.

* proposal*
So this patch exposes the total number of queued responses as a metric 
{{ResponseQueueSize}}

*implementation*
In {{RequestChannel}}, create a Gauge that adds up the size of the response 
queues.


h3. BeingSentResponses
As of 0.8.1, the processor threads will poll responses from the queues and 
attach them to the SelectionKey as fast as possible. The consequence of that is 
that the response queues are not a good indicator of the number of "in-flight" 
responses. The {{ServerSocketChannel}} acting as another queue of response to 
be sent.
The current metrics don't reflect the size of this "buffer", which is an issue.

*proposal*
This patch adds a gauge that keeps track of the number of responses being 
handled by the {{ServerSocketChannel}}.
That new metric is named "BeingSentResponses" (who said naming was hard?)

*implementation*
To calculate that metric, the patch adds up the number of SelectionKeys 
interested in writing across processor threads.

Another approach could be to keep all in-flight responses in a data structure 
(let's say a map) shared by the processor threads. A response will be added to 
that map when dequeued from the response queue, and removed when the write is 
complete. The gauge will simply report the size of that map. I decided against 
that second approach as it is more intrusive and requires some additional 
bookkeeping to gather information already available through the 
{{SelectionKey}}'s



  was:
This patch adds two metrics:

h3. ResponseQueueSize
As of 0.8.1, the sizes of the response queues are [reported as different 
metrics|https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/network/RequestChannel.scala#L127-L134]
 - one per processor thread. This is not very ideal for different reasons:
* charts have to sum the different metrics
* the metrics collection system might not support 'wild card queries' like 
{{sum:kafka.network.RequestChannel.Processor_*_ResponseQueueSize}} in which 
case monitoring now depends on the number of configured network threads
* monitoring the response by thread is not very valuable. However the global 
number of responses is useful.

* proposal*
So this patch exposes the total number of queued responses as a metric 
{{ResponseQueueSize}}

*implementation*
In {{RequestChannel}}, create a Gauge that adds up the size of the response 
queues.


h3. BeingSentResponses
As of 0.8.1, the processor threads will poll responses from the queues and 
attach them to the SelectionKey as fast as possible. The consequence of that is 
that the response queues are not a good indicator of the number of "in-flight" 
responses. The {{ServerSocketChannel}} acting as another queue of response to 
be sent.
The current metrics don't reflect the size of this "buffer", which is an issue.

*proposal*
This patch adds a gauge that keeps track of the number of responses being 
handled by the {{ServerSocketChannel}}.
That new metric is named "BeingSentResponses" (who said naming was hard?)

*implementation*
To calculate that metric, the patch adds up the number of SelectionKeys 
interested in writing across processor threads.

Another approach could be to keep all in-flight responses in a data structure 
(let's say a map) shared by the processor threads. A response will be added to 
that map when dequeued from the response queue, and removed when the write is 
complete. The gauge will simply report the size of that map. I decided agains 
that second approach as it is more intrusive and requires some additional 
bookkeeping to gather information already available through the 
{{SelectionKey}}'s




> new metrics: ResponseQueueSize and BeingSentResponses
> -
>
> Key: KAFKA-1594
> URL: https://issues.apache.org/jira/browse/KAFKA-1594
> Project: Kafka
>  Issue Type: New Feature
>  Components: core
>Reporter: Alexis Midon
>Priority: Minor
>  Labels: features
> Attachments: KAFKA-1594_BeingSentResponses.patch

[jira] [Updated] (KAFKA-1594) new metrics: ResponseQueueSize and BeingSentResponses

2014-08-13 Thread Alexis Midon (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexis Midon updated KAFKA-1594:


Description: 
This patch adds two metrics:

h3. ResponseQueueSize
As of 0.8.1, the sizes of the response queues are [reported as different 
metrics|https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/network/RequestChannel.scala#L127-L134]
 - one per processor thread. This is not very ideal for different reasons:
* charts have to sum the different metrics
* the metrics collection system might not support 'wild card queries' like 
{{sum:kafka.network.RequestChannel.Processor_*_ResponseQueueSize}} in which 
case monitoring now depends on the number of configured network threads
* monitoring the response by thread is not very valuable. However the global 
number of responses is useful.

* proposal*
So this patch exposes the total number of queued responses as a metric 
{{ResponseQueueSize}}

*implementation*
In {{RequestChannel}}, create a Gauge that adds up the size of the response 
queues.


h3. BeingSentResponses
As of 0.8.1, the processor threads will poll responses from the queues and 
attach them to the SelectionKey as fast as possible. The consequence of that is 
that the response queues are not a good indicator of the number of "in-flight" 
responses. The {{ServerSocketChannel}} acting as another queue of response to 
be sent.
The current metrics don't reflect the size of this "buffer", which is an issue.

*proposal*
This patch adds a gauge that keeps track of the number of responses being 
handled by the {{ServerSocketChannel}}.
That new metric is named "BeingSentResponses" (who said naming was hard?)

*implementation*
To calculate that metric, the patch adds up the number of SelectionKeys 
interested in writing across processor threads.

Another approach could be to keep all in-flight responses in a data structure 
(let's say a map) shared by the processor threads. A response will be added to 
that map when dequeued from the response queue, and removed when the write is 
complete. The gauge will simply report the size of that map. I decided agains 
that second approach as it is more intrusive and requires some additional 
bookkeeping to gather information already available through the 
{{SelectionKey}}'s



  was:
This patch adds two metrics:

h3. ResponseQueueSize
As of 0.8.1, the sizes of the response queues are [reported as different 
metrics|https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/network/RequestChannel.scala#L127-L134]
 - one per processor thread. This is not very ideal for different reasons:
* charts have to sum the different metrics
* the metrics collection system might not support 'wild card queries' like 
{{sum:kafka.network.RequestChannel.Processor_*_ResponseQueueSize}} in which 
case monitoring now depends on the number of configured network threads
* monitoring the response by thread is not very valuable. However the global 
number of responses is useful.

* proposal*
So this patch exposes the total number of queued responses as a metric 
{{ResponseQueueSize}}

*implementation*
In {{RequestChannel}}, create a Gauge that adds up the size of the response 
queues.


h3. BeingSentResponses
As of 0.8.1, the processor threads will poll responses from the queues and 
attach them to the SelectionKey as fast as possible. The consequence of that is 
that the response queues are not a good indicator of the number of "in-flight" 
responses. The {{ServerSocketChannel}} acting as another queue of response to 
be sent.
The current metrics don't reflect the size of this "buffer", which is an issue.

*proposal*
This patch adds a gauge that keeps track of the number of responses being 
handled by the {{ServerSocketChannel}}.
That new metric is named "BeingSentResponses" (who said naming was hard?)

*implementation*
To calculate that metric, the patch adds up the number of SelectionKeys 
interested in writing across processor threads.

An another approach could be to keep all in-flight responses in a data 
structure (let's say a map) shared by the processor thread. A response will be 
added to that map when dequeued from the response queue, and removed when the 
write is complete. I decided agains that second approach as it is more 
intrusive and requires some additional bookkeeping to gather information 
already available through the {{SelectionKey}}'s




> new metrics: ResponseQueueSize and BeingSentResponses
> -
>
> Key: KAFKA-1594
> URL: https://issues.apache.org/jira/browse/KAFKA-1594
> Project: Kafka
>  Issue Type: New Feature
>  Components: core
>Reporter: Alexis Midon
>Priority: Minor
>  Labels: features
> Attachments: KAFKA-1594_BeingSentResponses.patch, 
> KAFKA-1594_ResponseQueueSize.patch
>
>
> This 

[jira] [Updated] (KAFKA-1594) new metrics: ResponseQueueSize and BeingSentResponses

2014-08-13 Thread Alexis Midon (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexis Midon updated KAFKA-1594:


Description: 
This patch adds two metrics:

h3. ResponseQueueSize
As of 0.8.1, the sizes of the response queues are [reported as different 
metrics|https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/network/RequestChannel.scala#L127-L134]
 - one per processor thread. This is not very ideal for different reasons:
* charts have to sum the different metrics
* the metrics collection system might not support 'wild card queries' like 
{{sum:kafka.network.RequestChannel.Processor_*_ResponseQueueSize}} in which 
case monitoring now depends on the number of configured network threads
* monitoring the response by thread is not very valuable. However the global 
number of responses is useful.

* proposal*
So this patch exposes the total number of queued responses as a metric 
{{ResponseQueueSize}}

*implementation*
In {{RequestChannel}}, create a Gauge that adds up the size of the response 
queues.


h3. BeingSentResponses
As of 0.8.1, the processor threads will poll responses from the queues and 
attach them to the SelectionKey as fast as possible. The consequence of that is 
that the response queues are not a good indicator of the number of "in-flight" 
responses. The {{ServerSocketChannel}} acting as another queue of response to 
be sent.
The current metrics don't reflect the size of this "buffer", which is an issue.

*proposal*
This patch adds a gauge that keeps track of the number of responses being 
handled by the {{ServerSocketChannel}}.
That new metric is named "BeingSentResponses" (who said naming was hard?)

*implementation*
To calculate that metric, the patch adds up the number of SelectionKeys 
interested in writing across processor threads.

An another approach could be to keep all in-flight responses in a data 
structure (let's say a map) shared by the processor thread. A response will be 
added to that map when dequeued from the response queue, and removed when the 
write is complete. I decided agains that second approach as it is more 
intrusive and requires some additional bookkeeping to gather information 
already available through the {{SelectionKey}}'s



  was:

This patch adds two metrics:

h3. ResponseQueueSize
As of 0.8.1, the sizes of the response queues are [reported as different 
metrics|https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/network/RequestChannel.scala#L127-L134]
 - one per processor thread. This is not very ideal for different reasons:
* charts have to sum the different metrics
* the metrics collection system might not support 'wild card queries' like 
{{sum:kafka.network.RequestChannel.Processor_*_ResponseQueueSize}} in which 
case monitoring now depends on the number of configured network threads
* monitoring the response by thread is not very valuable. However the global 
number of responses is useful.

* proposal*
So this patch exposes the total number of queued responses as a metric 
{{ResponseQueueSize}}

*implementation*
In {{RequestChannel}}, create a Gauge that adds up the size of the response 
queues.


*BeingSentResponses
As of 0.8.1, the processor threads will poll responses from the queues and 
attach them to the SelectionKey as fast as possible. The consequence of that is 
that the response queues are not a good indicator of the number of "in-flight" 
responses. The {{ServerSocketChannel}} acting as another queue of response to 
be sent.
The current metrics don't reflect the size of this "buffer", which is an issue.

*proposal*
This patch adds a gauge that keeps track of the number of responses being 
handled by the {{ServerSocketChannel}}.
That new metric is named "BeingSentResponses" (who said naming was hard?)

*implementation*
To calculate that metric, the patch adds up the number of SelectionKeys 
interested in writing across processor threads.

An another approach could be to keep all in-flight responses in a data 
structure (let's say a map) shared by the processor thread. A response will be 
added to that map when dequeued from the response queue, and removed when the 
write is complete. I decided agains that second approach as it is more 
intrusive and requires some additional bookkeeping to gather information 
already available through the {{SelectionKey}}'s




> new metrics: ResponseQueueSize and BeingSentResponses
> -
>
> Key: KAFKA-1594
> URL: https://issues.apache.org/jira/browse/KAFKA-1594
> Project: Kafka
>  Issue Type: New Feature
>  Components: core
>Reporter: Alexis Midon
>Priority: Minor
>  Labels: features
>
> This patch adds two metrics:
> h3. ResponseQueueSize
> As of 0.8.1, the sizes of the response queues are [reported as different 
> metrics|https://github.com

[jira] [Created] (KAFKA-1594) new metrics: ResponseQueueSize and BeingSentResponses

2014-08-13 Thread Alexis Midon (JIRA)
Alexis Midon created KAFKA-1594:
---

 Summary: new metrics: ResponseQueueSize and BeingSentResponses
 Key: KAFKA-1594
 URL: https://issues.apache.org/jira/browse/KAFKA-1594
 Project: Kafka
  Issue Type: New Feature
  Components: core
Reporter: Alexis Midon
Priority: Minor



This patch adds two metrics:

h3. ResponseQueueSize
As of 0.8.1, the sizes of the response queues are [reported as different 
metrics|https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/network/RequestChannel.scala#L127-L134]
 - one per processor thread. This is not very ideal for different reasons:
* charts have to sum the different metrics
* the metrics collection system might not support 'wild card queries' like 
{{sum:kafka.network.RequestChannel.Processor_*_ResponseQueueSize}} in which 
case monitoring now depends on the number of configured network threads
* monitoring the response by thread is not very valuable. However the global 
number of responses is useful.

* proposal*
So this patch exposes the total number of queued responses as a metric 
{{ResponseQueueSize}}

*implementation*
In {{RequestChannel}}, create a Gauge that adds up the size of the response 
queues.


*BeingSentResponses
As of 0.8.1, the processor threads will poll responses from the queues and 
attach them to the SelectionKey as fast as possible. The consequence of that is 
that the response queues are not a good indicator of the number of "in-flight" 
responses. The {{ServerSocketChannel}} acting as another queue of response to 
be sent.
The current metrics don't reflect the size of this "buffer", which is an issue.

*proposal*
This patch adds a gauge that keeps track of the number of responses being 
handled by the {{ServerSocketChannel}}.
That new metric is named "BeingSentResponses" (who said naming was hard?)

*implementation*
To calculate that metric, the patch adds up the number of SelectionKeys 
interested in writing across processor threads.

An another approach could be to keep all in-flight responses in a data 
structure (let's say a map) shared by the processor thread. A response will be 
added to that map when dequeued from the response queue, and removed when the 
write is complete. I decided agains that second approach as it is more 
intrusive and requires some additional bookkeeping to gather information 
already available through the {{SelectionKey}}'s





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (KAFKA-1593) new metrics: ResponseQueueSize and BeingSentResponses

2014-08-13 Thread Alexis Midon (JIRA)
Alexis Midon created KAFKA-1593:
---

 Summary: new metrics: ResponseQueueSize and BeingSentResponses
 Key: KAFKA-1593
 URL: https://issues.apache.org/jira/browse/KAFKA-1593
 Project: Kafka
  Issue Type: New Feature
  Components: core
Reporter: Alexis Midon
Priority: Minor



This patch adds two metrics:

h3. ResponseQueueSize
As of 0.8.1, the sizes of the response queues are [reported as different 
metrics|https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/network/RequestChannel.scala#L127-L134]
 - one per processor thread. This is not very ideal for different reasons:
* charts have to sum the different metrics
* the metrics collection system might not support 'wild card queries' like 
{{sum:kafka.network.RequestChannel.Processor_*_ResponseQueueSize}} in which 
case monitoring now depends on the number of configured network threads
* monitoring the response by thread is not very valuable. However the global 
number of responses is useful.

* proposal*
So this patch exposes the total number of queued responses as a metric 
{{ResponseQueueSize}}

*implementation*
In {{RequestChannel}}, create a Gauge that adds up the size of the response 
queues.


*BeingSentResponses
As of 0.8.1, the processor threads will poll responses from the queues and 
attach them to the SelectionKey as fast as possible. The consequence of that is 
that the response queues are not a good indicator of the number of "in-flight" 
responses. The {{ServerSocketChannel}} acting as another queue of response to 
be sent.
The current metrics don't reflect the size of this "buffer", which is an issue.

*proposal*
This patch adds a gauge that keeps track of the number of responses being 
handled by the {{ServerSocketChannel}}.
That new metric is named "BeingSentResponses" (who said naming was hard?)

*implementation*
To calculate that metric, the patch adds up the number of SelectionKeys 
interested in writing across processor threads.

An another approach could be to keep all in-flight responses in a data 
structure (let's say a map) shared by the processor thread. A response will be 
added to that map when dequeued from the response queue, and removed when the 
write is complete. I decided agains that second approach as it is more 
intrusive and requires some additional bookkeeping to gather information 
already available through the {{SelectionKey}}'s





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1112) broker can not start itself after kafka is killed with -9

2014-07-21 Thread Alexis Midon (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1406#comment-1406
 ] 

Alexis Midon commented on KAFKA-1112:
-

I created https://issues.apache.org/jira/browse/KAFKA-1554.
thanks

> broker can not start itself after kafka is killed with -9
> -
>
> Key: KAFKA-1112
> URL: https://issues.apache.org/jira/browse/KAFKA-1112
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.8.0, 0.8.1
>Reporter: Kane Kim
>Assignee: Jay Kreps
>Priority: Critical
> Fix For: 0.8.1
>
> Attachments: KAFKA-1112-v1.patch, KAFKA-1112-v2.patch, 
> KAFKA-1112-v3.patch, KAFKA-1112-v4.patch, KAFKA-1112.out
>
>
> When I kill kafka with -9, broker cannot start itself because of corrupted 
> index logs. I think kafka should try to delete/rebuild indexes itself without 
> manual intervention. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (KAFKA-1554) Corrupt index found on clean startup

2014-07-21 Thread Alexis Midon (JIRA)
Alexis Midon created KAFKA-1554:
---

 Summary: Corrupt index found on clean startup
 Key: KAFKA-1554
 URL: https://issues.apache.org/jira/browse/KAFKA-1554
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.1
 Environment: ubuntu 12.04, oracle jdk 1.7
Reporter: Alexis Midon
Priority: Critical


On a clean start up, corrupted index files are found.
After investigations, it appears that some pre-allocated index files are not 
"compacted" correctly and the end of the file is full of zeroes.
As a result, on start up, the last relative offset is zero which yields an 
offset equal to the base offset.

The workaround is to delete all index files of size 10MB (the size of the 
pre-allocated files), and restart. Index files will be re-created.

{code}
find $your_data_directory -size 10485760c -name *.index #-delete
{code}

This is issue might be related/similar to 
https://issues.apache.org/jira/browse/KAFKA-1112

{code}
2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,696 INFO 
main kafka.server.KafkaServer.info - [Kafka Server 847605514], starting
2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,698 INFO 
main kafka.server.KafkaServer.info - [Kafka Server 847605514], Connecting to 
zookeeper on 
zk-main0.XXX:2181,zk-main1.XXX:2181,zk-main2.:2181/production/kafka/main
2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,708 INFO 
ZkClient-EventThread-14-zk-main0.XXX.com:2181,zk-main1.XXX.com:2181,zk-main2.XXX.com:2181,zk-main3.XXX.com:2181,zk-main4.XXX.com:2181/production/kafka/main
 org.I0Itec.zkclient.ZkEventThread.run - Starting ZkClient event thread.
2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,714 INFO 
main org.apache.zookeeper.ZooKeeper.logEnv - Client 
environment:zookeeper.version=3.3.3-1203054, built on 11/17/2011 05:47 GMT
2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,714 INFO 
main org.apache.zookeeper.ZooKeeper.logEnv - Client 
environment:host.name=i-6b948138.inst.aws.airbnb.com
2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,714 INFO 
main org.apache.zookeeper.ZooKeeper.logEnv - Client 
environment:java.version=1.7.0_55
2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,715 INFO 
main org.apache.zookeeper.ZooKeeper.logEnv - Client 
environment:java.vendor=Oracle Corporation
2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,715 INFO 
main org.apache.zookeeper.ZooKeeper.logEnv - Client 
environment:java.home=/usr/lib/jvm/jre-7-oracle-x64/jre
2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,715 INFO 
main org.apache.zookeeper.ZooKeeper.logEnv - Client 
environment:java.class.path=libs/snappy-java-1.0.5.jar:libs/scala-library-2.10.1.jar:libs/slf4j-api-1.7.2.jar:libs/jopt-simple-3.2.jar:libs/metrics-annotation-2.2.0.jar:libs/log4j-1.2.15.jar:libs/kafka_2.10-0.8.1.jar:libs/zkclient-0.3.jar:libs/zookeeper-3.3.4.jar:libs/metrics-core-2.2.0.jar
2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,715 INFO 
main org.apache.zookeeper.ZooKeeper.logEnv - Client 
environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,716 INFO 
main org.apache.zookeeper.ZooKeeper.logEnv - Client 
environment:java.io.tmpdir=/tmp
2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,716 INFO 
main org.apache.zookeeper.ZooKeeper.logEnv - Client 
environment:java.compiler=
2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,716 INFO 
main org.apache.zookeeper.ZooKeeper.logEnv - Client environment:os.name=Linux
2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,716 INFO 
main org.apache.zookeeper.ZooKeeper.logEnv - Client environment:os.arch=amd64
2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,717 INFO 
main org.apache.zookeeper.ZooKeeper.logEnv - Client 
environment:os.version=3.2.0-61-virtual
2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,717 INFO 
main org.apache.zookeeper.ZooKeeper.logEnv - Client environment:user.name=kafka
2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,717 INFO 
main org.apache.zookeeper.ZooKeeper.logEnv - Client 
environment:user.home=/srv/kafka
2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,717 INFO 
main org.apache.zookeeper.ZooKeeper.logEnv - Client 
environment:user.dir=/srv/kafka/kafka_2.10-0.8.1
2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,718 INFO 
main org.apache.zookeeper.ZooKeeper. - Initiating client connection, 
connectString=zk-main0.XXX.com:2181,zk-main1.XXX.com:2181,zk-main2.XXX.com:2181,zk-main3.XXX.com:2181,zk-main4.XXX.com:2181/producti

[jira] [Commented] (KAFKA-1112) broker can not start itself after kafka is killed with -9

2014-07-11 Thread Alexis Midon (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14059533#comment-14059533
 ] 

Alexis Midon commented on KAFKA-1112:
-

Hello,

I suffered from the same error using Kafka 0.8.1. Should I reopen this issue or 
create a new one?

{code}
2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,696 INFO 
main kafka.server.KafkaServer.info - [Kafka Server 847605514], starting
2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,698 INFO 
main kafka.server.KafkaServer.info - [Kafka Server 847605514], Connecting to 
zookeeper on 
zk-main0.XXX:2181,zk-main1.XXX:2181,zk-main2.:2181/production/kafka/main
2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,708 INFO 
ZkClient-EventThread-14-zk-main0.XXX.com:2181,zk-main1.XXX.com:2181,zk-main2.XXX.com:2181,zk-main3.XXX.com:2181,zk-main4.XXX.com:2181/production/kafka/main
 org.I0Itec.zkclient.ZkEventThread.run - Starting ZkClient event thread.
2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,714 INFO 
main org.apache.zookeeper.ZooKeeper.logEnv - Client 
environment:zookeeper.version=3.3.3-1203054, built on 11/17/2011 05:47 GMT
2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,714 INFO 
main org.apache.zookeeper.ZooKeeper.logEnv - Client 
environment:host.name=i-6b948138.inst.aws.airbnb.com
2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,714 INFO 
main org.apache.zookeeper.ZooKeeper.logEnv - Client 
environment:java.version=1.7.0_55
2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,715 INFO 
main org.apache.zookeeper.ZooKeeper.logEnv - Client 
environment:java.vendor=Oracle Corporation
2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,715 INFO 
main org.apache.zookeeper.ZooKeeper.logEnv - Client 
environment:java.home=/usr/lib/jvm/jre-7-oracle-x64/jre
2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,715 INFO 
main org.apache.zookeeper.ZooKeeper.logEnv - Client 
environment:java.class.path=libs/snappy-java-1.0.5.jar:libs/scala-library-2.10.1.jar:libs/slf4j-api-1.7.2.jar:libs/jopt-simple-3.2.jar:libs/metrics-annotation-2.2.0.jar:libs/log4j-1.2.15.jar:libs/kafka_2.10-0.8.1.jar:libs/zkclient-0.3.jar:libs/zookeeper-3.3.4.jar:libs/metrics-core-2.2.0.jar
2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,715 INFO 
main org.apache.zookeeper.ZooKeeper.logEnv - Client 
environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,716 INFO 
main org.apache.zookeeper.ZooKeeper.logEnv - Client 
environment:java.io.tmpdir=/tmp
2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,716 INFO 
main org.apache.zookeeper.ZooKeeper.logEnv - Client 
environment:java.compiler=
2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,716 INFO 
main org.apache.zookeeper.ZooKeeper.logEnv - Client environment:os.name=Linux
2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,716 INFO 
main org.apache.zookeeper.ZooKeeper.logEnv - Client environment:os.arch=amd64
2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,717 INFO 
main org.apache.zookeeper.ZooKeeper.logEnv - Client 
environment:os.version=3.2.0-61-virtual
2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,717 INFO 
main org.apache.zookeeper.ZooKeeper.logEnv - Client environment:user.name=kafka
2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,717 INFO 
main org.apache.zookeeper.ZooKeeper.logEnv - Client 
environment:user.home=/srv/kafka
2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,717 INFO 
main org.apache.zookeeper.ZooKeeper.logEnv - Client 
environment:user.dir=/srv/kafka/kafka_2.10-0.8.1
2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,718 INFO 
main org.apache.zookeeper.ZooKeeper. - Initiating client connection, 
connectString=zk-main0.XXX.com:2181,zk-main1.XXX.com:2181,zk-main2.XXX.com:2181,zk-main3.XXX.com:2181,zk-main4.XXX.com:2181/production/kafka/main
 sessionTimeout=6000 watcher=org.I0Itec.zkclient.ZkClient@4758af63
2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,733 INFO 
main-SendThread() org.apache.zookeeper.ClientCnxn.startConnect - Opening socket 
connection to server zk-main1.XXX.com/10.12.135.61:2181
2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,738 INFO 
main-SendThread(zk-main1.XXX.com:2181) 
org.apache.zookeeper.ClientCnxn.primeConnection - Socket connection established 
to zk-main1.XXX.com/10.12.135.61:2181, initiating session
2014-07-11T00:53:17+00:00 i-6b948138 local3.info 2014-07-11 - 00:53:17,745 INFO 
main-SendThread(zk-main1.XXX.com:2181) 
org.apache.zookeeper.ClientCnxn.readConnectR

[jira] [Comment Edited] (KAFKA-1300) Added WaitForReplaction admin tool.

2014-07-11 Thread Alexis Midon (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14059512#comment-14059512
 ] 

Alexis Midon edited comment on KAFKA-1300 at 7/11/14 11:47 PM:
---

Consiering that Kafka is designed to handle some replication lag, if you need 
to shutdown a broker it does not seem very useful to wait for the replica lag 
to be zero.
(If the broker is X messages behind, and my maintenance requires Y=F(message 
throughput) minutes, I can safely shutdown the broker is X+Y/throughput < 
replica.lag.max.messages.

So maybe that command will be more useful if it could take an argument that 
characterize X, i.e. how far behind can the broker be before a shutdown.


was (Author: alexismidon):
Consiering that Kafka is designed to handle some replication lag, if you need 
to shutdown a broker it does not seem very useful to wait for the replica lag 
to be zero.
(If the broker is X messages behind, and my maintenance requires Y=F(message 
throughput) minutes, I can safely shutdown the broker is X+Y < 
replica.lag.max.messages.

So maybe that command will be more useful if it could take an argument that 
characterize X, i.e. how far behind can the broker be before a shutdown.

> Added WaitForReplaction admin tool.
> ---
>
> Key: KAFKA-1300
> URL: https://issues.apache.org/jira/browse/KAFKA-1300
> Project: Kafka
>  Issue Type: New Feature
>  Components: tools
>Affects Versions: 0.8.0
> Environment: Ubuntu 12.04
>Reporter: Brenden Matthews
>  Labels: patch
> Fix For: 0.8.1
>
> Attachments: 0001-Added-WaitForReplaction-admin-tool.patch
>
>
> I have created a tool similar to the broker shutdown tool for doing rolling 
> restarts of Kafka clusters.
> The tool watches the max replica lag of the specified broker, and waits until 
> the lag drops to 0 before exiting.
> To do a rolling restart, here's the process we use:
> for (broker <- brokers) {
>   run shutdown tool for broker
>   terminate broker
>   start new broker
>   run wait for replication tool on new broker
> }
> Here's an example command line use:
> ./kafka-run-class.sh kafka.admin.WaitForReplication --zookeeper 
> zk.host.com:2181 --num.retries 100 --retry.interval.ms 6 --broker 0



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (KAFKA-1300) Added WaitForReplaction admin tool.

2014-07-11 Thread Alexis Midon (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14059512#comment-14059512
 ] 

Alexis Midon edited comment on KAFKA-1300 at 7/11/14 11:46 PM:
---

Consiering that Kafka is designed to handle some replication lag, if you need 
to shutdown a broker it does not seem very useful to wait for the replica lag 
to be zero.
(If the broker is X messages behind, and my maintenance requires Y=F(message 
throughput) minutes, I can safely shutdown the broker is X+Y < 
replica.lag.max.messages.

So maybe that command will be more useful if it could take an argument that 
characterize X, i.e. how far behind can the broker be before a shutdown.


was (Author: alexismidon):
Consiering that Kafka is designed to handle some replication lag, if you need 
to shutdown a broker it does not seem very useful to wait for the replica lag 
to be zero.
(If the broker is X messages behind, and my maintenance requires F(message 
throughput) minutes, I can safely shutdown the broker is X+ throughput*Y < 
replica.lag.max.messages.

So maybe that command will be more useful if it could take an argument that 
characterize X, i.e. how far behind can the broker be before a shutdown.

> Added WaitForReplaction admin tool.
> ---
>
> Key: KAFKA-1300
> URL: https://issues.apache.org/jira/browse/KAFKA-1300
> Project: Kafka
>  Issue Type: New Feature
>  Components: tools
>Affects Versions: 0.8.0
> Environment: Ubuntu 12.04
>Reporter: Brenden Matthews
>  Labels: patch
> Fix For: 0.8.1
>
> Attachments: 0001-Added-WaitForReplaction-admin-tool.patch
>
>
> I have created a tool similar to the broker shutdown tool for doing rolling 
> restarts of Kafka clusters.
> The tool watches the max replica lag of the specified broker, and waits until 
> the lag drops to 0 before exiting.
> To do a rolling restart, here's the process we use:
> for (broker <- brokers) {
>   run shutdown tool for broker
>   terminate broker
>   start new broker
>   run wait for replication tool on new broker
> }
> Here's an example command line use:
> ./kafka-run-class.sh kafka.admin.WaitForReplication --zookeeper 
> zk.host.com:2181 --num.retries 100 --retry.interval.ms 6 --broker 0



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1300) Added WaitForReplaction admin tool.

2014-07-11 Thread Alexis Midon (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14059512#comment-14059512
 ] 

Alexis Midon commented on KAFKA-1300:
-

Consiering that Kafka is designed to handle some replication lag, if you need 
to shutdown a broker it does not seem very useful to wait for the replica lag 
to be zero.
(If the broker is X messages behind, and my maintenance requires F(message 
throughput) minutes, I can safely shutdown the broker is X+ throughput*Y < 
replica.lag.max.messages.

So maybe that command will be more useful if it could take an argument that 
characterize X, i.e. how far behind can the broker be before a shutdown.

> Added WaitForReplaction admin tool.
> ---
>
> Key: KAFKA-1300
> URL: https://issues.apache.org/jira/browse/KAFKA-1300
> Project: Kafka
>  Issue Type: New Feature
>  Components: tools
>Affects Versions: 0.8.0
> Environment: Ubuntu 12.04
>Reporter: Brenden Matthews
>  Labels: patch
> Fix For: 0.8.1
>
> Attachments: 0001-Added-WaitForReplaction-admin-tool.patch
>
>
> I have created a tool similar to the broker shutdown tool for doing rolling 
> restarts of Kafka clusters.
> The tool watches the max replica lag of the specified broker, and waits until 
> the lag drops to 0 before exiting.
> To do a rolling restart, here's the process we use:
> for (broker <- brokers) {
>   run shutdown tool for broker
>   terminate broker
>   start new broker
>   run wait for replication tool on new broker
> }
> Here's an example command line use:
> ./kafka-run-class.sh kafka.admin.WaitForReplication --zookeeper 
> zk.host.com:2181 --num.retries 100 --retry.interval.ms 6 --broker 0



--
This message was sent by Atlassian JIRA
(v6.2#6252)