[jira] [Updated] (KAFKA-2202) ConsumerPerformance reports a throughput much higher than the actual one

2015-06-14 Thread Manikumar Reddy (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar Reddy updated KAFKA-2202:
---
Attachment: KAFKA-2202.patch

 ConsumerPerformance reports a throughput much higher than the actual one
 

 Key: KAFKA-2202
 URL: https://issues.apache.org/jira/browse/KAFKA-2202
 Project: Kafka
  Issue Type: Bug
  Components: tools
Affects Versions: 0.8.2.0
Reporter: Micael Capitão
Priority: Minor
 Attachments: KAFKA-2202.patch


 I've been using the kafka.tools.ConsumerPerformance tool for some 
 benchmarking until in one of my tests I got a throughput much higher than the 
 supported by my network interface.
 The test consisted in consuming around ~4900 MB from one topic using one 
 consumer with one thread. The reported throughput reported was ~1400 MB/s 
 which surpasses the 10 Gbps of the network. The time for the whole operation 
 was ~8 seconds, which should correspond to a throughput of ~612 MB/s.
 Digging the ConsumerPerformance code, I've found this at line 73:
 {code:java}
 val elapsedSecs = (endMs - startMs - config.consumerConfig.consumerTimeoutMs) 
 / 1000.0
 {code}
 The {{consumerTimeoutMs}} defined as 5000 at line 131 is always considered 
 leading to wrong results.
 This bug seems to be related to this one 
 [https://issues.apache.org/jira/browse/KAFKA-1828]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2202) ConsumerPerformance reports a throughput much higher than the actual one

2015-06-14 Thread Manikumar Reddy (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14585040#comment-14585040
 ] 

Manikumar Reddy commented on KAFKA-2202:


Created reviewboard https://reviews.apache.org/r/35437/diff/
 against branch origin/trunk

 ConsumerPerformance reports a throughput much higher than the actual one
 

 Key: KAFKA-2202
 URL: https://issues.apache.org/jira/browse/KAFKA-2202
 Project: Kafka
  Issue Type: Bug
  Components: tools
Affects Versions: 0.8.2.0
Reporter: Micael Capitão
Priority: Minor
 Attachments: KAFKA-2202.patch


 I've been using the kafka.tools.ConsumerPerformance tool for some 
 benchmarking until in one of my tests I got a throughput much higher than the 
 supported by my network interface.
 The test consisted in consuming around ~4900 MB from one topic using one 
 consumer with one thread. The reported throughput reported was ~1400 MB/s 
 which surpasses the 10 Gbps of the network. The time for the whole operation 
 was ~8 seconds, which should correspond to a throughput of ~612 MB/s.
 Digging the ConsumerPerformance code, I've found this at line 73:
 {code:java}
 val elapsedSecs = (endMs - startMs - config.consumerConfig.consumerTimeoutMs) 
 / 1000.0
 {code}
 The {{consumerTimeoutMs}} defined as 5000 at line 131 is always considered 
 leading to wrong results.
 This bug seems to be related to this one 
 [https://issues.apache.org/jira/browse/KAFKA-1828]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Review Request 35437: Patch for KAFKA-2202

2015-06-14 Thread Manikumar Reddy O

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35437/
---

Review request for kafka.


Bugs: KAFKA-2202
https://issues.apache.org/jira/browse/KAFKA-2202


Repository: kafka


Description
---

while computing stats, consumerTimeoutMs is considered only during 
ConsumerTimeout exception senarios; This should solve KAFKA-1828 also


Diffs
-

  core/src/main/scala/kafka/tools/ConsumerPerformance.scala 
903318d15893af08104a97499798c9ad0ba98013 

Diff: https://reviews.apache.org/r/35437/diff/


Testing
---


Thanks,

Manikumar Reddy O



[jira] [Resolved] (KAFKA-1083) Unregister JMX MBean when shutting down ConsumerConnector

2015-06-14 Thread Manikumar Reddy (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar Reddy resolved KAFKA-1083.

Resolution: Duplicate

 Unregister JMX MBean when shutting down ConsumerConnector
 -

 Key: KAFKA-1083
 URL: https://issues.apache.org/jira/browse/KAFKA-1083
 Project: Kafka
  Issue Type: Bug
  Components: consumer
Affects Versions: 0.8.0
Reporter: Tanguy
Assignee: Neha Narkhede
Priority: Minor

 In our application, we are currently starting a Kafka Consumer with the
 following lines of code:
 connector = Consumer.createJavaConsumerConnector(consumerConfig);
 streams = connector .createMessageStreams(map);
 Then, each KafkaStream is processed in a dedicated thread per topic and
 partition, as documented here
 https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example
 When we call:
 connector.shutdown()
 We can see that the MBean is not unregistered at shutdown time and still 
 alive even if the consumer connector has been shut down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2202) ConsumerPerformance reports a throughput much higher than the actual one

2015-06-14 Thread Manikumar Reddy (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar Reddy updated KAFKA-2202:
---
Assignee: Manikumar Reddy
  Status: Patch Available  (was: Open)

 ConsumerPerformance reports a throughput much higher than the actual one
 

 Key: KAFKA-2202
 URL: https://issues.apache.org/jira/browse/KAFKA-2202
 Project: Kafka
  Issue Type: Bug
  Components: tools
Affects Versions: 0.8.2.0
Reporter: Micael Capitão
Assignee: Manikumar Reddy
Priority: Minor
 Attachments: KAFKA-2202.patch


 I've been using the kafka.tools.ConsumerPerformance tool for some 
 benchmarking until in one of my tests I got a throughput much higher than the 
 supported by my network interface.
 The test consisted in consuming around ~4900 MB from one topic using one 
 consumer with one thread. The reported throughput reported was ~1400 MB/s 
 which surpasses the 10 Gbps of the network. The time for the whole operation 
 was ~8 seconds, which should correspond to a throughput of ~612 MB/s.
 Digging the ConsumerPerformance code, I've found this at line 73:
 {code:java}
 val elapsedSecs = (endMs - startMs - config.consumerConfig.consumerTimeoutMs) 
 / 1000.0
 {code}
 The {{consumerTimeoutMs}} defined as 5000 at line 131 is always considered 
 leading to wrong results.
 This bug seems to be related to this one 
 [https://issues.apache.org/jira/browse/KAFKA-1828]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1828) [ConsumerPerformance] the test result is negative number

2015-06-14 Thread Manikumar Reddy (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14585041#comment-14585041
 ] 

Manikumar Reddy commented on KAFKA-1828:


Submitted a simple patch for KAFKA-2202 which should solve this issue

 [ConsumerPerformance] the test result is negative number
 

 Key: KAFKA-1828
 URL: https://issues.apache.org/jira/browse/KAFKA-1828
 Project: Kafka
  Issue Type: Bug
  Components: tools
Reporter: maji2014
Priority: Minor

 the test result like:
 2014-12-23 19:15:15:329, 2014-12-23 19:15:15:400, 1048576, 0.0790, -0.1842, 
 1000, -2331.0023
 the reason why the result is negative number is that the running time is less 
 than consumer.timeout.ms, but the user doesn't know the reason, so add 
 judgement for that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-2233) Log deletion is not removing log metrics

2015-06-14 Thread Manikumar Reddy (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar Reddy resolved KAFKA-2233.

   Resolution: Duplicate
Fix Version/s: 0.8.3

This got fixed in KAFKA-1866

 Log deletion is not removing log metrics
 

 Key: KAFKA-2233
 URL: https://issues.apache.org/jira/browse/KAFKA-2233
 Project: Kafka
  Issue Type: Bug
  Components: log
Affects Versions: 0.8.2.1
Reporter: Stevo Slavic
Assignee: Jay Kreps
Priority: Minor
  Labels: newbie
 Fix For: 0.8.3


 Topic deletion does not remove associated metrics. Any configured kafka 
 metric reporter that gets triggered after a topic is deleted, when polling 
 for log metrics for such deleted logs it will throw something like:
 {noformat}
 java.util.NoSuchElementException
 at 
 java.util.concurrent.ConcurrentSkipListMap$Iter.advance(ConcurrentSkipListMap.java:2299)
 at 
 java.util.concurrent.ConcurrentSkipListMap$ValueIterator.next(ConcurrentSkipListMap.java:2326)
 at 
 scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43)
 at scala.collection.IterableLike$class.head(IterableLike.scala:107)
 at scala.collection.AbstractIterable.head(Iterable.scala:54)
 at kafka.log.Log.logStartOffset(Log.scala:502)
 at kafka.log.Log$$anon$2.value(Log.scala:86)
 at kafka.log.Log$$anon$2.value(Log.scala:85)
 {noformat}
 since on log deletion, {{Log}} segments collection get cleared, so 
 logSegments {{Iterable}} has no (next) elements.
 Known workaround is to restart broker - as metric registry is in memory, not 
 persisted, on restart it will be recreated with metrics for 
 existing/non-deleted topics only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2267) kafka-reassign-partitions.sh --generate doesn't work when reducing replication factor

2015-06-14 Thread Manikumar Reddy (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14584951#comment-14584951
 ] 

Manikumar Reddy commented on KAFKA-2267:


AFAIK,  kafka-reassign-partitions.sh --generate option does not support 
decommissioning_brokers  and reducing replication factor.  We need to manually 
create the JSON file.

https://kafka.apache.org/documentation.html#basic_ops_decommissioning_brokers

 kafka-reassign-partitions.sh --generate doesn't work when reducing 
 replication factor
 -

 Key: KAFKA-2267
 URL: https://issues.apache.org/jira/browse/KAFKA-2267
 Project: Kafka
  Issue Type: Bug
  Components: tools
Affects Versions: 0.8.2.1
 Environment: Centos 6.5, scala_version 2.10.4
Reporter: Robin Yamaguchi
Priority: Minor

 Using kafka-reassign-partitions.sh --generate errors when attempting to 
 reduce the replication factor of a topic.
 {code}
 $ /usr/local/kafka/bin/kafka-topics.sh --describe --zookeeper 
 server01:2181,server02:2181,server03:2181 --topic OpsTesting
 Topic:OpsTesting  PartitionCount:1ReplicationFactor:3 Configs:
   Topic: OpsTesting   Partition: 0Leader: 2   Replicas: 2,0,1 
 Isr: 2,0,1
 $ more topics-to-move.json
 {topics: [{topic: OpsTesting}],
  version:1
 }
 $ /usr/local/kafka/bin/kafka-reassign-partitions.sh --zookeeper --describe 
 --zookeeper server01:2181,server02:2181,server03:2181 
 --topics-to-move-json-file topics-to-move.json --broker-list 1,2 --generate
 Partitions reassignment failed due to replication factor: 3 larger than 
 available brokers: 2
 kafka.admin.AdminOperationException: replication factor: 3 larger than 
 available brokers: 2
   at kafka.admin.AdminUtils$.assignReplicasToBrokers(AdminUtils.scala:70)
   at 
 kafka.admin.ReassignPartitionsCommand$$anonfun$generateAssignment$1.apply(ReassignPartitionsCommand.scala:97)
   at 
 kafka.admin.ReassignPartitionsCommand$$anonfun$generateAssignment$1.apply(ReassignPartitionsCommand.scala:96)
   at scala.collection.immutable.Map$Map1.foreach(Map.scala:109)
   at 
 kafka.admin.ReassignPartitionsCommand$.generateAssignment(ReassignPartitionsCommand.scala:96)
   at 
 kafka.admin.ReassignPartitionsCommand$.main(ReassignPartitionsCommand.scala:45)
   at 
 kafka.admin.ReassignPartitionsCommand.main(ReassignPartitionsCommand.scala)
 {code}
 Yet I can manually create a json file, and kafka-reassign-partitions.sh 
 --execute will work:
 {code}
 $ more part.json
 {version:1,partitions:[{topic:OpsTesting,partition:0,replicas:[0,1]}]}
 $ /usr/local/kafka/bin/kafka-reassign-partitions.sh --zookeeper 
 server01:2181,server02:2181,server03:2181 --reassignment-json-file part.json 
 --execute
 Current partition replica assignment
 {version:1,partitions:[{topic:OpsTesting,partition:0,replicas:[2,0,1]}]}
 Save this to use as the --reassignment-json-file option during rollback
 Successfully started reassignment of partitions 
 {version:1,partitions:[{topic:OpsTesting,partition:0,replicas:[0,1]}]}
 {code}
 kafka-reassign-partitions.sh --verify also works as expected after the 
 --execute:
 {code}
 $ /usr/local/kafka/bin/kafka-topics.sh --describe --zookeeper 
 server01:2181,server02:2181,server03:2181 --topic OpsTesting
 Topic:OpsTesting  PartitionCount:1ReplicationFactor:2 Configs:
   Topic: OpsTesting   Partition: 0Leader: 0   Replicas: 0,1   
 Isr: 0,1
 $ /usr/local/kafka/bin/kafka-reassign-partitions.sh --zookeeper 
 server01:2181,server02:2181,server03:2181 --reassignment-json-file part.json 
 --verify
 Status of partition reassignment:
 Reassignment of partition [OpsTesting,0] completed successfully
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-23 - Add JSON/CSV output and looping options to ConsumerGroupCommand

2015-06-14 Thread Ashish Singh
Hi Neha,

Answers inline.

On Thu, Jun 11, 2015 at 7:20 PM, Neha Narkhede n...@confluent.io wrote:

 Thanks for submitting the KIP, Ashish! Few questions.

 1. Can you specify more details around how you expect csv output to be
 used. Same for json.

CSV takes less storage space and is more convenient for shell operations. A
simple diff between two csv outputs would tell you if something changed or
not. It's also common in certain industries when dealing with legacy
systems and workflows. Try importing JSON into MS Excel.

JSON on the other hand has easy interpretation, compact notation and
supports Hierarchical Data. If someone is planning to run the tool
periodically and send the output to some server or even just persist it
somewhere, JSON is probably the way to go.

2. If we add these options, would you still need the old format. If
 csv/json offers more convenience, should we have a plan to phase out the
 old format?

Probably not, but having it around will not hurt. Having three output
formats is not that bad and I do not expect this list to grow in future.


 On Thu, Jun 11, 2015 at 6:05 PM, Ashish Singh asi...@cloudera.com wrote:

  Jun,
 
  Can we add this as part of next KIP's agenda?
 
  On Thu, Jun 11, 2015 at 3:00 PM, Gwen Shapira gshap...@cloudera.com
  wrote:
 
   Maybe bring it up at the next KIP call, to make sure everyone is aware?
  
   On Thu, Jun 11, 2015 at 2:17 PM, Ashish Singh asi...@cloudera.com
  wrote:
Hi Guys,
   
This has been lying around for quite some time. Should I start a
 voting
thread on this?
   
On Thu, May 7, 2015 at 12:20 PM, Ashish Singh asi...@cloudera.com
   wrote:
   
Had to change the title of the page and that surprisingly changed
 the
   link
as well. KIP-23 is now available at here

  
 
 https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=56852556
.
   
On Thu, May 7, 2015 at 11:34 AM, Ashish Singh asi...@cloudera.com
   wrote:
   
Hi Guys,
   
I just added a KIP, KIP-23 - Add JSON/CSV output and looping
 options
  to
ConsumerGroupCommand
https://cwiki.apache.org/confluence/display/KAFKA/KIP-23, for
   KAFKA-313
https://issues.apache.org/jira/browse/KAFKA-313. The changes
 made
  as
part of the JIRA can be found here 
   https://reviews.apache.org/r/28096/.
   
Comments and suggestions are welcome!
   
--
   
Regards,
Ashish
   
   
   
   
--
   
Regards,
Ashish
   
   
   
   
--
   
Regards,
Ashish
  
 
 
 
  --
 
  Regards,
  Ashish
 



 --
 Thanks,
 Neha




-- 

Regards,
Ashish


[jira] [Updated] (KAFKA-1866) LogStartOffset gauge throws exceptions after log.delete()

2015-06-14 Thread Stevo Slavic (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stevo Slavic updated KAFKA-1866:

Affects Version/s: 0.8.2.1

 LogStartOffset gauge throws exceptions after log.delete()
 -

 Key: KAFKA-1866
 URL: https://issues.apache.org/jira/browse/KAFKA-1866
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.2.1
Reporter: Gian Merlino
Assignee: Sriharsha Chintalapani
 Fix For: 0.8.3

 Attachments: KAFKA-1866.patch, KAFKA-1866_2015-02-10_22:50:09.patch, 
 KAFKA-1866_2015-02-11_09:25:33.patch


 The LogStartOffset gauge does logSegments.head.baseOffset, which throws 
 NoSuchElementException on an empty list, which can occur after a delete() of 
 the log. This makes life harder for custom MetricsReporters, since they have 
 to deal with .value() possibly throwing an exception.
 Locally we're dealing with this by having Log.delete() also call removeMetric 
 on all the gauges. That also has the benefit of not having a bunch of metrics 
 floating around for logs that the broker is not actually handling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 35437: Patch for KAFKA-2202

2015-06-14 Thread Aditya Auradkar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35437/#review87876
---


This patch only addresses the old consumer. Can you check if the rate 
calculation for the new consumer is accurate? The both calculate end time in 
different ways. New consumer uses endMs = System.currentTimeMs but ignores 
the fixed timeout of 1sec (which shouldn't affect the rate much). You can 
perhaps return the lastConsumed from the consume() and use that instead of 
endMs.


core/src/main/scala/kafka/tools/ConsumerPerformance.scala
https://reviews.apache.org/r/35437/#comment140290

i think the general convention in kafka is 

if (consumerTimeout.get())



core/src/main/scala/kafka/tools/ConsumerPerformance.scala
https://reviews.apache.org/r/35437/#comment140291

can probably do this inline

case _: ConsumerEx = consumerTimeout.set(true)


- Aditya Auradkar


On June 14, 2015, 11:27 a.m., Manikumar Reddy O wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/35437/
 ---
 
 (Updated June 14, 2015, 11:27 a.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-2202
 https://issues.apache.org/jira/browse/KAFKA-2202
 
 
 Repository: kafka
 
 
 Description
 ---
 
 while computing stats, consumerTimeoutMs is considered only during 
 ConsumerTimeout exception senarios; This should solve KAFKA-1828 also
 
 
 Diffs
 -
 
   core/src/main/scala/kafka/tools/ConsumerPerformance.scala 
 903318d15893af08104a97499798c9ad0ba98013 
 
 Diff: https://reviews.apache.org/r/35437/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Manikumar Reddy O