[jira] [Created] (KAFKA-687) Rebalance algorithm should consider partitions from all topics
Pablo Barrera created KAFKA-687: --- Summary: Rebalance algorithm should consider partitions from all topics Key: KAFKA-687 URL: https://issues.apache.org/jira/browse/KAFKA-687 Project: Kafka Issue Type: Improvement Reporter: Pablo Barrera The current rebalance step, as stated in the original Kafka paper [1], splits the partitions per topic between all the consumers. So if you have 100 topics with 2 partitions each and 10 consumers only two consumers will be used. That is, for each topic all partitions will be listed and shared between the consumers in the consumer group in order (not randomly). If the consumer group is reading from several topics at the same time it makes sense to split all the partitions from all topics between all the consumer. Following the example, we will have 200 partitions in total, 20 per consumer, using the 10 consumers. The load per topic could be different and the division should consider this. However even a random division should be better than the current algorithm while reading from several topics and should harm reading from a few topics with several partitions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (KAFKA-684) ConsoleProducer does not have the queue-size option
[ https://issues.apache.org/jira/browse/KAFKA-684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxime Brugidou updated KAFKA-684: -- Attachment: KAFKA-684-2.patch I added: queue.enqueueTimeout.ms producer.request.required.acks producer.request.timeout.ms ConsoleProducer does not have the queue-size option --- Key: KAFKA-684 URL: https://issues.apache.org/jira/browse/KAFKA-684 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8 Reporter: Maxime Brugidou Fix For: 0.8 Attachments: KAFKA-684-2.patch, KAFKA-684.patch When using the kafka ConsoleProducer (from script kafka-console-producer.sh), you cannot set the queue.size, which gets very annoying when you want to produce quickly a lot of messages. You definitely need to increase the queue.size (or decrease the send timeout). Here is a simple patch to add the option. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (KAFKA-684) ConsoleProducer does not have the queue-size option
[ https://issues.apache.org/jira/browse/KAFKA-684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxime Brugidou updated KAFKA-684: -- Attachment: KAFKA-684-3.patch While I'm at it, I added this small feature: Exit at end of input stream (so you can do echo test | ./kafka-console-producer.sh or ./kafka-console-producer.sh test without stopping the producer manually ConsoleProducer does not have the queue-size option --- Key: KAFKA-684 URL: https://issues.apache.org/jira/browse/KAFKA-684 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8 Reporter: Maxime Brugidou Fix For: 0.8 Attachments: KAFKA-684-2.patch, KAFKA-684-3.patch, KAFKA-684.patch When using the kafka ConsoleProducer (from script kafka-console-producer.sh), you cannot set the queue.size, which gets very annoying when you want to produce quickly a lot of messages. You definitely need to increase the queue.size (or decrease the send timeout). Here is a simple patch to add the option. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-281) support multiple root log directories
[ https://issues.apache.org/jira/browse/KAFKA-281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546858#comment-13546858 ] Maxime Brugidou commented on KAFKA-281: --- I guess this was done on 0.8 as part of KAFKA-188 support multiple root log directories - Key: KAFKA-281 URL: https://issues.apache.org/jira/browse/KAFKA-281 Project: Kafka Issue Type: Improvement Components: core Reporter: Jun Rao Currently, the log layout is {log.dir}/topicname-partitionid and one can only specify 1 {log.dir}. This limits the # of topics we can have per broker. We can potentially support multiple directories for {log.dir} and just assign topics using hashing or round-robin. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (KAFKA-688) System Test - Update all testcase_xxxx_properties.json for properties keys uniform naming convention
John Fung created KAFKA-688: --- Summary: System Test - Update all testcase__properties.json for properties keys uniform naming convention Key: KAFKA-688 URL: https://issues.apache.org/jira/browse/KAFKA-688 Project: Kafka Issue Type: Task Reporter: John Fung Assignee: John Fung After the changes made in uniform naming convention of properties keys (KAFKA-648), all testcase__properties.json files need to be updated on the following properties keys changes: brokerid = broker.id hostname = host.name log.file.size = log.segment.size -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-648) Use uniform convention for naming properties keys
[ https://issues.apache.org/jira/browse/KAFKA-648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13547063#comment-13547063 ] John Fung commented on KAFKA-648: - The testcase__properties.json files in System Test will need to be updated for the following changes: brokerid = broker.id hostname = host.name log.file.size = log.segment.size A separate JIRA KAFKA-688 is created for this task. Use uniform convention for naming properties keys -- Key: KAFKA-648 URL: https://issues.apache.org/jira/browse/KAFKA-648 Project: Kafka Issue Type: Bug Affects Versions: 0.8 Reporter: Swapnil Ghike Assignee: Sriram Subramanian Priority: Blocker Fix For: 0.8, 0.8.1 Attachments: configchanges-1.patch, configchanges-v2.patch Currently, the convention that we seem to use to get a property value in *Config is as follows: val configVal = property.getType(config.val, ...) // dot is used to separate two words in the key and the first letter of second word is capitalized in configVal. We should use similar convention for groupId, consumerId, clientId, correlationId. This change will probably be backward non-compatible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (KAFKA-688) System Test - Update all testcase_xxxx_properties.json for properties keys uniform naming convention
[ https://issues.apache.org/jira/browse/KAFKA-688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Fung updated KAFKA-688: Description: After the changes made in uniform naming convention of properties keys (KAFKA-648), all testcase__properties.json files need to be updated on the following properties keys changes: brokerid = broker.id hostname = host.name log.file.size = log.segment.size groupid = group.id was: After the changes made in uniform naming convention of properties keys (KAFKA-648), all testcase__properties.json files need to be updated on the following properties keys changes: brokerid = broker.id hostname = host.name log.file.size = log.segment.size System Test - Update all testcase__properties.json for properties keys uniform naming convention Key: KAFKA-688 URL: https://issues.apache.org/jira/browse/KAFKA-688 Project: Kafka Issue Type: Task Reporter: John Fung Assignee: John Fung Labels: replication-testing After the changes made in uniform naming convention of properties keys (KAFKA-648), all testcase__properties.json files need to be updated on the following properties keys changes: brokerid = broker.id hostname = host.name log.file.size = log.segment.size groupid = group.id -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-682) java.lang.OutOfMemoryError: Java heap space
[ https://issues.apache.org/jira/browse/KAFKA-682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13547129#comment-13547129 ] Joel Koshy commented on KAFKA-682: -- That's why I asked for the configured num-required-acks for the producer requests. If it is the default (0) then it shouldn't be added to the request purgatory which rule out KAFKA-671 no? java.lang.OutOfMemoryError: Java heap space --- Key: KAFKA-682 URL: https://issues.apache.org/jira/browse/KAFKA-682 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8 Environment: $ uname -a Linux rngadam-think 3.5.0-17-generic #28-Ubuntu SMP Tue Oct 9 19:32:08 UTC 2012 i686 i686 i686 GNU/Linux $ java -version java version 1.7.0_09 OpenJDK Runtime Environment (IcedTea7 2.3.3) (7u9-2.3.3-0ubuntu1~12.04.1) OpenJDK Server VM (build 23.2-b09, mixed mode) Reporter: Ricky Ng-Adam Attachments: java_pid22281.hprof.gz, java_pid22281_Leak_Suspects.zip git pull (commit 32dae955d5e2e2dd45bddb628cb07c874241d856) ...build... ./sbt update ./sbt package ...run... bin/zookeeper-server-start.sh config/zookeeper.properties bin/kafka-server-start.sh config/server.properties ...then configured fluentd with kafka plugin... gem install fluentd --no-ri --no-rdoc gem install fluent-plugin-kafka fluentd -c ./fluent/fluent.conf -vv ...then flood fluentd with messages inputted from syslog and outputted to kafka. results in (after about 1 messages of 1K each in 3s): [2013-01-05 02:00:52,087] ERROR Closing socket for /127.0.0.1 because of error (kafka.network.Processor) java.lang.OutOfMemoryError: Java heap space at kafka.api.ProducerRequest$$anonfun$1$$anonfun$apply$1.apply(ProducerRequest.scala:45) at kafka.api.ProducerRequest$$anonfun$1$$anonfun$apply$1.apply(ProducerRequest.scala:42) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:206) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:206) at scala.collection.immutable.Range$ByOne$class.foreach(Range.scala:282) at scala.collection.immutable.Range$$anon$1.foreach(Range.scala:274) at scala.collection.TraversableLike$class.map(TraversableLike.scala:206) at scala.collection.immutable.Range.map(Range.scala:39) at kafka.api.ProducerRequest$$anonfun$1.apply(ProducerRequest.scala:42) at kafka.api.ProducerRequest$$anonfun$1.apply(ProducerRequest.scala:38) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:227) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:227) at scala.collection.immutable.Range$ByOne$class.foreach(Range.scala:282) at scala.collection.immutable.Range$$anon$1.foreach(Range.scala:274) at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:227) at scala.collection.immutable.Range.flatMap(Range.scala:39) at kafka.api.ProducerRequest$.readFrom(ProducerRequest.scala:38) at kafka.api.RequestKeys$$anonfun$1.apply(RequestKeys.scala:32) at kafka.api.RequestKeys$$anonfun$1.apply(RequestKeys.scala:32) at kafka.network.RequestChannel$Request.init(RequestChannel.scala:47) at kafka.network.Processor.read(SocketServer.scala:298) at kafka.network.Processor.run(SocketServer.scala:209) at java.lang.Thread.run(Thread.java:722) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (KAFKA-687) Rebalance algorithm should consider partitions from all topics
[ https://issues.apache.org/jira/browse/KAFKA-687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-687: Affects Version/s: 0.8.1 Rebalance algorithm should consider partitions from all topics -- Key: KAFKA-687 URL: https://issues.apache.org/jira/browse/KAFKA-687 Project: Kafka Issue Type: Improvement Affects Versions: 0.8.1 Reporter: Pablo Barrera The current rebalance step, as stated in the original Kafka paper [1], splits the partitions per topic between all the consumers. So if you have 100 topics with 2 partitions each and 10 consumers only two consumers will be used. That is, for each topic all partitions will be listed and shared between the consumers in the consumer group in order (not randomly). If the consumer group is reading from several topics at the same time it makes sense to split all the partitions from all topics between all the consumer. Following the example, we will have 200 partitions in total, 20 per consumer, using the 10 consumers. The load per topic could be different and the division should consider this. However even a random division should be better than the current algorithm while reading from several topics and should harm reading from a few topics with several partitions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (KAFKA-688) System Test - Update all testcase_xxxx_properties.json for properties keys uniform naming convention
[ https://issues.apache.org/jira/browse/KAFKA-688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Fung updated KAFKA-688: Description: After the changes made in uniform naming convention of properties keys (KAFKA-648), all testcase__properties.json files need to be updated on the following properties keys changes: brokerid = broker.id log.file.size = log.segment.size groupid = group.id was: After the changes made in uniform naming convention of properties keys (KAFKA-648), all testcase__properties.json files need to be updated on the following properties keys changes: brokerid = broker.id hostname = host.name log.file.size = log.segment.size groupid = group.id System Test - Update all testcase__properties.json for properties keys uniform naming convention Key: KAFKA-688 URL: https://issues.apache.org/jira/browse/KAFKA-688 Project: Kafka Issue Type: Task Reporter: John Fung Assignee: John Fung Labels: replication-testing After the changes made in uniform naming convention of properties keys (KAFKA-648), all testcase__properties.json files need to be updated on the following properties keys changes: brokerid = broker.id log.file.size = log.segment.size groupid = group.id -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-689) Can't append to a topic/partition that does not already exist
[ https://issues.apache.org/jira/browse/KAFKA-689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13547328#comment-13547328 ] David Arthur commented on KAFKA-689: Here is the same payload base64 encoded: WwxrYWZrYS1weXRob24AAQAAA+gBAAR0ZXN0AQAp AAAdemDkywIAA2Zvbwx0ZXN0IG1lc3NhZ2U= Can't append to a topic/partition that does not already exist - Key: KAFKA-689 URL: https://issues.apache.org/jira/browse/KAFKA-689 Project: Kafka Issue Type: Bug Components: clients Affects Versions: 0.8 Reporter: David Arthur Attachments: kafka.log, produce-payload.bin With a totally fresh Kafka (empty logs dir and empty ZK), if I send a ProduceRequest for a new topic, Kafka responds with kafka.common.UnknownTopicOrPartitionException: Topic test partition 0 doesn't exist on 0. This is when sending a ProduceRequest over the network (from Python, in this case). If I use the console producer it works fine (topic and partition get created). If I then send the same payload from before over the network, it works. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-689) Can't append to a topic/partition that does not already exist
[ https://issues.apache.org/jira/browse/KAFKA-689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13547417#comment-13547417 ] David Arthur commented on KAFKA-689: Thanks, I was hoping it was something simple like this. Feel free to invalidate this bug :) -David Can't append to a topic/partition that does not already exist - Key: KAFKA-689 URL: https://issues.apache.org/jira/browse/KAFKA-689 Project: Kafka Issue Type: Bug Components: clients Affects Versions: 0.8 Reporter: David Arthur Attachments: kafka.log, produce-payload.bin With a totally fresh Kafka (empty logs dir and empty ZK), if I send a ProduceRequest for a new topic, Kafka responds with kafka.common.UnknownTopicOrPartitionException: Topic test partition 0 doesn't exist on 0. This is when sending a ProduceRequest over the network (from Python, in this case). If I use the console producer it works fine (topic and partition get created). If I then send the same payload from before over the network, it works. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (KAFKA-690) TopicMetadataRequest throws exception when no topics are specified
David Arthur created KAFKA-690: -- Summary: TopicMetadataRequest throws exception when no topics are specified Key: KAFKA-690 URL: https://issues.apache.org/jira/browse/KAFKA-690 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8 Reporter: David Arthur Fix For: 0.8 If no topics are sent in a TopicMetadataRequest, `readFrom` throws an exception when trying to get the the head of the topic list for a debug statement. java.util.NoSuchElementException: head of empty list at scala.collection.immutable.Nil$.head(List.scala:386) at scala.collection.immutable.Nil$.head(List.scala:383) at kafka.api.TopicMetadataRequest$$anonfun$readFrom$2.apply(TopicMetadataRequest.scala:43) at kafka.api.TopicMetadataRequest$$anonfun$readFrom$2.apply(TopicMetadataRequest.scala:43) at kafka.utils.Logging$class.debug(Logging.scala:51) at kafka.api.TopicMetadataRequest$.debug(TopicMetadataRequest.scala:25) at kafka.api.TopicMetadataRequest$.readFrom(TopicMetadataRequest.scala:43) at kafka.api.RequestKeys$$anonfun$4.apply(RequestKeys.scala:37) at kafka.api.RequestKeys$$anonfun$4.apply(RequestKeys.scala:37) at kafka.network.RequestChannel$Request.init(RequestChannel.scala:47) at kafka.network.Processor.read(SocketServer.scala:320) at kafka.network.Processor.run(SocketServer.scala:231) at java.lang.Thread.run(Thread.java:680) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
kafka 0.8 producer throughput
According to Kafka official document, the producer throughput is about 50MB/S. But I do some test, the producer throughout is only about 2MB/S. The test environment is the same with document says. One producer, One broker, One Zookeeper are in independent machine. Message size is 100 bytes, batch size is 200, flush interval is 600 messages. The test environment is the same, the configuration is the same. The why there is such big gap the my test result and the document says?
[jira] [Commented] (KAFKA-690) TopicMetadataRequest throws exception when no topics are specified
[ https://issues.apache.org/jira/browse/KAFKA-690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13547665#comment-13547665 ] Neha Narkhede commented on KAFKA-690: - +1. Thanks for the patch ! TopicMetadataRequest throws exception when no topics are specified -- Key: KAFKA-690 URL: https://issues.apache.org/jira/browse/KAFKA-690 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8 Reporter: David Arthur Fix For: 0.8 Attachments: KAFKA-690.patch If no topics are sent in a TopicMetadataRequest, `readFrom` throws an exception when trying to get the the head of the topic list for a debug statement. java.util.NoSuchElementException: head of empty list at scala.collection.immutable.Nil$.head(List.scala:386) at scala.collection.immutable.Nil$.head(List.scala:383) at kafka.api.TopicMetadataRequest$$anonfun$readFrom$2.apply(TopicMetadataRequest.scala:43) at kafka.api.TopicMetadataRequest$$anonfun$readFrom$2.apply(TopicMetadataRequest.scala:43) at kafka.utils.Logging$class.debug(Logging.scala:51) at kafka.api.TopicMetadataRequest$.debug(TopicMetadataRequest.scala:25) at kafka.api.TopicMetadataRequest$.readFrom(TopicMetadataRequest.scala:43) at kafka.api.RequestKeys$$anonfun$4.apply(RequestKeys.scala:37) at kafka.api.RequestKeys$$anonfun$4.apply(RequestKeys.scala:37) at kafka.network.RequestChannel$Request.init(RequestChannel.scala:47) at kafka.network.Processor.read(SocketServer.scala:320) at kafka.network.Processor.run(SocketServer.scala:231) at java.lang.Thread.run(Thread.java:680) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira