[jira] [Created] (KAFKA-687) Rebalance algorithm should consider partitions from all topics

2013-01-08 Thread Pablo Barrera (JIRA)
Pablo Barrera created KAFKA-687:
---

 Summary: Rebalance algorithm should consider partitions from all 
topics
 Key: KAFKA-687
 URL: https://issues.apache.org/jira/browse/KAFKA-687
 Project: Kafka
  Issue Type: Improvement
Reporter: Pablo Barrera


The current rebalance step, as stated in the original Kafka paper [1], splits 
the partitions per topic between all the consumers. So if you have 100 topics 
with 2 partitions each and 10 consumers only two consumers will be used. That 
is, for each topic all partitions will be listed and shared between the 
consumers in the consumer group in order (not randomly).

If the consumer group is reading from several topics at the same time it makes 
sense to split all the partitions from all topics between all the consumer. 
Following the example, we will have 200 partitions in total, 20 per consumer, 
using the 10 consumers.

The load per topic could be different and the division should consider this. 
However even a random division should be better than the current algorithm 
while reading from several topics and should harm reading from a few topics 
with several partitions.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (KAFKA-684) ConsoleProducer does not have the queue-size option

2013-01-08 Thread Maxime Brugidou (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxime Brugidou updated KAFKA-684:
--

Attachment: KAFKA-684-2.patch

I added:

queue.enqueueTimeout.ms
producer.request.required.acks
producer.request.timeout.ms


 ConsoleProducer does not have the queue-size option
 ---

 Key: KAFKA-684
 URL: https://issues.apache.org/jira/browse/KAFKA-684
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8
Reporter: Maxime Brugidou
 Fix For: 0.8

 Attachments: KAFKA-684-2.patch, KAFKA-684.patch


 When using the kafka ConsoleProducer (from script kafka-console-producer.sh), 
 you cannot set the queue.size, which gets very annoying when  you want to 
 produce quickly a lot of messages. You definitely need to increase the 
 queue.size (or decrease the send timeout).
 Here is a simple patch to add the option.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (KAFKA-684) ConsoleProducer does not have the queue-size option

2013-01-08 Thread Maxime Brugidou (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxime Brugidou updated KAFKA-684:
--

Attachment: KAFKA-684-3.patch

While I'm at it, I added this small feature:

Exit at end of input stream (so you can do echo test | 
./kafka-console-producer.sh or ./kafka-console-producer.sh  test without 
stopping the producer manually

 ConsoleProducer does not have the queue-size option
 ---

 Key: KAFKA-684
 URL: https://issues.apache.org/jira/browse/KAFKA-684
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8
Reporter: Maxime Brugidou
 Fix For: 0.8

 Attachments: KAFKA-684-2.patch, KAFKA-684-3.patch, KAFKA-684.patch


 When using the kafka ConsoleProducer (from script kafka-console-producer.sh), 
 you cannot set the queue.size, which gets very annoying when  you want to 
 produce quickly a lot of messages. You definitely need to increase the 
 queue.size (or decrease the send timeout).
 Here is a simple patch to add the option.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (KAFKA-281) support multiple root log directories

2013-01-08 Thread Maxime Brugidou (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546858#comment-13546858
 ] 

Maxime Brugidou commented on KAFKA-281:
---

I guess this was done on 0.8 as part of KAFKA-188

 support multiple root log directories
 -

 Key: KAFKA-281
 URL: https://issues.apache.org/jira/browse/KAFKA-281
 Project: Kafka
  Issue Type: Improvement
  Components: core
Reporter: Jun Rao

 Currently, the log layout is {log.dir}/topicname-partitionid and one can only 
 specify 1 {log.dir}. This limits the # of topics we can have per broker. We 
 can potentially support multiple directories for {log.dir} and just assign 
 topics using hashing or round-robin.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (KAFKA-688) System Test - Update all testcase_xxxx_properties.json for properties keys uniform naming convention

2013-01-08 Thread John Fung (JIRA)
John Fung created KAFKA-688:
---

 Summary: System Test - Update all testcase__properties.json 
for properties keys uniform naming convention
 Key: KAFKA-688
 URL: https://issues.apache.org/jira/browse/KAFKA-688
 Project: Kafka
  Issue Type: Task
Reporter: John Fung
Assignee: John Fung


After the changes made in uniform naming convention of properties keys 
(KAFKA-648), all testcase__properties.json files need to be updated on the 
following properties keys changes:

brokerid = broker.id
hostname = host.name
log.file.size = log.segment.size

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (KAFKA-648) Use uniform convention for naming properties keys

2013-01-08 Thread John Fung (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13547063#comment-13547063
 ] 

John Fung commented on KAFKA-648:
-

The testcase__properties.json files in System Test will need to be updated 
for the following changes:

brokerid = broker.id
hostname = host.name
log.file.size = log.segment.size

A separate JIRA KAFKA-688 is created for this task.

 Use uniform convention for naming properties keys 
 --

 Key: KAFKA-648
 URL: https://issues.apache.org/jira/browse/KAFKA-648
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8
Reporter: Swapnil Ghike
Assignee: Sriram Subramanian
Priority: Blocker
 Fix For: 0.8, 0.8.1

 Attachments: configchanges-1.patch, configchanges-v2.patch


 Currently, the convention that we seem to use to get a property value in 
 *Config is as follows:
 val configVal = property.getType(config.val, ...) // dot is used to 
 separate two words in the key and the first letter of second word is 
 capitalized in configVal.
 We should use similar convention for groupId, consumerId, clientId, 
 correlationId.
 This change will probably be backward non-compatible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (KAFKA-688) System Test - Update all testcase_xxxx_properties.json for properties keys uniform naming convention

2013-01-08 Thread John Fung (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Fung updated KAFKA-688:


Description: 
After the changes made in uniform naming convention of properties keys 
(KAFKA-648), all testcase__properties.json files need to be updated on the 
following properties keys changes:

brokerid = broker.id
hostname = host.name
log.file.size = log.segment.size
groupid = group.id

  was:
After the changes made in uniform naming convention of properties keys 
(KAFKA-648), all testcase__properties.json files need to be updated on the 
following properties keys changes:

brokerid = broker.id
hostname = host.name
log.file.size = log.segment.size


 System Test - Update all testcase__properties.json for properties keys 
 uniform naming convention
 

 Key: KAFKA-688
 URL: https://issues.apache.org/jira/browse/KAFKA-688
 Project: Kafka
  Issue Type: Task
Reporter: John Fung
Assignee: John Fung
  Labels: replication-testing

 After the changes made in uniform naming convention of properties keys 
 (KAFKA-648), all testcase__properties.json files need to be updated on 
 the following properties keys changes:
 brokerid = broker.id
 hostname = host.name
 log.file.size = log.segment.size
 groupid = group.id

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (KAFKA-682) java.lang.OutOfMemoryError: Java heap space

2013-01-08 Thread Joel Koshy (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13547129#comment-13547129
 ] 

Joel Koshy commented on KAFKA-682:
--

That's why I asked  for the configured num-required-acks for the producer 
requests. If it is the default (0) then it shouldn't be added to the request 
purgatory which rule out KAFKA-671 no?

 java.lang.OutOfMemoryError: Java heap space
 ---

 Key: KAFKA-682
 URL: https://issues.apache.org/jira/browse/KAFKA-682
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8
 Environment: $ uname -a
 Linux rngadam-think 3.5.0-17-generic #28-Ubuntu SMP Tue Oct 9 19:32:08 UTC 
 2012 i686 i686 i686 GNU/Linux
 $ java -version
 java version 1.7.0_09
 OpenJDK Runtime Environment (IcedTea7 2.3.3) (7u9-2.3.3-0ubuntu1~12.04.1)
 OpenJDK Server VM (build 23.2-b09, mixed mode)
Reporter: Ricky Ng-Adam
 Attachments: java_pid22281.hprof.gz, java_pid22281_Leak_Suspects.zip


 git pull (commit 32dae955d5e2e2dd45bddb628cb07c874241d856)
 ...build...
 ./sbt update
 ./sbt package
 ...run...
 bin/zookeeper-server-start.sh config/zookeeper.properties
 bin/kafka-server-start.sh config/server.properties
 ...then configured fluentd with kafka plugin...
 gem install fluentd --no-ri --no-rdoc
 gem install fluent-plugin-kafka
 fluentd -c ./fluent/fluent.conf -vv
 ...then flood fluentd with messages inputted from syslog and outputted to 
 kafka.
 results in (after about 1 messages of 1K each in 3s):
 [2013-01-05 02:00:52,087] ERROR Closing socket for /127.0.0.1 because of 
 error (kafka.network.Processor)
 java.lang.OutOfMemoryError: Java heap space
 at 
 kafka.api.ProducerRequest$$anonfun$1$$anonfun$apply$1.apply(ProducerRequest.scala:45)
 at 
 kafka.api.ProducerRequest$$anonfun$1$$anonfun$apply$1.apply(ProducerRequest.scala:42)
 at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:206)
 at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:206)
 at scala.collection.immutable.Range$ByOne$class.foreach(Range.scala:282)
 at scala.collection.immutable.Range$$anon$1.foreach(Range.scala:274)
 at scala.collection.TraversableLike$class.map(TraversableLike.scala:206)
 at scala.collection.immutable.Range.map(Range.scala:39)
 at kafka.api.ProducerRequest$$anonfun$1.apply(ProducerRequest.scala:42)
 at kafka.api.ProducerRequest$$anonfun$1.apply(ProducerRequest.scala:38)
 at 
 scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:227)
 at 
 scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:227)
 at scala.collection.immutable.Range$ByOne$class.foreach(Range.scala:282)
 at scala.collection.immutable.Range$$anon$1.foreach(Range.scala:274)
 at 
 scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:227)
 at scala.collection.immutable.Range.flatMap(Range.scala:39)
 at kafka.api.ProducerRequest$.readFrom(ProducerRequest.scala:38)
 at kafka.api.RequestKeys$$anonfun$1.apply(RequestKeys.scala:32)
 at kafka.api.RequestKeys$$anonfun$1.apply(RequestKeys.scala:32)
 at kafka.network.RequestChannel$Request.init(RequestChannel.scala:47)
 at kafka.network.Processor.read(SocketServer.scala:298)
 at kafka.network.Processor.run(SocketServer.scala:209)
 at java.lang.Thread.run(Thread.java:722)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (KAFKA-687) Rebalance algorithm should consider partitions from all topics

2013-01-08 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-687:


Affects Version/s: 0.8.1

 Rebalance algorithm should consider partitions from all topics
 --

 Key: KAFKA-687
 URL: https://issues.apache.org/jira/browse/KAFKA-687
 Project: Kafka
  Issue Type: Improvement
Affects Versions: 0.8.1
Reporter: Pablo Barrera

 The current rebalance step, as stated in the original Kafka paper [1], splits 
 the partitions per topic between all the consumers. So if you have 100 topics 
 with 2 partitions each and 10 consumers only two consumers will be used. That 
 is, for each topic all partitions will be listed and shared between the 
 consumers in the consumer group in order (not randomly).
 If the consumer group is reading from several topics at the same time it 
 makes sense to split all the partitions from all topics between all the 
 consumer. Following the example, we will have 200 partitions in total, 20 per 
 consumer, using the 10 consumers.
 The load per topic could be different and the division should consider this. 
 However even a random division should be better than the current algorithm 
 while reading from several topics and should harm reading from a few topics 
 with several partitions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (KAFKA-688) System Test - Update all testcase_xxxx_properties.json for properties keys uniform naming convention

2013-01-08 Thread John Fung (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Fung updated KAFKA-688:


Description: 
After the changes made in uniform naming convention of properties keys 
(KAFKA-648), all testcase__properties.json files need to be updated on the 
following properties keys changes:

brokerid = broker.id
log.file.size = log.segment.size
groupid = group.id

  was:
After the changes made in uniform naming convention of properties keys 
(KAFKA-648), all testcase__properties.json files need to be updated on the 
following properties keys changes:

brokerid = broker.id
hostname = host.name
log.file.size = log.segment.size
groupid = group.id


 System Test - Update all testcase__properties.json for properties keys 
 uniform naming convention
 

 Key: KAFKA-688
 URL: https://issues.apache.org/jira/browse/KAFKA-688
 Project: Kafka
  Issue Type: Task
Reporter: John Fung
Assignee: John Fung
  Labels: replication-testing

 After the changes made in uniform naming convention of properties keys 
 (KAFKA-648), all testcase__properties.json files need to be updated on 
 the following properties keys changes:
 brokerid = broker.id
 log.file.size = log.segment.size
 groupid = group.id

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (KAFKA-689) Can't append to a topic/partition that does not already exist

2013-01-08 Thread David Arthur (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13547328#comment-13547328
 ] 

David Arthur commented on KAFKA-689:


Here is the same payload base64 encoded: 

WwxrYWZrYS1weXRob24AAQAAA+gBAAR0ZXN0AQAp
AAAdemDkywIAA2Zvbwx0ZXN0IG1lc3NhZ2U=

 Can't append to a topic/partition that does not already exist
 -

 Key: KAFKA-689
 URL: https://issues.apache.org/jira/browse/KAFKA-689
 Project: Kafka
  Issue Type: Bug
  Components: clients
Affects Versions: 0.8
Reporter: David Arthur
 Attachments: kafka.log, produce-payload.bin


 With a totally fresh Kafka (empty logs dir and empty ZK), if I send a 
 ProduceRequest for a new topic, Kafka responds with 
 kafka.common.UnknownTopicOrPartitionException: Topic test partition 0 
 doesn't exist on 0. This is when sending a ProduceRequest over the network 
 (from Python, in this case).
 If I use the console producer it works fine (topic and partition get 
 created). If I then send the same payload from before over the network, it 
 works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (KAFKA-689) Can't append to a topic/partition that does not already exist

2013-01-08 Thread David Arthur (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13547417#comment-13547417
 ] 

David Arthur commented on KAFKA-689:


Thanks, I was hoping it was something simple like this. Feel free to 
invalidate this bug :)

-David




 Can't append to a topic/partition that does not already exist
 -

 Key: KAFKA-689
 URL: https://issues.apache.org/jira/browse/KAFKA-689
 Project: Kafka
  Issue Type: Bug
  Components: clients
Affects Versions: 0.8
Reporter: David Arthur
 Attachments: kafka.log, produce-payload.bin


 With a totally fresh Kafka (empty logs dir and empty ZK), if I send a 
 ProduceRequest for a new topic, Kafka responds with 
 kafka.common.UnknownTopicOrPartitionException: Topic test partition 0 
 doesn't exist on 0. This is when sending a ProduceRequest over the network 
 (from Python, in this case).
 If I use the console producer it works fine (topic and partition get 
 created). If I then send the same payload from before over the network, it 
 works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (KAFKA-690) TopicMetadataRequest throws exception when no topics are specified

2013-01-08 Thread David Arthur (JIRA)
David Arthur created KAFKA-690:
--

 Summary: TopicMetadataRequest throws exception when no topics are 
specified
 Key: KAFKA-690
 URL: https://issues.apache.org/jira/browse/KAFKA-690
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8
Reporter: David Arthur
 Fix For: 0.8


If no topics are sent in a TopicMetadataRequest, `readFrom` throws an exception 
when trying to get the the head of the topic list for a debug statement.

java.util.NoSuchElementException: head of empty list
at scala.collection.immutable.Nil$.head(List.scala:386)
at scala.collection.immutable.Nil$.head(List.scala:383)
at 
kafka.api.TopicMetadataRequest$$anonfun$readFrom$2.apply(TopicMetadataRequest.scala:43)
at 
kafka.api.TopicMetadataRequest$$anonfun$readFrom$2.apply(TopicMetadataRequest.scala:43)
at kafka.utils.Logging$class.debug(Logging.scala:51)
at kafka.api.TopicMetadataRequest$.debug(TopicMetadataRequest.scala:25)
at 
kafka.api.TopicMetadataRequest$.readFrom(TopicMetadataRequest.scala:43)
at kafka.api.RequestKeys$$anonfun$4.apply(RequestKeys.scala:37)
at kafka.api.RequestKeys$$anonfun$4.apply(RequestKeys.scala:37)
at kafka.network.RequestChannel$Request.init(RequestChannel.scala:47)
at kafka.network.Processor.read(SocketServer.scala:320)
at kafka.network.Processor.run(SocketServer.scala:231)
at java.lang.Thread.run(Thread.java:680)


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


kafka 0.8 producer throughput

2013-01-08 Thread Jun Guo -X (jungu - CIIC at Cisco)
According to Kafka official document, the producer throughput is about 50MB/S. 
But I do some test, the producer throughout is only about 2MB/S. The test 
environment is the same with document says. One producer, One broker, One 
Zookeeper are in independent machine. Message size is 100 bytes, batch size is 
200, flush interval is 600 messages. The test environment is the same, the 
configuration is the same. The why there is such big gap the my test result and 
the document says?


[jira] [Commented] (KAFKA-690) TopicMetadataRequest throws exception when no topics are specified

2013-01-08 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13547665#comment-13547665
 ] 

Neha Narkhede commented on KAFKA-690:
-

+1. Thanks for the patch !

 TopicMetadataRequest throws exception when no topics are specified
 --

 Key: KAFKA-690
 URL: https://issues.apache.org/jira/browse/KAFKA-690
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8
Reporter: David Arthur
 Fix For: 0.8

 Attachments: KAFKA-690.patch


 If no topics are sent in a TopicMetadataRequest, `readFrom` throws an 
 exception when trying to get the the head of the topic list for a debug 
 statement.
 java.util.NoSuchElementException: head of empty list
   at scala.collection.immutable.Nil$.head(List.scala:386)
   at scala.collection.immutable.Nil$.head(List.scala:383)
   at 
 kafka.api.TopicMetadataRequest$$anonfun$readFrom$2.apply(TopicMetadataRequest.scala:43)
   at 
 kafka.api.TopicMetadataRequest$$anonfun$readFrom$2.apply(TopicMetadataRequest.scala:43)
   at kafka.utils.Logging$class.debug(Logging.scala:51)
   at kafka.api.TopicMetadataRequest$.debug(TopicMetadataRequest.scala:25)
   at 
 kafka.api.TopicMetadataRequest$.readFrom(TopicMetadataRequest.scala:43)
   at kafka.api.RequestKeys$$anonfun$4.apply(RequestKeys.scala:37)
   at kafka.api.RequestKeys$$anonfun$4.apply(RequestKeys.scala:37)
   at kafka.network.RequestChannel$Request.init(RequestChannel.scala:47)
   at kafka.network.Processor.read(SocketServer.scala:320)
   at kafka.network.Processor.run(SocketServer.scala:231)
   at java.lang.Thread.run(Thread.java:680)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira