[jira] [Updated] (KAFKA-991) Reduce the queue size in hadoop producer

Swapnil Ghike (JIRA) Wed, 31 Jul 2013 08:40:15 -0700

     [ 
https://issues.apache.org/jira/browse/KAFKA-991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Swapnil Ghike updated KAFKA-991:
--------------------------------

    Description: 
Currently the queue.size in hadoop producer is 10MB. This means that the 
KafkaRecordWriter will hit the send button on kafka producer after the size of 
uncompressed queued messages becomes greater than 10MB. (The other condition on 
which the messages are sent is if their number exceeds SHORT.MAX_VALUE).

Considering that the server accepts a (compressed) batch of messages of 
sizeupto 1 million bytes minus the log overhead, we should probably reduce the 
queue size in hadoop producer. We should do two things:

1. change max message size on the broker to 1 million + log overhead, because 
that will make the client message size easy to remember. Right now the maximum 
number of bytes that can be accepted from a client in a batch of messages is an 
awkward 999988. (I don't have a stronger reason). We have set fetch size on the 
consumer to 1MB, this gives us a lot of room even if the log overhead increased 
with further versions.

2. Set the default number of bytes on hadoop producer to 1 million bytes. 
Anyone who wants higher throughput can override this config using 
kafka.output.queue.size

  was:
Currently the queue.size in hadoop producer is 10MB. This means that the 
KafkaRecordWriter will hit the send button on kafka producer after the size of 
uncompressed queued messages becomes greater than 10MB. (The other condition on 
which the messages are sent is if their number exceeds SHORT.MAX_VALUE).

Considering that the server accepts a (compressed) batch of messages of 
sizeupto 1 million bytes minus the log overhead, we should probably reduce the 
queue size in hadoop producer. We should do two things:

1. change max message size on the broker to 1 million + log overhead, just 
because I think the client message size could be an easy to remember number. 
Right now the maximum number of bytes that can be accepted from a client in a 
batch of messages is an awkward 999988. (I don;t have a stronger reason). We 
have set fetch size on the consumer to 1MB, this gives us a lot of room even if 
the log overhead increased with further versions.

2. Set the default number of bytes on hadoop producer to 1 million bytes. 
Anyone who wants higher throughput can override this config using 
kafka.output.queue.size

    
> Reduce the queue size in hadoop producer
> ----------------------------------------
>
>                 Key: KAFKA-991
>                 URL: https://issues.apache.org/jira/browse/KAFKA-991
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.8
>            Reporter: Swapnil Ghike
>            Assignee: Swapnil Ghike
>              Labels: bugs
>             Fix For: 0.8
>
>         Attachments: kafka-991-v1.patch
>
>
> Currently the queue.size in hadoop producer is 10MB. This means that the 
> KafkaRecordWriter will hit the send button on kafka producer after the size 
> of uncompressed queued messages becomes greater than 10MB. (The other 
> condition on which the messages are sent is if their number exceeds 
> SHORT.MAX_VALUE).
> Considering that the server accepts a (compressed) batch of messages of 
> sizeupto 1 million bytes minus the log overhead, we should probably reduce 
> the queue size in hadoop producer. We should do two things:
> 1. change max message size on the broker to 1 million + log overhead, because 
> that will make the client message size easy to remember. Right now the 
> maximum number of bytes that can be accepted from a client in a batch of 
> messages is an awkward 999988. (I don't have a stronger reason). We have set 
> fetch size on the consumer to 1MB, this gives us a lot of room even if the 
> log overhead increased with further versions.
> 2. Set the default number of bytes on hadoop producer to 1 million bytes. 
> Anyone who wants higher throughput can override this config using 
> kafka.output.queue.size

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (KAFKA-991) Reduce the queue size in hadoop producer

Reply via email to