[jira] [Created] (KAFKA-2986) Consumer group doesn't lend itself well for slow consumers with varying message size

Jens Rantil (JIRA) Sun, 13 Dec 2015 05:24:05 -0800

Jens Rantil created KAFKA-2986:
----------------------------------

             Summary: Consumer group doesn't lend itself well for slow 
consumers with varying message size
                 Key: KAFKA-2986
                 URL: https://issues.apache.org/jira/browse/KAFKA-2986
             Project: Kafka
          Issue Type: Bug
          Components: consumer
    Affects Versions: 0.9.0.0
         Environment: Java consumer API 0.9.0.0
            Reporter: Jens Rantil
            Assignee: Neha Narkhede



I sent a related post to the Kafka mailing list, but haven't received any 
response: 
http://mail-archives.apache.org/mod_mbox/kafka-users/201512.mbox/%3CCAL%2BArfWNfkpymkNDuf6UJ06CJJ63XC1bPHeT4TSYXKjSsOpu-Q%40mail.gmail.com%3E
 So far, I think this is a design issue in Kafka so I'm taking the liberty of 
creating an issue.

*Use case:*
 - Slow consumtion. Maybe around 20 seconds per record.
 - Large variation in message size: Serialized tasks are in the range of ~300 
bytes up to ~3 MB.
 - Consumtion latency (20 seconds) is independent of message size.

*Code example:*
{noformat}
while (isRunning()) {
  ConsumerRecords<String, byte[]> records = consumer.poll(100);
  for (final ConsumerRecord<String, byte[]> record : records) {
    // Handle record...
  }
}
{noformat}

*Problem:* Kafka doesn't have any issues with large messages (as long as you 
bump some configuration flags). However, the problem is two-fold:
- KafkaConsumer#poll is the only call that sends healthchecks.
- There is no limit as to how many messages KafkaConsumer#poll will return. The 
limit is only set to the total number of bytes to be prefetched. This is 
problematic for varying message sizes as the session timeout becomes extremelly 
hard to tune:
-- delay until next KafkaConsumer#poll call is proportional to the number of 
records returned by previous KafkaConsumer#poll call.
-- KafkaConsumer#poll will return many small records or just a few larger 
records. For many small messages the risk is very large of the session timeout 
to kick in. Raising the session timeout in the order of magnitudes required to 
handle the smaller messages increases the latency until a dead consumer is 
discovered a thousand fold.

*Proposed fixes:* I do not claim to be a Kafka expert, but two ideas are to 
either
 - allow add `KafkaConsumer#healthy` call to let the broker know we are still 
processing records; or
 - add an upper number of message limit to `KafkaConsumer#poll`. I am thinking 
of something like `KafkaConsumer#poll(timeout, nMaxMessages)`.

*Workaround:* Have different topics for different message sizes. Makes tuning 
of partition prefetch easier.

*Questions:* Should Kafka be able to handle this case? Maybe I am using the 
wrong tool for this and Kafka is simply designed for high-throughput/low 
latency?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (KAFKA-2986) Consumer group doesn't lend itself well for slow consumers with varying message size

Reply via email to