Hi,

I was just wondering if there is any difference in the memory footprint of
a high level consumer when:

1. the consumer is live and continuously consuming messages with no backlogs
2. when the consumer is down for quite some time and needs to be brought up
to clear the backlog.

My test case with kafka 0.8.2.1 using only one topic has:

Setup: 6 brokers and 3 zookeeper nodes
Message Size: 1 MB
Producer rate: 100 threads with 1000 messages per thread
No. of partitions in topic: 100
Consumer threads: 100 consumer threads in the same group

I initially started producer and consumer on the same java process with a
heap size 1 GB. The producer could send all the messages to broker. But the
consumer started throwing OutOfMemory exceptions after consuming 26k
messages.

Upon restarting the process with 5 GB heap, the consumer consumed around
4.8k messages before going OOM (while clearing a backlog of around 74k).
The rest of the messages got consumed when I bumped up heap to 10 GB.

On the consumer, I have the default values for fetch.message.max.bytes and
queued.max.message.chunks.

If the calculation
(fetch.message.max.bytes)*(queued.max.message.chunks)*(no. of consumer
threads) holds good for consumer, then 1024*1024*10*100 (close to 1GB) is
well below the 5GB heap allocated. Did I leave something out of this
calculation?


Regards,
Kris

Reply via email to