Jay Kreps created KAFKA-1955:
--------------------------------

             Summary: Explore disk-based buffering in new Kafka Producer
                 Key: KAFKA-1955
                 URL: https://issues.apache.org/jira/browse/KAFKA-1955
             Project: Kafka
          Issue Type: Improvement
          Components: producer 
    Affects Versions: 0.8.2.0
            Reporter: Jay Kreps


There are two approaches to using Kafka for capturing event data that has no 
other "source of truth store":
1. Just write to Kafka and try hard to keep the Kafka cluster up as you would a 
database.
2. Write to some kind of local disk store and copy from that to Kafka.

The cons of the second approach are the following:
1. You end up depending on disks on all the producer machines. If you have 
10000 producers, that is 10k places state is kept. These tend to fail a lot.
2. You can get data arbitrarily delayed
3. You still don't tolerate hard outages since there is no replication in the 
producer tier
4. This tends to make problems with duplicates more common in certain failure 
scenarios.

There is one big pro, though: you don't have to keep Kafka running all the time.

So far we have done nothing in Kafka to help support approach (2), but people 
have built a lot of buffering things. It's not clear that this is necessarily 
bad.

However implementing this in the new Kafka producer might actually be quite 
easy. Here is an idea for how to do it. Implementation of this idea is probably 
pretty easy but it would require some pretty thorough testing to see if it was 
a success.

The new producer maintains a pool of ByteBuffer instances which it attempts to 
recycle and uses to buffer and send messages. When unsent data is queuing 
waiting to be sent to the cluster it is hanging out in this pool.

One approach to implementing a disk-baked buffer would be to slightly 
generalize this so that the buffer pool has the option to use a mmap'd file 
backend for it's ByteBuffers. When the BufferPool was created with a 
totalMemory setting of 1GB it would preallocate a 1GB sparse file and memory 
map it, then chop the file into batchSize MappedByteBuffer pieces and populate 
it's buffer with those.

Everything else would work normally except now all the buffered data would be 
disk backed and in cases where there was significant backlog these would start 
to fill up and page out.

We currently allow messages larger than batchSize and to handle these we do a 
one-off allocation of the necessary size. We would have to disallow this when 
running in mmap mode. However since the disk buffer will be really big this 
should not be a significant limitation as the batch size can be pretty big.

We would want to ensure that the pooling always gives out the most recently 
used ByteBuffer (I think it does). This way under normal operation where 
requests are processed quickly a given buffer would be reused many times before 
any physical disk write activity occurred.

Note that although this let's the producer buffer very large amounts of data 
the buffer isn't really fault-tolerant, since the ordering in the file isn't 
known so there is no easy way to recovery the producer's buffer in a failure. 
So the scope of this feature would just be to provide a bigger buffer for short 
outages or latency spikes in the Kafka cluster during which you would hope you 
don't also experience failures in your producer processes.

To complete the feature we would need to:
a. Get some unit tests that would cover disk-backed usage
b. Do some manual performance testing of this usage and understand the impact 
on throughput.
c. Do some manual testing of failure cases (i.e. if the broker goes down for 30 
seconds we should be able to keep taking writes) and observe how well the 
producer handles the catch up time when it has a large backlog to get rid of.
d. Add a new configuration for the producer to enable this, something like 
use.file.buffers=true/false.
e. Add documentation that covers these new options.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to