[ https://issues.apache.org/jira/browse/KAFKA-1955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16014550#comment-16014550 ]
Brandon Bradley commented on KAFKA-1955: ---------------------------------------- The rebased patch applies cleanly to 68ad80f8. I'm trying to get it updated to trunk and submit a proper pull request. > Explore disk-based buffering in new Kafka Producer > -------------------------------------------------- > > Key: KAFKA-1955 > URL: https://issues.apache.org/jira/browse/KAFKA-1955 > Project: Kafka > Issue Type: Improvement > Components: producer > Affects Versions: 0.8.2.0 > Reporter: Jay Kreps > Assignee: Jay Kreps > Attachments: KAFKA-1955.patch, > KAFKA-1955-RABASED-TO-8th-AUG-2015.patch > > > There are two approaches to using Kafka for capturing event data that has no > other "source of truth store": > 1. Just write to Kafka and try hard to keep the Kafka cluster up as you would > a database. > 2. Write to some kind of local disk store and copy from that to Kafka. > The cons of the second approach are the following: > 1. You end up depending on disks on all the producer machines. If you have > 10000 producers, that is 10k places state is kept. These tend to fail a lot. > 2. You can get data arbitrarily delayed > 3. You still don't tolerate hard outages since there is no replication in the > producer tier > 4. This tends to make problems with duplicates more common in certain failure > scenarios. > There is one big pro, though: you don't have to keep Kafka running all the > time. > So far we have done nothing in Kafka to help support approach (2), but people > have built a lot of buffering things. It's not clear that this is necessarily > bad. > However implementing this in the new Kafka producer might actually be quite > easy. Here is an idea for how to do it. Implementation of this idea is > probably pretty easy but it would require some pretty thorough testing to see > if it was a success. > The new producer maintains a pool of ByteBuffer instances which it attempts > to recycle and uses to buffer and send messages. When unsent data is queuing > waiting to be sent to the cluster it is hanging out in this pool. > One approach to implementing a disk-baked buffer would be to slightly > generalize this so that the buffer pool has the option to use a mmap'd file > backend for it's ByteBuffers. When the BufferPool was created with a > totalMemory setting of 1GB it would preallocate a 1GB sparse file and memory > map it, then chop the file into batchSize MappedByteBuffer pieces and > populate it's buffer with those. > Everything else would work normally except now all the buffered data would be > disk backed and in cases where there was significant backlog these would > start to fill up and page out. > We currently allow messages larger than batchSize and to handle these we do a > one-off allocation of the necessary size. We would have to disallow this when > running in mmap mode. However since the disk buffer will be really big this > should not be a significant limitation as the batch size can be pretty big. > We would want to ensure that the pooling always gives out the most recently > used ByteBuffer (I think it does). This way under normal operation where > requests are processed quickly a given buffer would be reused many times > before any physical disk write activity occurred. > Note that although this let's the producer buffer very large amounts of data > the buffer isn't really fault-tolerant, since the ordering in the file isn't > known so there is no easy way to recovery the producer's buffer in a failure. > So the scope of this feature would just be to provide a bigger buffer for > short outages or latency spikes in the Kafka cluster during which you would > hope you don't also experience failures in your producer processes. > To complete the feature we would need to: > a. Get some unit tests that would cover disk-backed usage > b. Do some manual performance testing of this usage and understand the impact > on throughput. > c. Do some manual testing of failure cases (i.e. if the broker goes down for > 30 seconds we should be able to keep taking writes) and observe how well the > producer handles the catch up time when it has a large backlog to get rid of. > d. Add a new configuration for the producer to enable this, something like > use.file.buffers=true/false. > e. Add documentation that covers these new options. -- This message was sent by Atlassian JIRA (v6.3.15#6346)