Kafka questions

Paul Sutter Mon, 18 Jul 2011 18:42:11 -0700

Kafka looks like an exciting project, thanks for opening it up.

I have a few questions:


1. Are checksums end to end (ie, created by the producer and checked by the
consumer)? or are they only used to confirm buffercache behavior on disk as
mentioned in the documentation? Bit errors occur vastly more often than most
people assume, often because of device driver bugs. TCP only detects 1 error
in 65536, so errors can flow through (if you like I can send links to papers
describing the need for checksums everywhere).

2. The consumer has a pretty solid mechanism to ensure it hasnt missed any
messages (i like the design by the way), but how does the producer know that
all of its messages have been stored? (no apparent message id on that side
since the message id isnt known until the message is written to the file).
I'm especially curious how failover/replication could be implemented and I'm
thinking that acks on the publisher side may help)

3. Has the consumer's flow control been tested over high bandwidth*delay
links? (what bandwidth can you get from a London consumer of an SF cluster?)

4. What kind of performance do you get if you set the producer's message
delay to zero? (ie, is there a separate system call for each message? or do
you manage to aggregate messages into a smaller number of system calls even
with a delay of 0?)

5. Have you considered using a library like zeromq for the messaging layer
instead of rolling your own? (zeromq will handle #4 cleanly at millions of
messages per second and has clients in 20 languages)

6. Do you have any plans to support intermediate processing elements the way
Flume supports?

7. The docs mention that new versions will only be released after they are
in production at LinkedIn? Does that mean that the latest version of the
source code is hidden at LinkedIn and contributors would have to throw
patches over the wall and wait months to get the integrated product?

Thanks!

Kafka questions

Reply via email to