Thanks for the response, Andrew, i appreciate the help!

Just i few thoughts that came up while reading your points:


  1.  In theory, Redis is also handling/storing data in memory which makes me 
wonder why is that Kafka does it better? Perhaps it has to do with the API 
contract, where, as you said, there's no complex transactional software that 
might hurt performance.
  2.  Didn't know there was such a big difference from linear to random writes, 
pretty awesome! But I still don't understand how disk usage, even If doing 
linear writes, is still allowing a throughput rate of 2 to 3x the amount of 
Redis, which doesn't use disk write/read at all and keep messages stored in 
memory.
  3.  Didn't know about this zero-copy technique, I'll read more about that but 
feels like the result would be a response similar to as if kafka had the info 
stored in-memory (as redis do) but that would still make me question how is 
that Kafka can handle a higher throughput if the "design" is so similar.


________________________________
De: Andrew Grant <andrewgrant...@gmail.com>
Enviado: quinta-feira, 14 de outubro de 2021 15:55
Para: dev@kafka.apache.org <dev@kafka.apache.org>
Assunto: [SPAM] Re: Why does Kafka have a higher throughput than Redis?

Hi Vitor,

I'm not an expert and probably some more knowledgeable folks can also chime
in (and correct me) but a few things came to mind:

1) On the write side (i.e. when using the publisher), Kafka does not flush
data to disk by default. It writes to the page cache so all writes are sort
of in-memory in a way. They're staged in the page cache and the kernel
flushes the data asynchronously. Also the API contract for Kafka is quite
"simple" in that it mostly reads and writes arbitrary sequences of bytes -
there isn't as much complex transactional software in front of the
writing/reading that might hurt performance compared to some other data
stores. Note, Kafka does provide things like idempotence and transactions
so it's not like there is never any overhead to consider.

2) Kafka reads and writes are conducive to being linear which helps a lot
with performance. Random writes are a lot slower than linear ones.

3) For reading (i.e. when using the consumer) data Kafka uses a zero-copy
technique in which data is directly sent from the page cache to the network
buffer without going through user space which helps a lot.

4) Kafka batches aggressively.

Here are two resources which might provide more information
https://docs.confluent.io/platform/current/kafka/design.html,
https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
.

Hope this helps a bit.

Andrew

On Thu, Oct 14, 2021 at 1:11 PM Vitor Augusto de Medeiros <
v.medei...@aluno.ufabc.edu.br> wrote:

> Hi everyone,
>
>  i'm doing a benchmark comparison between Kafka and Redis for my final
> bachelor paper and would like to understand more about why Kafka have
> higher throughput if compared to Redis.
>
>  I noticed Redis has lower overall latency (and makes sense since it's
> stored in memory) but cant figure out the difference in throughput.
>
> I found a study (not sure if i can post links here but it's named A
> COMPARISON OF DATA INGESTION PLATFORMS IN REAL-TIME STREAM PROCESSING
> PIPELINES by Sebastian Tallberg)
> showing Kafka's throughput hitting 3x the amount of msg/s if compared to
> Redis for a 1kB payload. I would like to understand what is in Kafka's
> architecture that allows it to be a lot faster than other message
> brokers/Redis in particular
>
> Thanks!
>


--
Andrew Grant
8054482621

Reply via email to