Hey Sean,

Thanks for the interest. We haven't done anything rigorous on the 
performance-testing side of things.

Jay has a little perf test to exercise a few things, and it gets in the 280k 
messages/sec range, but that's a pretty meaningless statement. He can probably 
speak more about what the perf test does, how big the messages are, whether it 
hits the Kafka broker, etc.

As far as upcoming perf work goes, the big thing is eliminating some concurrent 
data structures (queues/maps). HProf shows that this is where most (>20%) of 
our CPU cycles go. This can be done once Kafka's consumer API has been cleaned 
up a bit, which is a work in progress 
(https://cwiki.apache.org/confluence/display/KAFKA/Client+Rewrite).

The theoretical max throughput we could achieve when using Kafka with Samza 
would be something along the lines of the numbers in Kafka's consumer/producer 
performance tests, but I'm sure we're not near that (yet). See the grid at the 
bottom of the page here: 
https://cwiki.apache.org/confluence/display/KAFKA/Performance+testing

Our largest job is currently processing about about 13 megs/sec peak spread 
across 5 containers, but the need for 5 containers has more to do with memory 
requirements than throughput requirements, at this point.

I'm sorry I can't be more specific, it's just been "fast enough" so far. This 
is something we should take seriously. I've opened up a JIRA to track the 
creation of a performance test suite:

    https://issues.apache.org/jira/browse/SAMZA-6

Feel free to add yourself as a watcher to keep tabs on progress.

Cheers,
Chris
________________________________________
From: Sean Zhong(clockfly) [[email protected]]
Sent: Sunday, August 11, 2013 8:08 PM
To: [email protected]
Subject: About SAMZA performance

Hi, SAMZA Developers,

Have you done performnace comparison on SAMZA? Including the Throughput and
Latency.

I am very curious to see the performance difference compared with Storm, or
spark streaming.

Sean

Reply via email to