Hey Sean, Thanks for the interest. We haven't done anything rigorous on the performance-testing side of things.
Jay has a little perf test to exercise a few things, and it gets in the 280k messages/sec range, but that's a pretty meaningless statement. He can probably speak more about what the perf test does, how big the messages are, whether it hits the Kafka broker, etc. As far as upcoming perf work goes, the big thing is eliminating some concurrent data structures (queues/maps). HProf shows that this is where most (>20%) of our CPU cycles go. This can be done once Kafka's consumer API has been cleaned up a bit, which is a work in progress (https://cwiki.apache.org/confluence/display/KAFKA/Client+Rewrite). The theoretical max throughput we could achieve when using Kafka with Samza would be something along the lines of the numbers in Kafka's consumer/producer performance tests, but I'm sure we're not near that (yet). See the grid at the bottom of the page here: https://cwiki.apache.org/confluence/display/KAFKA/Performance+testing Our largest job is currently processing about about 13 megs/sec peak spread across 5 containers, but the need for 5 containers has more to do with memory requirements than throughput requirements, at this point. I'm sorry I can't be more specific, it's just been "fast enough" so far. This is something we should take seriously. I've opened up a JIRA to track the creation of a performance test suite: https://issues.apache.org/jira/browse/SAMZA-6 Feel free to add yourself as a watcher to keep tabs on progress. Cheers, Chris ________________________________________ From: Sean Zhong(clockfly) [[email protected]] Sent: Sunday, August 11, 2013 8:08 PM To: [email protected] Subject: About SAMZA performance Hi, SAMZA Developers, Have you done performnace comparison on SAMZA? Including the Throughput and Latency. I am very curious to see the performance difference compared with Storm, or spark streaming. Sean
