Hey Steve, I'd be very interested in hearing what you discover.
Most performance-related knowledge that I have is about tuning Kafka to go fast. :) As far as implementation goes, I think you'll need to implement a SystemConsumer, SystemProducer, SystemAdmin, and SystemFactory in order to fully support direct memory. The main problem with "swapping" out Kafka is that you're going to lose some of Samza's guarantees. Samza depends a lot on the guarantees of the underlying streaming system for things like: * Message ordering. * At-least once messaging. * Replayability (offsets). * Fault tolerance (replication). If your direct memory implementation doesn¹t provide some of these features, then neither can Samza. That may be fine, or that may be unsatisfactory for your use case. Samza will work without these features, but makes no effort to provide them itself. This means if, for example, your direct memory implementation isn't repayable, then your offset checkpoints are useless in Samza, and will be disregarded (you'll always start consuming from wherever the direct memory SystemConsumer implementation decides to start). Cheers, Chris On 9/16/14 4:21 AM, "Steven Yates" <[email protected]> wrote: >Hi devs, i am looking to get as much performance out of Samza as possible >and am interested in looking at what effect a direct memory approach has >on performance an whether frameworks like Kafka can be swapped out for a >more direct off heap approach I am trialling this implementation now in >my local env however I don't have exact metrics yet. I was wondering if >you guys had any further thoughts on this? > >-Steve
