This is referring to the usage of Kafka as where we deposit messages after processing by one job and (potentially) before processing by the next. Since Kafka writes all messages to disk and we (generally) write all messages to Kafka, this is our buffering to disk. The statement could be made a bit more explicit that this is the case when using Kafka and not necessarily other producers or consumers.
This approach is in contrast to other systems that try to keep messages in memory before passing them to another processor. -jg On Tue, Jul 8, 2014 at 9:19 AM, Yan Fang <[email protected]> wrote: > I was a little confusing by the statement "Samza takes a different approach > to buffering. We buffer to disk at every hop between a StreamTask. > < > http://samza.incubator.apache.org/learn/documentation/0.7.0/comparisons/storm.html > > > ". > > What does "buffer to disk" mean here? I actually do not get how we deal > with the situation when the processing is slower than receiving messages. > Thank you. > > Cheers, > > Fang, Yan > [email protected] > +1 (206) 849-4108 >
