On Tue, May 8, 2012 at 10:14 AM, S Ahmed <[email protected]> wrote: > Greetings, > > Just looking over the code a bit, and I really appreciate the level of > comments in the code! > > I am interesting in learning how the generic design works when it comes to > this (with my assumptions, please correct me where appropriate): > > 1. When data is being stored in-memory, it is stored in some sort of > collection like a conconcurrenthashmap. So this in memory structure gets > appended to until a certain criteria is met (time based, # of items, size > of data), then it gets flushed/sinked to one of the many implementations.
There is two implementations of storing events in memory. There is MemoryChannel which uses a LinkedBlockingDeque: https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/channel/MemoryChannel.java And there is FileChannel which uses a circular array: https://github.com/apache/flume/blob/trunk/flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FlumeEventQueue.java In both cases the Channel only stores data until a sink takes the data. Sinks/Sources implement their own batching. > > 2. How does this collection get sinked all the while accepting new data. I > also am guessing that this process is abstracted, so future implementations > can just borrow on this functionality and now have to worry about > concurrency issues. Sinks/Sources use a Transaction to take data off and put data on the Channel. They don't have to worry about which channel they are using. If you were writing a Channel you'd have to worry about how to handle the problems described above. Brock -- Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
