Hi All, Most streaming systems have built-in support for batching since it often offers major performance benefits in terms of throughput.
I'm a little confused about the state of batching in Flume today. It looks like a ChannelProcessor can process a batch of events within one transaction, but internally this just calls Channel.put() several times. As far as I can tell, both of the durable channels (JDBC and File) actually flush to disk in some fashion whenever there is a doPut(). It seems to me like it makes sense to buffer all of those puts in memory and only flush them once per transaction. Otherwise, isn't the benefit of batching put()'s within a transaction lost? I think I might be missing something here, any pointers are appreciated. - Patrick
