Hi everyone, I'm using the Flume-NG code in a system for collecting usage data via tracking pixels, and have implemented the pipeline I described in FLUME-896 <https://issues.apache.org/jira/browse/FLUME-896?focusedCommentId=13182644&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13182644>.
I'm now trying to contribute the generic portions of the code in case anyone else is interested in it, but I need some direction with respect to the batching/unbatching part. I've done it differently than it was done in Flume-OG, in order to support proper transactional behavior as individual events are split out and consumed in transactions that need not be 1:1 with the batches themselves. I've introduced a BatchSplitter component that is both a sink for batch events and a source for the individual events contained within them. I have implemented it to actually extend AbstractSink, but not AbstractSource (or even Source), as then the same setChannel() method would be called with both the upstream and downstream channel objects at configuration time. Instead, I have added a separate setDownstreamChannel() method, and since for the moment I'm configuring everything directly via POJOs it just works. Assuming that it is not inadvisable to bridge channels this way, I wonder how it is envisioned that such bridges would be configured "for real": should something like BatchSplitter actually implement the Source interface? should the configuration language support sinks with arbitrary configuration properties of type Channel? It seems that the former would require Sources and Sinks to have distinctly-named setChannel() methods, while the latter would require that Configurable.configure() be passed a ChannelFactory or otherwise have a way to "cast" a channel name to a Channel instance. Thoughts? -peter
