Hi everyone,

I'm using the Flume-NG code in a system for collecting usage data via tracking 
pixels, and have implemented the pipeline I described in FLUME-896 
<https://issues.apache.org/jira/browse/FLUME-896?focusedCommentId=13182644&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13182644>.

I'm now trying to contribute the generic portions of the code in case anyone 
else is interested in it, but I need some direction with respect to the 
batching/unbatching part.  I've done it differently than it was done in 
Flume-OG, in order to support proper transactional behavior as individual 
events are split out and consumed in transactions that need not be 1:1 with the 
batches themselves.

I've introduced a BatchSplitter component that is both a sink for batch events 
and a source for the individual events contained within them.  I have 
implemented it to actually extend AbstractSink, but not AbstractSource (or even 
Source), as then the same setChannel() method would be called with both the 
upstream and downstream channel objects at configuration time.  Instead, I have 
added a separate setDownstreamChannel() method, and since for the moment I'm 
configuring everything directly via POJOs it just works.

Assuming that it is not inadvisable to bridge channels this way, I wonder how 
it is envisioned that such bridges would be configured "for real": should 
something like BatchSplitter actually implement the Source interface? should 
the configuration language support sinks with arbitrary configuration 
properties of type Channel?  It seems that the former would require Sources and 
Sinks to have distinctly-named setChannel() methods, while the latter would 
require that Configurable.configure() be passed a ChannelFactory or otherwise 
have a way to "cast" a channel name to a Channel instance.

Thoughts?

-peter

Reply via email to