Hi all, I just took a first look at the alpha2 of flume-ng and thought I'd 
provide a little early high level feedback. First of all, thanks for doing this 
re-architecture, we (medio systems) have had problems with the OG and at first 
glance NG looks very promising and we're very excited to try it out.

Ok feedback. Personally I would find it very helpful if Source/Sink/Channel 
were strongly defined. On the wiki, I see the line:

"You still have sources and sinks and they still do the same thing. They are 
now connected by channels."

I'm not finding that to be true. In OG, events are appended to Sinks via 
append(Event) and polled from Sources via next(). That is, some upstream 
component is responsible from getting events somehow and appending them to a 
Sink; some downstream component is responsible for getting events from a Source 
and processing it. For example, Driver is a special case, acting as the 
downstream component of a Source and an upstream component of a Sink; its 
processing is simply appending events to the downstream Sink.

In NG, it seems like Sources map to upstream components of an OG Sink and Sinks 
map to downstream components of an OG Source. A channel maps to the combination 
of OG Sources and Sinks. That is, I see the high level modeling as:
Sources - A component which puts events on a channel. (Implementation defines 
where those events come from).
Sinks - A component which polls events from a channel and applies some 
processing on them. (Implementation defines processing)
Channel - Transport between sources and sinks (Implementation defines 
durability, transport mechanism, etc.)

Please let me know if I have the high level picture correctly.

It seems to me, most Source/Sink/Channel do follow the above definition. But 
the avro stuff seems to be a major divergence. I'm a little confused about 
AvroSource; it doesn't seem to do much. It appears to be just a vanilla Source 
that let's you manually pass in events, which it'll then pass straight through 
to a channel. I'm guessing that's a work in progress?
Avro transport, seems to me, best modeled as an AvroChannel, which can link a 
source and sink which in turn defines where the events come from and what to do 
with them on the other side; that is modeling avro transport as a channel seems 
like it might provide more flexibility for less configuration. Also, modeling 
avro transport as a channel seems like it would make it easier to configure for 
different reliability levels through composition with other channels. The way 
AvroSink is written can work, but I see advantages in modeling avro transport 
as a channel instead, any thoughts?

For the most part, I'm a big fan of the high level modeling in NG. One thing I 
want to bring up is the fact that channel has both put() and take() on it. I'm 
not seeing the case where the same component would want to both take() an event 
for processing and also put() an event on to the same channel, since that 
component has a good chance being the one that ends up take()ing the event 
back. Because of that, I think it could be a good idea to separate channel into 
2 interfaces. I can see channel-like implementations, for which it's more 
difficult to implement both put() and take() and I don't see the need for both 
to be in every implementation (though most will be, and that's ok). I guess 
what I'm thinking is along the lines of
interface ChannelPoller { take() }
interface ChannelSender { put() }
public class FileChannel implements ChannelReceivingSide, ChannelSendingSide 
{...}
public class SomeSource { ChannelSender _channel; ... }
public class SomeSink { ChannelPoller _channel; ... }

One final thing is, if I have the right idea on the high level modeling then it 
seems like a method like process() should be defined on the sinks interface and 
a method like List<Event> getNext() should be defined on the sources interface, 
thoughts? What mind a sink do if it doesn't have a process() defined?

Anyways thanks again for doing this work, I think it's very positive. I'll be 
talking to people internally about helping out, I think it could be good for 
all involved. I apologize if I've misunderstood anything or made any wrong 
assumptions. When we get around to testing it out, I'll get back to you guys on 
lower level issues.

Cheers,
Shu

Reply via email to