Hi all, I just took a first look at the alpha2 of flume-ng and thought I'd
provide a little early high level feedback. First of all, thanks for doing this
re-architecture, we (medio systems) have had problems with the OG and at first
glance NG looks very promising and we're very excited to try it out.
Ok feedback. Personally I would find it very helpful if Source/Sink/Channel
were strongly defined. On the wiki, I see the line:
"You still have sources and sinks and they still do the same thing. They are
now connected by channels."
I'm not finding that to be true. In OG, events are appended to Sinks via
append(Event) and polled from Sources via next(). That is, some upstream
component is responsible from getting events somehow and appending them to a
Sink; some downstream component is responsible for getting events from a Source
and processing it. For example, Driver is a special case, acting as the
downstream component of a Source and an upstream component of a Sink; its
processing is simply appending events to the downstream Sink.
In NG, it seems like Sources map to upstream components of an OG Sink and Sinks
map to downstream components of an OG Source. A channel maps to the combination
of OG Sources and Sinks. That is, I see the high level modeling as:
Sources - A component which puts events on a channel. (Implementation defines
where those events come from).
Sinks - A component which polls events from a channel and applies some
processing on them. (Implementation defines processing)
Channel - Transport between sources and sinks (Implementation defines
durability, transport mechanism, etc.)
Please let me know if I have the high level picture correctly.
It seems to me, most Source/Sink/Channel do follow the above definition. But
the avro stuff seems to be a major divergence. I'm a little confused about
AvroSource; it doesn't seem to do much. It appears to be just a vanilla Source
that let's you manually pass in events, which it'll then pass straight through
to a channel. I'm guessing that's a work in progress?
Avro transport, seems to me, best modeled as an AvroChannel, which can link a
source and sink which in turn defines where the events come from and what to do
with them on the other side; that is modeling avro transport as a channel seems
like it might provide more flexibility for less configuration. Also, modeling
avro transport as a channel seems like it would make it easier to configure for
different reliability levels through composition with other channels. The way
AvroSink is written can work, but I see advantages in modeling avro transport
as a channel instead, any thoughts?
For the most part, I'm a big fan of the high level modeling in NG. One thing I
want to bring up is the fact that channel has both put() and take() on it. I'm
not seeing the case where the same component would want to both take() an event
for processing and also put() an event on to the same channel, since that
component has a good chance being the one that ends up take()ing the event
back. Because of that, I think it could be a good idea to separate channel into
2 interfaces. I can see channel-like implementations, for which it's more
difficult to implement both put() and take() and I don't see the need for both
to be in every implementation (though most will be, and that's ok). I guess
what I'm thinking is along the lines of
interface ChannelPoller { take() }
interface ChannelSender { put() }
public class FileChannel implements ChannelReceivingSide, ChannelSendingSide
{...}
public class SomeSource { ChannelSender _channel; ... }
public class SomeSink { ChannelPoller _channel; ... }
One final thing is, if I have the right idea on the high level modeling then it
seems like a method like process() should be defined on the sinks interface and
a method like List<Event> getNext() should be defined on the sources interface,
thoughts? What mind a sink do if it doesn't have a process() defined?
Anyways thanks again for doing this work, I think it's very positive. I'll be
talking to people internally about helping out, I think it could be good for
all involved. I apologize if I've misunderstood anything or made any wrong
assumptions. When we get around to testing it out, I'll get back to you guys on
lower level issues.
Cheers,
Shu