Hello,

This might be something for the developer guide or it might be
somewhere and I just missed it.  I feel like we should set down some
expectations in regards to:

1) Source behavior when:
  a) Channel put fails
  b) Source started but is unable to obtain new events for some reason
2) Channel behavior when:
  a) Channel capacity exceeded
  b) take when channel is empty
3) Sink behavior when:
  a) Channel take returns null
  b) Sink cannot write to the downstream location

This comes about when I noticed some inconsistencies.  For example, a
take in MemoryChannel blocks for a few seconds by default and
JDBCChannel does not (FLUME-998). Combined with HDFSEvent sink, this
causes tremendous amounts of CPU consumption. Also, currently if HDFS
is unavailable for a period, flume needs to be restarted (FLUME-985).

My general thoughts are are based on experience working with JMS based services.

1) Source/Channel/Sink should not require a restart when up or down
stream services are restarted or become temporarily unavailable.
2) Channel capacity being exceeded should not lead to sources dying
and thus requiring a flume restart. This will happen when downstream
destinations slow down for various reasons.

Brock

-- 
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/

Reply via email to