[
https://issues.apache.org/jira/browse/FLUME-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13240989#comment-13240989
]
Arvind Prabhakar commented on FLUME-1045:
-----------------------------------------
bq. Curious to know what is the right way in current Flume architecture to
trade off transactional guarantees with very high thruput system; providing
certain degree of reliability incase the next link is down ?
This seems to be a confusion between design and implementation. The design
_requires_ that the channel expose transactional semantics. The channel
implementation _decides_ the degree of implementation. For example, the
transactional semantics implemented by the JDBC channel are very strict,
whereas that implemented by the Memory channel are weak.
However, since the design requires both these channels to expose transactional
semantics, you can switch the channels to suite your flow needs.
The solution being discussed here - disk based spooling on the sink side - goes
outside the scope of this design to accommodate throughput requirements. If
implemented, the messages that are spooled will be outside of the transaction
boundary and thus will invalidate the safety guarantee of the system.
bq. One of the solution which I think of where the IO cost is incurred on only
failures and still things are transactional: Wrap the MemoryChannel and
FileChannel into a new channel say SpoolingMemoryChannel. Events flow via
memory channel; on reaching the buffer capacity of memory channel, events are
spooled into FileChannel. Since the underlying channels are transactional,
SpoolingMemoryChannel can also be easily made transactional.
This sounds like a promising solution. The key thing to watch out here is the
ordering requirement. In general, channels are expected to preserve the order
of events. As long as that is take care of and the transactional semantics make
sense, then it could be the stop-gap solution until we have a high-throughput
file based channel implemented.
> Proposal to support disk based spooling
> ---------------------------------------
>
> Key: FLUME-1045
> URL: https://issues.apache.org/jira/browse/FLUME-1045
> Project: Flume
> Issue Type: New Feature
> Affects Versions: v1.0.0
> Reporter: Inder SIngh
> Priority: Minor
> Labels: patch
> Attachments: FLUME-1045-1.patch, FLUME-1045-2.patch
>
>
> 1. Problem Description
> A sink being unavailable at any stage in the pipeline causes it to back-off
> and retry after a while. Channel's associated with such sinks start buffering
> data with the caveat that if you are using a memory channel it can result in
> a domino effect on the entire pipeline. There could be legitimate down times
> eg: HDFS sink being down for name node maintenance, hadoop upgrades.
> 2. Why not use a durable channel (JDBC, FileChannel)?
> Want high throughput and support sink down times as a first class use-case.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira