[
https://issues.apache.org/jira/browse/FLUME-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238295#comment-13238295
]
Sharad Agarwal commented on FLUME-1045:
---------------------------------------
bq. as it violates the transactional exchange invariant of the design
Some systems have very high thruput requirement and have relaxed transaction
needs. Typically these applications want the system to run at very high thruput
and incase of failures, are ok to lose or replay small number of events.
FileChannel intends to be fully transactional and also high thruput. However it
will be IO/disk bound.
1. Curious to know what is the right way in current Flume architecture to trade
off transactional guarantees with very high thruput system; providing certain
degree of reliability incase the next link is down ?
2. One of the solution which I think of where the IO cost is incurred on only
failures and still things are transactional:
Wrap the MemoryChannel and FileChannel into a new channel say
SpoolingMemoryChannel. Events flow via memory channel; on reaching the buffer
capacity of memory channel, events are spooled into FileChannel. Since the
underlying channels are transactional, SpoolingMemoryChannel can also be easily
made transactional.
> Proposal to support disk based spooling
> ---------------------------------------
>
> Key: FLUME-1045
> URL: https://issues.apache.org/jira/browse/FLUME-1045
> Project: Flume
> Issue Type: New Feature
> Affects Versions: v1.0.0
> Reporter: Inder SIngh
> Priority: Minor
> Labels: patch
> Attachments: FLUME-1045-1.patch, FLUME-1045-2.patch
>
>
> 1. Problem Description
> A sink being unavailable at any stage in the pipeline causes it to back-off
> and retry after a while. Channel's associated with such sinks start buffering
> data with the caveat that if you are using a memory channel it can result in
> a domino effect on the entire pipeline. There could be legitimate down times
> eg: HDFS sink being down for name node maintenance, hadoop upgrades.
> 2. Why not use a durable channel (JDBC, FileChannel)?
> Want high throughput and support sink down times as a first class use-case.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira