[jira] [Commented] (FLUME-1045) Proposal to support disk based spooling

Arvind Prabhakar (Commented) (JIRA) Thu, 22 Mar 2012 09:48:44 -0700

    [ 
https://issues.apache.org/jira/browse/FLUME-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235705#comment-13235705
 ]


Arvind Prabhakar commented on FLUME-1045:
-----------------------------------------

Hi Inder - thanks for taking the initiative to provide this functionality. 
However, I am not sure if this fits well with the design due to following 
reasons:

* When a sink spools events they get removed from the channel, thereby losing 
transactional exchange guarantee for the next hop. This implies that the flow 
is no longer reliable.
* The domino effect of channel's reaching capacity is not really a problem. In 
fact, this is the main value add of buffered flows that allows every agent to 
queue up events waiting for the destination to get unblocked.
* Even when implemented, the problem will still remain as the spool may fill to 
capacity causing the said domino effect.

As you have noted, the motivation for this change is to have a high throughput 
flow that supports sink downtimes. The expected way to address this is to go 
with a high performance channel that is capable of delivering same throughput 
levels as the Memory channel without being limited to the available system 
memory.
                
> Proposal to support disk based spooling
> ---------------------------------------
>
>                 Key: FLUME-1045
>                 URL: https://issues.apache.org/jira/browse/FLUME-1045
>             Project: Flume
>          Issue Type: New Feature
>    Affects Versions: v1.0.0
>            Reporter: Inder SIngh
>            Priority: Minor
>              Labels: patch
>         Attachments: FLUME-1045-1.patch, FLUME-1045-2.patch
>
>
> 1. Problem Description 
> A sink being unavailable at any stage in the pipeline causes it to back-off 
> and retry after a while. Channel's associated with such sinks start buffering 
> data with the caveat that if you are using a memory channel it can result in 
> a domino effect on the entire pipeline. There could be legitimate down times 
> eg: HDFS sink being down for name node maintenance, hadoop upgrades. 
> 2. Why not use a durable channel (JDBC, FileChannel)?
> Want high throughput and support sink down times as a first class use-case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (FLUME-1045) Proposal to support disk based spooling

Reply via email to