[jira] [Commented] (FLUME-1045) Proposal to support disk based spooling

Patrick Wendell (JIRA) Tue, 07 Aug 2012 20:43:17 -0700

    [ 
https://issues.apache.org/jira/browse/FLUME-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13430846#comment-13430846
 ]


Patrick Wendell commented on FLUME-1045:
----------------------------------------

Hey I'm just getting caught up on this discussion. One issue (or 
misunderstanding) that I have with Sharad's proposal, and any of the proposals 
that seem to suggest a composed "MemoryChannel + File Channel" is what we want 
here, is that the existing FileChannel has certain transaction guarantees that 
you would not want in this case.

If you are running a memory channel and you want to spill over to disk, you are 
already accepting "best effort" delivery semantics for the normal case where 
all of the data is fitting in memory.

If our spillover implementation directly uses, or functionally mirrors, the 
existing FileChannel, we'll be offering much stronger semantics once the data 
has spilled over to disk, at a high throughput cost.

For instance, the FileChannel flushes to disk on every transaction to avoid 
data loss. If we were to build a disk-spilling extension to the existing 
MemoryChannel, we'd likely want to batch these disk flushes to make the 
aggregate disk throughput better. We just wouldn't want the strong semantics 
offered by the FileChannel.

That is why I think that just extending the Memory Channel to have some type of 
best effort disk spilling would be best, since it differs in fundamental ways 
from what is accomplished with the FileChannel.
                
> Proposal to support disk based spooling
> ---------------------------------------
>
>                 Key: FLUME-1045
>                 URL: https://issues.apache.org/jira/browse/FLUME-1045
>             Project: Flume
>          Issue Type: New Feature
>    Affects Versions: v1.0.0
>            Reporter: Inder SIngh
>            Priority: Minor
>              Labels: patch
>         Attachments: FLUME-1045-1.patch, FLUME-1045-2.patch
>
>
> 1. Problem Description 
> A sink being unavailable at any stage in the pipeline causes it to back-off 
> and retry after a while. Channel's associated with such sinks start buffering 
> data with the caveat that if you are using a memory channel it can result in 
> a domino effect on the entire pipeline. There could be legitimate down times 
> eg: HDFS sink being down for name node maintenance, hadoop upgrades. 
> 2. Why not use a durable channel (JDBC, FileChannel)?
> Want high throughput and support sink down times as a first class use-case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (FLUME-1045) Proposal to support disk based spooling

Reply via email to