[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13853707#comment-13853707
]
Roshan Naik commented on FLUME-1227:
------------------------------------
thanks for the feedback [~brocknoland]
Will incorporate ur feedback and update the patch soon.
WRT to the adding notes on file channel best practices into Spillable Channel
section, i am not too hot on that unless it has specifically to do with its
coupling with Spillable channel. In (FLUME-2239) recently I made a note about
multiple data dirs helping file channel perf. Also the dual checkpoint feature
is broken on Windows(FLUME-2224). Let me know if you feel otherwise.
> Introduce some sort of SpillableChannel
> ---------------------------------------
>
> Key: FLUME-1227
> URL: https://issues.apache.org/jira/browse/FLUME-1227
> Project: Flume
> Issue Type: New Feature
> Components: Channel
> Reporter: Jarek Jarcec Cecho
> Assignee: Roshan Naik
> Attachments: 1227.patch.1, FLUME-1227.v2.patch, FLUME-1227.v5.patch,
> FLUME-1227.v6.patch, FLUME-1227.v7.patch, FLUME-1227.v8.patch,
> SpillableMemory Channel Design 2.pdf, SpillableMemory Channel Design.pdf
>
>
> I would like to introduce new channel that would behave similarly as scribe
> (https://github.com/facebook/scribe). It would be something between memory
> and file channel. Input events would be saved directly to the memory (only)
> and would be served from there. In case that the memory would be full, we
> would outsource the events to file.
> Let me describe the use case behind this request. We have plenty of frontend
> servers that are generating events. We want to send all events to just
> limited number of machines from where we would send the data to HDFS (some
> sort of staging layer). Reason for this second layer is our need to decouple
> event aggregation and front end code to separate machines. Using memory
> channel is fully sufficient as we can survive lost of some portion of the
> events. However in order to sustain maintenance windows or networking issues
> we would have to end up with a lot of memory assigned to those "staging"
> machines. Referenced "scribe" is dealing with this problem by implementing
> following logic - events are saved in memory similarly as our MemoryChannel.
> However in case that the memory gets full (because of maintenance, networking
> issues, ...) it will spill data to disk where they will be sitting until
> everything start working again.
> I would like to introduce channel that would implement similar logic. It's
> durability guarantees would be same as MemoryChannel - in case that someone
> would remove power cord, this channel would lose data. Based on the
> discussion in FLUME-1201, I would propose to have the implementation
> completely independent on any other channel internal code.
> Jarcec
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)