[ https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13610111#comment-13610111 ]
Juhani Connolly commented on FLUME-1227: ---------------------------------------- I would personally prefer seeing a dependence on existing channels than another implementation of something like the file channel and something like the memory channel. The code-base is already getting pretty big, and the interfaces are fixed. The spillable channel shouldn't even know or care about what type the main/sub channel are, just feed them data. While it might not be the most optimal solution performance-wise, I think the cost would be small and it would give us less code to maintain overall. Either approach certainly has its merits. > Introduce some sort of SpillableChannel > --------------------------------------- > > Key: FLUME-1227 > URL: https://issues.apache.org/jira/browse/FLUME-1227 > Project: Flume > Issue Type: New Feature > Components: Channel > Reporter: Jarek Jarcec Cecho > Assignee: Roshan Naik > Attachments: 1227.patch.1, SpillableMemory Channel Design.pdf > > > I would like to introduce new channel that would behave similarly as scribe > (https://github.com/facebook/scribe). It would be something between memory > and file channel. Input events would be saved directly to the memory (only) > and would be served from there. In case that the memory would be full, we > would outsource the events to file. > Let me describe the use case behind this request. We have plenty of frontend > servers that are generating events. We want to send all events to just > limited number of machines from where we would send the data to HDFS (some > sort of staging layer). Reason for this second layer is our need to decouple > event aggregation and front end code to separate machines. Using memory > channel is fully sufficient as we can survive lost of some portion of the > events. However in order to sustain maintenance windows or networking issues > we would have to end up with a lot of memory assigned to those "staging" > machines. Referenced "scribe" is dealing with this problem by implementing > following logic - events are saved in memory similarly as our MemoryChannel. > However in case that the memory gets full (because of maintenance, networking > issues, ...) it will spill data to disk where they will be sitting until > everything start working again. > I would like to introduce channel that would implement similar logic. It's > durability guarantees would be same as MemoryChannel - in case that someone > would remove power cord, this channel would lose data. Based on the > discussion in FLUME-1201, I would propose to have the implementation > completely independent on any other channel internal code. > Jarcec -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira