[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431496#comment-13431496
]
Jarek Jarcec Cecho commented on FLUME-1227:
-------------------------------------------
I see pretty active discussion in FLUME-1045, so let me jump in and share my
thoughts. Please feel free to comment or make objections (or suggest completely
something else):
* I wanted to add new public methods to Channel interface to get current number
of items and maximal number of items (if applicable)
* I wanted to use those methods to create new SpillableChannel that would wrap
both memory and file channel.
* I wanted this channel to be based only on public interface of underlying
channels. It definitely should not use any internal details (that's why I
wanted to add those new methods in first note).
* I was thinking about implementing the logic as Inder proposed - put all
events into memory. When memory gets full, move all events in one transaction
to disk (e.g. no flushing issues). On reads, serve firstly events from disk (if
there are any) and then from memory.
Couple of notes:
* I believe that this channel will not introduce any significant issues as long
as it will be based only on public interfaces of underlying channels.
* This channel could lose data, however it would loose only events currently
stored in memory (user might use this knowledge to set up the memory size
appropriately).
* Spilling data from memory to disk could be visible to a user. "Once upon a
time" in case that the memory would get full, one transaction would be frozen
until all events would be migrated from memory to disk. Please note that this
issue could be solved by another thread that would do this on background,
however for simplicity I wanted to avoid that in the first implementation.
Jarcec
> Introduce some sort of SpillableChannel
> ---------------------------------------
>
> Key: FLUME-1227
> URL: https://issues.apache.org/jira/browse/FLUME-1227
> Project: Flume
> Issue Type: New Feature
> Components: Channel
> Reporter: Jarek Jarcec Cecho
> Assignee: Patrick Wendell
>
> I would like to introduce new channel that would behave similarly as scribe
> (https://github.com/facebook/scribe). It would be something between memory
> and file channel. Input events would be saved directly to the memory (only)
> and would be served from there. In case that the memory would be full, we
> would outsource the events to file.
> Let me describe the use case behind this request. We have plenty of frontend
> servers that are generating events. We want to send all events to just
> limited number of machines from where we would send the data to HDFS (some
> sort of staging layer). Reason for this second layer is our need to decouple
> event aggregation and front end code to separate machines. Using memory
> channel is fully sufficient as we can survive lost of some portion of the
> events. However in order to sustain maintenance windows or networking issues
> we would have to end up with a lot of memory assigned to those "staging"
> machines. Referenced "scribe" is dealing with this problem by implementing
> following logic - events are saved in memory similarly as our MemoryChannel.
> However in case that the memory gets full (because of maintenance, networking
> issues, ...) it will spill data to disk where they will be sitting until
> everything start working again.
> I would like to introduce channel that would implement similar logic. It's
> durability guarantees would be same as MemoryChannel - in case that someone
> would remove power cord, this channel would lose data. Based on the
> discussion in FLUME-1201, I would propose to have the implementation
> completely independent on any other channel internal code.
> Jarcec
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira