[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13608700#comment-13608700
]
Roshan Naik commented on FLUME-1227:
------------------------------------
Thanks Hari.
1) WRT the concern on not depending on another channel, i went down this path
since it looked like there was some consensus when i started. What alternative
design do you have in mind ?
2) WRT change in memory/file channel breaking the Spillable channel: Could you
expand a bit ? I am not familiar with replay order issue and how it can impact.
I dont think there is any intrinsic assumption being made wrt to any specific
channel's behavior. Just to be doubly sure, i made sure not to rely on a single
type of overflow channel in all the tests. The only material dependency (as far
as I can tell) that Spillable Channel has on the overflow is the interface
level guarantee that is expected from all channels: that order is maintained in
case of single source/sink.
Do you see any other assumptions/dependencies hiding there ?
3) WRT reserving capacity on both channels. If you mean that each txn should
not reserve capacity on both channels. I agree. And the current implementation
does not do that. Or were you by any chance referring to the issue of upfront
reservation (at put() time) versus commit() time ?
4) WRT to testing with fsyncs removed, i have not pursued it since i felt that
would be compromising the durability guarantees. Do you think its useful to do
that ?
5) WRT "we should make the configuration change". Can you elaborate ? I am not
certain which change specifically you are referring to. Or are you referring
to the whole config approach ?
6) WRT lifecycle management and dependencies : After configuration, any
channel that is found to be not connected with a source/sink is automatically
discarded from the list of Life cycle system managed components. Consequently
the Spillable Channel becomes the sole life cycle manager of the overflow
channel. Otherwise, yes there would be havoc.
> Introduce some sort of SpillableChannel
> ---------------------------------------
>
> Key: FLUME-1227
> URL: https://issues.apache.org/jira/browse/FLUME-1227
> Project: Flume
> Issue Type: New Feature
> Components: Channel
> Reporter: Jarek Jarcec Cecho
> Assignee: Roshan Naik
> Attachments: 1227.patch.1, SpillableMemory Channel Design.pdf
>
>
> I would like to introduce new channel that would behave similarly as scribe
> (https://github.com/facebook/scribe). It would be something between memory
> and file channel. Input events would be saved directly to the memory (only)
> and would be served from there. In case that the memory would be full, we
> would outsource the events to file.
> Let me describe the use case behind this request. We have plenty of frontend
> servers that are generating events. We want to send all events to just
> limited number of machines from where we would send the data to HDFS (some
> sort of staging layer). Reason for this second layer is our need to decouple
> event aggregation and front end code to separate machines. Using memory
> channel is fully sufficient as we can survive lost of some portion of the
> events. However in order to sustain maintenance windows or networking issues
> we would have to end up with a lot of memory assigned to those "staging"
> machines. Referenced "scribe" is dealing with this problem by implementing
> following logic - events are saved in memory similarly as our MemoryChannel.
> However in case that the memory gets full (because of maintenance, networking
> issues, ...) it will spill data to disk where they will be sitting until
> everything start working again.
> I would like to introduce channel that would implement similar logic. It's
> durability guarantees would be same as MemoryChannel - in case that someone
> would remove power cord, this channel would lose data. Based on the
> discussion in FLUME-1201, I would propose to have the implementation
> completely independent on any other channel internal code.
> Jarcec
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira