[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504291#comment-13504291
]
Roshan Naik commented on FLUME-1227:
------------------------------------
Continuing the discussion...
I spent some time studying the discussions in the jiras related to solving the
problem of spilling over (and/or failover). I think failover and spillover
should not be conflated to be the same problem ... even though it may be
possible to address them both in the same solution.
There is a consensus that the problem worth addressing. There are concerns
hovering around these dimensions.
1) complexity of implementation and configuration. also potentially
[enhancements|https://issues.apache.org/jira/browse/FLUME-1045?focusedCommentId=13430529&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13430529]
to existing interfaces
2) complexity of testing
3) Ensuring transaction guarantees are preserved and its weakness/strength level
4) Defining the durability level (durable or not) of the final solution .. this
is simple IMHO
5) Efficiency of the solution (batching requests during when spilling over)
6) Flexibility
So far the solutions discussed along with their concerns ..
1) FailOver Sink processor - has issues with retaining transaction
guarantees
([Reference|https://issues.apache.org/jira/browse/FLUME-1045?focusedCommentId=13235705&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13235705])
2) Mechanisms for Composing Existing Channels
([1201|https://issues.apache.org/jira/browse/FLUME-1201] and [my
proposal|https://issues.apache.org/jira/browse/FLUME-1227?focusedCommentId=13492828&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13492828])
- Flexible but has complexities in regards to testing ([mixed opinions
here|https://issues.apache.org/jira/browse/FLUME-1201?focusedCommentId=13282018&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13282018]),
implementation & determining durability
[See|https://issues.apache.org/jira/browse/FLUME-1045?focusedCommentId=13235705&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13235705]
3) Spillable Channel - Limited functionality but easier to test and determine
transaction+durability semantics.
My thoughts...
The concerns related to mechanisms for composing channels is largely centered
around complexities. I feel some of them are not true.
Testing a composition mechanism is not as complex as it has been feared for
reasons stated
[here|https://issues.apache.org/jira/browse/FLUME-1201?focusedCommentId=13282018&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13282018].
In a pluggable system (like rest of flume) we rely on guarantees from the
interface itself. There is no need to test all combination of all possible
channels for testing. Just like it does not make sense to test all combinations
of sink/channel/source/interceptors/sink-processors in Flume.
Implementation of a composite mechanisms would also be simpler. It would be
focussed only around issues involved in stitching channels. Not in actually
providing a robust backing store.
Spillover channel (Mem + File) seems a little too specialized .. for instance
it does not provide durability for users if needed. It is nice to allow the
primary channel to be on a fast smaller durable store (like SSDs) and overflow
into a another slower durable store (like hard disk /jdbc)
the following general strategy for compounding channels seems worth discussing
..
agent1.channels.compoundChannel.type = compound
agent1.channels.compoundChannel.1 = memChannel1
agent1.channels.compoundChannel.2 = fileChannel1
agent1.channels.compoundChannel.3 = jdbcChannel1
agent1.channels.compoundChannel.1.overflowBatchSize = 100 # batch size when
spilling into fileChannel1
agent1.channels.compoundChannel.2.overflowBatchSize = 1000 # batch size when
spilling into jdbcChannel1
> Introduce some sort of SpillableChannel
> ---------------------------------------
>
> Key: FLUME-1227
> URL: https://issues.apache.org/jira/browse/FLUME-1227
> Project: Flume
> Issue Type: New Feature
> Components: Channel
> Reporter: Jarek Jarcec Cecho
>
> I would like to introduce new channel that would behave similarly as scribe
> (https://github.com/facebook/scribe). It would be something between memory
> and file channel. Input events would be saved directly to the memory (only)
> and would be served from there. In case that the memory would be full, we
> would outsource the events to file.
> Let me describe the use case behind this request. We have plenty of frontend
> servers that are generating events. We want to send all events to just
> limited number of machines from where we would send the data to HDFS (some
> sort of staging layer). Reason for this second layer is our need to decouple
> event aggregation and front end code to separate machines. Using memory
> channel is fully sufficient as we can survive lost of some portion of the
> events. However in order to sustain maintenance windows or networking issues
> we would have to end up with a lot of memory assigned to those "staging"
> machines. Referenced "scribe" is dealing with this problem by implementing
> following logic - events are saved in memory similarly as our MemoryChannel.
> However in case that the memory gets full (because of maintenance, networking
> issues, ...) it will spill data to disk where they will be sitting until
> everything start working again.
> I would like to introduce channel that would implement similar logic. It's
> durability guarantees would be same as MemoryChannel - in case that someone
> would remove power cord, this channel would lose data. Based on the
> discussion in FLUME-1201, I would propose to have the implementation
> completely independent on any other channel internal code.
> Jarcec
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira