[ 
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13606137#comment-13606137
 ] 

Hari Shreedharan commented on FLUME-1227:
-----------------------------------------

Roshan, 

Sorry it took me this long to get to this one. I reviewed the design document 
and I have a couple of relatively major concerns:

#. This channel implicitly depends on the behavior of current channels - the 
File Channel and Memory Channel. As one of the people who maintain the file 
channel, I strongly feel this is not the correct thing to do. It is possible 
that behavior of the File Channel or the Memory Channel could change (This is 
not without precedent. In FLUME-1437, we did change the replay order). At that 
point, a change in the behavior of the File Channel or Memory Channel would 
break unit/integration tests for this channel - which could delay a commit. 

#. I don't think we should make the configuration change. The idea of the 
Lifecycle manager is to handle all the components and make them independent of 
each other. Dependencies on other components managed by the Lifecycle system is 
a bad idea. This also sets a bad precedent. This can lead to patches that make 
component inter-dependent and depend on the other component being a particular 
one (example a source using this hook to figure out if it is operating on 
Memory Channel or File Channel). 

I believe the current design is a bit more complex than it needs to be - due to 
the handling of more than one transaction. Also reserving transaction capacity 
on both channels is a bad indicator of where the txn should go. In my 
experience, people do set the transaction capacity to a value much higher than 
the average transaction. 

Also, have you tested this against a slightly modified File Channel with all of 
the fsyncs removed (or commented out)? I'd be interested in seeing the 
difference in performance at that point. Also, see FLUME-1423 where Denny 
removed the fsyncs for performance (the performance of the channel has improved 
even more since then though).
                
> Introduce some sort of SpillableChannel
> ---------------------------------------
>
>                 Key: FLUME-1227
>                 URL: https://issues.apache.org/jira/browse/FLUME-1227
>             Project: Flume
>          Issue Type: New Feature
>          Components: Channel
>            Reporter: Jarek Jarcec Cecho
>            Assignee: Roshan Naik
>         Attachments: 1227.patch.1, SpillableMemory Channel Design.pdf
>
>
> I would like to introduce new channel that would behave similarly as scribe 
> (https://github.com/facebook/scribe). It would be something between memory 
> and file channel. Input events would be saved directly to the memory (only) 
> and would be served from there. In case that the memory would be full, we 
> would outsource the events to file.
> Let me describe the use case behind this request. We have plenty of frontend 
> servers that are generating events. We want to send all events to just 
> limited number of machines from where we would send the data to HDFS (some 
> sort of staging layer). Reason for this second layer is our need to decouple 
> event aggregation and front end code to separate machines. Using memory 
> channel is fully sufficient as we can survive lost of some portion of the 
> events. However in order to sustain maintenance windows or networking issues 
> we would have to end up with a lot of memory assigned to those "staging" 
> machines. Referenced "scribe" is dealing with this problem by implementing 
> following logic - events are saved in memory similarly as our MemoryChannel. 
> However in case that the memory gets full (because of maintenance, networking 
> issues, ...) it will spill data to disk where they will be sitting until 
> everything start working again.
> I would like to introduce channel that would implement similar logic. It's 
> durability guarantees would be same as MemoryChannel - in case that someone 
> would remove power cord, this channel would lose data. Based on the 
> discussion in FLUME-1201, I would propose to have the implementation 
> completely independent on any other channel internal code.
> Jarcec

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to