[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

Mike Percy (JIRA) Mon, 25 Mar 2013 19:37:17 -0700

    [ 
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13613418#comment-13613418
 ]


Mike Percy commented on FLUME-1227:
-----------------------------------

Roshan, thanks a lot for this design documentation.

Guys, based on my prior [reviewboard 
comment|https://reviews.apache.org/r/9544/] one big problem I have with this 
implementation is the way that the channels are allowed to know about each 
other. I am completely against this because it violates separation of 
responsibilities and encourages unmaintainable spaghetti dependencies between 
components. What's next, sinks? That is why we have SinkProcessors (so sinks 
don't have to know about each other). We simply cannot afford to open that 
Pandora's box. Let the SpillableChannel instantiate its own dependencies and 
govern their lifecycle.

If explicitly depending on the file channel is a problem, then let's talk about 
ways to mitigate that... either forking a copy of the FC code into SC so that 
FC can evolve separately, or explicitly not relying on ordering in SC, if that 
is the issue. Therefore SC would not have ordering guarantees. Can the Drain 
Order Queue survive that situation? It makes me a little nervous that DOQ even 
exists to be honest... I don't really like it. It seems like a somewhat complex 
and brittle mechanism for achieving this spill functionality. But I would not 
block this patch because I'm not in love with the DOQ. And I think if the SC 
doesn't have to guarantee order then as long as its counts are correct then it 
should still work. Correct me if I'm wrong.

If specific non-explicit guarantees of the FC are being relied on then an 
alternative is to consider a different design that relies on different 
invariants than the DOQ does. I'm not necessarily advocating for that, I'm just 
throwing it out there as an option. But I'd be happy with forking the FC and 
getting this checked in without a total redesign to make progress if that 
addresses others' concerns.

My other as-yet unresolved item of code review feedback involved what happens 
when the agent is stopped then restarted while the channel has events in both 
the primary and secondary channels. Can this please be addressed as well?

Additionally, I agree with Hari on the use of transactionCapacity as a poor 
substitute for a reservation amount on the underlying channels. We need a 
better way, and if exposing channel size and capacity via an interface will 
help then I'm all for it.

Regards,
Mike

                
> Introduce some sort of SpillableChannel
> ---------------------------------------
>
>                 Key: FLUME-1227
>                 URL: https://issues.apache.org/jira/browse/FLUME-1227
>             Project: Flume
>          Issue Type: New Feature
>          Components: Channel
>            Reporter: Jarek Jarcec Cecho
>            Assignee: Roshan Naik
>         Attachments: 1227.patch.1, SpillableMemory Channel Design.pdf
>
>
> I would like to introduce new channel that would behave similarly as scribe 
> (https://github.com/facebook/scribe). It would be something between memory 
> and file channel. Input events would be saved directly to the memory (only) 
> and would be served from there. In case that the memory would be full, we 
> would outsource the events to file.
> Let me describe the use case behind this request. We have plenty of frontend 
> servers that are generating events. We want to send all events to just 
> limited number of machines from where we would send the data to HDFS (some 
> sort of staging layer). Reason for this second layer is our need to decouple 
> event aggregation and front end code to separate machines. Using memory 
> channel is fully sufficient as we can survive lost of some portion of the 
> events. However in order to sustain maintenance windows or networking issues 
> we would have to end up with a lot of memory assigned to those "staging" 
> machines. Referenced "scribe" is dealing with this problem by implementing 
> following logic - events are saved in memory similarly as our MemoryChannel. 
> However in case that the memory gets full (because of maintenance, networking 
> issues, ...) it will spill data to disk where they will be sitting until 
> everything start working again.
> I would like to introduce channel that would implement similar logic. It's 
> durability guarantees would be same as MemoryChannel - in case that someone 
> would remove power cord, this channel would lose data. Based on the 
> discussion in FLUME-1201, I would propose to have the implementation 
> completely independent on any other channel internal code.
> Jarcec

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

Reply via email to