[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

Roshan Naik (JIRA) Sat, 23 Mar 2013 16:31:17 -0700

    [ 
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13611899#comment-13611899
 ]


Roshan Naik commented on FLUME-1227:
------------------------------------

 
- I concur that unspecified guarantees should not be depended upon. I can drop 
that assumption from the tests.

- I think its very important to not continue to leave the guarantees 
unspecified. But that's for another Jira.

- WRT to deferring the decision to commit() time. Let me revisit that issue. 
 

*Instantiationa & config*:
For discussion, I would like to treat instantiation (new up the object) 
separate from life cycle (start/stop). Since existing instance may get reused 
during reconfigure. 

Overflow does not need to be instantiated or configured before SC! Just like 
sources, sinks and channels can be instantiated and configured independently in 
any order. Only start/stop needs to co-ordinated between the two. Also we need 
to ensure that SC is not able to get a reference to overflow if overflow had 
configuration errors.

 All components (sinks/sources/channels) get introduced to each other after 
they are correctly configured. There is already a step to introduce configured 
sinks and sources to their channels. I have extended that step to introduce 
channels to each other. The current implementation is a bit permissive and 
could be tightened up so that SC is limited to obtaining a handle only its 
overflow (not other channels).

*Life cycle*:
Hari, Correct me if you think its not the case, but i think the current design 
is in tune with your desire that the SC owns the lifecycle (start/stop) of the 
overflow. Config subsystem merely instantiates, configures and introduces the 
two channels to each other. Thereafter it disowns the lifecycle of overflow and 
lets the SC manage overflow's lifecycle. It retains ownership of SC's lifecycle 
however. This is nice because we dont have to replicate solutions to some of 
the config related aspects in SC. We don not have to worry about the order in 
which channels are instantiated and configured, and at the same time gain 
control over the order in which the start/stop is called on the SC and its 
overflow.


*Scribe*:
 Juhani, I think spilling policy can we definitely tweaked. Right now I spill 
into overflow only when primary is full. I like the idea that we can take a cue 
from the fact that takes() have begun to fail and start spilling early to 
minimize data loss. There is a throughput concern that I have with Scribe's 
operating mode where it switches exclusively to using either memory or disk. In 
SC's design we do not need to wait for the overflow to completely drain before 
resuming the use of the faster primary. I'll look more into scribe and see what 
we can leverage.


- The fsync experiment is something i would like to defer and resolve other 
open items. It does not look like a blocker and more of a perf tuning thing. 
does that sound reasonable ?

                
> Introduce some sort of SpillableChannel
> ---------------------------------------
>
>                 Key: FLUME-1227
>                 URL: https://issues.apache.org/jira/browse/FLUME-1227
>             Project: Flume
>          Issue Type: New Feature
>          Components: Channel
>            Reporter: Jarek Jarcec Cecho
>            Assignee: Roshan Naik
>         Attachments: 1227.patch.1, SpillableMemory Channel Design.pdf
>
>
> I would like to introduce new channel that would behave similarly as scribe 
> (https://github.com/facebook/scribe). It would be something between memory 
> and file channel. Input events would be saved directly to the memory (only) 
> and would be served from there. In case that the memory would be full, we 
> would outsource the events to file.
> Let me describe the use case behind this request. We have plenty of frontend 
> servers that are generating events. We want to send all events to just 
> limited number of machines from where we would send the data to HDFS (some 
> sort of staging layer). Reason for this second layer is our need to decouple 
> event aggregation and front end code to separate machines. Using memory 
> channel is fully sufficient as we can survive lost of some portion of the 
> events. However in order to sustain maintenance windows or networking issues 
> we would have to end up with a lot of memory assigned to those "staging" 
> machines. Referenced "scribe" is dealing with this problem by implementing 
> following logic - events are saved in memory similarly as our MemoryChannel. 
> However in case that the memory gets full (because of maintenance, networking 
> issues, ...) it will spill data to disk where they will be sitting until 
> everything start working again.
> I would like to introduce channel that would implement similar logic. It's 
> durability guarantees would be same as MemoryChannel - in case that someone 
> would remove power cord, this channel would lose data. Based on the 
> discussion in FLUME-1201, I would propose to have the implementation 
> completely independent on any other channel internal code.
> Jarcec

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

Reply via email to