Hi,
Due to design decisions made very early on in Flume NG - specifically the
fact that Sink only has a simple process() method - I don't see a good way
to get multiple sinks pulling from the same channel in a way that is
backwards-compatible with the current implementation.

Probably the "right" way to support this would be to have an interface
where the SinkRunner (or something outside of each Sink) is in control of
the transaction, and then it can easily send events to each sink serially
or in parallel within a single transaction. I think that is basically what
you are describing. If you look at SourceRunner and SourceProcessor you
will see similar ideas to what you are describing but they are only
implemented at the Source->Channel level. The current SinkProcessor is not
an analog of SourceProcessor, but if it was then I think that's where this
functionality might fit. However what happens when you do that is you have
to handle a ton of failure cases and threading models in a very general
way, which might be tough to get right for all use cases. I'm not 100%
sure, but I think that's why this was not pursued at the time.

To me, this seems like a potential design change (it would have to be very
carefully thought out) to consider for a future major Flume code line
(maybe a Flume 2.x).

By the way, if one is trying to get maximum throughput, then duplicating
events onto multiple channels, and having different threads running the
sinks (the current design) will be faster and more resilient in general
than a single thread and a single channel writing to multiple
sinks/destinations. The multiple-channel design pattern will allow periodic
downtimes or delays on a single sink to not affect the others, assuming the
channel sizes are large enough for buffering during downtime and assuming
that each sink is fast enough to recover from temporary delays. Without a
dedicated buffer per destination, one is at the mercy of the slowest sink
at every stage in the transaction.

One last thing worth noting is that the current channels are all well
ordered. This means that Flume currently provides a weak ordering guarantee
(across a single hop). That is a helpful property in the context of testing
and validation, as well as is what many people expect if they are storing
logs on a single hop. I hope we don't backpedal on that weak ordering
guarantee without a really good reason.

Regards,
Mike

On Fri, Aug 10, 2012 at 9:30 PM, Wang, Yongkun | Yongkun | BDD <
[email protected]> wrote:

> Hi Jhhani,
>
> Yes, we can use two (or several) channels to fan out data to different
> sinks. Then we will have two channels with same data, which may not be an
> optimized solution. So I want to use just ONE channel, creating a
> processor to pull the data once from the channel, then distributing to
> different sinks.
>
> Regards,
> Yongkun Wang
>
> On 12/08/10 18:07, "Juhani Connolly" <[email protected]>
> wrote:
>
> >Hi Yongkun,
> >
> >I'm curious why you need to pull the data twice from the sink? Do you
> >need all sinks to have read the same amount of data? Normally for the
> >case of splitting data into batch and analytics, we will send data from
> >the source to two separate channels and have the sinks read from
> >separate channels.
> >
> >On 08/10/2012 02:48 PM, Wang, Yongkun | Yongkun | BDD wrote:
> >> Hi Denny,
> >>
> >> I am working on the patch now, it's not difficult. I have listed the
> >> changes in that JIRA.
> >> I think you misunderstand my design, I didn't maintain the order of the
> >> events. Instead I make sure that each sink will get the same events (or
> >> different events specified by selector).
> >>
> >> Suppose Channel (mc) contains the following events: 4,3,2,1
> >>
> >> If simply enable it by configuration, it may work like this:
> >> Sink "hsa" may get 1,3;
> >> Sink "hsb" may get 2,4;
> >> So different sink will get different data. Is this what user wants?
> >>
> >>
> >> In my design, "hsa" and "hsb" will both get "4,3,2,1". This is a typical
> >> case when user want to fan-out the data into two places (eg. One for
> >>batch
> >> and and another for real-time analysis).
> >>
> >> Regards,
> >> Yongkun Wang
> >>
> >>
> >> On 12/08/10 14:29, "Denny Ye" <[email protected]> wrote:
> >>
> >>> hi Yongkun,
> >>>
> >>>    JIRA can be accessed now.
> >>>
> >>>    I think it might be difficult to understand the order of events from
> >>> your thought. If we don't care about the order, can discuss the value
> >>>and
> >>> feasibility.  In my opinion, data ingest flow is order unawareness, at
> >>> least, not such important for us. You can try to verify your proposal
> >>>and
> >>> give us result. It may be some difficulties in keeping transaction with
> >>> several Sinks.
> >>>
> >>> -Regards
> >>> Denny Ye
> >>>
> >>>
> >>> 2012/8/10 Wang, Yongkun | Yongkun | BDD <[email protected]
> >
> >>>
> >>>> JIRA is down again? I cannot connect to it and comment there.
> >>>>
> >>>> I have a proposal in "Transactional Multiplex (fan out) Sink"):
> >>>> https://issues.apache.org/jira/browse/FLUME-1435
> >>>> Which contains the design of one channel to multiple sinks.
> >>>>
> >>>> You can search the email since JIRA cannot be accessed.
> >>>>
> >>>> I think this is more than a configuration issue. If simply enable
> >>>> several
> >>>> sinks on the same channel, they will take it either in a round-robin
> >>>> mode
> >>>> or in a unpredictable mode if the speed of sinks are different.
> >>>>
> >>>> So it's better to have a even higher level transaction control instead
> >>>> of
> >>>> the transaction in the process() of each sink, as I describe in
> >>>> FLUME-1435.
> >>>>
> >>>> Regards,
> >>>> Yongkun Wang
> >>>>
> >>>>
> >>>> On 12/08/10 12:30, "Denny Ye (JIRA)" <[email protected]> wrote:
> >>>>
> >>>>> Denny Ye created FLUME-1479:
> >>>>> -------------------------------
> >>>>>
> >>>>>              Summary: Multiple Sinks can connect to single Channel
> >>>>>                  Key: FLUME-1479
> >>>>>                  URL:
> >>>>>https://issues.apache.org/jira/browse/FLUME-1479
> >>>>>              Project: Flume
> >>>>>           Issue Type: Bug
> >>>>>           Components: Configuration
> >>>>>     Affects Versions: v1.2.0
> >>>>>             Reporter: Denny Ye
> >>>>>             Assignee: Denny Ye
> >>>>>              Fix For: v1.3.0
> >>>>>
> >>>>>
> >>>>> If we has one Channel (mc) and two Sinks (hsa, hsb), then they may be
> >>>>> connected with each other with configuration example
> >>>>> {quote}
> >>>>> agent.sinks.hsa.channel = mc
> >>>>> agent.sinks.hsb.channel = mc
> >>>>> {quote}
> >>>>> It means that there have multiple Sinks can connect to single
> >>>>>Channel.
> >>>>> Normally, one Sink only can connect to unified Channel
> >>>>>
> >>>>> --
> >>>>> This message is automatically generated by JIRA.
> >>>>> If you think it was sent incorrectly, please contact your JIRA
> >>>>> administrators:
> >>>>>
> >>>>>
> https://issues.apache.org/jira/secure/ContactAdministrators!default.js
> >>>>>pa
> >>>>> For more information on JIRA, see:
> >>>> http://www.atlassian.com/software/jira
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>
> >>
> >
> >
>
>
>

Reply via email to