[
https://issues.apache.org/jira/browse/FLUME-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17613083#comment-17613083
]
Ralph Goers commented on FLUME-3439:
------------------------------------
OK. I do see that the put commit blocks until the take commit is done.
While I see now that it does function as you describe I can envision many use
cases in which its use would not be suitable as this essentially just makes
Flume a synchronous proxy between the originator of the data and the
destination.
> Introduce SynchronousChannel, a fast disk-less channel that doesn't lose
> events
> -------------------------------------------------------------------------------
>
> Key: FLUME-3439
> URL: https://issues.apache.org/jira/browse/FLUME-3439
> Project: Flume
> Issue Type: Improvement
> Components: Channel
> Reporter: Eiichi Sato
> Priority: Major
>
> Recently, I implemented
> [SynchronousChannel|https://github.com/eiiches/flume-synchronous-channel], in
> which every transaction that puts events waits for corresponding transactions
> that take the events to complete.
> * It's fast because it doesn't use disks.
> * It doesn't lose events because it doesn't actually store events. It has no
> capacity.
> Motivation behind this channel is that, when using a Taildir Source to
> collect logs and sending them to a remote Flume instance, we typically use
> File Channel or Memory Channel. Memory Channel is fast, but could lose
> events. File Channel is durable, but slow. Using a File Channel also means we
> are writing the same contents twice on the disk: first for a log file that
> Taildir Source is watching and secondly for the channel data. We don't need
> to buffer events in a channel because events are already there in a log file
> and Taildir Source can just read at its own pace.
> Expected use cases are:
> * Taildir Source --> Synchronous Channel --> Avro Sink
> * Kinesis Source --> Synchronous Channel --> Avro Sink
> * Cloud Pub/Sub Source --> Synchronous Channel --> Avro Sink
> In all these cases, the channel doesn't need to buffer events because the
> source already works like a buffer.
> In [this
> benchmark|https://github.com/eiiches/flume-synchronous-channel/tree/main/docs/benchmark]
> that uses Taildir Source + Synchronous Channel, I observed 84% increase in
> throughput and 75-81% reduction in CPU usage compared to File Channel when
> event body is 512-byte.
>
> ----
>
> The code is around 220 LOC (excluding tests) and doesn't pull additional
> third-party dependencies.
> I can work on a PR, but before doing so, I want a general feedback from the
> community. I'm wondering if this channel is useful or generic enough to be
> included in Flume or if this should be kept in a separate repository. What do
> you think?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]