Eiichi Sato created FLUME-3439:
----------------------------------
Summary: Introduce SynchronousChannel, a fast disk-less channel
that doesn't lose events
Key: FLUME-3439
URL: https://issues.apache.org/jira/browse/FLUME-3439
Project: Flume
Issue Type: Improvement
Components: Channel
Reporter: Eiichi Sato
Recently, I implemented
[SynchronousChannel|https://github.com/eiiches/flume-synchronous-channel], in
which every transaction that puts events waits for corresponding transactions
that take the events to complete.
* It's fast because it doesn't use disks.
* It doesn't lose events because it doesn't actually store events. It has no
capacity.
Motivation behind this channel is that, when using a Taildir Source to collect
logs and sending them to a remote Flume instance, we typically use File Channel
or Memory Channel. Memory Channel is fast, but could lose events. File Channel
is durable, but slow. Using a File Channel also means we are writing the same
contents twice on the disk: first for a log file that Taildir Source is
watching and secondly for the channel data. We don't need to buffer events in a
channel because events are already there in a log file and Taildir Source can
just read at its own pace.
Expected use cases are:
* Taildir Source --> Synchronous Channel --> Avro Sink
* Kinesis Source --> Synchronous Channel --> Avro Sink
* Cloud Pub/Sub Source --> Synchronous Channel --> Avro Sink
In all these cases, the channel doesn't need to buffer events because the
source already works like a buffer.
In [this
benchmark|https://github.com/eiiches/flume-synchronous-channel/tree/main/docs/benchmark]
that uses Taildir Source + Synchronous Channel, I observed 84% increase in
throughput and 75-81% reduction in CPU usage compared to File Channel when
event body is 512-byte.
----
The code is around 220 LOC (excluding tests) and doesn't pull additional
third-party dependencies.
I can work on a PR, but before doing so, I want a general feedback from the
community. I'm wondering if this channel is useful or generic enough to be
included in Flume or if this should be kept in a separate repository. What do
you think?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]