Eiichi Sato created FLUME-3439:
----------------------------------

             Summary: Introduce SynchronousChannel, a fast disk-less channel 
that doesn't lose events
                 Key: FLUME-3439
                 URL: https://issues.apache.org/jira/browse/FLUME-3439
             Project: Flume
          Issue Type: Improvement
          Components: Channel
            Reporter: Eiichi Sato


Recently, I implemented 
[SynchronousChannel|https://github.com/eiiches/flume-synchronous-channel], in 
which every transaction that puts events waits for corresponding transactions 
that take the events to complete.
 * It's fast because it doesn't use disks.
 * It doesn't lose events because it doesn't actually store events. It has no 
capacity.

Motivation behind this channel is that, when using a Taildir Source to collect 
logs and sending them to a remote Flume instance, we typically use File Channel 
or Memory Channel. Memory Channel is fast, but could lose events. File Channel 
is durable, but slow. Using a File Channel also means we are writing the same 
contents twice on the disk: first for a log file that Taildir Source is 
watching and secondly for the channel data. We don't need to buffer events in a 
channel because events are already there in a log file and Taildir Source can 
just read at its own pace.

Expected use cases are:
 * Taildir Source --> Synchronous Channel --> Avro Sink
 * Kinesis Source --> Synchronous Channel --> Avro Sink
 * Cloud Pub/Sub Source --> Synchronous Channel --> Avro Sink

In all these cases, the channel doesn't need to buffer events because the 
source already works like a buffer.

In [this 
benchmark|https://github.com/eiiches/flume-synchronous-channel/tree/main/docs/benchmark]
 that uses Taildir Source + Synchronous Channel, I observed 84% increase in 
throughput and 75-81% reduction in CPU usage compared to File Channel when 
event body is 512-byte.

 
----
 

The code is around 220 LOC (excluding tests) and doesn't pull additional 
third-party dependencies.

I can work on a PR, but before doing so, I want a general feedback from the 
community. I'm wondering if this channel is useful or generic enough to be 
included in Flume or if this should be kept in a separate repository. What do 
you think?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to