Re: Batching in appenders

Carter Kozak Thu, 24 Sep 2020 13:06:11 -0700

We don't provide a great reusable abstraction to batch log events and hand them 
off to a sink (appender). If we're writing to disk (or similar) this isn't a 
problem because we buffer at an entirely different layer -- using a 
RandomAccessFile appender, after passing through the layout, my log events are 
batched in an output buffer until either the buffer is filled or an event with 
endOfBatch=true is received (note: this requires immediateFlush=false). When an 
appender is capable of pushing data to a more efficient format, it's overall 
less work and less risk for the system to take advantage of that.
For example, when I log an event, the retained heap for that LogEvent is 
relatively large and includes the message string as well as all parameter 
references which may not be held elsewhere. By passing the LogEvent through a 
layout we are left with fewer total bytes to track: all parameter references 
are gone, old the formatted values if requested, contextData and contextStack 
may not have been used at all.


> Regarding your remark about "being sympathetic to the ring buffer design",
> I did not fully understand this. Would you mind elaborating on this a
> little bit more and maybe even sharing a comparison between this and the
> aforementioned explicit batching at the interface level, please?

Of course :-) We use the lmax disruptor for high performance, low latency, 
asynchronous logging. It takes advantage of several interesting performance 
optimizations (allocation avoidance, but also cache line optimization to avoid 
contention between producers and consumers in most cases) that allow 
substantially greater throughput than a BlockingQueue model. This difference is 
best illustrated by our benchmarks between fully async loggers 
(AsyncLoggerContextSelector) compared to AsyncAppender (ArrayBlockingQueue with 
a consumer thread).
In order to avoid creating intermediate objects (collections of events), 
disruptor uses an end-of-batch flag to suggest to inform consumers that there 
are no more events in the queue: 
https://lmax-exchange.github.io/disruptor/docs/com/lmax/disruptor/EventHandler.html
Our LogEvent endOfBatch marker is based on that, it does not allow us to 
enforce an upper bound, but it does allow us to take moments when load has 
receded to opportunistically flush appenders.

Written quickly, hopefully there aren't terribly many typos! I'm still settling 
after moving and things are still a little hectic, sorry for the recent lack of 
engagement.

TLDR: I think the idea that I've danced around is that batching based on the 
number of events may not give us enough data. One large event may be too large 
for a batch, and a thousand tiny events may be too small, sometimes the 
appender knows best. Sometimes it doesn't ;-)

-ck

On Thu, Sep 24, 2020, at 14:48, Volkan Yazıcı wrote:
> I think endOfBatch flag is still falling short of addressing the main
> issue: the appender is oblivious to the batching. Put another way, it
> cannot have an assumption whether endOfBatch is employed or not. Consider
> appender A honoring endOfBatch flag, though the preceding filter F never
> takes advantage of it, hence A will keep on piling events until an OOM
> exception. This is my main motivation for introducing batching explicitly
> at the appender interface. For instance,
> 
>     interface Appender {
>         default void append(LogEvent logEvent) {
> append(Batch.of(logEvent)); }    // for backward-compatibility
>         void append(Batch<LogEvent> batch);
>     }
> 
>     interface Batch<E> {
>         void forEach(Consumer<E> item);
>     }
> 
> This way we can introduce reusable (and hence, mutable) Batch<E>
> implementations, while the appender is free to perform its own batching
> magic tuned for the underlying sink, e.g., JDBC connection, network socket,
> etc.
> 
> Regarding your remark about "being sympathetic to the ring buffer design",
> I did not fully understand this. Would you mind elaborating on this a
> little bit more and maybe even sharing a comparison between this and the
> aforementioned explicit batching at the interface level, please?
> 
> In conclusion, IMHO, making batching/aggregation explicit at the interface
> level will help us to avoid quite some code repetition and make the life
> easier for future appenders.
> 
> On Thu, Sep 24, 2020 at 5:56 PM Carter Kozak <[email protected]> wrote:
> 
> > The AsyncAppender more or less implements what you've described, except it
> > forwards each event in the batch individually to a delegate appender and
> > sets the end-of-batch marker on the last one so the API isn't quite as
> > pretty. We could implement something similar that groups events up to some
> > interval or maximum capacity before forwarding them, but we'd have to be
> > careful to avoid creating garbage collection problems. Potentially holding
> > otherwise short lived parameters in the background can have unintended
> > impact on garbage collector performance :-)
> >
> > The LogEvent.isEndOfBatch method is designed to be sympathetic to the ring
> > buffer design, where we never create an intermediate collection, but
> > instead signal when an event is the last in a group that we're aware of.
> >
> > Batching comes up fairly frequently for network and database appenders, I
> > think we've duplicated that code a few times. I'd be supportive of a
> > canonical implementation (or reusable utility functionality) if we can
> > extract it out.
> >
> > -ck
> >
> > On Thu, Sep 24, 2020, at 11:46, Ralph Goers wrote:
> > > Well, Log4j only feeds events to Appenders one at a time. It has no
> > place to aggregate them. However, one Appender could wrap others to create
> > a batch at which point this would become useful.  I suppose some other
> > component could be invented to create a batch but I am not sure where or
> > how that would fit into the process -   does the batch apply only to a
> > single appender or to multiple appenders?  What filtering takes place
> > before it is added to the batch and what, if any, happens after?
> > >
> > > Ralph
> > >
> > > > On Sep 24, 2020, at 8:00 AM, Volkan Yazıcı <[email protected]>
> > wrote:
> > > >
> > > > If one would look close to existing appenders, almost everyone of them
> > > > implements a custom batching solution. To the best of my knowledge,
> > this is
> > > > mainly due to the constraint in the Appender interface:
> > > >
> > > >    void append(LogEvent event);
> > > >
> > > > If this would have been replaced (enhanced?) with
> > > >
> > > >    void append(List<LogEvent> event);
> > > >
> > > > appenders wouldn't need to implement any batching at all. Further,
> > Log4j
> > > > could have also provided a batching filter replacing all the custom
> > > > batching code internal to the appenders.
> > > >
> > > > What do you think? Am I missing something obvious here?
> > >
> > >
> > >
> >
>

Re: Batching in appenders

Reply via email to