Re: Cek's new log4j vs logback benchmark

Carter Kozak Mon, 20 Sep 2021 14:09:35 -0700

https://github.com/apache/logging-log4j2/pull/573 improves performance by 
double digits in my testing, unfortunately it's difficult and time-consuming to 
individually compare performance implications of each part of the change. I'd 
appreciate review/feedback on the change. When I apply this PR and benchmark 
with "-Dlog4j2.enable.direct.encoders=false" (results in 
StringBuilder.toString().getBytes(charset) rather than using CharsetEncoders) 
the performance in blocking mode matches logback, and async performance is much 
better.


StringBuilder.toString.getBytes vs CharsetEncoder:
I have some benchmarks here which illustrate the problem: 
https://github.com/carterkozak/stringbuilder-encoding-performance
The CharsetEncoder itself doesn't seem to make a great deal of sense these days 
because Strings are no longer backed by char arrays, the conversion from bytes 
(in String/StringBuilder) into chars for the CharBuffer over-complicates the 
process. It could be feasible to add an API method to CharsetEncoder which 
takes a String/StringBuilder (charsequence + instanceof?), extracts the 
bytes+coder, and writes to an existing buffer without the CharBuffer 
conversion, but I haven't had a chance to sketch this out.
I put together a quick and dirty hack showing a zero-gc fast-path for 
StringBuilder-to-bytes conversion, however it suffers from lack of intrinsics 
used in the String.getBytes(UTF_8) implementation:
https://github.com/carterkozak/stringbuilder-encoding-performance/pull/3

On Mon, Sep 20, 2021, at 15:34, Volkan Yazıcı wrote:
> Carter, I am really curious about your ongoing crusade. Mind updating us on
> it a bit, please? Where are we? What is your plan? How can we help you?
> 
> On Wed, Sep 1, 2021 at 2:57 PM Carter Kozak <cko...@ckozak.net> wrote:
> 
> > Thanks, Volkan!
> >
> > I had not seen InstantFormatter, it does look helpful, however I think it
> > may
> > have a bug. It appears to only compare milliseconds of Instant values
> > while FixedDateFormat has some patterns which support microsecond
> > and nanosecond precision. Currently I think this will batch together all
> > events within a millisecond and assign them the microsecond value
> > of the first cached event.
> > I think this is what we want:
> > https://github.com/apache/logging-log4j2/pull/576
> >
> > Good idea reaching out to Claes Redestad, it would be helpful to have
> > someone more familiar with modern jvm string internals than us take
> > a look if he's interested!
> >
> > -ck
> >
> > On Wed, Sep 1, 2021, at 03:44, Volkan Yazıcı wrote:
> > > Great work Carter!
> > >
> > > Have you seen `o.a.l.l.layout.template.json.util.InstantFormatter`,
> > > particularly its `Formatter#isInstantMatching` methods used for
> > > invalidating the cache? I was thinking of making it even smarter, e.g.,
> > if
> > > the pattern only has seconds, compare `Instant`s by their seconds. I
> > aspire
> > > to pull it to the core, replace access to all individual formatters with
> > > this one, and mark the rest deprecated. Another idea I was thinking about
> > > is enhancing these individual formatters to contain the precision they
> > > require and use that in `isInstantMatching` methods.
> > >
> > > Regarding your unicode character problems, shall we try pinging Claes
> > > Redestad (@cl4es), who has recently enhanced String.format() in JDK 17
> > > <https://twitter.com/cl4es/status/1432361530268528642>, via Twitter?
> > >
> > > On Mon, Aug 30, 2021 at 11:32 PM Carter Kozak <cko...@ckozak.net> wrote:
> > >
> > > > I've merged the fix for our FixedDateFormat caching bug which caused
> > us to
> > > > recompute the same millisecond-precision formatted timestamp
> > unnecessarily
> > > > each time our microsecond-precision clock incremented.
> > > > https://issues.apache.org/jira/browse/LOG4J2-3153
> > > >
> > > > I've also been unwrapping a few layers of complexity, wrapping several
> > > > layers of components with conditional logic makes it harder for the
> > jit to
> > > > optimize code, so we can improve things by using separate types based
> > on
> > > > the requested features:
> > > > https://github.com/apache/logging-log4j2/pull/573
> > > > TODO: I'm not happy with the way I unwrapped PatternFormatter objects
> > in
> > > > this PR, I think it could work better as an optional wrapper around the
> > > > delegate LogEventPatternConverter (where the default FormattingInfo
> > returns
> > > > the delegate instance directly)
> > > > TODO: simplify MessagePatternConverter  a bit, the body is giant for
> > > > something that's generally relatively simple. The method is too large
> > for
> > > > me to read in a glance, so I imagine the jit will have a hard time
> > making
> > > > it fast as well. I don't really like the message-format feature which
> > > > allows lookups in the formatted message text because it leaks details
> > of
> > > > the framework implementation/configuration into consumers of logging
> > APIs
> > > > (which may not even be log4j-core), however I'm not sure how
> > reasonable it
> > > > would be to change the default to disallow lookups given I'm sure folks
> > > > depend on that behavior.
> > > >
> > > > I'm not sure what to do about the CharsetEncoder vs
> > > > string.getBytes(charset) issue. The CharsetEncoder does not require
> > > > allocation and outperforms getBytes when I add a unicode character to
> > the
> > > > message. When the message contains only ascii characters, getBytes
> > performs
> > > > better. Using CharBuffer.wrap(StringBuilder) produces substantially
> > worse
> > > > performance -- it shouldn't be copying the buffer in memory, but I
> > suppose
> > > > the heap buffer is more efficient to deal with. I need to do more
> > research
> > > > in this area.
> > > >
> > > > Thoughts/ideas/concerns?
> > > > -ck
> > > >
> > >
> >
> 

-ck

Re: Cek's new log4j vs logback benchmark

Reply via email to