https://github.com/apache/logging-log4j2/pull/573 improves performance by double digits in my testing, unfortunately it's difficult and time-consuming to individually compare performance implications of each part of the change. I'd appreciate review/feedback on the change. When I apply this PR and benchmark with "-Dlog4j2.enable.direct.encoders=false" (results in StringBuilder.toString().getBytes(charset) rather than using CharsetEncoders) the performance in blocking mode matches logback, and async performance is much better.
StringBuilder.toString.getBytes vs CharsetEncoder: I have some benchmarks here which illustrate the problem: https://github.com/carterkozak/stringbuilder-encoding-performance The CharsetEncoder itself doesn't seem to make a great deal of sense these days because Strings are no longer backed by char arrays, the conversion from bytes (in String/StringBuilder) into chars for the CharBuffer over-complicates the process. It could be feasible to add an API method to CharsetEncoder which takes a String/StringBuilder (charsequence + instanceof?), extracts the bytes+coder, and writes to an existing buffer without the CharBuffer conversion, but I haven't had a chance to sketch this out. I put together a quick and dirty hack showing a zero-gc fast-path for StringBuilder-to-bytes conversion, however it suffers from lack of intrinsics used in the String.getBytes(UTF_8) implementation: https://github.com/carterkozak/stringbuilder-encoding-performance/pull/3 On Mon, Sep 20, 2021, at 15:34, Volkan Yazıcı wrote: > Carter, I am really curious about your ongoing crusade. Mind updating us on > it a bit, please? Where are we? What is your plan? How can we help you? > > On Wed, Sep 1, 2021 at 2:57 PM Carter Kozak <[email protected]> wrote: > > > Thanks, Volkan! > > > > I had not seen InstantFormatter, it does look helpful, however I think it > > may > > have a bug. It appears to only compare milliseconds of Instant values > > while FixedDateFormat has some patterns which support microsecond > > and nanosecond precision. Currently I think this will batch together all > > events within a millisecond and assign them the microsecond value > > of the first cached event. > > I think this is what we want: > > https://github.com/apache/logging-log4j2/pull/576 > > > > Good idea reaching out to Claes Redestad, it would be helpful to have > > someone more familiar with modern jvm string internals than us take > > a look if he's interested! > > > > -ck > > > > On Wed, Sep 1, 2021, at 03:44, Volkan Yazıcı wrote: > > > Great work Carter! > > > > > > Have you seen `o.a.l.l.layout.template.json.util.InstantFormatter`, > > > particularly its `Formatter#isInstantMatching` methods used for > > > invalidating the cache? I was thinking of making it even smarter, e.g., > > if > > > the pattern only has seconds, compare `Instant`s by their seconds. I > > aspire > > > to pull it to the core, replace access to all individual formatters with > > > this one, and mark the rest deprecated. Another idea I was thinking about > > > is enhancing these individual formatters to contain the precision they > > > require and use that in `isInstantMatching` methods. > > > > > > Regarding your unicode character problems, shall we try pinging Claes > > > Redestad (@cl4es), who has recently enhanced String.format() in JDK 17 > > > <https://twitter.com/cl4es/status/1432361530268528642>, via Twitter? > > > > > > On Mon, Aug 30, 2021 at 11:32 PM Carter Kozak <[email protected]> wrote: > > > > > > > I've merged the fix for our FixedDateFormat caching bug which caused > > us to > > > > recompute the same millisecond-precision formatted timestamp > > unnecessarily > > > > each time our microsecond-precision clock incremented. > > > > https://issues.apache.org/jira/browse/LOG4J2-3153 > > > > > > > > I've also been unwrapping a few layers of complexity, wrapping several > > > > layers of components with conditional logic makes it harder for the > > jit to > > > > optimize code, so we can improve things by using separate types based > > on > > > > the requested features: > > > > https://github.com/apache/logging-log4j2/pull/573 > > > > TODO: I'm not happy with the way I unwrapped PatternFormatter objects > > in > > > > this PR, I think it could work better as an optional wrapper around the > > > > delegate LogEventPatternConverter (where the default FormattingInfo > > returns > > > > the delegate instance directly) > > > > TODO: simplify MessagePatternConverter a bit, the body is giant for > > > > something that's generally relatively simple. The method is too large > > for > > > > me to read in a glance, so I imagine the jit will have a hard time > > making > > > > it fast as well. I don't really like the message-format feature which > > > > allows lookups in the formatted message text because it leaks details > > of > > > > the framework implementation/configuration into consumers of logging > > APIs > > > > (which may not even be log4j-core), however I'm not sure how > > reasonable it > > > > would be to change the default to disallow lookups given I'm sure folks > > > > depend on that behavior. > > > > > > > > I'm not sure what to do about the CharsetEncoder vs > > > > string.getBytes(charset) issue. The CharsetEncoder does not require > > > > allocation and outperforms getBytes when I add a unicode character to > > the > > > > message. When the message contains only ascii characters, getBytes > > performs > > > > better. Using CharBuffer.wrap(StringBuilder) produces substantially > > worse > > > > performance -- it shouldn't be copying the buffer in memory, but I > > suppose > > > > the heap buffer is more efficient to deal with. I need to do more > > research > > > > in this area. > > > > > > > > Thoughts/ideas/concerns? > > > > -ck > > > > > > > > > > -ck
