Thanks, Ralph!
You're correct that the direct encoderes are required for GC-free logging, some
of our consumers rely on that
for niche (low latency) workloads where it's important to understand precisely
where a pause may occur.
GC performance has become much better over the last several years, and the
throughput of the allocation-
heavy workflow has increased significantly, however there are a few things we
should keep in mind:
1. Java 8 performs better with direct encoders (although afaict the performance
isn't any better than java 9+,
string.getBytes just has several new optimizations
2. If this[1] change is merged into openjdk, the direct encoders may once again
be the best-performing option,
I'm planning to put together a local build for comparison later.
3. My testing has primarily targetted UTF-8 as it's the only charset we target
at work, and the default charset
on most modern servers. The jdk has a great deal of optimization which
special cases UTF-8 for this reason,
especially from string.getBytes. Other charsets are likely to perform
differently.
I'm in favor of changing the default value of "direct.encoders" based on the
java version, for example
Java 8 should continue to use them, while versions after compact-strings was
introduced are better
off without direct encoders until the jdk can be improved using something like
the linked PR.
1. https://github.com/openjdk/jdk/pull/5621#issuecomment-925413767
-ck
On Thu, Sep 23, 2021, at 11:51, Ralph Goers wrote:
> I ran Ceki’s benchmarks against 2.14.1, 2.15.0-SNAPSHOT and Carter’s branch.
> I ran them with 16 threads on my
> MacBook Pro and ran each test twice since the tests show some variability
> from time to time.
>
> I can draw 2 conclusions from this.
> 1. The improvements Carter has made to 2.15.0 have already made a noticeable
> improvement.
> 2. His new changes show a definite improvement on top of the 2.15.0
> improvements when enable direct encoders is false.
>
> These results definitely make me wonder if we should just remove the direct
> encoders logic at least in master. From what
> I can tell it makes things complicated and makes no noticeable impact on
> performance in 2.15.0 even before Carter’s
> changes. Are they required for gc-free? Can we make direct.encoders = false
> the default if it isn’t now?
>
> I apologize if the performance results don’t look good in your email. They
> are formatted with a fixed-width font but it is
> likely they will be messed up once they reach your mail client.
>
> Ralph
>
> 2.14.1 - enable direct encoders = false
>
> Benchmark Mode Cnt Score Error
> Units
> AsyncWithFileAppenderBenchmark.log4j1File thrpt 10 1221.995 ± 150.876
> ops/ms
> AsyncWithFileAppenderBenchmark.log4j2AsyncFile thrpt 10 1495.072 ± 41.715
> ops/ms
> AsyncWithFileAppenderBenchmark.logbackFile thrpt 10 1529.807 ± 55.876
> ops/ms
> FileAppenderBenchmark.log4j1File thrpt 10 681.690 ± 17.631
> ops/ms
> FileAppenderBenchmark.log4j2File thrpt 10 1134.543 ± 35.065
> ops/ms
> FileAppenderBenchmark.logbackFile thrpt 10 1546.811 ± 95.721
> ops/ms
>
> Benchmark Mode Cnt Score Error
> Units
> AsyncWithFileAppenderBenchmark.log4j1File thrpt 10 1103.526 ± 78.201
> ops/ms
> AsyncWithFileAppenderBenchmark.log4j2AsyncFile thrpt 10 1402.669 ± 144.374
> ops/ms
> AsyncWithFileAppenderBenchmark.logbackFile thrpt 10 1475.666 ± 87.885
> ops/ms
> FileAppenderBenchmark.log4j1File thrpt 10 582.841 ± 57.156
> ops/ms
> FileAppenderBenchmark.log4j2File thrpt 10 860.587 ± 185.170
> ops/ms
> FileAppenderBenchmark.logbackFile thrpt 10 1560.206 ± 323.229
> ops/ms
>
> 2.14.1 - enable direct encoders = true
>
> Benchmark Mode Cnt Score Error
> Units
> AsyncWithFileAppenderBenchmark.log4j1File thrpt 10 1250.808 ± 116.848
> ops/ms
> AsyncWithFileAppenderBenchmark.log4j2AsyncFile thrpt 10 1564.926 ± 146.195
> ops/ms
> AsyncWithFileAppenderBenchmark.logbackFile thrpt 10 1703.731 ± 183.304
> ops/ms
> FileAppenderBenchmark.log4j1File thrpt 10 736.189 ± 40.798
> ops/ms
> FileAppenderBenchmark.log4j2File thrpt 10 1070.958 ± 97.173
> ops/ms
> FileAppenderBenchmark.logbackFile thrpt 10 1350.507 ± 225.104
> ops/ms
>
> Benchmark Mode Cnt Score Error
> Units
> AsyncWithFileAppenderBenchmark.log4j1File thrpt 10 1218.639 ± 141.343
> ops/ms
> AsyncWithFileAppenderBenchmark.log4j2AsyncFile thrpt 10 1525.081 ± 112.058
> ops/ms
> AsyncWithFileAppenderBenchmark.logbackFile thrpt 10 1724.925 ± 169.680
> ops/ms
> FileAppenderBenchmark.log4j1File thrpt 10 680.453 ± 32.330
> ops/ms
> FileAppenderBenchmark.log4j2File thrpt 10 1054.517 ± 52.147
> ops/ms
> FileAppenderBenchmark.logbackFile thrpt 10 1353.180 ± 224.415
> ops/ms
>
> 2.15.0-SNAPSHOT - enable direct encoders = false
>
> Benchmark Mode Cnt Score Error
> Units
> AsyncWithFileAppenderBenchmark.log4j1File thrpt 10 1121.838 ± 112.026
> ops/ms
> AsyncWithFileAppenderBenchmark.log4j2AsyncFile thrpt 10 1606.602 ± 217.776
> ops/ms
> AsyncWithFileAppenderBenchmark.logbackFile thrpt 10 1607.875 ± 226.656
> ops/ms
> FileAppenderBenchmark.log4j1File thrpt 10 694.809 ± 33.380
> ops/ms
> FileAppenderBenchmark.log4j2File thrpt 10 1500.557 ± 402.779
> ops/ms
> FileAppenderBenchmark.logbackFile thrpt 10 1387.306 ± 213.081
> ops/ms
>
> Benchmark Mode Cnt Score Error
> Units
> AsyncWithFileAppenderBenchmark.log4j1File thrpt 10 1199.591 ± 149.157
> ops/ms
> AsyncWithFileAppenderBenchmark.log4j2AsyncFile thrpt 10 1744.558 ± 190.732
> ops/ms
> AsyncWithFileAppenderBenchmark.logbackFile thrpt 10 1718.862 ± 142.328
> ops/ms
> FileAppenderBenchmark.log4j1File thrpt 10 604.785 ± 78.884
> ops/ms
> FileAppenderBenchmark.log4j2File thrpt 10 1604.844 ± 712.807
> ops/ms
> FileAppenderBenchmark.logbackFile thrpt 10 1578.756 ± 109.343
> ops/ms
>
> 2.15.0-SNAPSHOT - enable direct encoders = true
>
> Benchmark Mode Cnt Score Error
> Units
> AsyncWithFileAppenderBenchmark.log4j1File thrpt 10 1244.029 ± 61.842
> ops/ms
> AsyncWithFileAppenderBenchmark.log4j2AsyncFile thrpt 10 1564.244 ± 263.322
> ops/ms
> AsyncWithFileAppenderBenchmark.logbackFile thrpt 10 1648.783 ± 217.936
> ops/ms
> FileAppenderBenchmark.log4j1File thrpt 10 717.024 ± 43.631
> ops/ms
> FileAppenderBenchmark.log4j2File thrpt 10 1513.402 ± 438.311
> ops/ms
> FileAppenderBenchmark.logbackFile thrpt 10 1425.745 ± 231.201
> ops/ms
>
> Benchmark Mode Cnt Score Error
> Units
> AsyncWithFileAppenderBenchmark.log4j1File thrpt 10 1207.924 ± 136.763
> ops/ms
> AsyncWithFileAppenderBenchmark.log4j2AsyncFile thrpt 10 1514.119 ± 98.725
> ops/ms
> AsyncWithFileAppenderBenchmark.logbackFile thrpt 10 1620.834 ± 238.498
> ops/ms
> FileAppenderBenchmark.log4j1File thrpt 10 716.294 ± 65.527
> ops/ms
> FileAppenderBenchmark.log4j2File thrpt 10 1523.430 ± 387.178
> ops/ms
> FileAppenderBenchmark.logbackFile thrpt 10 1402.748 ± 312.713
> ops/ms
>
> Carter 2.15.0-SNAPSHOT - enable direct encoders = false
>
> Benchmark Mode Cnt Score Error
> Units
> AsyncWithFileAppenderBenchmark.log4j1File thrpt 10 1351.481 ± 68.356
> ops/ms
> AsyncWithFileAppenderBenchmark.log4j2AsyncFile thrpt 10 2048.458 ± 118.329
> ops/ms
> AsyncWithFileAppenderBenchmark.logbackFile thrpt 10 1762.185 ± 155.733
> ops/ms
> FileAppenderBenchmark.log4j1File thrpt 10 693.222 ± 94.051
> ops/ms
> FileAppenderBenchmark.log4j2File thrpt 10 1707.576 ± 299.891
> ops/ms
> FileAppenderBenchmark.logbackFile thrpt 10 1439.624 ± 60.957
> ops/ms
>
> Benchmark Mode Cnt Score Error
> Units
> AsyncWithFileAppenderBenchmark.log4j1File thrpt 10 1230.511 ± 111.351
> ops/ms
> AsyncWithFileAppenderBenchmark.log4j2AsyncFile thrpt 10 1950.005 ± 166.843
> ops/ms
> AsyncWithFileAppenderBenchmark.logbackFile thrpt 10 1783.227 ± 53.359
> ops/ms
> FileAppenderBenchmark.log4j1File thrpt 10 702.814 ± 38.263
> ops/ms
> FileAppenderBenchmark.log4j2File thrpt 10 1738.084 ± 344.135
> ops/ms
> FileAppenderBenchmark.logbackFile thrpt 10 1463.187 ± 158.787
> ops/ms
>
> Carter 2.15.0-SNAPSHOT - enable direct encoders = true
>
> Benchmark Mode Cnt Score Error
> Units
> AsyncWithFileAppenderBenchmark.log4j1File thrpt 10 1212.045 ± 165.745
> ops/ms
> AsyncWithFileAppenderBenchmark.log4j2AsyncFile thrpt 10 1671.374 ± 190.449
> ops/ms
> AsyncWithFileAppenderBenchmark.logbackFile thrpt 10 1637.298 ± 169.134
> ops/ms
> FileAppenderBenchmark.log4j1File thrpt 10 708.106 ± 30.035
> ops/ms
> FileAppenderBenchmark.log4j2File thrpt 10 1523.321 ± 429.586
> ops/ms
> FileAppenderBenchmark.logbackFile thrpt 10 1418.236 ± 233.527
> ops/ms
>
> Benchmark Mode Cnt Score Error
> Units
> AsyncWithFileAppenderBenchmark.log4j1File thrpt 10 1189.572 ± 117.298
> ops/ms
> AsyncWithFileAppenderBenchmark.log4j2AsyncFile thrpt 10 1499.633 ± 186.132
> ops/ms
> AsyncWithFileAppenderBenchmark.logbackFile thrpt 10 1631.038 ± 214.951
> ops/ms
> FileAppenderBenchmark.log4j1File thrpt 10 697.329 ± 48.757
> ops/ms
> FileAppenderBenchmark.log4j2File thrpt 10 1542.492 ± 395.759
> ops/ms
> FileAppenderBenchmark.logbackFile thrpt 10 1321.256 ± 148.364
> ops/ms
>
>
> > On Sep 22, 2021, at 12:30 AM, Volkan Yazıcı <[email protected]> wrote:
> >
> > That's great news! Then I will be looking forward to your signal for
> > putting a ribbon onto 2.15.0.
> >
> > I have followed the conversation with Claes Redestad from Oracle on Twitter
> > <https://twitter.com/carter_kozak/status/1433798391604162561>. My
> > conclusion was also that there apparently is no way to make CharsetEncoder
> > beat .toString().getBytes() in Java 9+, until a CE specialization gets
> > introduced similar to what has been done in .toString().getBytes().
> > Nevertheless, I was also (naively?) thinking about a similar branching
> > strategy like the one you proposed as a temporary solution: preJava9 ?
> > useCE() : useStringGetBytes(). Would you be able to get this done?
> >
> > Another idea I thought of yesterday evening was to introduce our own
> > hand-written StringBuilder- or char[]-to-byte[] encoders for common cases,
> > i.e., ASCII and UTF-8. What do you think?
> >
> >
> > On Wed, Sep 22, 2021 at 3:48 AM Carter Kozak <[email protected]> wrote:
> >
> >> Thanks, Volkan!
> >>
> >> Rerunning the benchmarks on my branch (specifically the PatternLayout
> >> benchmarks in log4j-perf) produced much better improvements than I had
> >> anticipated. Beyond that, the date format cache invalidation problem
> >> resulted in a substantial speedup. I agree that it would be helpful to get
> >> a release out the door once this is merged.
> >>
> >> Re getBytes vs CharsetEncoder, I don't want to use the unsafe hack I put
> >> together in my benchmark project, that was just for experimentation :-)
> >> Future java releases (or changes in minor patch releases) could cause it to
> >> fail in frightening ways. We may be better off recommending the getBytes
> >> approach for now on some java versions (possibly by changing our default on
> >> java 9+).
> >> Claes has a potential change[1] which appears to buy us a great deal of
> >> performance in future Javas (assuming it is merged) and we may be able to
> >> engage for additional encoding APIs, for example something like this could
> >> avoid allocations and additional buffers:
> >>
> >> /** Returns the number of characters encoded to destination, or -1 if more
> >> space is needed (equivalent to CoderResult.OVERFLOW) */
> >> int CharSetEncoder.encode(charsequence, inputStart, inputLimit, byte[]
> >> destination, int destOffset, int destLimit)
> >> (I haven't put a great deal of thought into this API and it's getting
> >> late, so pardon any terrible ideas!)
> >>
> >> 1. https://github.com/openjdk/jdk/pull/5621
> >>
> >> -ck
> >>
> >> On Tue, Sep 21, 2021, at 15:31, Volkan Yazıcı wrote:
> >>> First and foremost, fantastic job Carter!
> >>>
> >>> For #573, I see that Gary and Ralph have already shared some remarks. I
> >>> would appreciate it if we can get this merged and cut the ribbon for
> >> 2.15.0
> >>> release.
> >>>
> >>> Regarding `StringBuilder#toString().getBytes()`-vs-`CharsetEncoder`...
> >> That
> >>> is a tough one. In your hack branch there, I am not sure if using
> >> `Unsafe`
> >>> is a future-proof path forward. I was trying to wrap my mind around
> >> Daniel
> >>> Sun's fast-reflection <https://github.com/danielsun1106/fast-reflection>
> >>> (for the records, I couldn't) and was triggered by his ASM usage there. I
> >>> was curious if we could do something similar via ASM to hack
> >>> `CharsetEncoder`? (I am probably talking nonsense, I hope it anyway
> >>> triggers a practical idea for you.)
> >>
>
>