On Wed, 6 Sep 2023 13:36:22 GMT, Claes Redestad <redes...@openjdk.org> wrote:
> This PR seeks to improve formatting of hex digits using `java.util.HexFormat` > somewhat. > > This is achieved getting rid of a couple of lookup tables, caching the result > of `HexFormat.of().withUpperCase()`, and removing tiny allocation that > happens in the `formatHex(A, byte)` method. Improvements range from 20-40% on > throughput, and some operations allocate less: > > > Name Cnt Base Error Test Error Unit > Diff% > HexFormatBench.appenderLower 15 1,330 ± 0,021 1,065 ± 0,067 us/op > 19,9% (p = 0,000*) > :gc.alloc.rate 15 11,481 ± 0,185 0,007 ± 0,000 MB/sec > -99,9% (p = 0,000*) > :gc.alloc.rate.norm 15 16,009 ± 0,000 0,007 ± 0,000 B/op > -100,0% (p = 0,000*) > :gc.count 15 3,000 0,000 counts > :gc.time 3 2,000 ms > HexFormatBench.appenderLowerCached 15 1,317 ± 0,013 1,065 ± 0,054 us/op > 19,1% (p = 0,000*) > :gc.alloc.rate 15 11,590 ± 0,111 0,007 ± 0,000 MB/sec > -99,9% (p = 0,000*) > :gc.alloc.rate.norm 15 16,009 ± 0,000 0,007 ± 0,000 B/op > -100,0% (p = 0,000*) > :gc.count 15 3,000 0,000 counts > :gc.time 3 2,000 ms > HexFormatBench.appenderUpper 15 1,330 ± 0,022 1,065 ± 0,036 us/op > 19,9% (p = 0,000*) > :gc.alloc.rate 15 34,416 ± 0,559 0,007 ± 0,000 MB/sec > -100,0% (p = 0,000*) > :gc.alloc.rate.norm 15 48,009 ± 0,000 0,007 ± 0,000 B/op > -100,0% (p = 0,000*) > :gc.count 15 0,000 0,000 counts > HexFormatBench.appenderUpperCached 15 1,353 ± 0,009 1,033 ± 0,014 us/op > 23,6% (p = 0,000*) > :gc.alloc.rate 15 11,284 ± 0,074 0,007 ± 0,000 MB/sec > -99,9% (p = 0,000*) > :gc.alloc.rate.norm 15 16,009 ± 0,000 0,007 ± 0,000 B/op > -100,0% (p = 0,000*) > :gc.count 15 3,000 0,000 counts > :gc.time 3 2,000 ms > HexFormatBench.toHexLower 15 0,198 ± 0,001 0,119 ± 0,008 us/op > 40,1% (p = 0,000*) > :gc.alloc.rate 15 0,007 ± 0,000 0,007 ± 0,000 MB/sec > -0,0% (p = 0,816 ) > :gc.alloc.rate.norm 15 0,001 ± 0,000 0,001 ± 0,000 B/op > -40,1% (p = 0,000*) > :gc.count 15 0,000 0,000 ... I'm not sure that micro-benchmarks are very indicative on whether a lookup table performs better than short and straightforward code. The reason is that, once in the CPU caches, a lookup table in micro-benchmarks stays there, whereas in more realistic situations, where access is more spread out in time, it is often evicted to make room for other, more lively data. A micro-benchmark using a lookup table shows good results because of a high rate of cache hits, whereas in other real-world workloads a lookup table might result in many cache misses on access. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15591#issuecomment-1711307164