On Fri, 8 Sep 2023 08:47:04 GMT, Raffaello Giulietti <rgiulie...@openjdk.org> wrote:
>> This PR seeks to improve formatting of hex digits using >> `java.util.HexFormat` somewhat. >> >> This is achieved getting rid of a couple of lookup tables, caching the >> result of `HexFormat.of().withUpperCase()`, and removing tiny allocation >> that happens in the `formatHex(A, byte)` method. Improvements range from >> 20-40% on throughput, and some operations allocate less: >> >> >> Name Cnt Base Error Test Error Unit >> Diff% >> HexFormatBench.appenderLower 15 1,330 ± 0,021 1,065 ± 0,067 us/op >> 19,9% (p = 0,000*) >> :gc.alloc.rate 15 11,481 ± 0,185 0,007 ± 0,000 MB/sec >> -99,9% (p = 0,000*) >> :gc.alloc.rate.norm 15 16,009 ± 0,000 0,007 ± 0,000 B/op >> -100,0% (p = 0,000*) >> :gc.count 15 3,000 0,000 counts >> :gc.time 3 2,000 ms >> HexFormatBench.appenderLowerCached 15 1,317 ± 0,013 1,065 ± 0,054 us/op >> 19,1% (p = 0,000*) >> :gc.alloc.rate 15 11,590 ± 0,111 0,007 ± 0,000 MB/sec >> -99,9% (p = 0,000*) >> :gc.alloc.rate.norm 15 16,009 ± 0,000 0,007 ± 0,000 B/op >> -100,0% (p = 0,000*) >> :gc.count 15 3,000 0,000 counts >> :gc.time 3 2,000 ms >> HexFormatBench.appenderUpper 15 1,330 ± 0,022 1,065 ± 0,036 us/op >> 19,9% (p = 0,000*) >> :gc.alloc.rate 15 34,416 ± 0,559 0,007 ± 0,000 MB/sec >> -100,0% (p = 0,000*) >> :gc.alloc.rate.norm 15 48,009 ± 0,000 0,007 ± 0,000 B/op >> -100,0% (p = 0,000*) >> :gc.count 15 0,000 0,000 counts >> HexFormatBench.appenderUpperCached 15 1,353 ± 0,009 1,033 ± 0,014 us/op >> 23,6% (p = 0,000*) >> :gc.alloc.rate 15 11,284 ± 0,074 0,007 ± 0,000 MB/sec >> -99,9% (p = 0,000*) >> :gc.alloc.rate.norm 15 16,009 ± 0,000 0,007 ± 0,000 B/op >> -100,0% (p = 0,000*) >> :gc.count 15 3,000 0,000 counts >> :gc.time 3 2,000 ms >> HexFormatBench.toHexLower 15 0,198 ± 0,001 0,119 ± 0,008 us/op >> 40,1% (p = 0,000*) >> :gc.alloc.rate 15 0,007 ± 0,000 0,007 ± 0,000 MB/sec >> -0,0% (p = 0,816 ) >> :gc.alloc.rate.norm 15 0,001 ± 0,000 0,001 ± 0,000 B/op >> -40,1% (p = 0,000*) >> :gc.... > > I'm not sure that micro-benchmarks are very indicative on whether a lookup > table performs better than short and straightforward code. > The reason is that, once in the CPU caches, a lookup table in > micro-benchmarks stays there, whereas in more realistic situations, where > access is more spread out in time, it is often evicted to make room for > other, more lively data. > A micro-benchmark using a lookup table shows good results because of a high > rate of cache hits, whereas in other real-world workloads a lookup table > might result in many cache misses on access. As @rgiulietti says lookup-table algorithms may outperform in microbenchmarks but lose out in real world scenarios, so we need to stay clear unless there's major benefit. And as it turns out, the relative benefit seem to come mainly from the use of `ByteArrayLittleEndian`, not from using the lookup table in `HexDigits.DIGITS`: public String toHexDigits(byte value) { byte[] rep = new byte[2]; - rep[0] = (byte)toHighHexDigit(value); - rep[1] = (byte)toLowHexDigit(value); + ByteArrayLittleEndian.setChar(rep, 0, (char)(toHighHexDigit(value) << 8 | toLowHexDigit(value))); try { return jla.newStringNoRepl(rep, StandardCharsets.ISO_8859_1); } catch (CharacterCodingException cce) { This gets the same speed-up (4%) as calling `HexDigits.DIGITS`: Name Cnt Base Error Test Error Unit Diff% HexFormatBench.toHexDigits 15 1,943 ± 0,014 1,860 ± 0,014 us/op 4,2% (p = 0,000*) * = significant ------------- PR Comment: https://git.openjdk.org/jdk/pull/15591#issuecomment-1711370758