RFR: 8280124: Reduce branches decoding latin-1 chars from UTF-8 encoded bytes

Claes Redestad Tue, 18 Jan 2022 02:16:24 -0800

This resolves minor inefficiency in the fast-path for decoding latin-1 chars 
from UTF-8. I also took the opportunity to refactor the StringDecode 
microbenchmark to align with recent changes to the StringEncode micro.


The inefficiency is that this test is quite branchy:

`if ((b1 == (byte)0xc2 || b1 == (byte)0xc3) && ...`

Since the two constant bytes differ only on the lowest bit this can be 
transformed to this, saving us a branch:

`if ((b1 & 0xfe) == 0xc2 && ...`

This provides a small speed-up on microbenchmarks where the input can be 
internally encoded as latin1:


Benchmark (charsetName) Mode Cnt Score Error Units
StringDecode.decodeLatin1LongStart UTF-8 avgt 50 2283.591 ± 12.332 ns/op

StringDecode.decodeLatin1LongStart UTF-8 avgt 50 2165.984 ± 13.136 ns/op

-------------

Commit messages:
 - 8280124: Reduce branches decoding latin-1 chars from UTF-8 encoded bytes
 - Align StringDecode microbenchmark with StringEncode

Changes: https://git.openjdk.java.net/jdk/pull/7122/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=7122&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8280124
  Stats: 139 lines in 2 files changed: 93 ins; 34 del; 12 mod
  Patch: https://git.openjdk.java.net/jdk/pull/7122.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/7122/head:pull/7122

PR: https://git.openjdk.java.net/jdk/pull/7122

RFR: 8280124: Reduce branches decoding latin-1 chars from UTF-8 encoded bytes

Reply via email to