On Sun, 19 Feb 2023 07:24:30 GMT, David Schlosnagle <d...@openjdk.org> wrote:
>> When encoding Strings to US-ASCII we can speed up the happy path >> significantly by using `StringCoding.countPositives` as a speculative check >> for whether there are any chars that needs to be replaced by `'?'`. Once a >> non-ASCII char is encountered we fall back to the slow loop and replace as >> needed. >> >> An alternative could be unrolling or using a byte array VarHandle, as >> show-cased by Brett Okken here: >> https://mail.openjdk.org/pipermail/core-libs-dev/2023-February/100573.html >> Having to replace chars with `?` is essentially an encoding error so it >> might be safe to assume this case is exceptional in practice. > > src/java.base/share/classes/java/lang/String.java line 976: > >> 974: private static byte[] encodeASCII(byte coder, byte[] val) { >> 975: if (coder == LATIN1) { >> 976: byte[] dst = Arrays.copyOf(val, val.length); > > Given the tweaks in https://git.openjdk.org/jdk/pull/12613 should this use > `val.clone()` (would skip the length check) > > Suggestion: > > byte[] dst = val.clone(); Yeah, probably makes sense. On that note I found that `val.clone()` underperform `Arrays.copyOf(val, val.length)` in C1 compiled code: https://bugs.openjdk.org/browse/JDK-8302850 - while this shouldn't affect peak performance it might be cause for a warmup regression. > src/java.base/share/classes/java/lang/String.java line 982: > >> 980: if (dst[i] < 0) { >> 981: dst[i] = '?'; >> 982: } > > I'm curious if using countPositives (and vectorization) to scan forward would > be valuable for long (mostly ASCII) strings or if the method call > overhead/non-constant stride is not a win for shorter strings or heavily > non-ascii inputs. > > Suggestion: > > for (int i = positives; i < dst.length; i = > StringCoding.countPositives(dst, i + 1, dst.length - i);) { > if (dst[i] < 0) { > dst[i] = '?'; > } There's some overhead doing countPositives calls, and doing it in a loop might provoke poor worst case performance. Im sure you can find inputs for which this is a win, though. ------------- PR: https://git.openjdk.org/jdk/pull/12640