On Thu, 15 Jan 2026 19:23:43 GMT, Roger Riggs <[email protected]> wrote:
> While is convenient that those UTF16 charsets have a easy to compute size, I
> doubt those two are in sufficient use to justify a commitment support them in
> the fast path. If you are going to support charsets beyond the most common
> utf8, ascii, and ISO-8856-1, then computing the encoded length should
> delegated to the Charset itself and have separate code in different packages.
Thanks, that makes sense to me. My opinion is that a large amount of the value
here is in optimizing UTF-8, and that there's an argument to optimize the other
standard charsets that `String` has other fast paths for, but sharply
diminishing returns beyond that. I would be inclined to stop at the standard
charsets, but also happy to make changes if there's a preference for having
more or fewer fast paths.
> Have you looked at `CharsetEncoder.maxBytesPerChar()`, It might only be
> useful for single byte formats, but if `maxBytesPerChar` is equal to
> `averageBytesPerChar` that might be a useful shortcut.
I had a quick look at that, and saw errors for `IBM-Thai`:
CharsetEncoder encoder = cs.newEncoder();
if (encoder.maxBytesPerChar() == 1f && encoder.maxBytesPerChar() ==
encoder.averageBytesPerChar()) {
return value.length * (int) encoder.maxBytesPerChar();
}
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2695769015