On Thu, 15 Jan 2026 19:23:43 GMT, Roger Riggs <[email protected]> wrote:

> While is convenient that those UTF16 charsets have a easy to compute size, I 
> doubt those two are in sufficient use to justify a commitment support them in 
> the fast path. If you are going to support charsets beyond the most common 
> utf8, ascii, and ISO-8856-1, then computing the encoded length should 
> delegated to the Charset itself and have separate code in different packages.

Thanks, that makes sense to me. My opinion is that a large amount of the value 
here is in optimizing UTF-8, and that there's an argument to optimize the other 
standard charsets that `String` has other fast paths for, but sharply 
diminishing returns beyond that. I would be inclined to stop at the standard 
charsets, but also happy to make changes if there's a preference for having 
more or fewer fast paths.

> Have you looked at `CharsetEncoder.maxBytesPerChar()`, It might only be 
> useful for single byte formats, but if `maxBytesPerChar` is equal to 
> `averageBytesPerChar` that might be a useful shortcut.

I had a quick look at that, and saw errors for `IBM-Thai`:


        CharsetEncoder encoder = cs.newEncoder();
        if (encoder.maxBytesPerChar() == 1f && encoder.maxBytesPerChar() == 
encoder.averageBytesPerChar()) {
            return value.length * (int) encoder.maxBytesPerChar();
        }

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2695769015

Reply via email to