On Thu, 15 Jan 2026 18:18:31 GMT, Liam Miller-Cushon <[email protected]> wrote:
>> src/java.base/share/classes/java/lang/String.java line 2151:
>>
>>> 2149: } else if (cs == US_ASCII.INSTANCE) {
>>> 2150: return encodedLengthASCII(coder, value);
>>> 2151: } else if (cs instanceof sun.nio.cs.UTF_16LE || cs instanceof
>>> sun.nio.cs.UTF_16BE) {
>>
>> I see that `sun.nio.cs.UTF_16{LE,BE}` specialization is suggested by
>> @ExE-Boss [here]. Though I'm not really sure if this is really needed. I
>> cannot spot any other usage of these constants in `java.base`, except
>> `jdk.internal.foreign.StringSupport`, which is irrelevant.
>>
>> [here]: https://github.com/openjdk/jdk/pull/28454/files#r2552768341
>
> I don't have a strong opinion about these charsets. It's nice that the
> encoded length for them can be calculated in constant time, but on the other
> hand if they are less frequently used and there isn't precedent for special
> casing them in `java.base`, then this part could be dropped.
While is convenient that those UTF16 charsets have a easy to compute size, I
doubt those two are in sufficient use to justify a commitment support them in
the fast path.
If you are going to support charsets beyond the most common utf8, ascii, and
ISO-8856-1, then
computing the encoded length should delegated to the Charset itself and have
separate code in different packages.
Have you looked at `CharsetEncoder.maxBytesPerChar()`, It might only be useful
for single byte formats, but if `maxBytesPerChar` is equal to
`averageBytesPerChar` that might be a useful shortcut.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2695660230