Re: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v5]

Roger Riggs Thu, 15 Jan 2026 11:28:30 -0800

On Thu, 15 Jan 2026 18:18:31 GMT, Liam Miller-Cushon <[email protected]> wrote:


>> src/java.base/share/classes/java/lang/String.java line 2151:
>> 
>>> 2149:         } else if (cs == US_ASCII.INSTANCE) {
>>> 2150:             return encodedLengthASCII(coder, value);
>>> 2151:         } else if (cs instanceof sun.nio.cs.UTF_16LE || cs instanceof 
>>> sun.nio.cs.UTF_16BE) {
>> 
>> I see that `sun.nio.cs.UTF_16{LE,BE}` specialization is suggested by 
>> @ExE-Boss [here]. Though I'm not really sure if this is really needed. I 
>> cannot spot any other usage of these constants in `java.base`, except 
>> `jdk.internal.foreign.StringSupport`, which is irrelevant.
>> 
>> [here]: https://github.com/openjdk/jdk/pull/28454/files#r2552768341
>
> I don't have a strong opinion about these charsets. It's nice that the 
> encoded length for them can be calculated in constant time, but on the other 
> hand if they are less frequently used and there isn't precedent for special 
> casing them in `java.base`, then this part could be dropped.

While is convenient that those UTF16 charsets have a easy to compute size, I 
doubt those two are in sufficient use to justify a commitment support them in 
the fast path.
If you are going to support charsets beyond the most common utf8, ascii, and 
ISO-8856-1, then
computing the encoded length should delegated to the Charset itself and have 
separate code in different packages.
Have you looked at `CharsetEncoder.maxBytesPerChar()`, It might only be useful 
for single byte formats, but if `maxBytesPerChar` is equal to 
`averageBytesPerChar` that might be a useful shortcut.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2695660230

Re: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v5]

Reply via email to