Re: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v5]

Liam Miller-Cushon Thu, 15 Jan 2026 10:31:27 -0800

On Thu, 15 Jan 2026 16:17:22 GMT, Volkan Yazici <[email protected]> wrote:


>> Liam Miller-Cushon has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   Update tests
>
> src/java.base/share/classes/java/lang/String.java line 2143:
> 
>> 2141:      */
>> 2142:     public int getBytesLength(Charset cs) {
>> 2143:         if (cs == UTF_8.INSTANCE) {
> 
> It'd be nice to catch null values as early as possible. I suggest adding a 
> `Objects.requireNonNull(cs)` along with `@throws NullPointerException If 
> {@code cs} is null` in docs.

I added the `requireNonNull`, omitting the `@throws` as suggested in 
https://github.com/openjdk/jdk/pull/28454/changes#r2695394410

> src/java.base/share/classes/java/lang/String.java line 2148:
> 
>> 2146:             if (isLatin1()) {
>> 2147:                 return value.length;
>> 2148:             }
> 
> Any particular reason you avoided introducing a `encodedLength8859_1` here? 
> (There is a `encode8859_1` method.)

I have tentatively added `encodedLength8859_1`

`encode8859_1` is implemented in terms of the `implEncodeISOArray`, so it is 
less similar than the other examples. In general I figured there was a tradeoff 
between the performance benefit and the additional code to have fast paths for 
each charset, and UTF-8 may be more frequently used.

> src/java.base/share/classes/java/lang/String.java line 2151:
> 
>> 2149:         } else if (cs == US_ASCII.INSTANCE) {
>> 2150:             return encodedLengthASCII(coder, value);
>> 2151:         } else if (cs instanceof sun.nio.cs.UTF_16LE || cs instanceof 
>> sun.nio.cs.UTF_16BE) {
> 
> I see that `sun.nio.cs.UTF_16{LE,BE}` specialization is suggested by 
> @ExE-Boss [here]. Though I'm not really sure if this is really needed. I 
> cannot spot any other usage of these constants in `java.base`, except 
> `jdk.internal.foreign.StringSupport`, which is irrelevant.
> 
> [here]: https://github.com/openjdk/jdk/pull/28454/files#r2552768341

I don't have a strong opinion about these charsets. It's nice that the encoded 
length for them can be calculated in constant time, but on the other hand if 
they are less frequently used and there isn't precedent for special casing them 
in `java.base`, then this part could be dropped.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2695394410
PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2695468134
PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2695471999

Re: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v5]

Reply via email to