Re: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v2]

Roger Riggs Wed, 14 Jan 2026 14:55:05 -0800

On Wed, 14 Jan 2026 10:59:32 GMT, Liam Miller-Cushon <[email protected]> wrote:


>> This implements an API to return the byte length of a String encoded in a 
>> given charset. See 
>> [JDK-8372353](https://bugs.openjdk.org/browse/JDK-8372353) for background.
>> 
>> ---
>> 
>> 
>> Benchmark                              (encoding)  (stringLength)   Mode  
>> Cnt          Score          Error  Units
>> StringLoopJmhBenchmark.getBytes             ASCII              10  thrpt    
>> 5  406782650.595 ± 16960032.852  ops/s
>> StringLoopJmhBenchmark.getBytes             ASCII             100  thrpt    
>> 5  172936926.189 ±  4532029.201  ops/s
>> StringLoopJmhBenchmark.getBytes             ASCII            1000  thrpt    
>> 5   38830681.232 ±  2413274.766  ops/s
>> StringLoopJmhBenchmark.getBytes             ASCII          100000  thrpt    
>> 5     458881.155 ±    12818.317  ops/s
>> StringLoopJmhBenchmark.getBytes            LATIN1              10  thrpt    
>> 5   37193762.990 ±  3962947.391  ops/s
>> StringLoopJmhBenchmark.getBytes            LATIN1             100  thrpt    
>> 5   55400876.236 ±  1267331.434  ops/s
>> StringLoopJmhBenchmark.getBytes            LATIN1            1000  thrpt    
>> 5   11104514.001 ±    41718.545  ops/s
>> StringLoopJmhBenchmark.getBytes            LATIN1          100000  thrpt    
>> 5     182535.414 ±    10296.120  ops/s
>> StringLoopJmhBenchmark.getBytes             UTF16              10  thrpt    
>> 5  113474681.457 ±  8326589.199  ops/s
>> StringLoopJmhBenchmark.getBytes             UTF16             100  thrpt    
>> 5   37854103.127 ±  4808526.773  ops/s
>> StringLoopJmhBenchmark.getBytes             UTF16            1000  thrpt    
>> 5    4139833.009 ±    70636.784  ops/s
>> StringLoopJmhBenchmark.getBytes             UTF16          100000  thrpt    
>> 5      57644.637 ±     1887.112  ops/s
>> StringLoopJmhBenchmark.getBytesLength       ASCII              10  thrpt    
>> 5  946701647.247 ± 76938927.141  ops/s
>> StringLoopJmhBenchmark.getBytesLength       ASCII             100  thrpt    
>> 5  396615374.479 ± 15167234.884  ops/s
>> StringLoopJmhBenchmark.getBytesLength       ASCII            1000  thrpt    
>> 5  100464784.979 ±   794027.897  ops/s
>> StringLoopJmhBenchmark.getBytesLength       ASCII          100000  thrpt    
>> 5    1215487.689 ±     1916.468  ops/s
>> StringLoopJmhBenchmark.getBytesLength      LATIN1              10  thrpt    
>> 5  221265102.323 ± 17013983.056  ops/s
>> StringLoopJmhBenchmark.getBytesLength      LATIN1             100  thrpt    
>> 5  137617873.887 ±  5842185.781  ops/s
>> StringLoopJmhBenchmark.getBytesLength      LATIN1            1000  thrpt    
>> 5   92540259.1...
>
> Liam Miller-Cushon has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Review feedback

src/java.base/share/classes/java/lang/String.java line 1080:

> 1078:             return value.length;
> 1079:         }
> 1080:         int len = value.length >> 1;

I don't think I understand what's being done and what Charset encoder it is 
mimicking.
It probably needs to document the assumptions about unmappable characters and 
malformed surrogates.
(Likely it is correct since the test of US_ASCII passes, but could use an 
explanation).

src/java.base/share/classes/java/lang/String.java line 2130:

> 2128: 
> 2129:     /**
> 2130:      * Returns the length in bytes of the given String encoded with the 
> given {@link Charset}.

You can use the javadoc tag `@return` and skip the duplication.  This first 
sentence reads better then the @return below since it emphasies the "encoded 
string" aspect.

src/java.base/share/classes/java/lang/String.java line 2136:

> 2134:      * @implNote This method may allocate memory to compute the length 
> for some charsets.
> 2135:      *
> 2136:      * @param cs the charset used to the compute the length

Capitalize and perhaps link "Charset".

src/java.base/share/classes/java/lang/String.java line 2155:

> 2153:     }
> 2154: 
> 2155:     private int byteFor(int length, int coder) {

Add a //comment please.
The method name should be plural.  `bytesForCoder`.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2692343837
PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2692299982
PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2692301234
PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2692304224

Re: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v2]

Reply via email to