On Wed, 14 Jan 2026 10:59:32 GMT, Liam Miller-Cushon <[email protected]> wrote:
>> This implements an API to return the byte length of a String encoded in a >> given charset. See >> [JDK-8372353](https://bugs.openjdk.org/browse/JDK-8372353) for background. >> >> --- >> >> >> Benchmark (encoding) (stringLength) Mode >> Cnt Score Error Units >> StringLoopJmhBenchmark.getBytes ASCII 10 thrpt >> 5 406782650.595 ± 16960032.852 ops/s >> StringLoopJmhBenchmark.getBytes ASCII 100 thrpt >> 5 172936926.189 ± 4532029.201 ops/s >> StringLoopJmhBenchmark.getBytes ASCII 1000 thrpt >> 5 38830681.232 ± 2413274.766 ops/s >> StringLoopJmhBenchmark.getBytes ASCII 100000 thrpt >> 5 458881.155 ± 12818.317 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 10 thrpt >> 5 37193762.990 ± 3962947.391 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 100 thrpt >> 5 55400876.236 ± 1267331.434 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 1000 thrpt >> 5 11104514.001 ± 41718.545 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 100000 thrpt >> 5 182535.414 ± 10296.120 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 10 thrpt >> 5 113474681.457 ± 8326589.199 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 100 thrpt >> 5 37854103.127 ± 4808526.773 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 1000 thrpt >> 5 4139833.009 ± 70636.784 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 100000 thrpt >> 5 57644.637 ± 1887.112 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 10 thrpt >> 5 946701647.247 ± 76938927.141 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 100 thrpt >> 5 396615374.479 ± 15167234.884 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 1000 thrpt >> 5 100464784.979 ± 794027.897 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 100000 thrpt >> 5 1215487.689 ± 1916.468 ops/s >> StringLoopJmhBenchmark.getBytesLength LATIN1 10 thrpt >> 5 221265102.323 ± 17013983.056 ops/s >> StringLoopJmhBenchmark.getBytesLength LATIN1 100 thrpt >> 5 137617873.887 ± 5842185.781 ops/s >> StringLoopJmhBenchmark.getBytesLength LATIN1 1000 thrpt >> 5 92540259.1... > > Liam Miller-Cushon has updated the pull request incrementally with one > additional commit since the last revision: > > Review feedback src/java.base/share/classes/java/lang/String.java line 1080: > 1078: return value.length; > 1079: } > 1080: int len = value.length >> 1; I don't think I understand what's being done and what Charset encoder it is mimicking. It probably needs to document the assumptions about unmappable characters and malformed surrogates. (Likely it is correct since the test of US_ASCII passes, but could use an explanation). src/java.base/share/classes/java/lang/String.java line 2130: > 2128: > 2129: /** > 2130: * Returns the length in bytes of the given String encoded with the > given {@link Charset}. You can use the javadoc tag `@return` and skip the duplication. This first sentence reads better then the @return below since it emphasies the "encoded string" aspect. src/java.base/share/classes/java/lang/String.java line 2136: > 2134: * @implNote This method may allocate memory to compute the length > for some charsets. > 2135: * > 2136: * @param cs the charset used to the compute the length Capitalize and perhaps link "Charset". src/java.base/share/classes/java/lang/String.java line 2155: > 2153: } > 2154: > 2155: private int byteFor(int length, int coder) { Add a //comment please. The method name should be plural. `bytesForCoder`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2692343837 PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2692299982 PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2692301234 PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2692304224
