On Mon, 7 Mar 2022 23:36:41 GMT, Xin Liu <x...@openjdk.org> wrote: >> If AbstractStringBuilder only grow, the inflated value which has been >> encoded in UTF16 can't be compressed. >> toString() can skip compression in this case. This can save an >> ArrayAllocation in StringUTF16::compress(). >> >> java.io.BufferedRead::readLine() is a case that StringBuilder grows only. >> >> In microbench, we expect to see that allocation/op reduces 20%. The initial >> capacity of StringBuilder is S in bytes. When it encounters the 1st >> character that can't be encoded in LATIN1, it inflates and allocate a new >> array of 2*S. `toString()` will try to compress that value so it need to >> allocate S bytes. The last step allocates 2*S bytes because it has to copy >> the string. so it requires to allocate 5 * S bytes in total. By skipping >> the failed compression, it only allocates 4 * S bytes. that's 20%. In real >> execution, we observe 16% allocation reduction, tracked by JMH GC profiler >> `gc.alloc.rate.norm `. I think it's because HotSpot can't track all >> allocations. >> >> Not only allocation drops, the runtime performance(ns/op) also increases >> from 3.34% to 18.91%. >> >> Before: >> >> $$make test >> TEST="micro:org.openjdk.bench.java.lang.StringBuilders.toStringWithMixedChars" >> MICRO="OPTIONS=-prof gc -gc true -o before.log -jvm >> $HOME/Devel/jdk_baseline/bin/java" >> >> Benchmark >> (MIXED_SIZE) Mode Cnt Score Error Units >> StringBuilders.toStringWithMixedChars >> 128 avgt 15 649.846 ± 76.291 ns/op >> StringBuilders.toStringWithMixedChars:·gc.alloc.rate >> 128 avgt 15 872.855 ± 128.259 MB/sec >> StringBuilders.toStringWithMixedChars:·gc.alloc.rate.norm >> 128 avgt 15 880.121 ± 0.050 B/op >> StringBuilders.toStringWithMixedChars:·gc.churn.G1_Eden_Space >> 128 avgt 15 707.730 ± 194.421 MB/sec >> StringBuilders.toStringWithMixedChars:·gc.churn.G1_Eden_Space.norm >> 128 avgt 15 706.602 ± 94.504 B/op >> StringBuilders.toStringWithMixedChars:·gc.churn.G1_Survivor_Space >> 128 avgt 15 0.001 ± 0.002 MB/sec >> StringBuilders.toStringWithMixedChars:·gc.churn.G1_Survivor_Space.norm >> 128 avgt 15 0.001 ± 0.001 B/op >> StringBuilders.toStringWithMixedChars:·gc.count >> 128 avgt 15 113.000 counts >> StringBuilders.toStringWithMixedChars:·gc.time >> 128 avgt 15 85.000 ms >> StringBuilders.toStringWithMixedChars >> 256 avgt 15 1316.652 ± 112.771 ns/op >> StringBuilders.toStringWithMixedChars:·gc.alloc.rate >> 256 avgt 15 800.864 ± 76.869 MB/sec >> StringBuilders.toStringWithMixedChars:·gc.alloc.rate.norm >> 256 avgt 15 1648.288 ± 0.162 B/op >> StringBuilders.toStringWithMixedChars:·gc.churn.G1_Eden_Space >> 256 avgt 15 599.736 ± 174.001 MB/sec >> StringBuilders.toStringWithMixedChars:·gc.churn.G1_Eden_Space.norm >> 256 avgt 15 1229.669 ± 318.518 B/op >> StringBuilders.toStringWithMixedChars:·gc.churn.G1_Survivor_Space >> 256 avgt 15 0.001 ± 0.001 MB/sec >> StringBuilders.toStringWithMixedChars:·gc.churn.G1_Survivor_Space.norm >> 256 avgt 15 0.001 ± 0.002 B/op >> StringBuilders.toStringWithMixedChars:·gc.count >> 256 avgt 15 133.000 counts >> StringBuilders.toStringWithMixedChars:·gc.time >> 256 avgt 15 92.000 ms >> StringBuilders.toStringWithMixedChars >> 1024 avgt 15 5204.303 ± 418.115 ns/op >> StringBuilders.toStringWithMixedChars:·gc.alloc.rate >> 1024 avgt 15 768.730 ± 72.945 MB/sec >> StringBuilders.toStringWithMixedChars:·gc.alloc.rate.norm >> 1024 avgt 15 6256.844 ± 0.358 B/op >> StringBuilders.toStringWithMixedChars:·gc.churn.G1_Eden_Space >> 1024 avgt 15 655.852 ± 121.602 MB/sec >> StringBuilders.toStringWithMixedChars:·gc.churn.G1_Eden_Space.norm >> 1024 avgt 15 5315.265 ± 578.878 B/op >> StringBuilders.toStringWithMixedChars:·gc.churn.G1_Survivor_Space >> 1024 avgt 15 0.002 ± 0.002 MB/sec >> StringBuilders.toStringWithMixedChars:·gc.churn.G1_Survivor_Space.norm >> 1024 avgt 15 0.014 ± 0.011 B/op >> StringBuilders.toStringWithMixedChars:·gc.count >> 1024 avgt 15 96.000 counts >> StringBuilders.toStringWithMixedChars:·gc.time >> 1024 avgt 15 86.000 ms >> >> >> After >> >> $make test >> TEST="micro:org.openjdk.bench.java.lang.StringBuilders.toStringWithMixedChars" >> MICRO="OPTIONS=-prof gc -gc true -o after.log" >> >> Benchmark >> (MIXED_SIZE) Mode Cnt Score Error Units >> StringBuilders.toStringWithMixedChars >> 128 avgt 15 627.522 ± 54.804 ns/op >> StringBuilders.toStringWithMixedChars:·gc.alloc.rate >> 128 avgt 15 751.353 ± 83.478 MB/sec >> StringBuilders.toStringWithMixedChars:·gc.alloc.rate.norm >> 128 avgt 15 736.120 ± 0.062 B/op >> StringBuilders.toStringWithMixedChars:·gc.churn.G1_Eden_Space >> 128 avgt 15 589.102 ± 164.083 MB/sec >> StringBuilders.toStringWithMixedChars:·gc.churn.G1_Eden_Space.norm >> 128 avgt 15 575.425 ± 127.021 B/op >> StringBuilders.toStringWithMixedChars:·gc.churn.G1_Survivor_Space >> 128 avgt 15 ≈ 10⁻³ MB/sec >> StringBuilders.toStringWithMixedChars:·gc.churn.G1_Survivor_Space.norm >> 128 avgt 15 ≈ 10⁻³ B/op >> StringBuilders.toStringWithMixedChars:·gc.count >> 128 avgt 15 116.000 counts >> StringBuilders.toStringWithMixedChars:·gc.time >> 128 avgt 15 86.000 ms >> StringBuilders.toStringWithMixedChars >> 256 avgt 15 1185.884 ± 137.364 ns/op >> StringBuilders.toStringWithMixedChars:·gc.alloc.rate >> 256 avgt 15 746.179 ± 92.534 MB/sec >> StringBuilders.toStringWithMixedChars:·gc.alloc.rate.norm >> 256 avgt 15 1376.244 ± 0.125 B/op >> StringBuilders.toStringWithMixedChars:·gc.churn.G1_Eden_Space >> 256 avgt 15 579.137 ± 208.636 MB/sec >> StringBuilders.toStringWithMixedChars:·gc.churn.G1_Eden_Space.norm >> 256 avgt 15 1055.150 ± 307.010 B/op >> StringBuilders.toStringWithMixedChars:·gc.churn.G1_Survivor_Space >> 256 avgt 15 0.002 ± 0.002 MB/sec >> StringBuilders.toStringWithMixedChars:·gc.churn.G1_Survivor_Space.norm >> 256 avgt 15 0.003 ± 0.003 B/op >> StringBuilders.toStringWithMixedChars:·gc.count >> 256 avgt 15 126.000 counts >> StringBuilders.toStringWithMixedChars:·gc.time >> 256 avgt 15 97.000 ms >> StringBuilders.toStringWithMixedChars >> 1024 avgt 15 4220.415 ± 440.169 ns/op >> StringBuilders.toStringWithMixedChars:·gc.alloc.rate >> 1024 avgt 15 791.945 ± 75.231 MB/sec >> StringBuilders.toStringWithMixedChars:·gc.alloc.rate.norm >> 1024 avgt 15 5217.435 ± 0.543 B/op >> StringBuilders.toStringWithMixedChars:·gc.churn.G1_Eden_Space >> 1024 avgt 15 657.270 ± 235.803 MB/sec >> StringBuilders.toStringWithMixedChars:·gc.churn.G1_Eden_Space.norm >> 1024 avgt 15 4232.470 ± 1267.388 B/op >> StringBuilders.toStringWithMixedChars:·gc.churn.G1_Survivor_Space >> 1024 avgt 15 0.005 ± 0.002 MB/sec >> StringBuilders.toStringWithMixedChars:·gc.churn.G1_Survivor_Space.norm >> 1024 avgt 15 0.033 ± 0.014 B/op >> StringBuilders.toStringWithMixedChars:·gc.count >> 1024 avgt 15 202.000 counts >> StringBuilders.toStringWithMixedChars:·gc.time >> 1024 avgt 15 121.000 ms > > Xin Liu has updated the pull request incrementally with one additional commit > since the last revision: > > make sure String(StringBuffer) is still synchronized.
I could find at least 2 other cases where `AbstractStringBuilder` is initialized with UTF16 coder, `growOnly` is `true`, and the string is compressible: - the constructor that takes a `CharSequence` may be used with `AbstractStringBuilder`; it then copies the coder, but does not copy `growOnly` - the same happens in `append(AbstractStringBuilder asb)`; coder may be upgraded to UTF16 even if asb has `growOnly` = `false` As for a better name for `growOnly`, something like `mayBeLatin` would better convey the variable's purpose. What do you think? ------------- PR: https://git.openjdk.java.net/jdk/pull/7671