On Wed, 9 Mar 2022 08:33:36 GMT, Xin Liu <x...@openjdk.org> wrote:

>> If AbstractStringBuilder only grow, the inflated value which has been 
>> encoded in UTF16 can't be compressed. 
>> toString() can skip compression in this case. This can save an 
>> ArrayAllocation in StringUTF16::compress().
>> 
>> java.io.BufferedRead::readLine() is a case that StringBuilder grows only. 
>> 
>> In microbench, we expect to see that allocation/op reduces 20%.  The initial 
>> capacity of StringBuilder is S in bytes. When it encounters the 1st 
>> character that can't be encoded in LATIN1, it inflates and allocate a new 
>> array of 2*S. `toString()` will try to compress that value so it need to 
>> allocate S bytes. The last step allocates 2*S bytes because it has to copy 
>> the string.  so it requires to allocate 5 * S bytes in total.  By skipping 
>> the failed compression, it only allocates 4 * S bytes.  that's 20%. In real 
>> execution, we observe 16% allocation reduction, tracked by JMH GC profiler 
>> `gc.alloc.rate.norm `.  I think it's because HotSpot can't track all 
>> allocations. 
>> 
>> Not only allocation drops, the runtime performance(ns/op) also increases 
>> from 3.34% to 18.91%. 
>> 
>> Before: 
>> 
>> $$make test 
>> TEST="micro:org.openjdk.bench.java.lang.StringBuilders.toStringWithMixedChars"
>>  MICRO="OPTIONS=-prof gc -gc true -o before.log -jvm 
>> $HOME/Devel/jdk_baseline/bin/java" 
>> 
>> Benchmark                                                               
>> (MIXED_SIZE)  Mode  Cnt     Score     Error   Units
>> StringBuilders.toStringWithMixedChars                                        
>>     128  avgt   15   649.846 ±  76.291   ns/op
>> StringBuilders.toStringWithMixedChars:·gc.alloc.rate                         
>>     128  avgt   15   872.855 ± 128.259  MB/sec
>> StringBuilders.toStringWithMixedChars:·gc.alloc.rate.norm                    
>>     128  avgt   15   880.121 ±   0.050    B/op
>> StringBuilders.toStringWithMixedChars:·gc.churn.G1_Eden_Space                
>>     128  avgt   15   707.730 ± 194.421  MB/sec
>> StringBuilders.toStringWithMixedChars:·gc.churn.G1_Eden_Space.norm           
>>     128  avgt   15   706.602 ±  94.504    B/op
>> StringBuilders.toStringWithMixedChars:·gc.churn.G1_Survivor_Space            
>>     128  avgt   15     0.001 ±   0.002  MB/sec
>> StringBuilders.toStringWithMixedChars:·gc.churn.G1_Survivor_Space.norm       
>>     128  avgt   15     0.001 ±   0.001    B/op
>> StringBuilders.toStringWithMixedChars:·gc.count                              
>>     128  avgt   15   113.000            counts
>> StringBuilders.toStringWithMixedChars:·gc.time                               
>>     128  avgt   15    85.000                ms
>> StringBuilders.toStringWithMixedChars                                        
>>     256  avgt   15  1316.652 ± 112.771   ns/op
>> StringBuilders.toStringWithMixedChars:·gc.alloc.rate                         
>>     256  avgt   15   800.864 ±  76.869  MB/sec
>> StringBuilders.toStringWithMixedChars:·gc.alloc.rate.norm                    
>>     256  avgt   15  1648.288 ±   0.162    B/op
>> StringBuilders.toStringWithMixedChars:·gc.churn.G1_Eden_Space                
>>     256  avgt   15   599.736 ± 174.001  MB/sec
>> StringBuilders.toStringWithMixedChars:·gc.churn.G1_Eden_Space.norm           
>>     256  avgt   15  1229.669 ± 318.518    B/op
>> StringBuilders.toStringWithMixedChars:·gc.churn.G1_Survivor_Space            
>>     256  avgt   15     0.001 ±   0.001  MB/sec
>> StringBuilders.toStringWithMixedChars:·gc.churn.G1_Survivor_Space.norm       
>>     256  avgt   15     0.001 ±   0.002    B/op
>> StringBuilders.toStringWithMixedChars:·gc.count                              
>>     256  avgt   15   133.000            counts
>> StringBuilders.toStringWithMixedChars:·gc.time                               
>>     256  avgt   15    92.000                ms
>> StringBuilders.toStringWithMixedChars                                        
>>    1024  avgt   15  5204.303 ± 418.115   ns/op
>> StringBuilders.toStringWithMixedChars:·gc.alloc.rate                         
>>    1024  avgt   15   768.730 ±  72.945  MB/sec
>> StringBuilders.toStringWithMixedChars:·gc.alloc.rate.norm                    
>>    1024  avgt   15  6256.844 ±   0.358    B/op
>> StringBuilders.toStringWithMixedChars:·gc.churn.G1_Eden_Space                
>>    1024  avgt   15   655.852 ± 121.602  MB/sec
>> StringBuilders.toStringWithMixedChars:·gc.churn.G1_Eden_Space.norm           
>>    1024  avgt   15  5315.265 ± 578.878    B/op
>> StringBuilders.toStringWithMixedChars:·gc.churn.G1_Survivor_Space            
>>    1024  avgt   15     0.002 ±   0.002  MB/sec
>> StringBuilders.toStringWithMixedChars:·gc.churn.G1_Survivor_Space.norm       
>>    1024  avgt   15     0.014 ±   0.011    B/op
>> StringBuilders.toStringWithMixedChars:·gc.count                              
>>    1024  avgt   15    96.000            counts
>> StringBuilders.toStringWithMixedChars:·gc.time                               
>>    1024  avgt   15    86.000                ms
>> 
>> 
>> After
>> 
>> $make test 
>> TEST="micro:org.openjdk.bench.java.lang.StringBuilders.toStringWithMixedChars"
>>  MICRO="OPTIONS=-prof gc -gc true -o after.log" 
>> 
>> Benchmark                                                               
>> (MIXED_SIZE)  Mode  Cnt     Score      Error   Units
>> StringBuilders.toStringWithMixedChars                                        
>>     128  avgt   15   627.522 ±   54.804   ns/op
>> StringBuilders.toStringWithMixedChars:·gc.alloc.rate                         
>>     128  avgt   15   751.353 ±   83.478  MB/sec
>> StringBuilders.toStringWithMixedChars:·gc.alloc.rate.norm                    
>>     128  avgt   15   736.120 ±    0.062    B/op
>> StringBuilders.toStringWithMixedChars:·gc.churn.G1_Eden_Space                
>>     128  avgt   15   589.102 ±  164.083  MB/sec
>> StringBuilders.toStringWithMixedChars:·gc.churn.G1_Eden_Space.norm           
>>     128  avgt   15   575.425 ±  127.021    B/op
>> StringBuilders.toStringWithMixedChars:·gc.churn.G1_Survivor_Space            
>>     128  avgt   15    ≈ 10⁻³             MB/sec
>> StringBuilders.toStringWithMixedChars:·gc.churn.G1_Survivor_Space.norm       
>>     128  avgt   15    ≈ 10⁻³               B/op
>> StringBuilders.toStringWithMixedChars:·gc.count                              
>>     128  avgt   15   116.000             counts
>> StringBuilders.toStringWithMixedChars:·gc.time                               
>>     128  avgt   15    86.000                 ms
>> StringBuilders.toStringWithMixedChars                                        
>>     256  avgt   15  1185.884 ±  137.364   ns/op
>> StringBuilders.toStringWithMixedChars:·gc.alloc.rate                         
>>     256  avgt   15   746.179 ±   92.534  MB/sec
>> StringBuilders.toStringWithMixedChars:·gc.alloc.rate.norm                    
>>     256  avgt   15  1376.244 ±    0.125    B/op
>> StringBuilders.toStringWithMixedChars:·gc.churn.G1_Eden_Space                
>>     256  avgt   15   579.137 ±  208.636  MB/sec
>> StringBuilders.toStringWithMixedChars:·gc.churn.G1_Eden_Space.norm           
>>     256  avgt   15  1055.150 ±  307.010    B/op
>> StringBuilders.toStringWithMixedChars:·gc.churn.G1_Survivor_Space            
>>     256  avgt   15     0.002 ±    0.002  MB/sec
>> StringBuilders.toStringWithMixedChars:·gc.churn.G1_Survivor_Space.norm       
>>     256  avgt   15     0.003 ±    0.003    B/op
>> StringBuilders.toStringWithMixedChars:·gc.count                              
>>     256  avgt   15   126.000             counts
>> StringBuilders.toStringWithMixedChars:·gc.time                               
>>     256  avgt   15    97.000                 ms
>> StringBuilders.toStringWithMixedChars                                        
>>    1024  avgt   15  4220.415 ±  440.169   ns/op
>> StringBuilders.toStringWithMixedChars:·gc.alloc.rate                         
>>    1024  avgt   15   791.945 ±   75.231  MB/sec
>> StringBuilders.toStringWithMixedChars:·gc.alloc.rate.norm                    
>>    1024  avgt   15  5217.435 ±    0.543    B/op
>> StringBuilders.toStringWithMixedChars:·gc.churn.G1_Eden_Space                
>>    1024  avgt   15   657.270 ±  235.803  MB/sec
>> StringBuilders.toStringWithMixedChars:·gc.churn.G1_Eden_Space.norm           
>>    1024  avgt   15  4232.470 ± 1267.388    B/op
>> StringBuilders.toStringWithMixedChars:·gc.churn.G1_Survivor_Space            
>>    1024  avgt   15     0.005 ±    0.002  MB/sec
>> StringBuilders.toStringWithMixedChars:·gc.churn.G1_Survivor_Space.norm       
>>    1024  avgt   15     0.033 ±    0.014    B/op
>> StringBuilders.toStringWithMixedChars:·gc.count                              
>>    1024  avgt   15   202.000             counts
>> StringBuilders.toStringWithMixedChars:·gc.time                               
>>    1024  avgt   15   121.000                 ms
>
> Xin Liu has updated the pull request incrementally with one additional commit 
> since the last revision:
> 
>   Change growOnly to maybeLatin.
>   
>   This patch also copys over the attribute from the other 
> AbstractStringBuilder.
>   Add a unit test to cover methods which cause maybeLatin1 becomes true.

Code changes look ok overall. We might inflate instance size of 
`StringBuilder`s a bit (could be examined via 
https://openjdk.java.net/projects/code-tools/jol/), but since SBs are typically 
short-lived entities I'm not too concerned about a few extra bytes here.

The microbenchmark needs a small refactoring to avoid running micros that don't 
care about the newly added `MIXED_SIZE` `@Param` multiple times.

src/java.base/share/classes/java/lang/AbstractStringBuilder.java line 540:

> 538:             }
> 539:             StringUTF16.putCharSB(value, index, ch);
> 540:             maybeLatin1 = true;

Only possible if `StringLatin1.canEncode(ch) == true` (and strictly speaking if 
the char at `index` is not latin1-encodable). Considering the overheads of 
adding a test it might be preferable to be conservative and just always mark it 
as a "maybe" like you do, though.

src/java.base/share/classes/java/lang/AbstractStringBuilder.java line 1008:

> 1006:         this.count = newCount;
> 1007:         putStringAt(start, str);
> 1008:         if (end - start > 0) {

regardless of value of `end - start` you could also skip setting `maybeLatin1 = 
true` if:
- `str.coder() == UTF16`
- `this.coder == LATIN1`

test/micro/org/openjdk/bench/java/lang/StringBuilders.java line 208:

> 206:     }
> 207: 
> 208:     @Param({"128", "256", "1024"})

I think adding this parameter at this level means all other benchmarks will now 
be redundantly run once per different value for MIXED_SIZE. You can avoid this 
by refactor this parameter and the associated Benchmark into subclass.

-------------

Changes requested by redestad (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/7671

Reply via email to