On Sun, 21 Nov 2021 00:29:46 GMT, Vicente Romero <vrom...@openjdk.org> wrote:

>> Please review this PR which aims to optimize the implementation of the 
>> `toString` method we provide for records. A benchmark comparing the 
>> implementation we are providing for records with lombok found out that 
>> lombok is much faster mainly because our implementation uses 
>> `String::format`. This fix is basically delegating on 
>> StringConcatFactory::makeConcatWithConstants which is faster.
>> 
>> TIA
>> 
>> This is the result of the benchmark comparing records to lombok with vanilla 
>> JDK:
>> 
>> Benchmark                          Mode  Cnt    Score    Error  Units
>> MyBenchmark.base                   avgt    4    0.849 ±  0.111  ns/op
>> MyBenchmark.equals_record          avgt    4    7.343 ±  2.740  ns/op
>> MyBenchmark.equals_value           avgt    4    6.644 ±  1.920  ns/op
>> MyBenchmark.record_hash_code       avgt    4    5.763 ±  3.882  ns/op
>> MyBenchmark.record_to_string       avgt    4  262.626 ± 12.574  ns/op        
>>        <------ Before
>> MyBenchmark.value_class_to_string  avgt    4   30.325 ± 21.389  ns/op
>> MyBenchmark.value_hash_code        avgt    4    5.048 ±  3.936  ns/op
>> 
>> 
>> after this patch:
>> 
>> Benchmark                          Mode  Cnt   Score   Error  Units
>> MyBenchmark.base                   avgt    4   0.680 ± 0.185  ns/op
>> MyBenchmark.equals_record          avgt    4   5.599 ± 1.348  ns/op
>> MyBenchmark.equals_value           avgt    4   5.718 ± 4.633  ns/op
>> MyBenchmark.record_hash_code       avgt    4   4.628 ± 4.368  ns/op
>> MyBenchmark.record_to_string       avgt    4  26.791 ± 1.817  ns/op          
>>        <------- After
>> MyBenchmark.value_class_to_string  avgt    4  35.473 ± 2.626  ns/op
>> MyBenchmark.value_hash_code        avgt    4   6.152 ± 5.101  ns/op
>
> Vicente Romero has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   setting max split size to 20

FWIW I did a few experiments trying to move the chunking to `SCF`. Most made 
things worse, but this is getting close: 
https://github.com/openjdk/jdk/compare/master...cl4es:scf_split?expand=1

The threshold for when the JIT fails to optimize seem to be pushed up a bit and 
I get a 4x gains when concatenating 100 Strings (though it takes a good while 
for the microbenchmark to stabilize). It also fails to make much of a 
difference when looking at 200 arguments (maybe 1.4x win). The experiment I 
have done so far are crude and aggregates the results into a String builder 
with no size estimation, so it adds a but of unnecessary allocation that could 
be improved. I think this needs way more work to be a good enhancement and that 
splitting up outside is going to remain better for now. 

A "hidden" drawback with chunking is that you're likely going to be allocating 
more.

I also re-realized that it'll be pretty hard (or impossible) to get around 
having some MH slot limit: even though we chunk it up internally, the MH we 
return must have a type that matches the type given to the bootstrap method, 
and any operations on the tree that require some temporary arguments we need to 
pass around will require some argument slots. The current strategy doesn't use 
too many temporaries, so the real limit might be around 250, but it makes sense 
to give some room for alternative implementations that use more temporaries if 
that makes things more efficient.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6403

Reply via email to