Re: RFR: 8261847: performace of java.lang.Record::toString should be improved [v4]

2021-11-20 Thread Vicente Romero
> Please review this PR which aims to optimize the implementation of the 
> `toString` method we provide for records. A benchmark comparing the 
> implementation we are providing for records with lombok found out that lombok 
> is much faster mainly because our implementation uses `String::format`. This 
> fix is basically delegating on StringConcatFactory::makeConcatWithConstants 
> which is faster.
> 
> TIA
> 
> This is the result of the benchmark comparing records to lombok with vanilla 
> JDK:
> 
> Benchmark  Mode  CntScoreError  Units
> MyBenchmark.base   avgt40.849 ±  0.111  ns/op
> MyBenchmark.equals_record  avgt47.343 ±  2.740  ns/op
> MyBenchmark.equals_value   avgt46.644 ±  1.920  ns/op
> MyBenchmark.record_hash_code   avgt45.763 ±  3.882  ns/op
> MyBenchmark.record_to_string   avgt4  262.626 ± 12.574  ns/op 
>   <-- Before
> MyBenchmark.value_class_to_string  avgt4   30.325 ± 21.389  ns/op
> MyBenchmark.value_hash_codeavgt45.048 ±  3.936  ns/op
> 
> 
> after this patch:
> 
> Benchmark  Mode  Cnt   Score   Error  Units
> MyBenchmark.base   avgt4   0.680 ± 0.185  ns/op
> MyBenchmark.equals_record  avgt4   5.599 ± 1.348  ns/op
> MyBenchmark.equals_value   avgt4   5.718 ± 4.633  ns/op
> MyBenchmark.record_hash_code   avgt4   4.628 ± 4.368  ns/op
> MyBenchmark.record_to_string   avgt4  26.791 ± 1.817  ns/op   
>   <--- After
> MyBenchmark.value_class_to_string  avgt4  35.473 ± 2.626  ns/op
> MyBenchmark.value_hash_codeavgt4   6.152 ± 5.101  ns/op

Vicente Romero has updated the pull request incrementally with one additional 
commit since the last revision:

  setting max split size to 20

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6403/files
  - new: https://git.openjdk.java.net/jdk/pull/6403/files/4698f98e..2d3240e1

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6403&range=03
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6403&range=02-03

  Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6403.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6403/head:pull/6403

PR: https://git.openjdk.java.net/jdk/pull/6403


Re: RFR: 8261847: performace of java.lang.Record::toString should be improved [v4]

2021-11-22 Thread Jim Laskey
On Sun, 21 Nov 2021 00:29:46 GMT, Vicente Romero  wrote:

>> Please review this PR which aims to optimize the implementation of the 
>> `toString` method we provide for records. A benchmark comparing the 
>> implementation we are providing for records with lombok found out that 
>> lombok is much faster mainly because our implementation uses 
>> `String::format`. This fix is basically delegating on 
>> StringConcatFactory::makeConcatWithConstants which is faster.
>> 
>> TIA
>> 
>> This is the result of the benchmark comparing records to lombok with vanilla 
>> JDK:
>> 
>> Benchmark  Mode  CntScoreError  Units
>> MyBenchmark.base   avgt40.849 ±  0.111  ns/op
>> MyBenchmark.equals_record  avgt47.343 ±  2.740  ns/op
>> MyBenchmark.equals_value   avgt46.644 ±  1.920  ns/op
>> MyBenchmark.record_hash_code   avgt45.763 ±  3.882  ns/op
>> MyBenchmark.record_to_string   avgt4  262.626 ± 12.574  ns/op
>><-- Before
>> MyBenchmark.value_class_to_string  avgt4   30.325 ± 21.389  ns/op
>> MyBenchmark.value_hash_codeavgt45.048 ±  3.936  ns/op
>> 
>> 
>> after this patch:
>> 
>> Benchmark  Mode  Cnt   Score   Error  Units
>> MyBenchmark.base   avgt4   0.680 ± 0.185  ns/op
>> MyBenchmark.equals_record  avgt4   5.599 ± 1.348  ns/op
>> MyBenchmark.equals_value   avgt4   5.718 ± 4.633  ns/op
>> MyBenchmark.record_hash_code   avgt4   4.628 ± 4.368  ns/op
>> MyBenchmark.record_to_string   avgt4  26.791 ± 1.817  ns/op  
>><--- After
>> MyBenchmark.value_class_to_string  avgt4  35.473 ± 2.626  ns/op
>> MyBenchmark.value_hash_codeavgt4   6.152 ± 5.101  ns/op
>
> Vicente Romero has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   setting max split size to 20

Marked as reviewed by jlaskey (Reviewer).

src/java.base/share/classes/java/lang/runtime/ObjectMethods.java line 276:

> 274: if (i != names.size() - 1) {
> 275: constants[1 + 2 * i + 1] = ", ";
> 276: }

Wondering if it would be cleaner to use a full recipe.


StringBuilder recipeSB = new StringBuilder();
recipeSB.append(receiverClass.getSimpleName());
recipeSB.append('[');

for (int i = 0; i < names.size(); i++) {
if (i != 0) {
recipeSB.append(", ");
}

recipeSB.append(names.get(i));
recipeSB.append('=');
recipeSB.append('\1');
}

recipeSB.append(']');
String recipe = recipeSB.toString();

-

PR: https://git.openjdk.java.net/jdk/pull/6403


Re: RFR: 8261847: performace of java.lang.Record::toString should be improved [v4]

2021-11-22 Thread Claes Redestad
On Sun, 21 Nov 2021 00:29:46 GMT, Vicente Romero  wrote:

>> Please review this PR which aims to optimize the implementation of the 
>> `toString` method we provide for records. A benchmark comparing the 
>> implementation we are providing for records with lombok found out that 
>> lombok is much faster mainly because our implementation uses 
>> `String::format`. This fix is basically delegating on 
>> StringConcatFactory::makeConcatWithConstants which is faster.
>> 
>> TIA
>> 
>> This is the result of the benchmark comparing records to lombok with vanilla 
>> JDK:
>> 
>> Benchmark  Mode  CntScoreError  Units
>> MyBenchmark.base   avgt40.849 ±  0.111  ns/op
>> MyBenchmark.equals_record  avgt47.343 ±  2.740  ns/op
>> MyBenchmark.equals_value   avgt46.644 ±  1.920  ns/op
>> MyBenchmark.record_hash_code   avgt45.763 ±  3.882  ns/op
>> MyBenchmark.record_to_string   avgt4  262.626 ± 12.574  ns/op
>><-- Before
>> MyBenchmark.value_class_to_string  avgt4   30.325 ± 21.389  ns/op
>> MyBenchmark.value_hash_codeavgt45.048 ±  3.936  ns/op
>> 
>> 
>> after this patch:
>> 
>> Benchmark  Mode  Cnt   Score   Error  Units
>> MyBenchmark.base   avgt4   0.680 ± 0.185  ns/op
>> MyBenchmark.equals_record  avgt4   5.599 ± 1.348  ns/op
>> MyBenchmark.equals_value   avgt4   5.718 ± 4.633  ns/op
>> MyBenchmark.record_hash_code   avgt4   4.628 ± 4.368  ns/op
>> MyBenchmark.record_to_string   avgt4  26.791 ± 1.817  ns/op  
>><--- After
>> MyBenchmark.value_class_to_string  avgt4  35.473 ± 2.626  ns/op
>> MyBenchmark.value_hash_codeavgt4   6.152 ± 5.101  ns/op
>
> Vicente Romero has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   setting max split size to 20

FWIW I did a few experiments trying to move the chunking to `SCF`. Most made 
things worse, but this is getting close: 
https://github.com/openjdk/jdk/compare/master...cl4es:scf_split?expand=1

The threshold for when the JIT fails to optimize seem to be pushed up a bit and 
I get a 4x gains when concatenating 100 Strings (though it takes a good while 
for the microbenchmark to stabilize). It also fails to make much of a 
difference when looking at 200 arguments (maybe 1.4x win). The experiment I 
have done so far are crude and aggregates the results into a String builder 
with no size estimation, so it adds a but of unnecessary allocation that could 
be improved. I think this needs way more work to be a good enhancement and that 
splitting up outside is going to remain better for now. 

A "hidden" drawback with chunking is that you're likely going to be allocating 
more.

I also re-realized that it'll be pretty hard (or impossible) to get around 
having some MH slot limit: even though we chunk it up internally, the MH we 
return must have a type that matches the type given to the bootstrap method, 
and any operations on the tree that require some temporary arguments we need to 
pass around will require some argument slots. The current strategy doesn't use 
too many temporaries, so the real limit might be around 250, but it makes sense 
to give some room for alternative implementations that use more temporaries if 
that makes things more efficient.

-

PR: https://git.openjdk.java.net/jdk/pull/6403


Re: RFR: 8261847: performace of java.lang.Record::toString should be improved [v4]

2021-11-22 Thread Claes Redestad
On Sun, 21 Nov 2021 00:29:46 GMT, Vicente Romero  wrote:

>> Please review this PR which aims to optimize the implementation of the 
>> `toString` method we provide for records. A benchmark comparing the 
>> implementation we are providing for records with lombok found out that 
>> lombok is much faster mainly because our implementation uses 
>> `String::format`. This fix is basically delegating on 
>> StringConcatFactory::makeConcatWithConstants which is faster.
>> 
>> TIA
>> 
>> This is the result of the benchmark comparing records to lombok with vanilla 
>> JDK:
>> 
>> Benchmark  Mode  CntScoreError  Units
>> MyBenchmark.base   avgt40.849 ±  0.111  ns/op
>> MyBenchmark.equals_record  avgt47.343 ±  2.740  ns/op
>> MyBenchmark.equals_value   avgt46.644 ±  1.920  ns/op
>> MyBenchmark.record_hash_code   avgt45.763 ±  3.882  ns/op
>> MyBenchmark.record_to_string   avgt4  262.626 ± 12.574  ns/op
>><-- Before
>> MyBenchmark.value_class_to_string  avgt4   30.325 ± 21.389  ns/op
>> MyBenchmark.value_hash_codeavgt45.048 ±  3.936  ns/op
>> 
>> 
>> after this patch:
>> 
>> Benchmark  Mode  Cnt   Score   Error  Units
>> MyBenchmark.base   avgt4   0.680 ± 0.185  ns/op
>> MyBenchmark.equals_record  avgt4   5.599 ± 1.348  ns/op
>> MyBenchmark.equals_value   avgt4   5.718 ± 4.633  ns/op
>> MyBenchmark.record_hash_code   avgt4   4.628 ± 4.368  ns/op
>> MyBenchmark.record_to_string   avgt4  26.791 ± 1.817  ns/op  
>><--- After
>> MyBenchmark.value_class_to_string  avgt4  35.473 ± 2.626  ns/op
>> MyBenchmark.value_hash_codeavgt4   6.152 ± 5.101  ns/op
>
> Vicente Romero has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   setting max split size to 20

Marked as reviewed by redestad (Reviewer).

src/java.base/share/classes/java/lang/runtime/ObjectMethods.java line 276:

> 274: int namesIndex = 0;
> 275: do {
> 276: /* StringConcatFactory::makeConcatWithConstants can only 
> deal with 200 spots, longs and double occupy two

typos: spot -> slot, chunck -> chunk, diference -> difference

src/java.base/share/classes/java/lang/runtime/ObjectMethods.java line 314:

> 312: ).getTarget();
> 313: mhs[splitIndex] = 
> MethodHandles.filterArguments(mhs[splitIndex], 0, currentSplitGetters);
> 314: mhs[splitIndex] = MethodHandles.permuteArguments(

This is some gnarly logic. Could we add some comments on what permuteArguments 
with a reorder array of just zeros is doing here?

src/java.base/share/classes/java/lang/runtime/ObjectMethods.java line 330:

> 328: 
> 329: /**
> 330:  * Chops the getters into smaller chunks according to the maximum 
> number of spots

typo: spots -> slots

src/java.base/share/classes/java/lang/runtime/ObjectMethods.java line 333:

> 331:  * StringConcatFactory::makeConcatWithConstants can chew
> 332:  * @param getters the current getters
> 333:  * @return chunks that wont surpass the maximum number of spots 
> StringConcatFactory::makeConcatWithConstants can chew

ditto

-

PR: https://git.openjdk.java.net/jdk/pull/6403


Re: RFR: 8261847: performace of java.lang.Record::toString should be improved [v4]

2021-11-22 Thread Jim Laskey
On Mon, 22 Nov 2021 15:56:46 GMT, Claes Redestad  wrote:

>> Vicente Romero has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   setting max split size to 20
>
> src/java.base/share/classes/java/lang/runtime/ObjectMethods.java line 314:
> 
>> 312: ).getTarget();
>> 313: mhs[splitIndex] = 
>> MethodHandles.filterArguments(mhs[splitIndex], 0, currentSplitGetters);
>> 314: mhs[splitIndex] = MethodHandles.permuteArguments(
> 
> This is some gnarly logic. Could we add some comments on what 
> permuteArguments with a reorder array of just zeros is doing here?

This is not unusual. It spreads a single argument across several "getters". But 
a comment wouldn't hurt.

-

PR: https://git.openjdk.java.net/jdk/pull/6403


Re: RFR: 8261847: performace of java.lang.Record::toString should be improved [v4]

2021-11-22 Thread Vicente Romero
On Sun, 21 Nov 2021 00:29:46 GMT, Vicente Romero  wrote:

>> Please review this PR which aims to optimize the implementation of the 
>> `toString` method we provide for records. A benchmark comparing the 
>> implementation we are providing for records with lombok found out that 
>> lombok is much faster mainly because our implementation uses 
>> `String::format`. This fix is basically delegating on 
>> StringConcatFactory::makeConcatWithConstants which is faster.
>> 
>> TIA
>> 
>> This is the result of the benchmark comparing records to lombok with vanilla 
>> JDK:
>> 
>> Benchmark  Mode  CntScoreError  Units
>> MyBenchmark.base   avgt40.849 ±  0.111  ns/op
>> MyBenchmark.equals_record  avgt47.343 ±  2.740  ns/op
>> MyBenchmark.equals_value   avgt46.644 ±  1.920  ns/op
>> MyBenchmark.record_hash_code   avgt45.763 ±  3.882  ns/op
>> MyBenchmark.record_to_string   avgt4  262.626 ± 12.574  ns/op
>><-- Before
>> MyBenchmark.value_class_to_string  avgt4   30.325 ± 21.389  ns/op
>> MyBenchmark.value_hash_codeavgt45.048 ±  3.936  ns/op
>> 
>> 
>> after this patch:
>> 
>> Benchmark  Mode  Cnt   Score   Error  Units
>> MyBenchmark.base   avgt4   0.680 ± 0.185  ns/op
>> MyBenchmark.equals_record  avgt4   5.599 ± 1.348  ns/op
>> MyBenchmark.equals_value   avgt4   5.718 ± 4.633  ns/op
>> MyBenchmark.record_hash_code   avgt4   4.628 ± 4.368  ns/op
>> MyBenchmark.record_to_string   avgt4  26.791 ± 1.817  ns/op  
>><--- After
>> MyBenchmark.value_class_to_string  avgt4  35.473 ± 2.626  ns/op
>> MyBenchmark.value_hash_codeavgt4   6.152 ± 5.101  ns/op
>
> Vicente Romero has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   setting max split size to 20

hi guys, can I have a thumbs up for this one? I will continue doing additional 
research for the `hashCode` and `equals` methods as a follow up but would like 
to close this one as the performance for the `Record::toString` was the worst 
compared to lombok.

TIA

-

PR: https://git.openjdk.java.net/jdk/pull/6403


Re: RFR: 8261847: performace of java.lang.Record::toString should be improved [v4]

2021-11-22 Thread Claes Redestad
On Mon, 22 Nov 2021 16:03:16 GMT, Jim Laskey  wrote:

>> src/java.base/share/classes/java/lang/runtime/ObjectMethods.java line 314:
>> 
>>> 312: ).getTarget();
>>> 313: mhs[splitIndex] = 
>>> MethodHandles.filterArguments(mhs[splitIndex], 0, currentSplitGetters);
>>> 314: mhs[splitIndex] = MethodHandles.permuteArguments(
>> 
>> This is some gnarly logic. Could we add some comments on what 
>> permuteArguments with a reorder array of just zeros is doing here?
>
> This is not unusual. It spreads a single argument across several "getters". 
> But a comment wouldn't hurt.

I haven't seen it used like this before. The duplication behavior is mentioned 
explicitly with an example in the javadoc, sure, but it still took me a deep 
reading of the docs to understand what was going on here and what the intent 
was.

-

PR: https://git.openjdk.java.net/jdk/pull/6403


Re: RFR: 8261847: performace of java.lang.Record::toString should be improved [v4]

2021-11-22 Thread Vicente Romero
On Mon, 22 Nov 2021 15:53:53 GMT, Claes Redestad  wrote:

> FWIW I did a few experiments trying to move the chunking to `SCF`. Most made 
> things worse, but this is getting close: 
> https://github.com/openjdk/jdk/compare/master...cl4es:scf_split?expand=1
> 
> The threshold for when the JIT fails to optimize seem to be pushed up a bit 
> and I get a 4x gains when concatenating 100 Strings (though it takes a good 
> while for the microbenchmark to stabilize). It also fails to make much of a 
> difference when looking at 200 arguments (maybe 1.4x win). The experiment I 
> have done so far are crude and aggregates the results into a String builder 
> with no size estimation, so it adds a but of unnecessary allocation that 
> could be improved. I think this needs way more work to be a good enhancement 
> and that splitting up outside is going to remain better for now.
> 
> A "hidden" drawback with chunking is that you're likely going to be 
> allocating more.
> 
> I also re-realized that it'll be pretty hard (or impossible) to get around 
> having some MH slot limit: even though we chunk it up internally, the MH we 
> return must have a type that matches the type given to the bootstrap method, 
> and any operations on the tree that require some temporary arguments we need 
> to pass around will require some argument slots. The current strategy doesn't 
> use too many temporaries, so the real limit might be around 250, but it makes 
> sense to give some room for alternative implementations that use more 
> temporaries if that makes things more efficient.

Thanks for sharing this and for doing the experiment!

-

PR: https://git.openjdk.java.net/jdk/pull/6403