Re: RFR: 8265237: String.join and StringJoiner can be improved further [v2]

2021-04-18 Thread Peter Levart
On Fri, 16 Apr 2021 23:02:33 GMT, Peter Levart  wrote:

>> src/java.base/share/classes/java/lang/String.java line 3254:
>> 
>>> 3252: 
>>> 3253: byte[] value = StringConcatHelper.newArray(((long) icoder << 
>>> 32) | llen);
>>> 3254: int off = 0;
>> 
>> StringConcatHelper.newArray() can double the length (based on the coder) and 
>> it is then truncated to 32 bits when passed to 
>> UNSAFE.allocatlUnitializedArray.
>> The test of length above only ensures llen can be truncated to 32 bits 
>> without loss of data.
>
> I thought about that, yes. And I think we have to do the check for the 
> doubled length before calling the newArray. I checked the StringJoinerTest 
> and it only deals with ascii strings unfortunately. Will have to add a test 
> for that too...

I do the checks before calling `StringConcatHelper.newArray()` now and pass a 
long value to it that already holds the number of bytes needed and where the 
upper 32  bits (coder) is always 0.

-

PR: https://git.openjdk.java.net/jdk/pull/3501


Re: RFR: 8265237: String.join and StringJoiner can be improved further [v2]

2021-04-17 Thread Tagir F . Valeev
On Thu, 15 Apr 2021 19:26:48 GMT, Peter Levart  wrote:

>> While JDK-8148937 improved StringJoiner class by replacing internal use of 
>> getChars that copies out characters from String elements into a char[] array 
>> with StringBuilder which is somehow more optimal, the improvement was 
>> marginal in speed (0% ... 10%) and mainly for smaller strings, while GC was 
>> reduced by about 50% in average per operation.
>> Initial attempt to tackle that issue was more involved, but was later 
>> discarded because it was apparently using too much internal String details 
>> in code that lives outside String and outside java.lang package.
>> But there is another way to package such "intimate" code - we can put it 
>> into String itself and just call it from StringJoiner.
>> This PR is an attempt at doing just that. It introduces new package-private 
>> method in `java.lang.String` which is then used from both pubic static 
>> `String.join` methods as well as from `java.util.StringJoiner` (via 
>> SharedSecrets). The improvements can be seen by running the following JMH 
>> benchmark:
>> 
>> https://gist.github.com/plevart/86ac7fc6d4541dbc08256cde544019ce
>> 
>> The comparative results are here:
>> 
>> https://jmh.morethan.io/?gist=7eb421cf7982456a2962269137f71c15
>> 
>> The jmh-result.json files are here:
>> 
>> https://gist.github.com/plevart/7eb421cf7982456a2962269137f71c15
>> 
>> Improvement in speed ranges from 8% (for small strings) to 200% (for long 
>> strings), while creation of garbage has been further reduced to an almost 
>> garbage-free operation.
>> 
>> So WDYT?
>
> Peter Levart has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Add String.join benchmark method to StringJoinerBenchmark and adjust some 
> parameters to cover bigger range

Great work, thanks! I tried to do something similar a couple of years ago but 
did not submit my patch. One thing that stopped me is that joining many 
one-character strings was slower with my patch, compared to the baseline. 
Something like this:


// setup
String[] strings = new String[100];
Arrays.fill(strings, "a");
// benchmark
return String.join(",", strings);


Could you add this case to your benchmark, for completeness?

-

PR: https://git.openjdk.java.net/jdk/pull/3501


Re: RFR: 8265237: String.join and StringJoiner can be improved further [v2]

2021-04-16 Thread Peter Levart
On Fri, 16 Apr 2021 19:05:25 GMT, Roger Riggs  wrote:

>> Peter Levart has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   Add String.join benchmark method to StringJoinerBenchmark and adjust some 
>> parameters to cover bigger range
>
> src/java.base/share/classes/java/lang/String.java line 3254:
> 
>> 3252: 
>> 3253: byte[] value = StringConcatHelper.newArray(((long) icoder << 
>> 32) | llen);
>> 3254: int off = 0;
> 
> StringConcatHelper.newArray() can double the length (based on the coder) and 
> it is then truncated to 32 bits when passed to 
> UNSAFE.allocatlUnitializedArray.
> The test of length above only ensures llen can be truncated to 32 bits 
> without loss of data.

I thought about that, yes. And I think we have to do the check for the doubled 
length before calling the newArray. I checked the StringJoinerTest and it only 
deals with ascii strings unfortunately. Will have to add a test for that too...

> src/java.base/share/classes/java/lang/String.java line 3256:
> 
>> 3254: int off = 0;
>> 3255: prefix.getBytes(value, off, coder); off += prefix.length();
>> 3256: for (int i = 0; i < size; i++) {
> 
> Can you save a branch inside the loop by handling element 0 outside the loop 
> and
> then do the loop for the rest?

Thanks, I'll do that and then re-test to see if there's any improvement.

-

PR: https://git.openjdk.java.net/jdk/pull/3501


Re: RFR: 8265237: String.join and StringJoiner can be improved further [v2]

2021-04-16 Thread Roger Riggs
On Thu, 15 Apr 2021 19:26:48 GMT, Peter Levart  wrote:

>> While JDK-8148937 improved StringJoiner class by replacing internal use of 
>> getChars that copies out characters from String elements into a char[] array 
>> with StringBuilder which is somehow more optimal, the improvement was 
>> marginal in speed (0% ... 10%) and mainly for smaller strings, while GC was 
>> reduced by about 50% in average per operation.
>> Initial attempt to tackle that issue was more involved, but was later 
>> discarded because it was apparently using too much internal String details 
>> in code that lives outside String and outside java.lang package.
>> But there is another way to package such "intimate" code - we can put it 
>> into String itself and just call it from StringJoiner.
>> This PR is an attempt at doing just that. It introduces new package-private 
>> method in `java.lang.String` which is then used from both pubic static 
>> `String.join` methods as well as from `java.util.StringJoiner` (via 
>> SharedSecrets). The improvements can be seen by running the following JMH 
>> benchmark:
>> 
>> https://gist.github.com/plevart/86ac7fc6d4541dbc08256cde544019ce
>> 
>> The comparative results are here:
>> 
>> https://jmh.morethan.io/?gist=7eb421cf7982456a2962269137f71c15
>> 
>> The jmh-result.json files are here:
>> 
>> https://gist.github.com/plevart/7eb421cf7982456a2962269137f71c15
>> 
>> Improvement in speed ranges from 8% (for small strings) to 200% (for long 
>> strings), while creation of garbage has been further reduced to an almost 
>> garbage-free operation.
>> 
>> So WDYT?
>
> Peter Levart has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Add String.join benchmark method to StringJoinerBenchmark and adjust some 
> parameters to cover bigger range

Look very good.

src/java.base/share/classes/java/lang/String.java line 3254:

> 3252: 
> 3253: byte[] value = StringConcatHelper.newArray(((long) icoder << 
> 32) | llen);
> 3254: int off = 0;

StringConcatHelper.newArray() can double the length (based on the coder) and it 
is then truncated to 32 bits when passed to UNSAFE.allocatlUnitializedArray.
The test of length above only ensures llen can be truncated to 32 bits without 
loss of data.

src/java.base/share/classes/java/lang/String.java line 3256:

> 3254: int off = 0;
> 3255: prefix.getBytes(value, off, coder); off += prefix.length();
> 3256: for (int i = 0; i < size; i++) {

Can you save a branch inside the loop by handling element 0 outside the loop and
then do the loop for the rest?

-

PR: https://git.openjdk.java.net/jdk/pull/3501


Re: RFR: 8265237: String.join and StringJoiner can be improved further [v2]

2021-04-15 Thread Peter Levart
> While JDK-8148937 improved StringJoiner class by replacing internal use of 
> getChars that copies out characters from String elements into a char[] array 
> with StringBuilder which is somehow more optimal, the improvement was 
> marginal in speed (0% ... 10%) and mainly for smaller strings, while GC was 
> reduced by about 50% in average per operation.
> Initial attempt to tackle that issue was more involved, but was later 
> discarded because it was apparently using too much internal String details in 
> code that lives outside String and outside java.lang package.
> But there is another way to package such "intimate" code - we can put it into 
> String itself and just call it from StringJoiner.
> This PR is an attempt at doing just that. It introduces new package-private 
> method in `java.lang.String` which is then used from both pubic static 
> `String.join` methods as well as from `java.util.StringJoiner` (via 
> SharedSecrets). The improvements can be seen by running the following JMH 
> benchmark:
> 
> https://gist.github.com/plevart/86ac7fc6d4541dbc08256cde544019ce
> 
> The comparative results are here:
> 
> https://jmh.morethan.io/?gist=7eb421cf7982456a2962269137f71c15
> 
> The jmh-result.json files are here:
> 
> https://gist.github.com/plevart/7eb421cf7982456a2962269137f71c15
> 
> Improvement in speed ranges from 8% (for small strings) to 200% (for long 
> strings), while creation of garbage has been further reduced to an almost 
> garbage-free operation.
> 
> So WDYT?

Peter Levart has updated the pull request incrementally with one additional 
commit since the last revision:

  Add String.join benchmark method to StringJoinerBenchmark and adjust some 
parameters to cover bigger range

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/3501/files
  - new: https://git.openjdk.java.net/jdk/pull/3501/files/62b577fd..6160e5aa

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=3501=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=3501=00-01

  Stats: 11 lines in 1 file changed: 8 ins; 0 del; 3 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3501.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3501/head:pull/3501

PR: https://git.openjdk.java.net/jdk/pull/3501