Re: RFR: 8173585: Intrinsify StringLatin1.indexOf(char) [v2]

2020-09-22 Thread Volker Simonis
On Fri, 18 Sep 2020 11:56:09 GMT, Jason Tatton 
 wrote:

>> This is an implementation of the indexOf(char) intrinsic for StringLatin1 (1 
>> byte encoded Strings). It is provided for
>> x86 and ARM64. The implementation is greatly inspired by the indexOf(char) 
>> intrinsic for StringUTF16. To incorporate it
>> I had to make a small change to StringLatin1.java (refactor of functionality 
>> to intrisified private method) as well as
>> code for C2.  Submitted to: hotspot-compiler-dev and core-libs-dev as this 
>> patch contains a change to hotspot and
>> java/lang/StringLatin1.java  https://bugs.openjdk.java.net/browse/JDK-8173585
>> 
>> Details of testing:
>> 
>> I have created a jtreg test 
>> “compiler/intrinsics/string/TestStringLatin1IndexOfChar” to cover this new 
>> intrinsic. Note
>> that, particularly for the x86 implementation of the intrinsic, the code 
>> path taken is dependent upon the length of the
>> input String. Hence the test has been designed to cover all these cases. In 
>> summary they are:
>> - A “short” string of < 16 characters.
>> - A SIMD String of 16 – 31 characters.
>> - A AVX2 SIMD String of 32 characters+.
>> 
>> Hardware used for testing:
>> -
>> 
>> - Intel Xeon CPU E5-2680 (JVM did not recognize this as having AVX2 support) 
>> • Intel i7 processor (with AVX2 support).
>> - AWS Graviton 2 (ARM 64 processor).
>> 
>> I also ran; ‘run-test-tier1’ and ‘run-test-tier2’ for: x86_64 and aarch64.
>> 
>> Possible future enhancements:
>> 
>> For the x86 implementation there may be two further improvements we can make 
>> in order to improve performance of both
>> the StringUTF16 and StringLatin1 indexOf(char) intrinsics:
>> 1. Make use of AVX-512 instructions.
>> 2. For “short” Strings (see below), I think it may be possible to modify the 
>> existing algorithm to still use SSE SIMD
>> instructions instead of a loop.
>> Benchmark results:
>> 
>> **Without** the new StringLatin1 indexOf(char) intrinsic:
>> 
>> | Benchmark  | Mode | Cnt | Score | Error | Units |
>> | - | - |- |- |- 
>> |- |
>> | IndexOfBenchmark.latin1_mixed_char | avgt | 5 | **26,389.129** | ± 182.581 
>> | ns/op |
>> | IndexOfBenchmark.utf16_mixed_char | avgt | 5 | 17,885.383 | ± 435.933  | 
>> ns/op |
>> 
>> 
>> **With** the new StringLatin1 indexOf(char) intrinsic:
>> 
>> | Benchmark  | Mode | Cnt | Score | Error | Units |
>> | - | - |- |- |- 
>> |- |
>> | IndexOfBenchmark.latin1_mixed_char | avgt | 5 | **17,875.185** | ± 407.716 
>>  | ns/op |
>> | IndexOfBenchmark.utf16_mixed_char | avgt | 5 | 18,292.802 | ± 167.306  | 
>> ns/op |
>> 
>> 
>> The objective of the patch is to bring the performance of StringLatin1 
>> indexOf(char) in line with StringUTF16
>> indexOf(char) for x86 and ARM64. We can see above that this has been 
>> achieved. Similar results were obtained when
>> running on ARM.
>
> Jason Tatton has updated the pull request with a new target base due to a 
> merge or a rebase. The pull request now
> contains four commits:
>  - Merge master
>  - 8173585: further whitespace changes required by jcheck
>  - JDK-8173585 - whitespace changes required by jcheck
>  - JDK-8173585

Hi Jason,

thanks for bringing String.indexOf() for latin strings up to date with the 
Unicode version.

Your changes look good except a few minor issues I've commented on right in the 
code.

I'd only like to ask you if you could possibly improve your test a little bit. 
As far as I understand, your search text
is a consecutive sequence of "abc" characters, so you'll always find the 
character your searching for within the next
three characters of the source text. This won't exercise the loops of your 
intrinsic. Maybe you can also add some test
versions where the search character will be found beyond the first 32/64 
characters after "fromIndex"?

test/hotspot/jtreg/compiler/intrinsics/string/TestStringLatin1IndexOfChar.java 
line 24:

> 22:
> 23: public static void main(String[] args) throws Exception {
> 24: for (int i = 0; i < 100_0; ++i) {//repeat such that we enter into 
> C2 code...

The placement of the underscore looks strange to me. I'd expect it to separate 
thousands (like 1_000) if at all but not
sure if id use it for one thousand at all as that's really not such a big 
number that it is hard to read..

Also, the Tier4InvocationThreshold is 5000 so I'm not sure youre reaching C2?

src/hotspot/share/classfile/vmSymbols.cpp line 295:

> 293:   if (symbol == NULL)  return NO_SID;
> 294:   return find_sid(symbol);
> 295: }

I think it is common sense to have a newline at the end of a file.

test/hotspot/jtreg/compiler/intrinsics/string/TestStringLatin1IndexOfChar.java 
line 84:

> 82: }
> 83:
> 84:  }

Please put an EOL at the end of the file.

test/micro/org/openjdk/bench/java/lang/StringIndexOf

Re: RFR: 8173585: Intrinsify StringLatin1.indexOf(char) [v2]

2020-09-21 Thread Volker Simonis
On Fri, 18 Sep 2020 11:56:09 GMT, Jason Tatton 
 wrote:

>> This is an implementation of the indexOf(char) intrinsic for StringLatin1 (1 
>> byte encoded Strings). It is provided for
>> x86 and ARM64. The implementation is greatly inspired by the indexOf(char) 
>> intrinsic for StringUTF16. To incorporate it
>> I had to make a small change to StringLatin1.java (refactor of functionality 
>> to intrisified private method) as well as
>> code for C2.  Submitted to: hotspot-compiler-dev and core-libs-dev as this 
>> patch contains a change to hotspot and
>> java/lang/StringLatin1.java  https://bugs.openjdk.java.net/browse/JDK-8173585
>> 
>> Details of testing:
>> 
>> I have created a jtreg test 
>> “compiler/intrinsics/string/TestStringLatin1IndexOfChar” to cover this new 
>> intrinsic. Note
>> that, particularly for the x86 implementation of the intrinsic, the code 
>> path taken is dependent upon the length of the
>> input String. Hence the test has been designed to cover all these cases. In 
>> summary they are:
>> - A “short” string of < 16 characters.
>> - A SIMD String of 16 – 31 characters.
>> - A AVX2 SIMD String of 32 characters+.
>> 
>> Hardware used for testing:
>> -
>> 
>> - Intel Xeon CPU E5-2680 (JVM did not recognize this as having AVX2 support) 
>> • Intel i7 processor (with AVX2 support).
>> - AWS Graviton 2 (ARM 64 processor).
>> 
>> I also ran; ‘run-test-tier1’ and ‘run-test-tier2’ for: x86_64 and aarch64.
>> 
>> Possible future enhancements:
>> 
>> For the x86 implementation there may be two further improvements we can make 
>> in order to improve performance of both
>> the StringUTF16 and StringLatin1 indexOf(char) intrinsics:
>> 1. Make use of AVX-512 instructions.
>> 2. For “short” Strings (see below), I think it may be possible to modify the 
>> existing algorithm to still use SSE SIMD
>> instructions instead of a loop.
>> Benchmark results:
>> 
>> **Without** the new StringLatin1 indexOf(char) intrinsic:
>> 
>> | Benchmark  | Mode | Cnt | Score | Error | Units |
>> | - | - |- |- |- 
>> |- |
>> | IndexOfBenchmark.latin1_mixed_char | avgt | 5 | **26,389.129** | ± 182.581 
>> | ns/op |
>> | IndexOfBenchmark.utf16_mixed_char | avgt | 5 | 17,885.383 | ± 435.933  | 
>> ns/op |
>> 
>> 
>> **With** the new StringLatin1 indexOf(char) intrinsic:
>> 
>> | Benchmark  | Mode | Cnt | Score | Error | Units |
>> | - | - |- |- |- 
>> |- |
>> | IndexOfBenchmark.latin1_mixed_char | avgt | 5 | **17,875.185** | ± 407.716 
>>  | ns/op |
>> | IndexOfBenchmark.utf16_mixed_char | avgt | 5 | 18,292.802 | ± 167.306  | 
>> ns/op |
>> 
>> 
>> The objective of the patch is to bring the performance of StringLatin1 
>> indexOf(char) in line with StringUTF16
>> indexOf(char) for x86 and ARM64. We can see above that this has been 
>> achieved. Similar results were obtained when
>> running on ARM.
>
> Jason Tatton has updated the pull request with a new target base due to a 
> merge or a rebase. The pull request now
> contains four commits:
>  - Merge master
>  - 8173585: further whitespace changes required by jcheck
>  - JDK-8173585 - whitespace changes required by jcheck
>  - JDK-8173585

src/hotspot/share/classfile/vmSymbols.cpp line 295:

> 293:   if (symbol == NULL)  return NO_SID;
> 294:   return find_sid(symbol);
> 295: }

I think it is common sense to have a newline at the end of a file.

-

PR: https://git.openjdk.java.net/jdk/pull/71


Re: RFR: 8173585: Intrinsify StringLatin1.indexOf(char) [v2]

2020-09-18 Thread Jason Tatton
> This is an implementation of the indexOf(char) intrinsic for StringLatin1 (1 
> byte encoded Strings). It is provided for
> x86 and ARM64. The implementation is greatly inspired by the indexOf(char) 
> intrinsic for StringUTF16. To incorporate it
> I had to make a small change to StringLatin1.java (refactor of functionality 
> to intrisified private method) as well as
> code for C2.  Submitted to: hotspot-compiler-dev and core-libs-dev as this 
> patch contains a change to hotspot and
> java/lang/StringLatin1.java  https://bugs.openjdk.java.net/browse/JDK-8173585
> 
> Details of testing:
> 
> I have created a jtreg test 
> “compiler/intrinsics/string/TestStringLatin1IndexOfChar” to cover this new 
> intrinsic. Note
> that, particularly for the x86 implementation of the intrinsic, the code path 
> taken is dependent upon the length of the
> input String. Hence the test has been designed to cover all these cases. In 
> summary they are:
> - A “short” string of < 16 characters.
> - A SIMD String of 16 – 31 characters.
> - A AVX2 SIMD String of 32 characters+.
> 
> Hardware used for testing:
> -
> 
> - Intel Xeon CPU E5-2680 (JVM did not recognize this as having AVX2 support) 
> • Intel i7 processor (with AVX2 support).
> - AWS Graviton 2 (ARM 64 processor).
> 
> I also ran; ‘run-test-tier1’ and ‘run-test-tier2’ for: x86_64 and aarch64.
> 
> Possible future enhancements:
> 
> For the x86 implementation there may be two further improvements we can make 
> in order to improve performance of both
> the StringUTF16 and StringLatin1 indexOf(char) intrinsics:
> 1. Make use of AVX-512 instructions.
> 2. For “short” Strings (see below), I think it may be possible to modify the 
> existing algorithm to still use SSE SIMD
> instructions instead of a loop.
> Benchmark results:
> 
> **Without** the new StringLatin1 indexOf(char) intrinsic:
> 
> | Benchmark  | Mode | Cnt | Score | Error | Units |
> | - | - |- |- |- 
> |- |
> | IndexOfBenchmark.latin1_mixed_char | avgt | 5 | **26,389.129** | ± 182.581 
> | ns/op |
> | IndexOfBenchmark.utf16_mixed_char | avgt | 5 | 17,885.383 | ± 435.933  | 
> ns/op |
> 
> 
> **With** the new StringLatin1 indexOf(char) intrinsic:
> 
> | Benchmark  | Mode | Cnt | Score | Error | Units |
> | - | - |- |- |- 
> |- |
> | IndexOfBenchmark.latin1_mixed_char | avgt | 5 | **17,875.185** | ± 407.716  
> | ns/op |
> | IndexOfBenchmark.utf16_mixed_char | avgt | 5 | 18,292.802 | ± 167.306  | 
> ns/op |
> 
> 
> The objective of the patch is to bring the performance of StringLatin1 
> indexOf(char) in line with StringUTF16
> indexOf(char) for x86 and ARM64. We can see above that this has been 
> achieved. Similar results were obtained when
> running on ARM.

Jason Tatton has updated the pull request with a new target base due to a merge 
or a rebase. The pull request now
contains four commits:

 - Merge master
 - 8173585: further whitespace changes required by jcheck
 - JDK-8173585 - whitespace changes required by jcheck
 - JDK-8173585

-

Changes: https://git.openjdk.java.net/jdk/pull/71/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=71&range=01
  Stats: 523 lines in 16 files changed: 506 ins; 0 del; 17 mod
  Patch: https://git.openjdk.java.net/jdk/pull/71.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/71/head:pull/71

PR: https://git.openjdk.java.net/jdk/pull/71