On Tue, 8 Sep 2020 11:59:36 GMT, Jason Tatton <github.com+70893615+jasontatton-...@openjdk.org> wrote:
> This is an implementation of the indexOf(char) intrinsic for StringLatin1 (1 > byte encoded Strings). It is provided for > x86 and ARM64. The implementation is greatly inspired by the indexOf(char) > intrinsic for StringUTF16. To incorporate it > I had to make a small change to StringLatin1.java (refactor of functionality > to intrisified private method) as well as > code for C2. Submitted to: hotspot-compiler-dev and core-libs-dev as this > patch contains a change to hotspot and > java/lang/StringLatin1.java https://bugs.openjdk.java.net/browse/JDK-8173585 > > Details of testing: > ============ > I have created a jtreg test > “compiler/intrinsics/string/TestStringLatin1IndexOfChar” to cover this new > intrinsic. Note > that, particularly for the x86 implementation of the intrinsic, the code path > taken is dependent upon the length of the > input String. Hence the test has been designed to cover all these cases. In > summary they are: > - A “short” string of < 16 characters. > - A SIMD String of 16 – 31 characters. > - A AVX2 SIMD String of 32 characters+. > > Hardware used for testing: > ----------------------------- > > - Intel Xeon CPU E5-2680 (JVM did not recognize this as having AVX2 support) > • Intel i7 processor (with AVX2 support). > - AWS Graviton 2 (ARM 64 processor). > > I also ran; ‘run-test-tier1’ and ‘run-test-tier2’ for: x86_64 and aarch64. > > Possible future enhancements: > ==================== > For the x86 implementation there may be two further improvements we can make > in order to improve performance of both > the StringUTF16 and StringLatin1 indexOf(char) intrinsics: > 1. Make use of AVX-512 instructions. > 2. For “short” Strings (see below), I think it may be possible to modify the > existing algorithm to still use SSE SIMD > instructions instead of a loop. > Benchmark results: > ============ > **Without** the new StringLatin1 indexOf(char) intrinsic: > > | Benchmark | Mode | Cnt | Score | Error | Units | > | ------------- | ------------- |------------- |------------- |------------- > |------------- | > | IndexOfBenchmark.latin1_mixed_char | avgt | 5 | **26,389.129** | ± 182.581 > | ns/op | > | IndexOfBenchmark.utf16_mixed_char | avgt | 5 | 17,885.383 | ± 435.933 | > ns/op | > > > **With** the new StringLatin1 indexOf(char) intrinsic: > > | Benchmark | Mode | Cnt | Score | Error | Units | > | ------------- | ------------- |------------- |------------- |------------- > |------------- | > | IndexOfBenchmark.latin1_mixed_char | avgt | 5 | **17,875.185** | ± 407.716 > | ns/op | > | IndexOfBenchmark.utf16_mixed_char | avgt | 5 | 18,292.802 | ± 167.306 | > ns/op | > > > The objective of the patch is to bring the performance of StringLatin1 > indexOf(char) in line with StringUTF16 > indexOf(char) for x86 and ARM64. We can see above that this has been > achieved. Similar results were obtained when > running on ARM. This pull request has now been integrated. Changeset: f71e8a61 Author: Jason Tatton (AWS) <jptat...@amazon.com> Committer: Paul Hohensee <p...@openjdk.org> URL: https://git.openjdk.java.net/jdk/commit/f71e8a61 Stats: 613 lines in 15 files changed: 597 ins; 0 del; 16 mod 8173585: Intrinsify StringLatin1.indexOf(char) Reviewed-by: neliasso ------------- PR: https://git.openjdk.java.net/jdk/pull/71