On Mon, 12 Oct 2020 11:17:25 GMT, Jason Tatton <github.com+70893615+jasontatton-...@openjdk.org> wrote:
>> This is an implementation of the indexOf(char) intrinsic for StringLatin1 (1 >> byte encoded Strings). It is provided for >> x86 and ARM64. The implementation is greatly inspired by the indexOf(char) >> intrinsic for StringUTF16. To incorporate it >> I had to make a small change to StringLatin1.java (refactor of functionality >> to intrisified private method) as well as >> code for C2. Submitted to: hotspot-compiler-dev and core-libs-dev as this >> patch contains a change to hotspot and >> java/lang/StringLatin1.java https://bugs.openjdk.java.net/browse/JDK-8173585 >> >> Details of testing: >> ============ >> I have created a jtreg test >> “compiler/intrinsics/string/TestStringLatin1IndexOfChar” to cover this new >> intrinsic. Note >> that, particularly for the x86 implementation of the intrinsic, the code >> path taken is dependent upon the length of the >> input String. Hence the test has been designed to cover all these cases. In >> summary they are: >> - A “short” string of < 16 characters. >> - A SIMD String of 16 – 31 characters. >> - A AVX2 SIMD String of 32 characters+. >> >> Hardware used for testing: >> ----------------------------- >> >> - Intel Xeon CPU E5-2680 (JVM did not recognize this as having AVX2 support) >> • Intel i7 processor (with AVX2 support). >> - AWS Graviton 2 (ARM 64 processor). >> >> I also ran; ‘run-test-tier1’ and ‘run-test-tier2’ for: x86_64 and aarch64. >> >> Possible future enhancements: >> ==================== >> For the x86 implementation there may be two further improvements we can make >> in order to improve performance of both >> the StringUTF16 and StringLatin1 indexOf(char) intrinsics: >> 1. Make use of AVX-512 instructions. >> 2. For “short” Strings (see below), I think it may be possible to modify the >> existing algorithm to still use SSE SIMD >> instructions instead of a loop. >> Benchmark results: >> ============ >> **Without** the new StringLatin1 indexOf(char) intrinsic: >> >> | Benchmark | Mode | Cnt | Score | Error | Units | >> | ------------- | ------------- |------------- |------------- |------------- >> |------------- | >> | IndexOfBenchmark.latin1_mixed_char | avgt | 5 | **26,389.129** | ± 182.581 >> | ns/op | >> | IndexOfBenchmark.utf16_mixed_char | avgt | 5 | 17,885.383 | ± 435.933 | >> ns/op | >> >> >> **With** the new StringLatin1 indexOf(char) intrinsic: >> >> | Benchmark | Mode | Cnt | Score | Error | Units | >> | ------------- | ------------- |------------- |------------- |------------- >> |------------- | >> | IndexOfBenchmark.latin1_mixed_char | avgt | 5 | **17,875.185** | ± 407.716 >> | ns/op | >> | IndexOfBenchmark.utf16_mixed_char | avgt | 5 | 18,292.802 | ± 167.306 | >> ns/op | >> >> >> The objective of the patch is to bring the performance of StringLatin1 >> indexOf(char) in line with StringUTF16 >> indexOf(char) for x86 and ARM64. We can see above that this has been >> achieved. Similar results were obtained when >> running on ARM. > > Jason Tatton has updated the pull request incrementally with one additional > commit since the last revision: > > Added missing copyright notices Marked as reviewed by neliasso (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/71