Re: RFR: 8304245: Speed up CharacterData.of by avoiding bit shifting in the latin1 fast-path test [v2]
On Wed, 15 Mar 2023 14:31:03 GMT, Eirik Bjorsnos wrote: >> By avoiding a bit shift operation for the latin1 fast-path test, we can >> speed up the `java.lang.CharacterData.of` method by ~25% for latin1 code >> points. >> >> The latin1 test is currently implemented as `ch >>> 8 == 0`. We can replace >> this with `ch >= 0 && ch <= 0xFF` for a noticable performance gain >> (especially for Latin1 code points): >> >> This method is called frequently by various property-determining methods in >> `java.lang.Character` like `isLowerCase`, `isDigit` etc, so one should >> expect improvements for all these methods. >> >> Performance is tested using the `Characters.isDigit` benchmark using the >> digits '0' (decimal 48, in CharacterDataLatin1) and '\u0660' (decimal 1632, >> in CharacterData00): >> >> Baseline: >> >> >> Benchmark (codePoint) Mode Cnt Score Error Units >> Characters.isDigit 48 avgt 15 0.870 ± 0.011 ns/op >> Characters.isDigit 1632 avgt 15 2.168 ± 0.017 ns/op >> >> PR: >> >> >> Benchmark (codePoint) Mode Cnt Score Error Units >> Characters.isDigit 48 avgt 15 0.654 ± 0.007 ns/op >> Characters.isDigit 1632 avgt 15 2.032 ± 0.019 ns/op > > Eirik Bjorsnos has updated the pull request incrementally with one additional > commit since the last revision: > > Update StringLatin1.canEncode to sync with same test in CharacterData.of I'm withdrawing this PR since its merits seem suspect. - PR Comment: https://git.openjdk.org/jdk/pull/13040#issuecomment-1500087677
Re: RFR: 8304245: Speed up CharacterData.of by avoiding bit shifting in the latin1 fast-path test [v2]
On Sat, 18 Mar 2023 17:06:52 GMT, jmehrens wrote: >> We could, but benchmarks show no performance improvements over the current >> PR. I think the current code is perhaps slightly more readable. > > Does non-short-circuit logical AND operator perform similar to baseline? In > this very specific case there is no risk of side effects due to method calls, > no risk of null pointer exception because of primitives, and there is not an > excessive amount of conditions. Therefore, I would think either `&` or `&&` > are viable options. > > `if (ch >= 0 & ch <= 0xFF) {` > > I ask just for the sake of my own learning on this topic as it relates to > performance as opposed to code readability preferences. Both choices of operators en up with identical generated code on my Intel Mac. For reference, I did a comparison between the generated code for the current code, and the one in this PR: Current generated code start like this: movl%eax, -0x14000(%rsp) pushq %rbp subq$0x20, %rsp cmpl$1, 0x20(%r15) jne 0x119cae072 movl%esi, %r11d shrl$8, %r11d testl %r11d, %r11d jne 0x119cae043 movabsq $0x70fe03c90, %rax ; {oop(a 'java/lang/CharacterDataLatin1'{0x00070fe03c90})} ``` While the PR generated code starts like this: movl%eax, -0x14000(%rsp) pushq %rbp subq$0x20, %rsp cmpl$1, 0x20(%r15) jne 0x1171d41f2 nopw(%rax, %rax) cmpl$0x100, %esi jae 0x1171d41c5 movabsq $0x70fe03c90, %rax ; {oop(a 'java/lang/CharacterDataLatin1'{0x00070fe03c90})} The rest of the code seems identical. - PR: https://git.openjdk.org/jdk/pull/13040
Re: RFR: 8304245: Speed up CharacterData.of by avoiding bit shifting in the latin1 fast-path test [v2]
On Wed, 15 Mar 2023 14:56:28 GMT, Eirik Bjorsnos wrote: >> Eirik Bjorsnos has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Update StringLatin1.canEncode to sync with same test in CharacterData.of > > Just for fun, I tried with a benchmark where the code point is Latin1 in > every other call: > > > @Benchmark > public void isDigitVarying(Blackhole blackhole) { > blackhole.consume(Character.isDigit(48)); > blackhole.consume(Character.isDigit(1632)); > } > > > With this benchmark, there is no difference between the baseline, the PR and > using StringLatin1.canEncode: > > Baseline: > > > Benchmark (codePoint) Mode Cnt Score Error Units > Characters.isDigitVarying 1632 avgt 15 1.198 ± 0.056 ns/op > > > PR: > > > Benchmark (codePoint) Mode Cnt Score Error Units > Characters.isDigitVarying 1632 avgt 15 1.195 ± 0.058 ns/op > > > StringLatin1.canEncode: > > > Benchmark (codePoint) Mode Cnt Score Error Units > Characters.isDigitVarying 1632 avgt 15 1.193 ± 0.055 ns/op > ``` > > At this point, I'm starting to wonder a bit if the performance benefits > suggested by this PR might be dubious and will only surface in very narrow > benchmarks. On the other hand, it does not seem harmful either. What do > people think? > @eirbjo I suggest to add `-prof perfasm` and see what's going on; I suspect > it's better that inputs are loaded from `@State` field...and don't use > `BlackHole` but combine the results of the 2 operations (although I remember > that compiler assisted black holes are not as noisy as I'm used to observe > with JDK 11 ones) I think I'm done with benchmarking for this PR. Any StringLatin1.canEncode method call regression can be solved outside this PR. The PR as it stands seems to have some benefit for Latin1-only apps/use cases, which I think is not uncommon. Any benefit for mixed cases are more dubious, but don't seem to hurt. - PR: https://git.openjdk.org/jdk/pull/13040
Re: RFR: 8304245: Speed up CharacterData.of by avoiding bit shifting in the latin1 fast-path test [v2]
On Wed, 15 Mar 2023 14:57:43 GMT, Quan Anh Mai wrote: >>> The StringLatin1.canEncode regression disappears. >> >> I mixed things up so StringLatin1.canEncode was benchmarked without the >> updated code. >> >> Here are updated benchmark results: >> >> >> Baseline: >> >> >> Benchmark (codePoint) Mode Cnt Score Error Units >> Characters.isDigitRandom 1632 avgt 15 5.437 ± 0.235 ns/op >> >> >> PR: >> >> >> Benchmark (codePoint) Mode Cnt Score Error Units >> Characters.isDigitRandom 1632 avgt 15 5.319 ± 0.341 ns/op >> >> >> StringLatin1.canEncode: >> >> >> Benchmark (codePoint) Mode Cnt Score Error Units >> Characters.isDigitRandom 1632 avgt 15 5.447 ± 0.304 ns/op >> ``` >> >> So it seems using StringLatin1.canEncode still might have a regression also >> in the randomized input case. >> >> For this PR, I suggest we update StringLatin1.canEncode to be in sync with >> CharacterData.of, without one calling the other. If anyone wants to >> investigate the regression further, than can be done outside this PR. >> >> I have independently verified that StringLatin1.canEncode sees performance >> improvements using the StringIndexOf benchmark. > > We can do `Integer.compareUnsigned(ch, 0xFF) <= 0` We could, but benchmarks show no performance improvements over the current PR. I think the current code is perhaps slightly more readable. - PR: https://git.openjdk.org/jdk/pull/13040
Re: RFR: 8304245: Speed up CharacterData.of by avoiding bit shifting in the latin1 fast-path test [v2]
On Wed, 15 Mar 2023 14:56:28 GMT, Eirik Bjorsnos wrote: >> Eirik Bjorsnos has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Update StringLatin1.canEncode to sync with same test in CharacterData.of > > Just for fun, I tried with a benchmark where the code point is Latin1 in > every other call: > > > @Benchmark > public void isDigitVarying(Blackhole blackhole) { > blackhole.consume(Character.isDigit(48)); > blackhole.consume(Character.isDigit(1632)); > } > > > With this benchmark, there is no difference between the baseline, the PR and > using StringLatin1.canEncode: > > Baseline: > > > Benchmark (codePoint) Mode Cnt Score Error Units > Characters.isDigitVarying 1632 avgt 15 1.198 ± 0.056 ns/op > > > PR: > > > Benchmark (codePoint) Mode Cnt Score Error Units > Characters.isDigitVarying 1632 avgt 15 1.195 ± 0.058 ns/op > > > StringLatin1.canEncode: > > > Benchmark (codePoint) Mode Cnt Score Error Units > Characters.isDigitVarying 1632 avgt 15 1.193 ± 0.055 ns/op > ``` > > At this point, I'm starting to wonder a bit if the performance benefits > suggested by this PR might be dubious and will only surface in very narrow > benchmarks. On the other hand, it does not seem harmful either. What do > people think? @eirbjo I suggest to add `-prof perfasm` and see what's going on; I suspect it's better that inputs are loaded from `@State` field...and don't use `BlackHole` but combine the results of the 2 operations (although I remember that compiler assisted black holes are not as noisy as I'm used to observe with JDK 11 ones) - PR: https://git.openjdk.org/jdk/pull/13040
Re: RFR: 8304245: Speed up CharacterData.of by avoiding bit shifting in the latin1 fast-path test [v2]
On Wed, 15 Mar 2023 14:23:35 GMT, Eirik Bjorsnos wrote: >> Many thanks to have tried, yep, I was curious indeed re the >> "StringLatin1.canEncode regression" case. >> I would still modify the benchmark to use inputs (I know that will make it >> memory bound sadly, due to reading inputs - but the size of such inputs can >> be a benchmark parameter, together with the bias eg "latin","mix", >> "non-latin") "semi-randomly" generated based on the mentioned >> strategies/biases. >> It will benefit future tests on this, although could be provided as a >> separate PR. > >> The StringLatin1.canEncode regression disappears. > > I mixed things up so StringLatin1.canEncode was benchmarked without the > updated code. > > Here are updated benchmark results: > > > Baseline: > > > Benchmark (codePoint) Mode Cnt Score Error Units > Characters.isDigitRandom 1632 avgt 15 5.437 ± 0.235 ns/op > > > PR: > > > Benchmark (codePoint) Mode Cnt Score Error Units > Characters.isDigitRandom 1632 avgt 15 5.319 ± 0.341 ns/op > > > StringLatin1.canEncode: > > > Benchmark (codePoint) Mode Cnt Score Error Units > Characters.isDigitRandom 1632 avgt 15 5.447 ± 0.304 ns/op > ``` > > So it seems using StringLatin1.canEncode still might have a regression also > in the randomized input case. > > For this PR, I suggest we update StringLatin1.canEncode to be in sync with > CharacterData.of, without one calling the other. If anyone wants to > investigate the regression further, than can be done outside this PR. > > I have independently verified that StringLatin1.canEncode sees performance > improvements using the StringIndexOf benchmark. We can do `Integer.compareUnsigned(ch, 0xFF) <= 0` - PR: https://git.openjdk.org/jdk/pull/13040
Re: RFR: 8304245: Speed up CharacterData.of by avoiding bit shifting in the latin1 fast-path test [v2]
On Wed, 15 Mar 2023 14:31:03 GMT, Eirik Bjorsnos wrote: >> By avoiding a bit shift operation for the latin1 fast-path test, we can >> speed up the `java.lang.CharacterData.of` method by ~25% for latin1 code >> points. >> >> The latin1 test is currently implemented as `ch >>> 8 == 0`. We can replace >> this with `ch >= 0 && ch <= 0xFF` for a noticable performance gain >> (especially for Latin1 code points): >> >> This method is called frequently by various property-determining methods in >> `java.lang.Character` like `isLowerCase`, `isDigit` etc, so one should >> expect improvements for all these methods. >> >> Performance is tested using the `Characters.isDigit` benchmark using the >> digits '0' (decimal 48, in CharacterDataLatin1) and '\u0660' (decimal 1632, >> in CharacterData00): >> >> Baseline: >> >> >> Benchmark (codePoint) Mode Cnt Score Error Units >> Characters.isDigit 48 avgt 15 0.870 ± 0.011 ns/op >> Characters.isDigit 1632 avgt 15 2.168 ± 0.017 ns/op >> >> PR: >> >> >> Benchmark (codePoint) Mode Cnt Score Error Units >> Characters.isDigit 48 avgt 15 0.654 ± 0.007 ns/op >> Characters.isDigit 1632 avgt 15 2.032 ± 0.019 ns/op > > Eirik Bjorsnos has updated the pull request incrementally with one additional > commit since the last revision: > > Update StringLatin1.canEncode to sync with same test in CharacterData.of Just for fun, I tried with a benchmark where the code point is Latin1 in every other call: @Benchmark public void isDigitVarying(Blackhole blackhole) { blackhole.consume(Character.isDigit(48)); blackhole.consume(Character.isDigit(1632)); } With this benchmark, there is no difference between the baseline, the PR and using StringLatin1.canEncode: Baseline: Benchmark (codePoint) Mode Cnt Score Error Units Characters.isDigitVarying 1632 avgt 15 1.198 ± 0.056 ns/op PR: Benchmark (codePoint) Mode Cnt Score Error Units Characters.isDigitVarying 1632 avgt 15 1.195 ± 0.058 ns/op StringLatin1.canEncode: Benchmark (codePoint) Mode Cnt Score Error Units Characters.isDigitVarying 1632 avgt 15 1.193 ± 0.055 ns/op ``` At this point, I'm starting to wonder a bit if the performance benefits suggested by this PR might be dubious and will only surface in very narrow benchmarks. On the other hand, it does not seem harmful either. What do people think? - PR: https://git.openjdk.org/jdk/pull/13040
Re: RFR: 8304245: Speed up CharacterData.of by avoiding bit shifting in the latin1 fast-path test [v2]
On Wed, 15 Mar 2023 14:31:03 GMT, Eirik Bjorsnos wrote: >> By avoiding a bit shift operation for the latin1 fast-path test, we can >> speed up the `java.lang.CharacterData.of` method by ~25% for latin1 code >> points. >> >> The latin1 test is currently implemented as `ch >>> 8 == 0`. We can replace >> this with `ch >= 0 && ch <= 0xFF` for a noticable performance gain >> (especially for Latin1 code points): >> >> This method is called frequently by various property-determining methods in >> `java.lang.Character` like `isLowerCase`, `isDigit` etc, so one should >> expect improvements for all these methods. >> >> Performance is tested using the `Characters.isDigit` benchmark using the >> digits '0' (decimal 48, in CharacterDataLatin1) and '\u0660' (decimal 1632, >> in CharacterData00): >> >> Baseline: >> >> >> Benchmark (codePoint) Mode Cnt Score Error Units >> Characters.isDigit 48 avgt 15 0.870 ± 0.011 ns/op >> Characters.isDigit 1632 avgt 15 2.168 ± 0.017 ns/op >> >> PR: >> >> >> Benchmark (codePoint) Mode Cnt Score Error Units >> Characters.isDigit 48 avgt 15 0.654 ± 0.007 ns/op >> Characters.isDigit 1632 avgt 15 2.032 ± 0.019 ns/op > > Eirik Bjorsnos has updated the pull request incrementally with one additional > commit since the last revision: > > Update StringLatin1.canEncode to sync with same test in CharacterData.of Marked as reviewed by stsypanov (Author). - PR: https://git.openjdk.org/jdk/pull/13040
Re: RFR: 8304245: Speed up CharacterData.of by avoiding bit shifting in the latin1 fast-path test [v2]
On Wed, 15 Mar 2023 12:00:59 GMT, Claes Redestad wrote: >> Eirik Bjorsnos has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Update StringLatin1.canEncode to sync with same test in CharacterData.of > > Nice one! @cl4es Do you agree that the StringLatin1.canEncode change should be included in this PR? If so, can you approve the latest update? - PR: https://git.openjdk.org/jdk/pull/13040
Re: RFR: 8304245: Speed up CharacterData.of by avoiding bit shifting in the latin1 fast-path test [v2]
On Wed, 15 Mar 2023 13:50:44 GMT, Francesco Nigro wrote: >> I created a randomized version of `Characters.isDigit` which tests with code >> points picked at random such that any category (Latin1, negative, different >> planes, unassiged) are equally probable. >> >> Baseline: >> >> >> Benchmark (codePoint) Mode Cnt Score Error Units >> Characters.isDigitRandom 1632 avgt 15 5.503 ± 0.371 ns/op >> >> >> Current PR: >> >> >> Benchmark (codePoint) Mode Cnt Score Error Units >> Characters.isDigitRandom 1632 avgt 15 5.393 ± 0.336 ns/op >> >> >> Using StringLatin1.canEncode: >> >> >> Benchmark (codePoint) Mode Cnt Score Error Units >> Characters.isDigitRandom 1632 avgt 15 5.377 ± 0.322 ns/op >> >> >> Seems the PR still has a small improvement for this scenario. The >> StringLatin1.canEncode regression disappears. >> >> In the real world ASCII/Latin1 seems to dominate most data, so this scenario >> is perhaps not very realistic. >> >> I'm running this on a Mac, so cannot try `-prof perfnorm`. > > Many thanks to have tried, yep, I was curious indeed re the > "StringLatin1.canEncode regression" case. > I would still modify the benchmark to use inputs (I know that will make it > memory bound sadly, due to reading inputs - but the size of such inputs can > be a benchmark parameter, together with the bias eg "latin","mix", > "non-latin") "semi-randomly" generated based on the mentioned > strategies/biases. > It will benefit future tests on this, although could be provided as a > separate PR. > The StringLatin1.canEncode regression disappears. I mixed things up so StringLatin1.canEncode was benchmarked without the updated code. Here are updated benchmark results: Baseline: Benchmark (codePoint) Mode Cnt Score Error Units Characters.isDigitRandom 1632 avgt 15 5.437 ± 0.235 ns/op PR: Benchmark (codePoint) Mode Cnt Score Error Units Characters.isDigitRandom 1632 avgt 15 5.319 ± 0.341 ns/op StringLatin1.canEncode: Benchmark (codePoint) Mode Cnt Score Error Units Characters.isDigitRandom 1632 avgt 15 5.447 ± 0.304 ns/op ``` So it seems using StringLatin1.canEncode still might have a regression also in the randomized input case. For this PR, I suggest we update StringLatin1.canEncode to be in sync with CharacterData.of, without one calling the other. If anyone wants to investigate the regression further, than can be done outside this PR. I have independently verified that StringLatin1.canEncode sees performance improvements using the StringIndexOf benchmark. - PR: https://git.openjdk.org/jdk/pull/13040
Re: RFR: 8304245: Speed up CharacterData.of by avoiding bit shifting in the latin1 fast-path test [v2]
> By avoiding a bit shift operation for the latin1 fast-path test, we can speed > up the `java.lang.CharacterData.of` method by ~25% for latin1 code points. > > The latin1 test is currently implemented as `ch >>> 8 == 0`. We can replace > this with `ch >= 0 && ch <= 0xFF` for a noticable performance gain > (especially for Latin1 code points): > > This method is called frequently by various property-determining methods in > `java.lang.Character` like `isLowerCase`, `isDigit` etc, so one should expect > improvements for all these methods. > > Performance is tested using the `Characters.isDigit` benchmark using the > digits '0' (decimal 48, in CharacterDataLatin1) and '\u0660' (decimal 1632, > in CharacterData00): > > Baseline: > > > Benchmark (codePoint) Mode Cnt Score Error Units > Characters.isDigit 48 avgt 15 0.870 ± 0.011 ns/op > Characters.isDigit 1632 avgt 15 2.168 ± 0.017 ns/op > > PR: > > > Benchmark (codePoint) Mode Cnt Score Error Units > Characters.isDigit 48 avgt 15 0.654 ± 0.007 ns/op > Characters.isDigit 1632 avgt 15 2.032 ± 0.019 ns/op Eirik Bjorsnos has updated the pull request incrementally with one additional commit since the last revision: Update StringLatin1.canEncode to sync with same test in CharacterData.of - Changes: - all: https://git.openjdk.org/jdk/pull/13040/files - new: https://git.openjdk.org/jdk/pull/13040/files/0174440e..0baf690e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=13040=01 - incr: https://webrevs.openjdk.org/?repo=jdk=13040=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13040.diff Fetch: git fetch https://git.openjdk.org/jdk pull/13040/head:pull/13040 PR: https://git.openjdk.org/jdk/pull/13040
Re: RFR: 8304245: Speed up CharacterData.of by avoiding bit shifting in the latin1 fast-path test
On Wed, 15 Mar 2023 13:42:22 GMT, Eirik Bjorsnos wrote: >> Can you check what happen adding much more inputs to the dataset including >> non-latin chars as well and use `-prof perfnorm` to check what `perf` report >> re branches/branch-misses? >> >> You can use `SplittableRandom` to pre-populate an array of inputs which >> sequence is "random" but still allow deterministic benchmarking and feed the >> benchmark method by cycling the pre-computed inputs. >> In the real world I expect `isDigit` to happen on different input types and >> both having C2 with both branches places based on prev inputs distribution >> and a confused branch-predictor to allow comparing vs something that looks a >> bit nearest to the real world (TBD, I know). >> I expect in that case that a single cmp + mask to work better depending on >> latin input distribution/occurrence > > I created a randomized version of `Characters.isDigit` which tests with code > points picked at random such that any category (Latin1, negative, different > planes, unassiged) are equally probable. > > Baseline: > > > Benchmark (codePoint) Mode Cnt Score Error Units > Characters.isDigitRandom 1632 avgt 15 5.503 ± 0.371 ns/op > > > Current PR: > > > Benchmark (codePoint) Mode Cnt Score Error Units > Characters.isDigitRandom 1632 avgt 15 5.393 ± 0.336 ns/op > > > Using StringLatin1.canEncode: > > > Benchmark (codePoint) Mode Cnt Score Error Units > Characters.isDigitRandom 1632 avgt 15 5.377 ± 0.322 ns/op > > > Seems the PR still has a small improvement for this scenario. The > StringLatin1.canEncode regression disappears. > > In the real world ASCII/Latin1 seems to dominate most data, so this scenario > is perhaps not very realistic. > > I'm running this on a Mac, so cannot try `-prof perfnorm`. Many thanks to have tried, yep, I was curious indeed re the "StringLatin1.canEncode regression" case. I would still modify the benchmark to use inputs (I know that will make it memory bound sadly, due to reading inputs - but the size of such inputs can be a benchmark parameter, together with the bias eg "latin","mix", "non-latin") "semi-randomly" generated based on the mentioned strategies/biases. It will benefit future tests on this, although could be provided as a separate PR. - PR: https://git.openjdk.org/jdk/pull/13040
Re: RFR: 8304245: Speed up CharacterData.of by avoiding bit shifting in the latin1 fast-path test
On Wed, 15 Mar 2023 12:37:24 GMT, Francesco Nigro wrote: >>> It seems reasonable to keep these two in sync, yes. (`CharacterData.of` >>> could even call into `StringLatin1.canEncode`, unless that's cause for some >>> performance anomaly) >> >> If I update `StringLatin1.canEncode` and call into that from >> `CharacterData.of`, I observe no regression for the Latin1 case, but a >> significant regression for the non-Latin1 case. I have no idea how to >> explain that: >> >> >> Benchmark (codePoint) Mode Cnt Score Error Units >> Characters.isDigit 48 avgt 15 0.675 ± 0.029 ns/op >> Characters.isDigit 1632 avgt 15 2.435 ± 0.032 ns/op > > Can you check what happen adding much more inputs to the dataset including > non-latin chars as well and use `-prof perfnorm` to check what `perf` report > re branches/branch-misses? > > You can use `SplittableRandom` to pre-populate an array of inputs which > sequence is "random" but still allow deterministic benchmarking and feed the > benchmark method by cycling the pre-computed inputs. > In the real world I expect `isDigit` to happen on different input types and > both having C2 with both branches places based on prev inputs distribution > and a confused branch-predictor to allow comparing vs something that looks a > bit nearest to the real world (TBD, I know). > I expect in that case that a single cmp + mask to work better depending on > latin input distribution/occurrence I created a randomized version of `Characters.isDigit` which tests with code points picked at random such that any category (Latin1, negative, different planes, unassiged) are equally probable. Baseline: Benchmark (codePoint) Mode Cnt Score Error Units Characters.isDigitRandom 1632 avgt 15 5.503 ± 0.371 ns/op Current PR: Benchmark (codePoint) Mode Cnt Score Error Units Characters.isDigitRandom 1632 avgt 15 5.393 ± 0.336 ns/op Using StringLatin1.canEncode: Benchmark (codePoint) Mode Cnt Score Error Units Characters.isDigitRandom 1632 avgt 15 5.377 ± 0.322 ns/op Seems the PR still has a small improvement for this scenario. The StringLatin1.canEncode regression disappears. In the real world ASCII/Latin1 seems to dominate most data, so this scenario is perhaps not very realistic. I'm running this on a Mac, so cannot try `-prof perfnorm`. - PR: https://git.openjdk.org/jdk/pull/13040
Re: RFR: 8304245: Speed up CharacterData.of by avoiding bit shifting in the latin1 fast-path test
On Wed, 15 Mar 2023 12:28:05 GMT, Eirik Bjorsnos wrote: >>> `if (ch && 0xFF00 == 0) {` >> >> This seems to perform similar to baseline: >> >> >> Benchmark (codePoint) Mode Cnt Score Error Units >> Characters.isDigit 48 avgt 15 0.890 ± 0.025 ns/op >> Characters.isDigit 1632 avgt 15 2.174 ± 0.011 ns/op >> >> >> Would be interesting to check the performance on non-Intel architectures. If >> you want to give it a spin on your M1, here's the benchmark command I used: >> >> `make test TEST='micro:java.lang.Characters.isDigit' MICRO="OPTIONS=-p >> codePoint=48,1632"` > >> It seems reasonable to keep these two in sync, yes. (`CharacterData.of` >> could even call into `StringLatin1.canEncode`, unless that's cause for some >> performance anomaly) > > If I update `StringLatin1.canEncode` and call into that from > `CharacterData.of`, I observe no regression for the Latin1 case, but a > significant regression for the non-Latin1 case. I have no idea how to explain > that: > > > Benchmark (codePoint) Mode Cnt Score Error Units > Characters.isDigit 48 avgt 15 0.675 ± 0.029 ns/op > Characters.isDigit 1632 avgt 15 2.435 ± 0.032 ns/op Can you check what happen adding much more inputs to the dataset that includes non-latin chars as well and use `-prof perfnorm` to check what `perf` report re branches/branch-misses? You can use SplittableRandom to pre-populate an array of inputs which sequence is "random" but still allow deterministic benchmarking and feed the benchmark method cycling the pre-computed inputs - PR: https://git.openjdk.org/jdk/pull/13040
Re: RFR: 8304245: Speed up CharacterData.of by avoiding bit shifting in the latin1 fast-path test
On Wed, 15 Mar 2023 12:15:39 GMT, Eirik Bjorsnos wrote: >> It seems reasonable to keep these two in sync, yes. (`CharacterData.of` >> could even call into `StringLatin1.canEncode`, unless that's cause for some >> performance anomaly) > >> `if (ch && 0xFF00 == 0) {` > > This seems to perform similar to baseline: > > > Benchmark (codePoint) Mode Cnt Score Error Units > Characters.isDigit 48 avgt 15 0.890 ± 0.025 ns/op > Characters.isDigit 1632 avgt 15 2.174 ± 0.011 ns/op > > > Would be interesting to check the performance on non-Intel architectures. If > you want to give it a spin on your M1, here's the benchmark command I used: > > `make test TEST='micro:java.lang.Characters.isDigit' MICRO="OPTIONS=-p > codePoint=48,1632"` > It seems reasonable to keep these two in sync, yes. (`CharacterData.of` could > even call into `StringLatin1.canEncode`, unless that's cause for some > performance anomaly) If I update `StringLatin1.canEncode` and call into that from `CharacterData.of`, I observe no regression for the Latin1 case, but a significant regression for the non-Latin1 case. I have no idea how to explain that: Benchmark (codePoint) Mode Cnt Score Error Units Characters.isDigit 48 avgt 15 0.675 ± 0.029 ns/op Characters.isDigit 1632 avgt 15 2.435 ± 0.032 ns/op - PR: https://git.openjdk.org/jdk/pull/13040
Re: RFR: 8304245: Speed up CharacterData.of by avoiding bit shifting in the latin1 fast-path test
On Wed, 15 Mar 2023 12:13:00 GMT, Claes Redestad wrote: >> Btw, I think we can do the same for `StringLatin1.canEncode()` > > It seems reasonable to keep these two in sync, yes. (`CharacterData.of` could > even call into `StringLatin1.canEncode`, unless that's cause for some > performance anomaly) > `if (ch && 0xFF00 == 0) {` This seems to perform similar to baseline: Benchmark (codePoint) Mode Cnt Score Error Units Characters.isDigit 48 avgt 15 0.890 ± 0.025 ns/op Characters.isDigit 1632 avgt 15 2.174 ± 0.011 ns/op Would be interesting to check the performance on non-Intel architectures. If you want to give it a spin on your M1, here's the benchmark command I used: `make test TEST='micro:java.lang.Characters.isDigit' MICRO="OPTIONS=-p codePoint=48,1632"` - PR: https://git.openjdk.org/jdk/pull/13040
Re: RFR: 8304245: Speed up CharacterData.of by avoiding bit shifting in the latin1 fast-path test
On Wed, 15 Mar 2023 12:07:12 GMT, Sergey Tsypanov wrote: >> src/java.base/share/classes/java/lang/CharacterData.java line 72: >> >>> 70: >>> 71: static final CharacterData of(int ch) { >>> 72: if (ch >= 0 && ch <= 0xFF) { // fast-path >> >> Maybereducing to a single branch with a mask op helps further? Or maybe the >> JIT already effectively does that: >> >> `if (ch && 0xFF00 == 0) {` > > Btw, I think we can do the same for `StringLatin1.canEncode()` It seems reasonable to keep these two in sync, yes. (`CharacterData.of` could even call into `StringLatin1.canEncode`, unless that's cause for some performance anomaly) - PR: https://git.openjdk.org/jdk/pull/13040
Re: RFR: 8304245: Speed up CharacterData.of by avoiding bit shifting in the latin1 fast-path test
On Wed, 15 Mar 2023 11:58:14 GMT, Claes Redestad wrote: >> By avoiding a bit shift operation for the latin1 fast-path test, we can >> speed up the `java.lang.CharacterData.of` method by ~25% for latin1 code >> points. >> >> The latin1 test is currently implemented as `ch >>> 8 == 0`. We can replace >> this with `ch >= 0 && ch <= 0xFF` for a noticable performance gain >> (especially for Latin1 code points): >> >> This method is called frequently by various property-determining methods in >> `java.lang.Character` like `isLowerCase`, `isDigit` etc, so one should >> expect improvements for all these methods. >> >> Performance is tested using the `Characters.isDigit` benchmark using the >> digits '0' (decimal 48, in CharacterDataLatin1) and '\u0660' (decimal 1632, >> in CharacterData00): >> >> Baseline: >> >> >> Benchmark (codePoint) Mode Cnt Score Error Units >> Characters.isDigit 48 avgt 15 0.870 ± 0.011 ns/op >> Characters.isDigit 1632 avgt 15 2.168 ± 0.017 ns/op >> >> PR: >> >> >> Benchmark (codePoint) Mode Cnt Score Error Units >> Characters.isDigit 48 avgt 15 0.654 ± 0.007 ns/op >> Characters.isDigit 1632 avgt 15 2.032 ± 0.019 ns/op > > src/java.base/share/classes/java/lang/CharacterData.java line 72: > >> 70: >> 71: static final CharacterData of(int ch) { >> 72: if (ch >= 0 && ch <= 0xFF) { // fast-path > > Maybereducing to a single branch with a mask op helps further? Or maybe the > JIT already effectively does that: > > `if (ch && 0xFF00 == 0) {` Btw, I think we can do the same for `StringLatin1.canEncode()` - PR: https://git.openjdk.org/jdk/pull/13040
Re: RFR: 8304245: Speed up CharacterData.of by avoiding bit shifting in the latin1 fast-path test
On Wed, 15 Mar 2023 11:26:21 GMT, Eirik Bjorsnos wrote: > By avoiding a bit shift operation for the latin1 fast-path test, we can speed > up the `java.lang.CharacterData.of` method by ~25% for latin1 code points. > > The latin1 test is currently implemented as `ch >>> 8 == 0`. We can replace > this with `ch >= 0 && ch <= 0xFF` for a noticable performance gain > (especially for Latin1 code points): > > This method is called frequently by various property-determining methods in > `java.lang.Character` like `isLowerCase`, `isDigit` etc, so one should expect > improvements for all these methods. > > Performance is tested using the `Characters.isDigit` benchmark using the > digits '0' (decimal 48, in CharacterDataLatin1) and '\u0660' (decimal 1632, > in CharacterData00): > > Baseline: > > > Benchmark (codePoint) Mode Cnt Score Error Units > Characters.isDigit 48 avgt 15 0.870 ± 0.011 ns/op > Characters.isDigit 1632 avgt 15 2.168 ± 0.017 ns/op > > PR: > > > Benchmark (codePoint) Mode Cnt Score Error Units > Characters.isDigit 48 avgt 15 0.654 ± 0.007 ns/op > Characters.isDigit 1632 avgt 15 2.032 ± 0.019 ns/op Nice one! src/java.base/share/classes/java/lang/CharacterData.java line 72: > 70: > 71: static final CharacterData of(int ch) { > 72: if (ch >= 0 && ch <= 0xFF) { // fast-path Maybereducing to a single branch with a mask op helps further? Or maybe the JIT already effectively does that: `if (ch && 0xFF00 == 0) {` - Marked as reviewed by redestad (Reviewer). PR: https://git.openjdk.org/jdk/pull/13040
Re: RFR: 8304245: Speed up CharacterData.of by avoiding bit shifting in the latin1 fast-path test
On Wed, 15 Mar 2023 11:26:21 GMT, Eirik Bjorsnos wrote: > By avoiding a bit shift operation for the latin1 fast-path test, we can speed > up the `java.lang.CharacterData.of` method by ~25% for latin1 code points. > > The latin1 test is currently implemented as `ch >>> 8 == 0`. We can replace > this with `ch >= 0 && ch <= 0xFF` for a noticable performance gain > (especially for Latin1 code points): > > This method is called frequently by various property-determining methods in > `java.lang.Character` like `isLowerCase`, `isDigit` etc, so one should expect > improvements for all these methods. > > Performance is tested using the `Characters.isDigit` benchmark using the > digits '0' (decimal 48, in CharacterDataLatin1) and '\u0660' (decimal 1632, > in CharacterData00): > > Baseline: > > > Benchmark (codePoint) Mode Cnt Score Error Units > Characters.isDigit 48 avgt 15 0.870 ± 0.011 ns/op > Characters.isDigit 1632 avgt 15 2.168 ± 0.017 ns/op > > PR: > > > Benchmark (codePoint) Mode Cnt Score Error Units > Characters.isDigit 48 avgt 15 0.654 ± 0.007 ns/op > Characters.isDigit 1632 avgt 15 2.032 ± 0.019 ns/op Marked as reviewed by stsypanov (Author). - PR: https://git.openjdk.org/jdk/pull/13040
RFR: 8304245: Speed up CharacterData.of by avoiding bit shifting in the latin1 fast-path test
By avoiding a bit shift operation for the latin1 fast-path test, we can speed up the `java.lang.CharacterData.of` method by ~25% for latin1 code points. The latin1 test is currently implemented as `ch >>> 8 == 0`. We can replace this with `ch >= 0 && ch <= 0xFF` for a noticable performance gain (especially for ASCII chars): This methods is called frequently by various property-determining methods in `java.lang.Character` like `isLowerCase`, `isDigit` etc, so one should expect improvements for all these methods. Performance is tested using the `Characters.isDigit` benchmark using the digits '0' (decimal 48, in CharacterDataLatin1) and '\u0660' (decimal 1632, in CharacterData00): Baseline: Benchmark (codePoint) Mode Cnt Score Error Units Characters.isDigit 48 avgt 15 0.870 ± 0.011 ns/op Characters.isDigit 1632 avgt 15 2.168 ± 0.017 ns/op PR: Benchmark (codePoint) Mode Cnt Score Error Units Characters.isDigit 48 avgt 15 0.654 ± 0.007 ns/op Characters.isDigit 1632 avgt 15 2.032 ± 0.019 ns/op - Commit messages: - Update copyright year - Avoid bit shifting in the CharacterDataLatin1 fast-path Changes: https://git.openjdk.org/jdk/pull/13040/files Webrev: https://webrevs.openjdk.org/?repo=jdk=13040=00 Issue: https://bugs.openjdk.org/browse/JDK-8304245 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13040.diff Fetch: git fetch https://git.openjdk.org/jdk pull/13040/head:pull/13040 PR: https://git.openjdk.org/jdk/pull/13040