Re: RFR: 8304245: Speed up CharacterData.of by avoiding bit shifting in the latin1 fast-path test [v2]

2023-04-07 Thread Eirik Bjorsnos
On Wed, 15 Mar 2023 14:31:03 GMT, Eirik Bjorsnos  wrote:

>> By avoiding a bit shift operation for the latin1 fast-path test, we can 
>> speed up the `java.lang.CharacterData.of` method by ~25% for latin1 code 
>> points.
>> 
>> The latin1 test is currently implemented as `ch >>> 8 == 0`.  We can replace 
>> this with `ch >= 0 && ch <= 0xFF` for a noticable performance gain 
>> (especially for Latin1 code points):
>> 
>> This method is called frequently by various property-determining methods in 
>> `java.lang.Character` like `isLowerCase`, `isDigit` etc, so one should 
>> expect improvements for all these methods.
>> 
>> Performance is tested using the `Characters.isDigit` benchmark using the 
>> digits '0' (decimal 48, in CharacterDataLatin1) and '\u0660' (decimal 1632, 
>> in CharacterData00):
>> 
>> Baseline:
>> 
>> 
>> Benchmark   (codePoint)  Mode  Cnt  Score   Error  Units
>> Characters.isDigit   48  avgt   15  0.870 ± 0.011  ns/op
>> Characters.isDigit 1632  avgt   15  2.168 ± 0.017  ns/op
>> 
>> PR:
>> 
>> 
>> Benchmark   (codePoint)  Mode  Cnt  Score   Error  Units
>> Characters.isDigit   48  avgt   15  0.654 ± 0.007  ns/op
>> Characters.isDigit 1632  avgt   15  2.032 ± 0.019  ns/op
>
> Eirik Bjorsnos has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Update StringLatin1.canEncode to sync with same test in CharacterData.of

I'm withdrawing this PR since its merits seem suspect.

-

PR Comment: https://git.openjdk.org/jdk/pull/13040#issuecomment-1500087677


Re: RFR: 8304245: Speed up CharacterData.of by avoiding bit shifting in the latin1 fast-path test [v2]

2023-03-18 Thread Eirik Bjorsnos
On Sat, 18 Mar 2023 17:06:52 GMT, jmehrens  wrote:

>> We could, but benchmarks show no performance improvements over the current 
>> PR. I think the current code is perhaps slightly more readable.
>
> Does non-short-circuit logical AND operator perform similar to baseline?  In 
> this very specific case there is no risk of side effects due to method calls, 
> no risk of null pointer exception because of primitives, and there is not an 
> excessive amount of conditions.  Therefore, I would think either `&` or `&&` 
> are viable options.
> 
> `if (ch >= 0 & ch <= 0xFF) {`
> 
> I ask just for the sake of my own learning on this topic as it relates to 
> performance as opposed to code readability preferences.

Both choices of operators en up with identical generated code on my Intel Mac.

For reference, I did a comparison between the generated code for the current 
code, and the one in this PR:

Current generated code start like this:


movl%eax, -0x14000(%rsp)
pushq   %rbp
subq$0x20, %rsp
cmpl$1, 0x20(%r15)
jne 0x119cae072
movl%esi, %r11d
shrl$8, %r11d
testl   %r11d, %r11d
jne 0x119cae043
movabsq $0x70fe03c90, %rax  ;   {oop(a 
'java/lang/CharacterDataLatin1'{0x00070fe03c90})}
``` 

While the PR generated code starts like this:


movl%eax, -0x14000(%rsp)
pushq   %rbp
subq$0x20, %rsp
cmpl$1, 0x20(%r15)
jne 0x1171d41f2
nopw(%rax, %rax)
cmpl$0x100, %esi
jae 0x1171d41c5
movabsq $0x70fe03c90, %rax  ;   {oop(a 
'java/lang/CharacterDataLatin1'{0x00070fe03c90})}


The rest of the code seems identical.

-

PR: https://git.openjdk.org/jdk/pull/13040


Re: RFR: 8304245: Speed up CharacterData.of by avoiding bit shifting in the latin1 fast-path test [v2]

2023-03-15 Thread Eirik Bjorsnos
On Wed, 15 Mar 2023 14:56:28 GMT, Eirik Bjorsnos  wrote:

>> Eirik Bjorsnos has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   Update StringLatin1.canEncode to sync with same test in CharacterData.of
>
> Just for fun, I tried with a benchmark where the code point is Latin1 in 
> every other call:
> 
> 
> @Benchmark
> public void isDigitVarying(Blackhole blackhole) {
> blackhole.consume(Character.isDigit(48));
> blackhole.consume(Character.isDigit(1632));
> }
> 
> 
> With this benchmark, there is no difference between the baseline, the PR and 
> using StringLatin1.canEncode:
> 
> Baseline:
> 
> 
> Benchmark  (codePoint)  Mode  Cnt  Score   Error  Units
> Characters.isDigitVarying 1632  avgt   15  1.198 ± 0.056  ns/op
> 
> 
> PR:
> 
> 
> Benchmark  (codePoint)  Mode  Cnt  Score   Error  Units
> Characters.isDigitVarying 1632  avgt   15  1.195 ± 0.058  ns/op
> 
> 
> StringLatin1.canEncode:
> 
> 
> Benchmark  (codePoint)  Mode  Cnt  Score   Error  Units
> Characters.isDigitVarying 1632  avgt   15  1.193 ± 0.055  ns/op
> ``` 
> 
> At this point, I'm starting to wonder a bit if the performance benefits 
> suggested by this PR might be dubious and will only surface in very narrow 
> benchmarks. On the other hand, it does not seem harmful either. What do 
> people think?

> @eirbjo I suggest to add `-prof perfasm` and see what's going on; I suspect 
> it's better that inputs are loaded from `@State` field...and don't use 
> `BlackHole` but combine the results of the 2 operations (although I remember 
> that compiler assisted black holes are not as noisy as I'm used to observe 
> with JDK 11 ones)

I think I'm done with benchmarking for this PR. Any StringLatin1.canEncode 
method call regression can be solved outside this PR.

The PR as it stands seems to have some benefit for Latin1-only apps/use cases, 
which I think is not uncommon. Any benefit for mixed cases are more dubious, 
but don't seem to hurt.

-

PR: https://git.openjdk.org/jdk/pull/13040


Re: RFR: 8304245: Speed up CharacterData.of by avoiding bit shifting in the latin1 fast-path test [v2]

2023-03-15 Thread Eirik Bjorsnos
On Wed, 15 Mar 2023 14:57:43 GMT, Quan Anh Mai  wrote:

>>> The StringLatin1.canEncode regression disappears.
>> 
>> I mixed things up so StringLatin1.canEncode was benchmarked without the 
>> updated code.
>> 
>> Here are updated benchmark results:
>> 
>> 
>> Baseline:
>> 
>> 
>> Benchmark (codePoint)  Mode  Cnt  Score   Error  Units
>> Characters.isDigitRandom 1632  avgt   15  5.437 ± 0.235  ns/op
>> 
>> 
>> PR:
>> 
>> 
>> Benchmark (codePoint)  Mode  Cnt  Score   Error  Units
>> Characters.isDigitRandom 1632  avgt   15  5.319 ± 0.341  ns/op
>> 
>> 
>> StringLatin1.canEncode:
>> 
>> 
>> Benchmark (codePoint)  Mode  Cnt  Score   Error  Units
>> Characters.isDigitRandom 1632  avgt   15  5.447 ± 0.304  ns/op
>> ``` 
>> 
>> So it seems using StringLatin1.canEncode still might have a regression also 
>> in the randomized input case.
>> 
>> For this PR, I suggest we update StringLatin1.canEncode to be in sync with 
>> CharacterData.of, without one calling the other. If anyone wants to 
>> investigate the regression further, than can be done outside this PR.
>> 
>> I have independently verified that StringLatin1.canEncode sees performance 
>> improvements using the StringIndexOf benchmark.
>
> We can do `Integer.compareUnsigned(ch, 0xFF) <= 0`

We could, but benchmarks show no performance improvements over the current PR. 
I think the current code is perhaps slightly more readable.

-

PR: https://git.openjdk.org/jdk/pull/13040


Re: RFR: 8304245: Speed up CharacterData.of by avoiding bit shifting in the latin1 fast-path test [v2]

2023-03-15 Thread Francesco Nigro
On Wed, 15 Mar 2023 14:56:28 GMT, Eirik Bjorsnos  wrote:

>> Eirik Bjorsnos has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   Update StringLatin1.canEncode to sync with same test in CharacterData.of
>
> Just for fun, I tried with a benchmark where the code point is Latin1 in 
> every other call:
> 
> 
> @Benchmark
> public void isDigitVarying(Blackhole blackhole) {
> blackhole.consume(Character.isDigit(48));
> blackhole.consume(Character.isDigit(1632));
> }
> 
> 
> With this benchmark, there is no difference between the baseline, the PR and 
> using StringLatin1.canEncode:
> 
> Baseline:
> 
> 
> Benchmark  (codePoint)  Mode  Cnt  Score   Error  Units
> Characters.isDigitVarying 1632  avgt   15  1.198 ± 0.056  ns/op
> 
> 
> PR:
> 
> 
> Benchmark  (codePoint)  Mode  Cnt  Score   Error  Units
> Characters.isDigitVarying 1632  avgt   15  1.195 ± 0.058  ns/op
> 
> 
> StringLatin1.canEncode:
> 
> 
> Benchmark  (codePoint)  Mode  Cnt  Score   Error  Units
> Characters.isDigitVarying 1632  avgt   15  1.193 ± 0.055  ns/op
> ``` 
> 
> At this point, I'm starting to wonder a bit if the performance benefits 
> suggested by this PR might be dubious and will only surface in very narrow 
> benchmarks. On the other hand, it does not seem harmful either. What do 
> people think?

@eirbjo I suggest to add `-prof perfasm` and see what's going on; I suspect 
it's better that inputs are loaded from `@State` field...and don't use 
`BlackHole` but combine the results of the 2 operations (although I remember 
that compiler assisted black holes are not as noisy as I'm used to observe with 
JDK 11 ones)

-

PR: https://git.openjdk.org/jdk/pull/13040


Re: RFR: 8304245: Speed up CharacterData.of by avoiding bit shifting in the latin1 fast-path test [v2]

2023-03-15 Thread Quan Anh Mai
On Wed, 15 Mar 2023 14:23:35 GMT, Eirik Bjorsnos  wrote:

>> Many thanks to have tried, yep, I was curious indeed re the 
>> "StringLatin1.canEncode regression" case.
>> I would still modify the benchmark to use inputs (I know that will make it 
>> memory bound sadly, due to reading inputs - but the size of such inputs can 
>> be a benchmark parameter, together with the bias eg "latin","mix", 
>> "non-latin") "semi-randomly" generated based on the mentioned 
>> strategies/biases. 
>> It will benefit future tests on this, although could be provided as a 
>> separate PR.
>
>> The StringLatin1.canEncode regression disappears.
> 
> I mixed things up so StringLatin1.canEncode was benchmarked without the 
> updated code.
> 
> Here are updated benchmark results:
> 
> 
> Baseline:
> 
> 
> Benchmark (codePoint)  Mode  Cnt  Score   Error  Units
> Characters.isDigitRandom 1632  avgt   15  5.437 ± 0.235  ns/op
> 
> 
> PR:
> 
> 
> Benchmark (codePoint)  Mode  Cnt  Score   Error  Units
> Characters.isDigitRandom 1632  avgt   15  5.319 ± 0.341  ns/op
> 
> 
> StringLatin1.canEncode:
> 
> 
> Benchmark (codePoint)  Mode  Cnt  Score   Error  Units
> Characters.isDigitRandom 1632  avgt   15  5.447 ± 0.304  ns/op
> ``` 
> 
> So it seems using StringLatin1.canEncode still might have a regression also 
> in the randomized input case.
> 
> For this PR, I suggest we update StringLatin1.canEncode to be in sync with 
> CharacterData.of, without one calling the other. If anyone wants to 
> investigate the regression further, than can be done outside this PR.
> 
> I have independently verified that StringLatin1.canEncode sees performance 
> improvements using the StringIndexOf benchmark.

We can do `Integer.compareUnsigned(ch, 0xFF) <= 0`

-

PR: https://git.openjdk.org/jdk/pull/13040


Re: RFR: 8304245: Speed up CharacterData.of by avoiding bit shifting in the latin1 fast-path test [v2]

2023-03-15 Thread Eirik Bjorsnos
On Wed, 15 Mar 2023 14:31:03 GMT, Eirik Bjorsnos  wrote:

>> By avoiding a bit shift operation for the latin1 fast-path test, we can 
>> speed up the `java.lang.CharacterData.of` method by ~25% for latin1 code 
>> points.
>> 
>> The latin1 test is currently implemented as `ch >>> 8 == 0`.  We can replace 
>> this with `ch >= 0 && ch <= 0xFF` for a noticable performance gain 
>> (especially for Latin1 code points):
>> 
>> This method is called frequently by various property-determining methods in 
>> `java.lang.Character` like `isLowerCase`, `isDigit` etc, so one should 
>> expect improvements for all these methods.
>> 
>> Performance is tested using the `Characters.isDigit` benchmark using the 
>> digits '0' (decimal 48, in CharacterDataLatin1) and '\u0660' (decimal 1632, 
>> in CharacterData00):
>> 
>> Baseline:
>> 
>> 
>> Benchmark   (codePoint)  Mode  Cnt  Score   Error  Units
>> Characters.isDigit   48  avgt   15  0.870 ± 0.011  ns/op
>> Characters.isDigit 1632  avgt   15  2.168 ± 0.017  ns/op
>> 
>> PR:
>> 
>> 
>> Benchmark   (codePoint)  Mode  Cnt  Score   Error  Units
>> Characters.isDigit   48  avgt   15  0.654 ± 0.007  ns/op
>> Characters.isDigit 1632  avgt   15  2.032 ± 0.019  ns/op
>
> Eirik Bjorsnos has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Update StringLatin1.canEncode to sync with same test in CharacterData.of

Just for fun, I tried with a benchmark where the code point is Latin1 in every 
other call:


@Benchmark
public void isDigitVarying(Blackhole blackhole) {
blackhole.consume(Character.isDigit(48));
blackhole.consume(Character.isDigit(1632));
}


With this benchmark, there is no difference between the baseline, the PR and 
using StringLatin1.canEncode:

Baseline:


Benchmark  (codePoint)  Mode  Cnt  Score   Error  Units
Characters.isDigitVarying 1632  avgt   15  1.198 ± 0.056  ns/op


PR:


Benchmark  (codePoint)  Mode  Cnt  Score   Error  Units
Characters.isDigitVarying 1632  avgt   15  1.195 ± 0.058  ns/op


StringLatin1.canEncode:


Benchmark  (codePoint)  Mode  Cnt  Score   Error  Units
Characters.isDigitVarying 1632  avgt   15  1.193 ± 0.055  ns/op
``` 

At this point, I'm starting to wonder a bit if the performance benefits 
suggested by this PR might be dubious and will only surface in very narrow 
benchmarks. On the other hand, it does not seem harmful either. What do people 
think?

-

PR: https://git.openjdk.org/jdk/pull/13040


Re: RFR: 8304245: Speed up CharacterData.of by avoiding bit shifting in the latin1 fast-path test [v2]

2023-03-15 Thread Sergey Tsypanov
On Wed, 15 Mar 2023 14:31:03 GMT, Eirik Bjorsnos  wrote:

>> By avoiding a bit shift operation for the latin1 fast-path test, we can 
>> speed up the `java.lang.CharacterData.of` method by ~25% for latin1 code 
>> points.
>> 
>> The latin1 test is currently implemented as `ch >>> 8 == 0`.  We can replace 
>> this with `ch >= 0 && ch <= 0xFF` for a noticable performance gain 
>> (especially for Latin1 code points):
>> 
>> This method is called frequently by various property-determining methods in 
>> `java.lang.Character` like `isLowerCase`, `isDigit` etc, so one should 
>> expect improvements for all these methods.
>> 
>> Performance is tested using the `Characters.isDigit` benchmark using the 
>> digits '0' (decimal 48, in CharacterDataLatin1) and '\u0660' (decimal 1632, 
>> in CharacterData00):
>> 
>> Baseline:
>> 
>> 
>> Benchmark   (codePoint)  Mode  Cnt  Score   Error  Units
>> Characters.isDigit   48  avgt   15  0.870 ± 0.011  ns/op
>> Characters.isDigit 1632  avgt   15  2.168 ± 0.017  ns/op
>> 
>> PR:
>> 
>> 
>> Benchmark   (codePoint)  Mode  Cnt  Score   Error  Units
>> Characters.isDigit   48  avgt   15  0.654 ± 0.007  ns/op
>> Characters.isDigit 1632  avgt   15  2.032 ± 0.019  ns/op
>
> Eirik Bjorsnos has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Update StringLatin1.canEncode to sync with same test in CharacterData.of

Marked as reviewed by stsypanov (Author).

-

PR: https://git.openjdk.org/jdk/pull/13040


Re: RFR: 8304245: Speed up CharacterData.of by avoiding bit shifting in the latin1 fast-path test [v2]

2023-03-15 Thread Eirik Bjorsnos
On Wed, 15 Mar 2023 12:00:59 GMT, Claes Redestad  wrote:

>> Eirik Bjorsnos has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   Update StringLatin1.canEncode to sync with same test in CharacterData.of
>
> Nice one!

@cl4es Do you agree that the StringLatin1.canEncode change should be included 
in this PR? If so, can you approve the latest update?

-

PR: https://git.openjdk.org/jdk/pull/13040


Re: RFR: 8304245: Speed up CharacterData.of by avoiding bit shifting in the latin1 fast-path test [v2]

2023-03-15 Thread Eirik Bjorsnos
On Wed, 15 Mar 2023 13:50:44 GMT, Francesco Nigro  wrote:

>> I created a randomized version of `Characters.isDigit` which tests with code 
>> points picked at random such that any category (Latin1, negative, different 
>> planes, unassiged) are equally probable.
>> 
>> Baseline:
>> 
>> 
>> Benchmark (codePoint)  Mode  Cnt  Score   Error  Units
>> Characters.isDigitRandom 1632  avgt   15  5.503 ± 0.371  ns/op
>> 
>> 
>> Current PR:
>> 
>> 
>> Benchmark (codePoint)  Mode  Cnt  Score   Error  Units
>> Characters.isDigitRandom 1632  avgt   15  5.393 ± 0.336  ns/op
>> 
>> 
>> Using StringLatin1.canEncode:
>> 
>> 
>> Benchmark (codePoint)  Mode  Cnt  Score   Error  Units
>> Characters.isDigitRandom 1632  avgt   15  5.377 ± 0.322  ns/op
>> 
>> 
>> Seems the PR still has a small improvement for this scenario. The 
>> StringLatin1.canEncode regression disappears.
>> 
>> In the real world ASCII/Latin1 seems to dominate most data, so this scenario 
>> is perhaps not very realistic.
>> 
>> I'm running this on a Mac, so cannot try `-prof perfnorm`.
>
> Many thanks to have tried, yep, I was curious indeed re the 
> "StringLatin1.canEncode regression" case.
> I would still modify the benchmark to use inputs (I know that will make it 
> memory bound sadly, due to reading inputs - but the size of such inputs can 
> be a benchmark parameter, together with the bias eg "latin","mix", 
> "non-latin") "semi-randomly" generated based on the mentioned 
> strategies/biases. 
> It will benefit future tests on this, although could be provided as a 
> separate PR.

> The StringLatin1.canEncode regression disappears.

I mixed things up so StringLatin1.canEncode was benchmarked without the updated 
code.

Here are updated benchmark results:


Baseline:


Benchmark (codePoint)  Mode  Cnt  Score   Error  Units
Characters.isDigitRandom 1632  avgt   15  5.437 ± 0.235  ns/op


PR:


Benchmark (codePoint)  Mode  Cnt  Score   Error  Units
Characters.isDigitRandom 1632  avgt   15  5.319 ± 0.341  ns/op


StringLatin1.canEncode:


Benchmark (codePoint)  Mode  Cnt  Score   Error  Units
Characters.isDigitRandom 1632  avgt   15  5.447 ± 0.304  ns/op
``` 

So it seems using StringLatin1.canEncode still might have a regression also in 
the randomized input case.

For this PR, I suggest we update StringLatin1.canEncode to be in sync with 
CharacterData.of, without one calling the other. If anyone wants to investigate 
the regression further, than can be done outside this PR.

I have independently verified that StringLatin1.canEncode sees performance 
improvements using the StringIndexOf benchmark.

-

PR: https://git.openjdk.org/jdk/pull/13040


Re: RFR: 8304245: Speed up CharacterData.of by avoiding bit shifting in the latin1 fast-path test [v2]

2023-03-15 Thread Eirik Bjorsnos
> By avoiding a bit shift operation for the latin1 fast-path test, we can speed 
> up the `java.lang.CharacterData.of` method by ~25% for latin1 code points.
> 
> The latin1 test is currently implemented as `ch >>> 8 == 0`.  We can replace 
> this with `ch >= 0 && ch <= 0xFF` for a noticable performance gain 
> (especially for Latin1 code points):
> 
> This method is called frequently by various property-determining methods in 
> `java.lang.Character` like `isLowerCase`, `isDigit` etc, so one should expect 
> improvements for all these methods.
> 
> Performance is tested using the `Characters.isDigit` benchmark using the 
> digits '0' (decimal 48, in CharacterDataLatin1) and '\u0660' (decimal 1632, 
> in CharacterData00):
> 
> Baseline:
> 
> 
> Benchmark   (codePoint)  Mode  Cnt  Score   Error  Units
> Characters.isDigit   48  avgt   15  0.870 ± 0.011  ns/op
> Characters.isDigit 1632  avgt   15  2.168 ± 0.017  ns/op
> 
> PR:
> 
> 
> Benchmark   (codePoint)  Mode  Cnt  Score   Error  Units
> Characters.isDigit   48  avgt   15  0.654 ± 0.007  ns/op
> Characters.isDigit 1632  avgt   15  2.032 ± 0.019  ns/op

Eirik Bjorsnos has updated the pull request incrementally with one additional 
commit since the last revision:

  Update StringLatin1.canEncode to sync with same test in CharacterData.of

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/13040/files
  - new: https://git.openjdk.org/jdk/pull/13040/files/0174440e..0baf690e

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk=13040=01
 - incr: https://webrevs.openjdk.org/?repo=jdk=13040=00-01

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/13040.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/13040/head:pull/13040

PR: https://git.openjdk.org/jdk/pull/13040


Re: RFR: 8304245: Speed up CharacterData.of by avoiding bit shifting in the latin1 fast-path test

2023-03-15 Thread Francesco Nigro
On Wed, 15 Mar 2023 13:42:22 GMT, Eirik Bjorsnos  wrote:

>> Can you check what happen adding much more inputs to the dataset including 
>> non-latin chars as well and use `-prof perfnorm` to check what `perf` report 
>> re branches/branch-misses?
>> 
>> You can use `SplittableRandom` to pre-populate an array of inputs which 
>> sequence is "random" but still allow deterministic benchmarking and feed the 
>> benchmark method by cycling the pre-computed inputs.
>> In the real world I expect `isDigit` to happen on different input types and 
>> both having C2 with both branches places based on prev inputs distribution 
>> and a confused branch-predictor to allow comparing vs something that looks a 
>> bit nearest to the real world (TBD, I know).
>> I expect in that case that a single cmp + mask to work better depending on 
>> latin input distribution/occurrence
>
> I created a randomized version of `Characters.isDigit` which tests with code 
> points picked at random such that any category (Latin1, negative, different 
> planes, unassiged) are equally probable.
> 
> Baseline:
> 
> 
> Benchmark (codePoint)  Mode  Cnt  Score   Error  Units
> Characters.isDigitRandom 1632  avgt   15  5.503 ± 0.371  ns/op
> 
> 
> Current PR:
> 
> 
> Benchmark (codePoint)  Mode  Cnt  Score   Error  Units
> Characters.isDigitRandom 1632  avgt   15  5.393 ± 0.336  ns/op
> 
> 
> Using StringLatin1.canEncode:
> 
> 
> Benchmark (codePoint)  Mode  Cnt  Score   Error  Units
> Characters.isDigitRandom 1632  avgt   15  5.377 ± 0.322  ns/op
> 
> 
> Seems the PR still has a small improvement for this scenario. The 
> StringLatin1.canEncode regression disappears.
> 
> In the real world ASCII/Latin1 seems to dominate most data, so this scenario 
> is perhaps not very realistic.
> 
> I'm running this on a Mac, so cannot try `-prof perfnorm`.

Many thanks to have tried, yep, I was curious indeed re the 
"StringLatin1.canEncode regression" case.
I would still modify the benchmark to use inputs (I know that will make it 
memory bound sadly, due to reading inputs - but the size of such inputs can be 
a benchmark parameter, together with the bias eg "latin","mix", "non-latin") 
"semi-randomly" generated based on the mentioned strategies/biases. 
It will benefit future tests on this, although could be provided as a separate 
PR.

-

PR: https://git.openjdk.org/jdk/pull/13040


Re: RFR: 8304245: Speed up CharacterData.of by avoiding bit shifting in the latin1 fast-path test

2023-03-15 Thread Eirik Bjorsnos
On Wed, 15 Mar 2023 12:37:24 GMT, Francesco Nigro  wrote:

>>> It seems reasonable to keep these two in sync, yes. (`CharacterData.of` 
>>> could even call into `StringLatin1.canEncode`, unless that's cause for some 
>>> performance anomaly)
>> 
>> If I update `StringLatin1.canEncode` and call into that from 
>> `CharacterData.of`, I observe no regression for the Latin1 case, but a 
>> significant regression for the non-Latin1 case. I have no idea how to 
>> explain that:
>> 
>> 
>> Benchmark   (codePoint)  Mode  Cnt  Score   Error  Units
>> Characters.isDigit   48  avgt   15  0.675 ± 0.029  ns/op
>> Characters.isDigit 1632  avgt   15  2.435 ± 0.032  ns/op
>
> Can you check what happen adding much more inputs to the dataset including 
> non-latin chars as well and use `-prof perfnorm` to check what `perf` report 
> re branches/branch-misses?
> 
> You can use `SplittableRandom` to pre-populate an array of inputs which 
> sequence is "random" but still allow deterministic benchmarking and feed the 
> benchmark method by cycling the pre-computed inputs.
> In the real world I expect `isDigit` to happen on different input types and 
> both having C2 with both branches places based on prev inputs distribution 
> and a confused branch-predictor to allow comparing vs something that looks a 
> bit nearest to the real world (TBD, I know).
> I expect in that case that a single cmp + mask to work better depending on 
> latin input distribution/occurrence

I created a randomized version of `Characters.isDigit` which tests with code 
points picked at random such that any category (Latin1, negative, different 
planes, unassiged) are equally probable.

Baseline:


Benchmark (codePoint)  Mode  Cnt  Score   Error  Units
Characters.isDigitRandom 1632  avgt   15  5.503 ± 0.371  ns/op


Current PR:


Benchmark (codePoint)  Mode  Cnt  Score   Error  Units
Characters.isDigitRandom 1632  avgt   15  5.393 ± 0.336  ns/op


Using StringLatin1.canEncode:


Benchmark (codePoint)  Mode  Cnt  Score   Error  Units
Characters.isDigitRandom 1632  avgt   15  5.377 ± 0.322  ns/op


Seems the PR still has a small improvement for this scenario. The 
StringLatin1.canEncode regression disappears.

In the real world ASCII/Latin1 seems to dominate most data, so this scenario is 
perhaps not very realistic.

I'm running this on a Mac, so cannot try `-prof perfnorm`.

-

PR: https://git.openjdk.org/jdk/pull/13040


Re: RFR: 8304245: Speed up CharacterData.of by avoiding bit shifting in the latin1 fast-path test

2023-03-15 Thread Francesco Nigro
On Wed, 15 Mar 2023 12:28:05 GMT, Eirik Bjorsnos  wrote:

>>> `if (ch && 0xFF00 == 0) {`
>> 
>> This seems to perform similar to baseline:
>> 
>> 
>> Benchmark   (codePoint)  Mode  Cnt  Score   Error  Units
>> Characters.isDigit   48  avgt   15  0.890 ± 0.025  ns/op
>> Characters.isDigit 1632  avgt   15  2.174 ± 0.011  ns/op
>> 
>> 
>> Would be interesting to check the performance on non-Intel architectures. If 
>> you want to give it a spin on your M1, here's the benchmark command I used:
>> 
>> `make test TEST='micro:java.lang.Characters.isDigit' MICRO="OPTIONS=-p 
>> codePoint=48,1632"`
>
>> It seems reasonable to keep these two in sync, yes. (`CharacterData.of` 
>> could even call into `StringLatin1.canEncode`, unless that's cause for some 
>> performance anomaly)
> 
> If I update `StringLatin1.canEncode` and call into that from 
> `CharacterData.of`, I observe no regression for the Latin1 case, but a 
> significant regression for the non-Latin1 case. I have no idea how to explain 
> that:
> 
> 
> Benchmark   (codePoint)  Mode  Cnt  Score   Error  Units
> Characters.isDigit   48  avgt   15  0.675 ± 0.029  ns/op
> Characters.isDigit 1632  avgt   15  2.435 ± 0.032  ns/op

Can you check what happen adding much more inputs to the dataset that includes 
non-latin chars as well and use `-prof perfnorm` to check what `perf` report re 
branches/branch-misses?

You can use SplittableRandom to pre-populate an array of inputs which sequence 
is "random" but still allow deterministic benchmarking and feed the benchmark 
method cycling the pre-computed inputs

-

PR: https://git.openjdk.org/jdk/pull/13040


Re: RFR: 8304245: Speed up CharacterData.of by avoiding bit shifting in the latin1 fast-path test

2023-03-15 Thread Eirik Bjorsnos
On Wed, 15 Mar 2023 12:15:39 GMT, Eirik Bjorsnos  wrote:

>> It seems reasonable to keep these two in sync, yes. (`CharacterData.of` 
>> could even call into `StringLatin1.canEncode`, unless that's cause for some 
>> performance anomaly)
>
>> `if (ch && 0xFF00 == 0) {`
> 
> This seems to perform similar to baseline:
> 
> 
> Benchmark   (codePoint)  Mode  Cnt  Score   Error  Units
> Characters.isDigit   48  avgt   15  0.890 ± 0.025  ns/op
> Characters.isDigit 1632  avgt   15  2.174 ± 0.011  ns/op
> 
> 
> Would be interesting to check the performance on non-Intel architectures. If 
> you want to give it a spin on your M1, here's the benchmark command I used:
> 
> `make test TEST='micro:java.lang.Characters.isDigit' MICRO="OPTIONS=-p 
> codePoint=48,1632"`

> It seems reasonable to keep these two in sync, yes. (`CharacterData.of` could 
> even call into `StringLatin1.canEncode`, unless that's cause for some 
> performance anomaly)

If I update `StringLatin1.canEncode` and call into that from 
`CharacterData.of`, I observe no regression for the Latin1 case, but a 
significant regression for the non-Latin1 case. I have no idea how to explain 
that:


Benchmark   (codePoint)  Mode  Cnt  Score   Error  Units
Characters.isDigit   48  avgt   15  0.675 ± 0.029  ns/op
Characters.isDigit 1632  avgt   15  2.435 ± 0.032  ns/op

-

PR: https://git.openjdk.org/jdk/pull/13040


Re: RFR: 8304245: Speed up CharacterData.of by avoiding bit shifting in the latin1 fast-path test

2023-03-15 Thread Eirik Bjorsnos
On Wed, 15 Mar 2023 12:13:00 GMT, Claes Redestad  wrote:

>> Btw, I think we can do the same for `StringLatin1.canEncode()`
>
> It seems reasonable to keep these two in sync, yes. (`CharacterData.of` could 
> even call into `StringLatin1.canEncode`, unless that's cause for some 
> performance anomaly)

> `if (ch && 0xFF00 == 0) {`

This seems to perform similar to baseline:


Benchmark   (codePoint)  Mode  Cnt  Score   Error  Units
Characters.isDigit   48  avgt   15  0.890 ± 0.025  ns/op
Characters.isDigit 1632  avgt   15  2.174 ± 0.011  ns/op


Would be interesting to check the performance on non-Intel architectures. If 
you want to give it a spin on your M1, here's the benchmark command I used:

`make test TEST='micro:java.lang.Characters.isDigit' MICRO="OPTIONS=-p 
codePoint=48,1632"`

-

PR: https://git.openjdk.org/jdk/pull/13040


Re: RFR: 8304245: Speed up CharacterData.of by avoiding bit shifting in the latin1 fast-path test

2023-03-15 Thread Claes Redestad
On Wed, 15 Mar 2023 12:07:12 GMT, Sergey Tsypanov  wrote:

>> src/java.base/share/classes/java/lang/CharacterData.java line 72:
>> 
>>> 70: 
>>> 71: static final CharacterData of(int ch) {
>>> 72: if (ch >= 0 && ch <= 0xFF) { // fast-path
>> 
>> Maybereducing to a single branch with a mask op helps further? Or maybe the 
>> JIT already effectively does that:
>> 
>> `if (ch && 0xFF00 == 0) {`
>
> Btw, I think we can do the same for `StringLatin1.canEncode()`

It seems reasonable to keep these two in sync, yes. (`CharacterData.of` could 
even call into `StringLatin1.canEncode`, unless that's cause for some 
performance anomaly)

-

PR: https://git.openjdk.org/jdk/pull/13040


Re: RFR: 8304245: Speed up CharacterData.of by avoiding bit shifting in the latin1 fast-path test

2023-03-15 Thread Sergey Tsypanov
On Wed, 15 Mar 2023 11:58:14 GMT, Claes Redestad  wrote:

>> By avoiding a bit shift operation for the latin1 fast-path test, we can 
>> speed up the `java.lang.CharacterData.of` method by ~25% for latin1 code 
>> points.
>> 
>> The latin1 test is currently implemented as `ch >>> 8 == 0`.  We can replace 
>> this with `ch >= 0 && ch <= 0xFF` for a noticable performance gain 
>> (especially for Latin1 code points):
>> 
>> This method is called frequently by various property-determining methods in 
>> `java.lang.Character` like `isLowerCase`, `isDigit` etc, so one should 
>> expect improvements for all these methods.
>> 
>> Performance is tested using the `Characters.isDigit` benchmark using the 
>> digits '0' (decimal 48, in CharacterDataLatin1) and '\u0660' (decimal 1632, 
>> in CharacterData00):
>> 
>> Baseline:
>> 
>> 
>> Benchmark   (codePoint)  Mode  Cnt  Score   Error  Units
>> Characters.isDigit   48  avgt   15  0.870 ± 0.011  ns/op
>> Characters.isDigit 1632  avgt   15  2.168 ± 0.017  ns/op
>> 
>> PR:
>> 
>> 
>> Benchmark   (codePoint)  Mode  Cnt  Score   Error  Units
>> Characters.isDigit   48  avgt   15  0.654 ± 0.007  ns/op
>> Characters.isDigit 1632  avgt   15  2.032 ± 0.019  ns/op
>
> src/java.base/share/classes/java/lang/CharacterData.java line 72:
> 
>> 70: 
>> 71: static final CharacterData of(int ch) {
>> 72: if (ch >= 0 && ch <= 0xFF) { // fast-path
> 
> Maybereducing to a single branch with a mask op helps further? Or maybe the 
> JIT already effectively does that:
> 
> `if (ch && 0xFF00 == 0) {`

Btw, I think we can do the same for `StringLatin1.canEncode()`

-

PR: https://git.openjdk.org/jdk/pull/13040


Re: RFR: 8304245: Speed up CharacterData.of by avoiding bit shifting in the latin1 fast-path test

2023-03-15 Thread Claes Redestad
On Wed, 15 Mar 2023 11:26:21 GMT, Eirik Bjorsnos  wrote:

> By avoiding a bit shift operation for the latin1 fast-path test, we can speed 
> up the `java.lang.CharacterData.of` method by ~25% for latin1 code points.
> 
> The latin1 test is currently implemented as `ch >>> 8 == 0`.  We can replace 
> this with `ch >= 0 && ch <= 0xFF` for a noticable performance gain 
> (especially for Latin1 code points):
> 
> This method is called frequently by various property-determining methods in 
> `java.lang.Character` like `isLowerCase`, `isDigit` etc, so one should expect 
> improvements for all these methods.
> 
> Performance is tested using the `Characters.isDigit` benchmark using the 
> digits '0' (decimal 48, in CharacterDataLatin1) and '\u0660' (decimal 1632, 
> in CharacterData00):
> 
> Baseline:
> 
> 
> Benchmark   (codePoint)  Mode  Cnt  Score   Error  Units
> Characters.isDigit   48  avgt   15  0.870 ± 0.011  ns/op
> Characters.isDigit 1632  avgt   15  2.168 ± 0.017  ns/op
> 
> PR:
> 
> 
> Benchmark   (codePoint)  Mode  Cnt  Score   Error  Units
> Characters.isDigit   48  avgt   15  0.654 ± 0.007  ns/op
> Characters.isDigit 1632  avgt   15  2.032 ± 0.019  ns/op

Nice one!

src/java.base/share/classes/java/lang/CharacterData.java line 72:

> 70: 
> 71: static final CharacterData of(int ch) {
> 72: if (ch >= 0 && ch <= 0xFF) { // fast-path

Maybereducing to a single branch with a mask op helps further? Or maybe the JIT 
already effectively does that:

`if (ch && 0xFF00 == 0) {`

-

Marked as reviewed by redestad (Reviewer).

PR: https://git.openjdk.org/jdk/pull/13040


Re: RFR: 8304245: Speed up CharacterData.of by avoiding bit shifting in the latin1 fast-path test

2023-03-15 Thread Sergey Tsypanov
On Wed, 15 Mar 2023 11:26:21 GMT, Eirik Bjorsnos  wrote:

> By avoiding a bit shift operation for the latin1 fast-path test, we can speed 
> up the `java.lang.CharacterData.of` method by ~25% for latin1 code points.
> 
> The latin1 test is currently implemented as `ch >>> 8 == 0`.  We can replace 
> this with `ch >= 0 && ch <= 0xFF` for a noticable performance gain 
> (especially for Latin1 code points):
> 
> This method is called frequently by various property-determining methods in 
> `java.lang.Character` like `isLowerCase`, `isDigit` etc, so one should expect 
> improvements for all these methods.
> 
> Performance is tested using the `Characters.isDigit` benchmark using the 
> digits '0' (decimal 48, in CharacterDataLatin1) and '\u0660' (decimal 1632, 
> in CharacterData00):
> 
> Baseline:
> 
> 
> Benchmark   (codePoint)  Mode  Cnt  Score   Error  Units
> Characters.isDigit   48  avgt   15  0.870 ± 0.011  ns/op
> Characters.isDigit 1632  avgt   15  2.168 ± 0.017  ns/op
> 
> PR:
> 
> 
> Benchmark   (codePoint)  Mode  Cnt  Score   Error  Units
> Characters.isDigit   48  avgt   15  0.654 ± 0.007  ns/op
> Characters.isDigit 1632  avgt   15  2.032 ± 0.019  ns/op

Marked as reviewed by stsypanov (Author).

-

PR: https://git.openjdk.org/jdk/pull/13040


RFR: 8304245: Speed up CharacterData.of by avoiding bit shifting in the latin1 fast-path test

2023-03-15 Thread Eirik Bjorsnos
By avoiding a bit shift operation for the latin1 fast-path test, we can speed 
up the `java.lang.CharacterData.of` method by ~25% for latin1 code points.

The latin1 test is currently implemented as `ch >>> 8 == 0`.  We can replace 
this with `ch >= 0 && ch <= 0xFF` for a noticable performance gain (especially 
for ASCII chars):

This methods is called frequently by various property-determining methods in 
`java.lang.Character` like `isLowerCase`, `isDigit` etc, so one should expect 
improvements for all these methods.

Performance is tested using the `Characters.isDigit` benchmark using the digits 
'0' (decimal 48, in CharacterDataLatin1) and '\u0660' (decimal 1632, in 
CharacterData00):

Baseline:


Benchmark   (codePoint)  Mode  Cnt  Score   Error  Units
Characters.isDigit   48  avgt   15  0.870 ± 0.011  ns/op
Characters.isDigit 1632  avgt   15  2.168 ± 0.017  ns/op

PR:


Benchmark   (codePoint)  Mode  Cnt  Score   Error  Units
Characters.isDigit   48  avgt   15  0.654 ± 0.007  ns/op
Characters.isDigit 1632  avgt   15  2.032 ± 0.019  ns/op

-

Commit messages:
 - Update copyright year
 - Avoid bit shifting in the CharacterDataLatin1 fast-path

Changes: https://git.openjdk.org/jdk/pull/13040/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk=13040=00
  Issue: https://bugs.openjdk.org/browse/JDK-8304245
  Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/13040.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/13040/head:pull/13040

PR: https://git.openjdk.org/jdk/pull/13040