Withdrawn: 8302872: Speed up StringLatin1.regionMatchesCI_UTF16

2023-05-09 Thread duke
On Sat, 18 Feb 2023 18:22:49 GMT, Eirik Bjorsnos  wrote:

> This PR continues the efforts from #12632 to speed up case-insensitive string 
> matching.
> 
> We now tackle case-insensitive comparison of mixed-coder strings, implemented 
> in `StringLatin1.regionMatchesCI_UTF16`
> 
> Key insights:
> 
> - If the UTF16 code point is also in latin1 range, we can leverage 
> improvements from 12632 directly by calling 
> `CharacterDataLatin1.equalsIgnoreCase`
> - There are exactly 7 non-latin1 Unicode code points which case fold into the 
> latin1 range. We can special-case our comparison of these code points by 
> adding the method `CharacterDataLatin1.latin1CaseFold`.
> - To avoid checking of `a == b` twice, this check is lifted out of 
> `CharacterDataLatin1.equalsIgnoreCase` and the two callers are updated to 
> check that `a != b` before calling the method. 
>  
> For completeness, the RegionMatches test is updated to also compare Turkic 
> dotted/dotless 'i's against the uppercase ASCII 'I', not just the lowercase 
> one.  Not stricktly related to the purpose of this PR, but it did help catch 
> a regression introduced in an earlier iteration of the PR.   
> 
> To guard against regressions caused by future changes to the set of Unicode 
> code points folding into latin1, a new test is added to `EqualsIgnoreCase` 
> which identifies all such code points and verifies they are compared correcty.
> 
> Performance is tested for matching and mismatching cases of selected code 
> point pairs picked from the ASCII letter, ASCII number, latin1 letter and 
> non-latin Unicode letter ranges. Results in the first comment below.

This pull request has been closed without being integrated.

-

PR: https://git.openjdk.org/jdk/pull/12637


Re: RFR: 8302872: Speed up StringLatin1.regionMatchesCI_UTF16 [v2]

2023-03-14 Thread Eirik Bjorsnos
> This PR continues the efforts from #12632 to speed up case-insensitive string 
> matching.
> 
> We now tackle case-insensitive comparison of mixed-coder strings, implemented 
> in `StringLatin1.regionMatchesCI_UTF16`
> 
> Key insights:
> 
> - If the UTF16 code point is also in latin1 range, we can leverage 
> improvements from 12632 directly by calling 
> `CharacterDataLatin1.equalsIgnoreCase`
> - There are exactly 7 non-latin1 Unicode code points which case fold into the 
> latin1 range. We can special-case our comparison of these code points by 
> adding the method `CharacterDataLatin1.latin1CaseFold`.
> - To avoid checking of `a == b` twice, this check is lifted out of 
> `CharacterDataLatin1.equalsIgnoreCase` and the two callers are updated to 
> check that `a != b` before calling the method. 
>  
> For completeness, the RegionMatches test is updated to also compare Turkic 
> dotted/dotless 'i's against the uppercase ASCII 'I', not just the lowercase 
> one.  Not stricktly related to the purpose of this PR, but it did help catch 
> a regression introduced in an earlier iteration of the PR.   
> 
> To guard against regressions caused by future changes to the set of Unicode 
> code points folding into latin1, a new test is added to `EqualsIgnoreCase` 
> which identifies all such code points and verifies they are compared correcty.
> 
> Performance is tested for matching and mismatching cases of selected code 
> point pairs picked from the ASCII letter, ASCII number, latin1 letter and 
> non-latin Unicode letter ranges. Results in the first comment below.

Eirik Bjorsnos has updated the pull request with a new target base due to a 
merge or a rebase. The pull request now contains 24 commits:

 - Merge branch 'master' into regionmatches-mixed-speedup
 - Inline local variable
 - latin1CaseFold was moved to CharacterDataLatin1
 - Move latin1CaseFold to CharacterDataLatin1
 - Improve latin1CaseFold javadocs
 - Simplify comments
 - Prefer fast matching by comparing for equality before checking latin1 range
 - Improve Javadocs of latin1CaseFold
 - Be consistent in comments
 - CharacterData.latin1LowerCase was renamed to latin1CaseFold
 - ... and 14 more: https://git.openjdk.org/jdk/compare/6d30bbe6...2340f8b5

-

Changes: https://git.openjdk.org/jdk/pull/12637/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk=12637=01
  Stats: 169 lines in 5 files changed: 155 ins; 2 del; 12 mod
  Patch: https://git.openjdk.org/jdk/pull/12637.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/12637/head:pull/12637

PR: https://git.openjdk.org/jdk/pull/12637


Re: RFR: 8302872: Speed up StringLatin1.regionMatchesCI_UTF16

2023-02-28 Thread Eirik Bjorsnos
On Sat, 18 Feb 2023 18:22:49 GMT, Eirik Bjorsnos  wrote:

> This PR continues the efforts from #12632 to speed up case-insensitive string 
> matching.
> 
> We now tackle case-insensitive comparison of mixed-coder strings, implemented 
> in `StringLatin1.regionMatchesCI_UTF16`
> 
> Key insights:
> 
> - If the UTF16 code point is also in latin1 range, we can leverage 
> improvements from 12632 directly by calling 
> `CharacterDataLatin1.equalsIgnoreCase`
> - There are exactly 7 non-latin1 Unicode code points which case fold into the 
> latin1 range. We can special-case our comparison of these code points by 
> adding the method `CharacterDataLatin1.latin1CaseFold`.
> - To avoid checking of `a == b` twice, this check is lifted out of 
> `CharacterDataLatin1.equalsIgnoreCase` and the two callers are updated to 
> check that `a != b` before calling the method. 
>  
> For completeness, the RegionMatches test is updated to also compare Turkic 
> dotted/dotless 'i's against the uppercase ASCII 'I', not just the lowercase 
> one.  Not stricktly related to the purpose of this PR, but it did help catch 
> a regression introduced in an earlier iteration of the PR.   
> 
> To guard against regressions caused by future changes to the set of Unicode 
> code points folding into latin1, a new test is added to `EqualsIgnoreCase` 
> which identifies all such code points and verifies they are compared correcty.
> 
> Performance is tested for matching and mismatching cases of selected code 
> point pairs picked from the ASCII letter, ASCII number, latin1 letter and 
> non-latin Unicode letter ranges. Results in the first comment below.

Benchmark results:

Baseline:


Benchmark (codePoints)  (size)  Mode  Cnt  
Score Error  Units
RegionMatchesIC.Mixed.regionMatchesIC  ascii-match1024  avgt   15   
1497.391 ±  22.350  ns/op
RegionMatchesIC.Mixed.regionMatchesIC   ascii-mismatch1024  avgt   15  
5.346 ±   0.165  ns/op
RegionMatchesIC.Mixed.regionMatchesIC number-match1024  avgt   15
364.034 ±   5.561  ns/op
RegionMatchesIC.Mixed.regionMatchesIC  number-mismatch1024  avgt   15  
4.036 ±   0.171  ns/op
RegionMatchesIC.Mixed.regionMatchesIC   lat1-match1024  avgt   15   
2674.043 ± 174.669  ns/op
RegionMatchesIC.Mixed.regionMatchesIClat1-mismatch1024  avgt   15  
6.493 ±   0.230  ns/op
RegionMatchesIC.Mixed.regionMatchesIC  utf16-match1024  avgt   15  
12630.314 ± 212.472  ns/op
RegionMatchesIC.Mixed.regionMatchesIC   utf16-mismatch1024  avgt   15 
14.796 ±   0.359  ns/op



PR:


Benchmark (codePoints)  (size)  Mode  Cnt 
ScoreError  Units
RegionMatchesIC.Mixed.regionMatchesIC  ascii-match1024  avgt   15  
1449.499 ± 14.350  ns/op
RegionMatchesIC.Mixed.regionMatchesIC   ascii-mismatch1024  avgt   15 
3.450 ±  0.082  ns/op
RegionMatchesIC.Mixed.regionMatchesIC number-match1024  avgt   15   
362.582 ±  2.963  ns/op
RegionMatchesIC.Mixed.regionMatchesIC  number-mismatch1024  avgt   15 
3.259 ±  0.021  ns/op
RegionMatchesIC.Mixed.regionMatchesIC   lat1-match1024  avgt   15  
1625.513 ± 14.305  ns/op
RegionMatchesIC.Mixed.regionMatchesIClat1-mismatch1024  avgt   15 
3.858 ±  0.027  ns/op
RegionMatchesIC.Mixed.regionMatchesIC  utf16-match1024  avgt   15  
1422.722 ± 85.581  ns/op
RegionMatchesIC.Mixed.regionMatchesIC   utf16-mismatch1024  avgt   15 
3.756 ±  0.089  ns/op

-

PR: https://git.openjdk.org/jdk/pull/12637


RFR: 8302872: Speed up StringLatin1.regionMatchesCI_UTF16

2023-02-28 Thread Eirik Bjorsnos
This PR continues the efforts from #12632 to speed up case-insensitive string 
matching.

We now tackle case-insensitive comparison of mixed-coder strings, implemented 
in `StringLatin1.regionMatchesCI_UTF16`

Key insights:

- If the UTF16 code point is also in latin1 range, we can leverage improvements 
from 12632 directly by calling `CharacterDataLatin1.equalsIgnoreCase`
- There are exactly 7 non-latin1 Unicode code points which case fold into the 
latin1 range. We can special-case our comparison of these code points by adding 
the method `CharacterDataLatin1.latin1CaseFold`.
- To avoid checking of `a == b` twice, this check is lifted out of 
`CharacterDataLatin1.equalsIgnoreCase` and the two callers are updated to check 
that `a != b` before calling the method. 
 
For completeness, the RegionMatches test is updated to also compare Turkic 
dotted/dotless 'i's against the uppercase ASCII 'I', not just the lowercase 
one.  Not stricktly related to the purpose of this PR, but it did help catch a 
regression introduced in an earlier iteration of the PR.   

To guard against regressions caused by future changes to the set of Unicode 
code points folding into latin1, a new test is added to `EqualsIgnoreCase` 
which identifies all such code points and verifies they are compared correcty.

Performance is tested for matching and mismatching cases of selected code point 
pairs picked from the ASCII letter, ASCII number, latin1 letter and non-latin 
Unicode letter ranges. Results in the first comment below.

-

Commit messages:
 - Inline local variable
 - latin1CaseFold was moved to CharacterDataLatin1
 - Move latin1CaseFold to CharacterDataLatin1
 - Improve latin1CaseFold javadocs
 - Simplify comments
 - Prefer fast matching by comparing for equality before checking latin1 range
 - Improve Javadocs of latin1CaseFold
 - Be consistent in comments
 - CharacterData.latin1LowerCase was renamed to latin1CaseFold
 - Hoist equality check out of CharacterDataLatin1.equalsIgnoreCase
 - ... and 13 more: https://git.openjdk.org/jdk/compare/f2b03f9a...92755920

Changes: https://git.openjdk.org/jdk/pull/12637/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk=12637=00
  Issue: https://bugs.openjdk.org/browse/JDK-8302872
  Stats: 169 lines in 5 files changed: 155 ins; 2 del; 12 mod
  Patch: https://git.openjdk.org/jdk/pull/12637.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/12637/head:pull/12637

PR: https://git.openjdk.org/jdk/pull/12637


Re: Speed up StringLatin1.regionMatchesCI_UTF16

2023-02-20 Thread Claes Redestad
RFE filed: https://bugs.openjdk.org/browse/JDK-8302872

/Claes

18 feb. 2023 kl. 19:58 skrev Eirik Bjørsnøs 
mailto:eir...@gmail.com>>:

Hi,

This PR continues the effort to speed up case-insensitive string comparisons, 
this time tackling comparison of latin1-coded strings with utf16-coded strings:

https://github.com/openjdk/jdk/pull/12637

This builds on top of #12632, it makes sense to review that one first.

Thanks,
Eirik.



Speed up StringLatin1.regionMatchesCI_UTF16

2023-02-18 Thread Eirik Bjørsnøs
Hi,

This PR continues the effort to speed up case-insensitive string
comparisons, this time tackling comparison of latin1-coded strings with
utf16-coded strings:

https://github.com/openjdk/jdk/pull/12637

This builds on top of #12632, it makes sense to review that one first.

Thanks,
Eirik.