Re: RFR: 8302871: Speed up StringLatin1.regionMatchesCI [v10]
On Wed, 22 Feb 2023 16:25:41 GMT, Martin Buchholz wrote: >> Eirik Bjorsnos has updated the pull request incrementally with two >> additional commits since the last revision: >> >> - Replace 'oldest ASCII trick in the book' use in toUpperCase, toLowerCase >> with "by removing (setting) a single bit" >> - Align local variable naming in toLowerCase, toUpperCase with >> equalsIgnoreCase by using 'lower' and 'upper' > > test/jdk/java/lang/String/CompactString/EqualsIgnoreCase.java line 89: > >> 87: for (int ab = 0; ab < 256; ab++) { >> 88: for (int bb = 0; bb < 256; bb++) { >> 89: char a = (char) ab, b = (char) bb; > > char is an unsigned numeric type, so cleaner is > > for (char a = 0; a < 256; a++) > for (char b = 0; b < 256; b++) Thanks, fixed. Might have been copied over from processing of code points in the higher planes. Not needed here. - PR: https://git.openjdk.org/jdk/pull/12632
Re: RFR: 8302871: Speed up StringLatin1.regionMatchesCI [v10]
On Wed, 22 Feb 2023 07:11:16 GMT, Eirik Bjorsnos wrote: >> This PR suggests we can speed up `StringLatin1.regionMatchesCI` by applying >> 'the oldest ASCII trick in the book'. >> >> The new static method `CharacterDataLatin1.equalsIgnoreCase` compares two >> latin1 bytes for equality ignoring case. `StringLatin1.regionMatchesCI` is >> updated to use `equalsIgnoreCase` >> >> To verify the correctness of `equalsIgnoreCase`, a new test is added to >> `EqualsIgnoreCase` with an exhaustive verification that all 256x256 latin1 >> code point pairs have an `equalsIgnoreCase` consistent with >> Character.toUpperCase, Character.toLowerCase. >> >> Performance is tested for matching and mismatching cases of code point pairs >> picked from the ASCII letter, ASCII number and latin1 letter ranges. Results >> in the first comment below. > > Eirik Bjorsnos has updated the pull request incrementally with two additional > commits since the last revision: > > - Replace 'oldest ASCII trick in the book' use in toUpperCase, toLowerCase > with "by removing (setting) a single bit" > - Align local variable naming in toLowerCase, toUpperCase with > equalsIgnoreCase by using 'lower' and 'upper' Marked as reviewed by martin (Reviewer). test/jdk/java/lang/String/CompactString/EqualsIgnoreCase.java line 89: > 87: for (int ab = 0; ab < 256; ab++) { > 88: for (int bb = 0; bb < 256; bb++) { > 89: char a = (char) ab, b = (char) bb; char is an unsigned numeric type, so cleaner is for (char a = 0; a < 256; a++) for (char b = 0; b < 256; b++) - PR: https://git.openjdk.org/jdk/pull/12632
Re: RFR: 8302871: Speed up StringLatin1.regionMatchesCI [v10]
On Wed, 22 Feb 2023 07:11:16 GMT, Eirik Bjorsnos wrote: >> This PR suggests we can speed up `StringLatin1.regionMatchesCI` by applying >> 'the oldest ASCII trick in the book'. >> >> The new static method `CharacterDataLatin1.equalsIgnoreCase` compares two >> latin1 bytes for equality ignoring case. `StringLatin1.regionMatchesCI` is >> updated to use `equalsIgnoreCase` >> >> To verify the correctness of `equalsIgnoreCase`, a new test is added to >> `EqualsIgnoreCase` with an exhaustive verification that all 256x256 latin1 >> code point pairs have an `equalsIgnoreCase` consistent with >> Character.toUpperCase, Character.toLowerCase. >> >> Performance is tested for matching and mismatching cases of code point pairs >> picked from the ASCII letter, ASCII number and latin1 letter ranges. Results >> in the first comment below. > > Eirik Bjorsnos has updated the pull request incrementally with two additional > commits since the last revision: > > - Replace 'oldest ASCII trick in the book' use in toUpperCase, toLowerCase > with "by removing (setting) a single bit" > - Align local variable naming in toLowerCase, toUpperCase with > equalsIgnoreCase by using 'lower' and 'upper' Marked as reviewed by redestad (Reviewer). - PR: https://git.openjdk.org/jdk/pull/12632
Re: RFR: 8302871: Speed up StringLatin1.regionMatchesCI [v10]
> This PR suggests we can speed up `StringLatin1.regionMatchesCI` by applying > 'the oldest ASCII trick in the book'. > > The new static method `CharacterDataLatin1.equalsIgnoreCase` compares two > latin1 bytes for equality ignoring case. `StringLatin1.regionMatchesCI` is > updated to use `equalsIgnoreCase` > > To verify the correctness of `equalsIgnoreCase`, a new test is added to > `EqualsIgnoreCase` with an exhaustive verification that all 256x256 latin1 > code point pairs have an `equalsIgnoreCase` consistent with > Character.toUpperCase, Character.toLowerCase. > > Performance is tested for matching and mismatching cases of code point pairs > picked from the ASCII letter, ASCII number and latin1 letter ranges. Results > in the first comment below. Eirik Bjorsnos has updated the pull request incrementally with two additional commits since the last revision: - Replace 'oldest ASCII trick in the book' use in toUpperCase, toLowerCase with "by removing (setting) a single bit" - Align local variable naming in toLowerCase, toUpperCase with equalsIgnoreCase by using 'lower' and 'upper' - Changes: - all: https://git.openjdk.org/jdk/pull/12632/files - new: https://git.openjdk.org/jdk/pull/12632/files/6588ab0f..44d91544 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=12632=09 - incr: https://webrevs.openjdk.org/?repo=jdk=12632=08-09 Stats: 11 lines in 1 file changed: 2 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/12632.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12632/head:pull/12632 PR: https://git.openjdk.org/jdk/pull/12632