Suspicious duplicate condition in java.util.regex.Grapheme#isExcludedSpacingMark
Hello. I found suspicious condition in the method java.util.regex.Grapheme#isExcludedSpacingMark It's detected by IntelliJ IDEA inspection 'Condition is covered by further condition' https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/util/regex/Grapheme.java#L157 ``` private static boolean isExcludedSpacingMark(int cp) { return cp == 0x102B || cp == 0x102C || cp == 0x1038 || cp >= 0x1062 && cp <= 0x1064 || cp >= 0x1062 && cp <= 0x106D || // <== here is the warning cp == 0x1083 || cp >= 0x1087 && cp <= 0x108C || cp == 0x108F || cp >= 0x109A && cp <= 0x109C || cp == 0x1A61 || cp == 0x1A63 || cp == 0x1A64 || cp == 0xAA7B || cp == 0xAA7D; } ``` There are 2 sub-conditions in this complex condition: cp >= 0x1062 && cp <= 0x1064 || cp >= 0x1062 && cp <= 0x106D || The second condition is _wider_ than the first one. I believe it's a bug. The second condition (according to https://www.compart.com/en/unicode/category/Mc) should look like this: cp >= 0x1067 && cp <= 0x106D || 0x1065, 0x1066 are not from the Spacing_Mark category. Andrey Turbanov
Re: Suspicious duplicate condition in java.util.regex.Grapheme#isExcludedSpacingMark
Bug submitted on your behalf. https://bugs.openjdk.java.net/browse/JDK-8273430 > On Sep 6, 2021, at 4:16 AM, Andrey Turbanov wrote: > > Hello. > I found suspicious condition in the method > java.util.regex.Grapheme#isExcludedSpacingMark > It's detected by IntelliJ IDEA inspection 'Condition is covered by > further condition' > https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/util/regex/Grapheme.java#L157 > > ``` > private static boolean isExcludedSpacingMark(int cp) { > return cp == 0x102B || cp == 0x102C || cp == 0x1038 || > cp >= 0x1062 && cp <= 0x1064 || > cp >= 0x1062 && cp <= 0x106D || // <== here is the warning > cp == 0x1083 || > cp >= 0x1087 && cp <= 0x108C || > cp == 0x108F || > cp >= 0x109A && cp <= 0x109C || > cp == 0x1A61 || cp == 0x1A63 || cp == 0x1A64 || > cp == 0xAA7B || cp == 0xAA7D; > } > ``` > There are 2 sub-conditions in this complex condition: > cp >= 0x1062 && cp <= 0x1064 || > cp >= 0x1062 && cp <= 0x106D || > > The second condition is _wider_ than the first one. > I believe it's a bug. The second condition (according to > https://www.compart.com/en/unicode/category/Mc) should look like this: > > cp >= 0x1067 && cp <= 0x106D || > > 0x1065, 0x1066 are not from the Spacing_Mark category. > > > Andrey Turbanov
Re: Suspicious duplicate condition in java.util.regex.Grapheme#isExcludedSpacingMark
It does look incorrect. I will take a look. Naoto On 9/6/21 12:16 AM, Andrey Turbanov wrote: Hello. I found suspicious condition in the method java.util.regex.Grapheme#isExcludedSpacingMark It's detected by IntelliJ IDEA inspection 'Condition is covered by further condition' https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/util/regex/Grapheme.java#L157 ``` private static boolean isExcludedSpacingMark(int cp) { return cp == 0x102B || cp == 0x102C || cp == 0x1038 || cp >= 0x1062 && cp <= 0x1064 || cp >= 0x1062 && cp <= 0x106D || // <== here is the warning cp == 0x1083 || cp >= 0x1087 && cp <= 0x108C || cp == 0x108F || cp >= 0x109A && cp <= 0x109C || cp == 0x1A61 || cp == 0x1A63 || cp == 0x1A64 || cp == 0xAA7B || cp == 0xAA7D; } ``` There are 2 sub-conditions in this complex condition: cp >= 0x1062 && cp <= 0x1064 || cp >= 0x1062 && cp <= 0x106D || The second condition is _wider_ than the first one. I believe it's a bug. The second condition (according to https://www.compart.com/en/unicode/category/Mc) should look like this: cp >= 0x1067 && cp <= 0x106D || 0x1065, 0x1066 are not from the Spacing_Mark category. Andrey Turbanov
RFR: 8273430: Suspicious duplicate condition in java.util.regex.Grapheme#isExcludedSpacingMark
The duplicate condition in this chain of expressions needs to be shrunk to drop a couple of character that are not excluded spacing marks. - Commit messages: - 8273430: Suspicious duplicate condition in java.util.regex.Grapheme#isExcludedSpacingMark Changes: https://git.openjdk.java.net/jdk/pull/5425/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5425&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273430 Stats: 10 lines in 2 files changed: 7 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/5425.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5425/head:pull/5425 PR: https://git.openjdk.java.net/jdk/pull/5425
Integrated: 8273430: Suspicious duplicate condition in java.util.regex.Grapheme#isExcludedSpacingMark
On Wed, 8 Sep 2021 20:24:31 GMT, Ian Graves wrote: > The duplicate condition in this chain of expressions needs to be shrunk to > drop a couple of character that are not excluded spacing marks. This pull request has now been integrated. Changeset: 3d9dc8f8 Author:Ian Graves URL: https://git.openjdk.java.net/jdk/commit/3d9dc8f824abf597d9b28f456cfeb5af927221b8 Stats: 539 lines in 4 files changed: 147 ins; 388 del; 4 mod 8273430: Suspicious duplicate condition in java.util.regex.Grapheme#isExcludedSpacingMark Reviewed-by: naoto - PR: https://git.openjdk.java.net/jdk/pull/5425
Re: RFR: 8273430: Suspicious duplicate condition in java.util.regex.Grapheme#isExcludedSpacingMark
On Wed, 8 Sep 2021 20:24:31 GMT, Ian Graves wrote: > The duplicate condition in this chain of expressions needs to be shrunk to > drop a couple of character that are not excluded spacing marks. The copyright year in Grapheme.java should be 2021, otherwise looks good. - Marked as reviewed by naoto (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5425
Re: RFR: 8273430: Suspicious duplicate condition in java.util.regex.Grapheme#isExcludedSpacingMark [v2]
> The duplicate condition in this chain of expressions needs to be shrunk to > drop a couple of character that are not excluded spacing marks. Ian Graves has updated the pull request incrementally with one additional commit since the last revision: Refactoring test to whitebox - Changes: - all: https://git.openjdk.java.net/jdk/pull/5425/files - new: https://git.openjdk.java.net/jdk/pull/5425/files/ed2e4d2a..c60523af Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5425&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5425&range=00-01 Stats: 545 lines in 4 files changed: 147 ins; 395 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/5425.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5425/head:pull/5425 PR: https://git.openjdk.java.net/jdk/pull/5425
Re: RFR: 8273430: Suspicious duplicate condition in java.util.regex.Grapheme#isExcludedSpacingMark [v2]
On Fri, 10 Sep 2021 20:57:34 GMT, Ian Graves wrote: >> The duplicate condition in this chain of expressions needs to be shrunk to >> drop a couple of character that are not excluded spacing marks. > > Ian Graves has updated the pull request incrementally with one additional > commit since the last revision: > > Refactoring test to whitebox Updating the tests here. Existing ones included code copy/pasted directly from the unit under test. Refactored to a whitebox-style testing that injects accessors into `java.util.regex` so we don't have to modify code in two places to keep it in sync. Also relaxed two private static methods in `java.util.regex.Grapheme` to package-private so the injected accessor class can reach them and expose them for the test. - PR: https://git.openjdk.java.net/jdk/pull/5425
Re: RFR: 8273430: Suspicious duplicate condition in java.util.regex.Grapheme#isExcludedSpacingMark [v2]
On Fri, 10 Sep 2021 20:57:34 GMT, Ian Graves wrote: >> The duplicate condition in this chain of expressions needs to be shrunk to >> drop a couple of character that are not excluded spacing marks. > > Ian Graves has updated the pull request incrementally with one additional > commit since the last revision: > > Refactoring test to whitebox Looks good. Nice cleanup! - Marked as reviewed by naoto (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5425