Re: RFR: 8247546: Pattern matching does not skip correctly over supplementary characters

Joe Wang Tue, 28 Jul 2020 10:22:14 -0700

Hi Naoto,

The new webrev looks good.


Best,
Joe

On 7/27/20 7:11 PM, [email protected] wrote:

Hi Joe,

Thank you for the review!

On 7/27/20 6:04 PM, Joe Wang wrote:
Hi Naoto,
Looks good to me, using the correct supplementary-awareimplementation is an improvement for the impl to handle the situation.
A note (probably a workaround) for the user's case, to replaceinvalid surrogate characters, he could have used Unicode escapesequences [\ud800-\udfff] instead.
Yes.
For the test, I wonder if it'd be better to combine the 1st and 2nd,3rd and 4th cases into: xxx\udca9\ud83d\udca9\ud83dyyy to representtwo lone surrogates plus a valid pair.
If we combine test strings into the one mentioned above, I think itcannot distinguish the case we wanted to test. They all match with thefirst unpaired surrogate, i.e., \udca9.
A minor comment on the comments of the test cases (inSupplementaryTestCases.txt): instead of repeating the bug id (thatis: // @bug 8247546), it may be good to provide a short note (e.g.match lone/invalid surrogates).
Right, there is no reason to leave bug id since it can be found fromhg log. Instead, I replaced them with more useful short notes.
Revised webrev:
https://cr.openjdk.java.net/~naoto/8247546/webrev.02/

Naoto
Regards,
Joe

On 7/27/20 2:18 PM, [email protected] wrote:
On 7/27/20 11:51 AM, [email protected] wrote:
Apart from the original issue, there was a bug in Range() method(Pattern.java:5795), so it is fixed along.
Created a test case for this:

--- a/test/jdk/java/util/regex/SupplementaryTestCases.txt
+++ b/test/jdk/java/util/regex/SupplementaryTestCases.txt
@@ -149,6 +149,11 @@
 \ud83d\udca9
 false 0

+// @bug 8247546
+[\x{dc00}-\x{dfff}]
+\ud83d\udca9
+false 0
+
 // use of x modifier
 \ud800\udc61bc(?x)bl\ud800\udc61h
 \ud800\udc61bcbl\ud800\udc61h
Low surrogate range check falls into using BmpCharPredicate, whichresults in the same bug. The entire webrev is also revised:
http://cr.openjdk.java.net/~naoto/8247546/webrev.01/

Naoto

Re: RFR: 8247546: Pattern matching does not skip correctly over supplementary characters

Reply via email to