On Wed, 16 Feb 2022 18:45:29 GMT, Ian Graves <igra...@openjdk.org> wrote:
> This is a fix in the buggy way CIBackRef traverses unicode characters that > could be variable-length. Originally it followed the approach that BackRef > does, but failed to account for unicode characters that could be 2 > chars-long. The upper bound (groupSize) for the traversing loop is set by the > difference between group start and stop indexes. This works for single char > characters and it also works for case-sensitive comparisons because > byte-by-byte comparisons are acceptable, but it doesn't work for a comparison > where some kind of normalization (i.e. case) is required. This fix adjusts > the upper bound for the loop that traverses the character when a two-char > character is encountered. > > An alternative was to check the length of the group size by scanning the > group in advance and converting to code points, but this could potentially > result in multiple scans and codepoint conversions of the same matcher group > which could be long. The solution that adjusts the loop bounds on the fly > avoids this case. This pull request has now been integrated. Changeset: 3cb38678 Author: Ian Graves <igra...@openjdk.org> URL: https://git.openjdk.java.net/jdk/commit/3cb38678aa7f03356421f5a17c1de4156e206d68 Stats: 25 lines in 2 files changed: 21 ins; 0 del; 4 mod 8281315: Unicode, (?i) flag and backreference throwing IndexOutOfBounds Exception Reviewed-by: naoto ------------- PR: https://git.openjdk.java.net/jdk/pull/7501