On Fri, 26 Jan 2024 15:06:50 GMT, Jim Laskey <jlas...@openjdk.org> wrote:
>> Currently String::translateEscapes does not support unicode escapes, >> reported as a IllegalArgumentException("Invalid escape sequence: ..."). >> String::translateEscapes should translate unicode escape sequences to >> provide full coverage, > > Jim Laskey has updated the pull request with a new target base due to a merge > or a rebase. The incremental webrev excludes the unrelated changes brought in > by the merge/rebase. The pull request contains 12 additional commits since > the last revision: > > - Merge remote-tracking branch 'upstream/master' into 8263261 > - Update unicode to Unicode > - Requested changes > - Update String.java > - Requested changes > - Update Copyright > - Update copyright year of test > - Add JLS Unicode Escapes reference > - Update comment > - Update copyright year > - ... and 2 more: https://git.openjdk.org/jdk/compare/b94b04ff...040bda82 src/java.base/share/classes/java/lang/String.java line 4229: > 4227: * <th scope="row">{@code \u005Cu...uXXXX}</th> > 4228: * <td>Unicode escape</td> > 4229: * <td>single UTF-16 code unit equivalent</td> The `...` makes it less clear what is being shown. It might be clearer to include the XXXX in the resulting value and drop the multiple `u` case. src/java.base/share/classes/java/lang/String.java line 4245: > 4243: * escape sequences and Unicode escapes are translated as > encountered in one pass and > 4244: * <strong>not</strong> done as an Unicode escapes pass followed by > an escape sequences > 4245: * pass. I would move the description of the compiler behavior to the end and remove "also". For example, Suggestion: * @implNote As a convenience for use with constructed * strings, this method translates Unicode escapes. For example, this * method could be used when ASCII encoded text files need to maintain Unicode * content. The translation is done in a single pass and is non-recursive. That is, * escape sequences and Unicode escapes are translated as encountered in one pass and * <strong>not</strong> done as an Unicode escapes pass followed by an escape sequences * pass. By comparison, the compiler translates all Unicode escapes before string * literals are translated. test/jdk/java/lang/String/TranslateEscapes.java line 97: > 95: verifyUnicodeEscape("\\u2022", "\u2022"); > 96: verifyUnicodeEscape("\\ud83c\\udf09", "\ud83c\udf09"); > 97: verifyUnicodeEscape("\\uuuuu2022", "\uuuuu2022"); Include the code from the example as a test case too. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17491#discussion_r1467892757 PR Review Comment: https://git.openjdk.org/jdk/pull/17491#discussion_r1467895901 PR Review Comment: https://git.openjdk.org/jdk/pull/17491#discussion_r1467900516