> On Sep 19, 2018, at 7:21 PM, Stuart Marks <stuart.ma...@oracle.com> wrote:
>
>
>
> On 9/18/18 10:51 AM, Jim Laskey wrote:
>> Please review the code for String::unescape. Used to translate escape
>> sequences in a string, typically in a raw string literal, into characters
>> represented by those escapes.
>> webrev: http://cr.openjdk.java.net/~jlaskey/8202442/webrev/index.html
>> jbs: https://bugs.openjdk.java.net/browse/JDK-8202442
>> csr: https://bugs.openjdk.java.net/browse/JDK-8202443
>
> Hi Jim,
>
> For citing the JLS, there's a @jls javadoc tag that you might want to use.
> There are a couple usages elsewhere in String.java already.
Will add.
>
> Is there going to be an escape() method that does the inverse of this? I
> thought that this was part of your original suite of string enhancements.
> Will this be proposed separately, or is it unnecessary?
The general feeling is that it is unnecessary. The inverse method is also
fraught with danger; too many decision points on various characters. Ex.does
‘\r’ translate to ‘\r’ or '\013’ or `\u000D`, does ‘\0’ translate to ‘\0’
or’\u0000’.
>
>
> 2979 * Each unicode escape in the form \unnnn is translated to the
> 2980 * unicode character whose code point is {@code 0xnnnn}. Care should
> be
> 2981 * taken when using UTF-16 surrogate pairs to ensure that the high
> 2982 * surrogate (U+D800..U+DBFF) is immediately followed by a low
> surrogate
> 2983 * (U+DC00..U+DFFF) otherwise a
> 2984 * {@link java.nio.charset.CharacterCodingException} may occur
> during UTF-8
> 2985 * decoding.
>
>
> I know you're going to update this based on Naoto's comments, but I'd suggest
> rethinking this section. The \unnnn construct is called a "Unicode escape"
> per JLS 3.3, but how it's handled has little to do with Unicode. The nnnn
> digits are simply translated into a 16-bit 'char' value. Any such value will
> work, even if it's an invalid UTF-16 code unit (such as 0xFFF0) or an
> unpaired surrogate.
>
> I believe this is consistent with the JLS treatment of \unnnn.
>
> It might be sufficient to say that \unnnn is translated into a 16-bit 'char'
> value, and leave it at that.
Sure.
>
> s'marks