Re: RFR - JDK-8202442 - String::unescape (Code Review)

Jim Laskey Thu, 20 Sep 2018 05:52:01 -0700

> On Sep 19, 2018, at 7:21 PM, Stuart Marks <stuart.ma...@oracle.com> wrote:
> 
> 
> 
> On 9/18/18 10:51 AM, Jim Laskey wrote:
>> Please review the code for String::unescape. Used to translate escape 
>> sequences in a string, typically in a raw string literal, into characters 
>> represented by those escapes.
>> webrev: http://cr.openjdk.java.net/~jlaskey/8202442/webrev/index.html
>> jbs: https://bugs.openjdk.java.net/browse/JDK-8202442
>> csr: https://bugs.openjdk.java.net/browse/JDK-8202443
> 
> Hi Jim,
> 
> For citing the JLS, there's a @jls javadoc tag that you might want to use. 
> There are a couple usages elsewhere in String.java already.

Will add.

> 
> Is there going to be an escape() method that does the inverse of this? I 
> thought that this was part of your original suite of string enhancements. 
> Will this be proposed separately, or is it unnecessary?

The general feeling is that it is unnecessary. The inverse method is also 
fraught with danger; too many decision points on various characters. Ex.does 
‘\r’ translate to ‘\r’ or '\013’ or `\u000D`, does ‘\0’ translate to ‘\0’ 
or’\u0000’.

> 
> 
> 2979      * Each unicode escape in the form \unnnn is translated to the
> 2980      * unicode character whose code point is {@code 0xnnnn}. Care should 
> be
> 2981      * taken when using UTF-16 surrogate pairs to ensure that the high
> 2982      * surrogate (U+D800..U+DBFF) is immediately followed by a low 
> surrogate
> 2983      * (U+DC00..U+DFFF) otherwise a
> 2984      * {@link java.nio.charset.CharacterCodingException} may occur 
> during UTF-8
> 2985      * decoding.
> 
> 
> I know you're going to update this based on Naoto's comments, but I'd suggest 
> rethinking this section. The \unnnn construct is called a "Unicode escape" 
> per JLS 3.3, but how it's handled has little to do with Unicode. The nnnn 
> digits are simply translated into a 16-bit 'char' value. Any such value will 
> work, even if it's an invalid UTF-16 code unit (such as 0xFFF0) or an 
> unpaired surrogate.
> 
> I believe this is consistent with the JLS treatment of \unnnn.
> 
> It might be sufficient to say that \unnnn is translated into a 16-bit 'char' 
> value, and leave it at that.

Sure.

> 
> s'marks
Re: RFR - JDK-8202442 - String::unescape (Code Review)

Reply via email to