RE: invalid escape sequences

Dave Fugate Wed, 01 Jun 2011 09:11:02 -0700

Results for IE9 ("IE9 standards" mode) given the snippet below:
        "\r" : "ERROR"
        "\\u" : "ERROR"
        "\\x" : "ERROR"
        "\\8" : "8"
        "\\28" : "\u00028"
        "\\228" : "\u00128"
        "\\3778" : "ÿ8"
        "\\478" : "'8"
        "\\778" : "?8"

My best,

Dave

-----Original Message-----
From: [email protected] [mailto:[email protected]] On 
Behalf Of Mike Samuel
Sent: Tuesday, May 31, 2011 6:34 PM
To: es-discuss
Subject: invalid escape sequences

During the last meeting, the semantics of "\z" came up.  Specifically, what 
does \ followed by a character not in the set with a specified escape expand to?

From 7.8.4 StringLiteral

    "
    EscapeSequence :: CharacterEscapeSequence
    "

leads to

    "
    CharacterEscapeSequence :: ...
        NonEscapeCharacter

    NonEscapeCharacter :: SourceCharacter but not one of EscapeCharacter or 
LineTerminator
    "

and the semantics of NonEscapeCharacter is given thus

    "
    The CV of CharacterEscapeSequence :: NonEscapeCharacter is the CV of the 
NonEscapeCharacter.
    "

so are the following assertions true?

(1)

The only SourceCharacter sequences that do not match ( DoubleStringCharacter | 
SingleStringCharacter ) applied one or more times are a LineTerminator not 
preceded by an odd number of backslashes, "u" not followed by 4 valid hex 
digits and not preceded by an even number of backslashes, "x" not followed by 2 
valid hex digits and not preceded by an even number of backslashes, or a 
decimal digit not preceded by an even number of backslashes.
I.e. 
/(?:^|[^\\])(?:\\\\)*([\r\n\u2028\u2029]|\\u(?![0-9A-Fa-f]{4})|\\x(?![0-9A-Fa-f]{2})|\\[0-9]/
tests whether a sequence of SourceCharacters matches zero or more ( 
DoubleStringCharacter | SingleStringCharacter ).

(2)

The B.1.2 additional octal syntax, quoted below, does change the validity of 
the test above.
    "
    OctalEscapeSequence :: OctalDigit [lookahead not in DecimalDigit]
        ZeroToThree OctalDigit [lookahead not in DecimalDigit]
        FourToSeven OctalDigit
        ZeroToThree OctalDigit OctalDigit
    "

NonEscapeCharacter excludes DecimalDigit through SingleEscapeCharacter but 
OctalEscape allows [0-7].  So under B.1.2, 
/(?:^|[^\\])(?:\\\\)*([\r\n\u2028\u2029]|\\u(?![0-9A-Fa-f]{4})|\\x(?![0-9A-Fa-f]{2}|\\[89]|\\[0-3][0-7]?(?![89])|\\[4-7](?![89]))/
tests whether a sequence of SourceCharacters matches zero or more ( 
DoubleStringCharacter | SingleStringCharacter ).

I did some empirical testing to see what is actually allowed by running the 
below in a variety of browsers in the squarefree shell.

var notStringLiterals = [ "\r", "\\u", "\\x", "\\8", "\\28", "\\228", "\\3778", 
"\\478", "\\778" ]; for (var i = 0; i < notStringLiterals.length; ++i) {
  var result;
  try {
    result = eval('"' + notStringLiterals[i] + '"');
  } catch (ex) {
    result = "ERROR";
  }
  print(JSON.stringify(notStringLiterals[i]) + " : " + JSON.stringify(result)); 
}

All are invalid absent B.1.2 if the assertions above are true.  With B.1.2, 
"\3778", "\478", and "\778" are valid.

I'm having trouble running IE today, but on other browsers, in alphabetical 
order:

Chrome
"\r" : "ERROR"
"\\u" : "u"
"\\x" : "x"
"\\8" : "8"
"\\28" : "\u00028"
"\\228" : "\u00128"
"\\3778" : "ÿ8"
"\\478" : "'8"
"\\778" : "?8"

FF3
"\u000d" : "ERROR"
"\\u" : "u"
"\\x" : "x"
"\\8" : "8"
"\\28" : "\u00028"
"\\228" : "\u00128"
"\\3778" : "ÿ8"
"\\478" : "'8"
"\\778" : "?8"

Safari
"\r" : "ERROR"
"\\u" : "u"
"\\x" : "x"
"\\8" : "8"
"\\28" : "\u00028"
"\\228" : "\u00128"
"\\3778" : "ÿ8"
"\\478" : "'8"
"\\778" : "?8"

So at least 3 different interpreter strains treat "\u" === "u", "\x"
=== "x", "\8" === "8", and don't care whether there is a decimal digit after an 
octal escape sequence.  All reject unescaped newlines in string literals.

I would like to be able to specify quasiliteral literal part decoding in terms 
of the SV defined in 7.8.4.  If user code is going to have decoded literal 
parts available when they validly decode, but at least have access to the raw 
literal parts otherwise, then it would be good for them to be consistently 
available across interpreters.  Would it be worthwhile having the SV and CV in 
7.8.4 specify the decoding of some sourcecharacter sequences that can't 
actually reach the SV or CV from via the StringLiteral production?
_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss
_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss

RE: invalid escape sequences

Reply via email to