According to the ES5 spec a regular expression such as /[\w-_]/ should generate 
a syntax error.  Unfortunately there appears to be a significant quantity of 
existing code that will break if this behavior is implemented (I have been 
experimenting with bringing WebKit's RegExp implementation into closer 
conformance to the spec), and looking at other implementations it appears 
common for this error to be ignored.

The parsing of this expression matches a single NonemptyClassRanges of the form 
"ClassAtom - ClassAtom", where the first ClassAtom is a CharacterClassEscape 
and the second a SourceCharacter.  Per section 15.10.2.15 of the spec this 
calls CharacterRange, resulting in this syntax error:

        1. If A does not contain exactly one character or B does not contain 
exactly one character then throw a SyntaxError exception.

I'd like to propose a minimal change to hopefully allow implementations to come 
into line with the spec, without breaking the web.  I'd suggest changing the 
first step of CharacterRange to instead read:

        1. If A does not contain exactly one character or B does not contain 
exactly one character then create a CharSet AB containing the union of the 
CharSets A and B, and return the union of CharSet AB and the CharSet containing 
the one character -.

This is roughly equivalent to implicitly escaping the hyphen in any invalid 
range*, so /[\w-_]/ is treated as /[\w\-_]/.  I believe this change would have 
a low impact on the spec, that it should be feasible for implementors to easily 
adopt this behavior, and that this should commonly be compatible with existing 
code that is currently not spec compliant.

many thanks,
Gavin



[ * However this is not exactly equivalent to treating the hyphen in an invalid 
ranges as having being escaped.  Consider /[\d-a-z]/.  Escaping the hyphen in 
the invalid range would give the expression /[\d\-a-z]/, in which case a-z 
would be matched as a CharacterRange.  This would arguably be a more intuitive 
interpretation of the expression, but changing the language to match this would 
require a more intrusive change to the grammar, which I'm assuming would not be 
desirable. ]

_______________________________________________
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Reply via email to