Fantastic! Thanks for the analysis. Looks like we should be able to compile the JavaScript regexp to a Java regexp and then use java.util.regex. I'll put that on my queue :-)
--Norris On Apr 22, 12:32 pm, "John Cowan" <[EMAIL PROTECTED]> wrote: > On Tue, Apr 22, 2008 at 9:52 AM, Norris Boyd <[EMAIL PROTECTED]> wrote: > > We welcome contributions and contributors; see > > http://developer.mozilla.org/en/docs/Rhino_Wish_List. > > Your list asks about ECMAscript regular expressions. As far as I can > tell by closely comparing the 3rd Edition with the Javadoc for > java.util.regex.Pattern (supplemented by a few experiments), they are > a proper subset of Java regular expressions with the following three > exceptions: > > Java does not support the \v escape: use \ck instead. > > Java does not support the \0 escape: use \x00 instead. > > Java does not support the \b escape within character classes: for > [...\b...] read [...\ch...]. > > Java also provides the following extensions over ECMAscript: > > Octal escapes (\0d, \0dd, \01dd) > \a (same as \cg) and \e (same as \x1b) > Posix, Unicode, and Java-specific character classes with \p and \P > \A (beginning of input), \z (end of input), and \Z (end of input > except for final line terminator) > Possessive quantifiers ?+, *+, ++ (match as much as possible even if > other parts fail as a result) > \Q and \E (force all characters in between to be escaped) > (?<=X) and (?<!X) for positive and negative lookbehind > (?idmnsux) Turn on special matching flags > (?idmnsux:X) Turn on special matching flags in this group > Character class union (by concatenation) and intersection (with &&) > > The Java syntax for character class union and intersection provokes > incompatible interpretations in certain cases: for example, > [a-z&&[^d-f]] is the same as [a-cg-z] in Java (modulo locale issues), > but in ECMAscript it should match any of a-z&^[ followed by ]. > However, this is a very improbable way of writing that regular > expression in ECMAscript (or any non-Java regular expression > language), so the syntax is *in effect* backward compatible. > Likewise, [a-z[] is invalid in Java (erroneous nested character class) > but should match any of a-z or [ in ECMAscript. > > -- > GMail doesn't have rotating .sigs, but you can see mine > athttp://www.ccil.org/~cowan/signatures --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "JVM Languages" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/jvm-languages?hl=en -~----------~----~----~----~------~----~------~--~---
