2010/11/16 Erik Corry <[email protected]>: > 2010/11/15 Marc Harter <[email protected]>: >> On Mon, 2010-11-15 at 14:06 +0100, Erik Corry wrote: >> >>> Your proposal seems to allow variable length lookbehind. This isn't >>> allowed in perl as far as I know. I just tried the following: >> >>> perl -e '"foobarbaz" =~ /a(?<=(ob|bab))/;' >> >>> which gives an error on perl5. I think if we are going to allow >>> variable length lookbehind we should first find out why they don't >>> have it in perl. I think the implementation is a little tricky if you >>> want to support the full regexp language in lookbehinds. >> >> This was not my intention. I am proposing zero-width lookbehind, which >> would not allow for the case you specified above. I will update the > > The issue is not with the number of characters consumed by the > assertion. This is indeed zero. The issue is with the width of the > text matched by the disjunction inside the brackets. This is not any > disjunction, but rather a restricted part of the regexp language that > can only match a particular number of characters. > > It seems the .Net regexp library is able to handle arbitrary content > in a lookbehind. It is almost the only one. > > See http://www.regular-expressions.info/lookaround.html#lookbehind for > more details. > > We could add this feature to JS. As far as I can work out it > presupposes the ability to reverse an arbitrary regexp and run it > backwards (stepping back and backtracking forwards). I don't think we > should add it accidentally though, and perhaps the proposer should be > the first to implement it.
Don't you already have to do that to efficiently handle a regexp that ends at the end of the input (in JS, a non multiline $, or \z in java.util.regex parlance)? If you have the whole input string available in memory, and are trying to figure out whether a lookbehind (?<=x) matches at position p, can't you just test /(?:x)$/ against the prefix of the input of length p. >> proposal. It is my understanding that lookahead as implemented in >> ECMAScript also is zero-width and not variable. This is also how Perl has >> implemented lookbehind. >> >> http://perldoc.perl.org/perlre.html#Extended-Patterns >> >> Updated Proposal: >> https://docs.google.com/document/pub?id=1EUHvr1SC72g6OPo5fJjelVESpd4nI0D5NQpF3oUO5UM > > The issue is not that the regexp doesn't match in perl. The issue is > that it is not compiled at all. > >> >> Is there an example of a language that supports the full regexp power >> in lookbehinds so we can look at their experiences with implementing >> it? >> >> As far as I know Perl is the de facto standard. >> >> >> >> 2010/11/15 Marc Harter <[email protected]>: >>> Brendan et al., >>> >>> I have created a proposal for look-behind provided at this link: >>> >>> >>> https://docs.google.com/document/pub?id=1EUHvr1SC72g6OPo5fJjelVESpd4nI0D5NQpF3oUO5UM >>> >>> I hope it is a format that will be helpful for discussion with TC39. >>> Admittedly, I have never written one of these before so am completely open >>> to any feedback or ways to improve the document from yourself or anyone >>> else >>> on this list. >>> >>> Marc >>> >>> On Sat, 2010-11-13 at 09:32 -0600, Marc Harter wrote: >>> >>> I would be game to write up a proposal for this. When would you need >>> this by to discuss w/ TC39? >>> >>> Thanks for your consideration, >>> Marc >>> >>> On Nov 12, 2010, at 5:04 PM, Brendan Eich <[email protected]> wrote: >>> >>>> On Nov 12, 2010, at 2:52 PM, Marc Harter wrote: >>>> >>>>> After considering all the breadth this discussion could take maybe it >>>>> would be wise to just focus on one issue at a time. For me, the biggest >>>>> missing feature is lookbehind. Its common to most languages >>>>> implementing the Perl-RegExp-syntax, it is very useful when looking for >>>>> patterns that follow or don't follow a particular pattern. I guess I'm >>>>> confused why lookahead made it in but not lookbehind. >>>> >>>> This was 1998, Netscape 4 work I did in '97 was based on Perl 4(!), but >>>> we >>>> proposed to ECMA TC39 TG1 (the JS group -- things were different then, >>>> including capitalization) something based on Perl 5. We didn't get >>>> everything, and we had to rationalize some obvious quirks. >>>> >>>> I don't remember lookbehind (which emerged in Perl 5.005 in July '98) >>>> being left out on purpose. Waldemar may recall more, I'd handed him the >>>> JS >>>> keys inside netscape.com to go do mozilla.org. >>>> >>>> If you are game to write a proposal or mini-spec (in the style of ES5 >>>> even), let me know. I'll chat with other TC39'ers next week about this. >>>> >>>> /be >>>> >>>> >>>>> What do people >>>>> think about including this feature? >>>>> >>>>> Marc >>>>> >>>>> On Fri, 2010-11-12 at 16:20 -0600, Marc Harter wrote: >>>>>> I will start out with a disclaimer. I have not read both ECMAScript >>>>>> specifications for 3 and now 5, so I admit that I am not an expert in >>>>>> the spec itself but as I user of JavaScript, I would like to get some >>>>>> expert discussion over this topic as proposed enhancements to the >>>>>> RegExp engine for Harmony. >>>>>> >>>>>> I will start with a list of lacking features in JS as compared to Perl >>>>>> provided by (http://www.regular-expressions.info/javascript.html): >>>>>> >>>>>> * No \A or \Z anchors to match the start or end of the string. >>>>>> Use a caret or dollar instead. >>>>>> * Lookbehind is not supported at all. Lookahead is fully >>>>>> supported. >>>>>> * No atomic grouping or possessive quantifiers >>>>>> * No Unicode support, except for matching single characters with >>>>>> \uFFFF >>>>>> * No named capturing groups. Use numbered capturing groups >>>>>> instead. >>>>>> * No mode modifiers to set matching options within the regular >>>>>> expression. >>>>>> * No conditionals. >>>>>> * No regular expression comments. Describe your regular >>>>>> expression with JavaScript // comments instead, outside the >>>>>> regular expression string. >>>>>> >>>>>> I don't know if all of these "need" to be in the language but there >>>>>> have been some that I have personally wanted to use: >>>>>> >>>>>> * Lookbehind! ECMAScript fully supports lookahead, why not >>>>>> lookbehind? Seems like a big hole to me. >>>>>> * Named capturing groups and comments (e.g. >>>>>> http://xregexp.com/syntax/). Mostly I argue for this because >>>>>> it makes RegExp matches more self-documenting. Regular >>>>>> Expressions are already cryptic as it is. >>>>>> >>>>>> I do like some of the new flags proposed in >>>>>> (http://xregexp.com/flags/) but personally haven't used them but maybe >>>>>> that is something also for discussion. >>>>>> >>>>>> Marc Harter >>>>> >>>>> _______________________________________________ >>>>> es-discuss mailing list >>>>> [email protected] >>>>> https://mail.mozilla.org/listinfo/es-discuss >>>> >>> >>> _______________________________________________ >>> es-discuss mailing list >>> [email protected] >>> https://mail.mozilla.org/listinfo/es-discuss >>> >>> >> > _______________________________________________ > es-discuss mailing list > [email protected] > https://mail.mozilla.org/listinfo/es-discuss > _______________________________________________ es-discuss mailing list [email protected] https://mail.mozilla.org/listinfo/es-discuss

