2010/11/16 Mike Samuel <[email protected]>: > 2010/11/16 Erik Corry <[email protected]>: >> 2010/11/15 Marc Harter <[email protected]>: >>> On Mon, 2010-11-15 at 14:06 +0100, Erik Corry wrote: >>> >>>> Your proposal seems to allow variable length lookbehind. This isn't >>>> allowed in perl as far as I know. I just tried the following: >>> >>>> perl -e '"foobarbaz" =~ /a(?<=(ob|bab))/;' >>> >>>> which gives an error on perl5. I think if we are going to allow >>>> variable length lookbehind we should first find out why they don't >>>> have it in perl. I think the implementation is a little tricky if you >>>> want to support the full regexp language in lookbehinds. >>> >>> This was not my intention. I am proposing zero-width lookbehind, which >>> would not allow for the case you specified above. I will update the >> >> The issue is not with the number of characters consumed by the >> assertion. This is indeed zero. The issue is with the width of the >> text matched by the disjunction inside the brackets. This is not any >> disjunction, but rather a restricted part of the regexp language that >> can only match a particular number of characters. >> >> It seems the .Net regexp library is able to handle arbitrary content >> in a lookbehind. It is almost the only one. >> >> See http://www.regular-expressions.info/lookaround.html#lookbehind for >> more details. >> >> We could add this feature to JS. As far as I can work out it >> presupposes the ability to reverse an arbitrary regexp and run it >> backwards (stepping back and backtracking forwards). I don't think we >> should add it accidentally though, and perhaps the proposer should be >> the first to implement it. > > Don't you already have to do that to efficiently handle a regexp that > ends at the end of the input (in JS, a non multiline $, or \z in > java.util.regex parlance)?
V8 doesn't have a general form of that optimization. Do the others? > If you have the whole input string available in memory, and are trying > to figure out whether a lookbehind (?<=x) matches at position p, can't > you just test /(?:x)$/ against the prefix of the input of length p. > > >>> proposal. It is my understanding that lookahead as implemented in >>> ECMAScript also is zero-width and not variable. This is also how Perl has >>> implemented lookbehind. >>> >>> http://perldoc.perl.org/perlre.html#Extended-Patterns >>> >>> Updated Proposal: >>> https://docs.google.com/document/pub?id=1EUHvr1SC72g6OPo5fJjelVESpd4nI0D5NQpF3oUO5UM >> >> The issue is not that the regexp doesn't match in perl. The issue is >> that it is not compiled at all. >> >>> >>> Is there an example of a language that supports the full regexp power >>> in lookbehinds so we can look at their experiences with implementing >>> it? >>> >>> As far as I know Perl is the de facto standard. >>> >>> >>> >>> 2010/11/15 Marc Harter <[email protected]>: >>>> Brendan et al., >>>> >>>> I have created a proposal for look-behind provided at this link: >>>> >>>> >>>> https://docs.google.com/document/pub?id=1EUHvr1SC72g6OPo5fJjelVESpd4nI0D5NQpF3oUO5UM >>>> >>>> I hope it is a format that will be helpful for discussion with TC39. >>>> Admittedly, I have never written one of these before so am completely open >>>> to any feedback or ways to improve the document from yourself or anyone >>>> else >>>> on this list. >>>> >>>> Marc >>>> >>>> On Sat, 2010-11-13 at 09:32 -0600, Marc Harter wrote: >>>> >>>> I would be game to write up a proposal for this. When would you need >>>> this by to discuss w/ TC39? >>>> >>>> Thanks for your consideration, >>>> Marc >>>> >>>> On Nov 12, 2010, at 5:04 PM, Brendan Eich <[email protected]> wrote: >>>> >>>>> On Nov 12, 2010, at 2:52 PM, Marc Harter wrote: >>>>> >>>>>> After considering all the breadth this discussion could take maybe it >>>>>> would be wise to just focus on one issue at a time. For me, the biggest >>>>>> missing feature is lookbehind. Its common to most languages >>>>>> implementing the Perl-RegExp-syntax, it is very useful when looking for >>>>>> patterns that follow or don't follow a particular pattern. I guess I'm >>>>>> confused why lookahead made it in but not lookbehind. >>>>> >>>>> This was 1998, Netscape 4 work I did in '97 was based on Perl 4(!), but >>>>> we >>>>> proposed to ECMA TC39 TG1 (the JS group -- things were different then, >>>>> including capitalization) something based on Perl 5. We didn't get >>>>> everything, and we had to rationalize some obvious quirks. >>>>> >>>>> I don't remember lookbehind (which emerged in Perl 5.005 in July '98) >>>>> being left out on purpose. Waldemar may recall more, I'd handed him the >>>>> JS >>>>> keys inside netscape.com to go do mozilla.org. >>>>> >>>>> If you are game to write a proposal or mini-spec (in the style of ES5 >>>>> even), let me know. I'll chat with other TC39'ers next week about this. >>>>> >>>>> /be >>>>> >>>>> >>>>>> What do people >>>>>> think about including this feature? >>>>>> >>>>>> Marc >>>>>> >>>>>> On Fri, 2010-11-12 at 16:20 -0600, Marc Harter wrote: >>>>>>> I will start out with a disclaimer. I have not read both ECMAScript >>>>>>> specifications for 3 and now 5, so I admit that I am not an expert in >>>>>>> the spec itself but as I user of JavaScript, I would like to get some >>>>>>> expert discussion over this topic as proposed enhancements to the >>>>>>> RegExp engine for Harmony. >>>>>>> >>>>>>> I will start with a list of lacking features in JS as compared to Perl >>>>>>> provided by (http://www.regular-expressions.info/javascript.html): >>>>>>> >>>>>>> * No \A or \Z anchors to match the start or end of the string. >>>>>>> Use a caret or dollar instead. >>>>>>> * Lookbehind is not supported at all. Lookahead is fully >>>>>>> supported. >>>>>>> * No atomic grouping or possessive quantifiers >>>>>>> * No Unicode support, except for matching single characters with >>>>>>> \uFFFF >>>>>>> * No named capturing groups. Use numbered capturing groups >>>>>>> instead. >>>>>>> * No mode modifiers to set matching options within the regular >>>>>>> expression. >>>>>>> * No conditionals. >>>>>>> * No regular expression comments. Describe your regular >>>>>>> expression with JavaScript // comments instead, outside the >>>>>>> regular expression string. >>>>>>> >>>>>>> I don't know if all of these "need" to be in the language but there >>>>>>> have been some that I have personally wanted to use: >>>>>>> >>>>>>> * Lookbehind! ECMAScript fully supports lookahead, why not >>>>>>> lookbehind? Seems like a big hole to me. >>>>>>> * Named capturing groups and comments (e.g. >>>>>>> http://xregexp.com/syntax/). Mostly I argue for this because >>>>>>> it makes RegExp matches more self-documenting. Regular >>>>>>> Expressions are already cryptic as it is. >>>>>>> >>>>>>> I do like some of the new flags proposed in >>>>>>> (http://xregexp.com/flags/) but personally haven't used them but maybe >>>>>>> that is something also for discussion. >>>>>>> >>>>>>> Marc Harter >>>>>> >>>>>> _______________________________________________ >>>>>> es-discuss mailing list >>>>>> [email protected] >>>>>> https://mail.mozilla.org/listinfo/es-discuss >>>>> >>>> >>>> _______________________________________________ >>>> es-discuss mailing list >>>> [email protected] >>>> https://mail.mozilla.org/listinfo/es-discuss >>>> >>>> >>> >> _______________________________________________ >> es-discuss mailing list >> [email protected] >> https://mail.mozilla.org/listinfo/es-discuss >> > _______________________________________________ es-discuss mailing list [email protected] https://mail.mozilla.org/listinfo/es-discuss

