I was thinking in terms of what hooks need overridden (none for the base proposal), not where the method itself lived. -----
Isiah Meadows cont...@isiahmeadows.com www.isiahmeadows.com On Mon, Jul 30, 2018 at 5:44 PM, Jordan Harband <ljh...@gmail.com> wrote: > (It'd have to be a subclass for `RegExp.prototype.whatever.call` to do the > right thing for your alternative regex) > > On Mon, Jul 30, 2018 at 12:36 PM, Isiah Meadows <isiahmead...@gmail.com> > wrote: >> >> Not yet, but if I were, it wouldn't be a subclass (it's not >> necessary). But the key trouble implementing this is that I'd have to >> reimplement the regexp matching logic entirely from scratch if I want >> remotely reasonable runtime complexity. As for why: >> >> - If you ignore backreferences (which IIUC makes JS regexps go from >> regular to context-sensitive recognizers), it's *possible* to >> implement partial matching using regexp rewriting, but only because JS >> doesn't have any other arcane enough features to make it infeasible. >> - There is no possible way to extract duplicate matches without >> rewriting the regexp entirely and using `regexp.exec`, and the >> complexity of the logic to do this generally I suspect is NP-complete, >> maybe PSPACE-complete, and in either case, certainly infeasible when >> backreferences enter the picture. >> - It's technically possible to refactor a string for streaming, but I >> then lose the ability to discern a match from a non-match. I also have >> the same issue as per above WRT duplicate matches and partial matching >> with complexity issues. >> - If you were to combine the three, rewriting the regexp generally to >> support streaming matches, including duplicates, I suspect would >> likely be PSPACE-complete because it seems *very* similar to the first >> problem listed here [1]. >> >> \* Backreferences bring the grammatical complexity of JS regexps from >> I'm pretty sure regular to context-sensitive. >> >> Or in other words, either you control the raw matching logic yourself, >> or the polyfill runtime complexity could get absurd for this proposal. >> And implementing a pushdown state machine-based regexp engine + parser >> in JS isn't exactly something I'm willing to prototype for a strawman. >> >> [1]: >> https://en.wikipedia.org/wiki/PSPACE-complete#Regular_expressions_and_automata >> >> ----- >> >> Isiah Meadows >> cont...@isiahmeadows.com >> www.isiahmeadows.com >> >> >> On Mon, Jul 30, 2018 at 2:44 PM, Jordan Harband <ljh...@gmail.com> wrote: >> > Have you tried to implement this as a RegExp subclass, overriding all >> > the >> > necessary extension points? >> > >> > On Mon, Jul 30, 2018 at 11:39 AM, Isiah Meadows <isiahmead...@gmail.com> >> > wrote: >> >> >> >> There's two things I've found that need suspendable matching: >> >> >> >> 1. Partially matching against a string, which is useful with >> >> interactive form validation and such. >> >> 2. Pattern matching and replacement over a character stream, which is >> >> useful for things like matching against files without loading the >> >> entire thing into memory or easier filtering of requests. >> >> >> >> Also, it'd be nice if there was a facility to get *all* matches, >> >> including duplicate group matches. This is often useful for simple >> >> parsing, where if such support existed, you could just use a Kleene >> >> star instead of the standard `exec` loops (which admittedly get old). >> >> >> >> And finally, we could avoid setting regexp globals here. That would >> >> speed up the matcher quite a bit. >> >> >> >> So, here's my proposal: >> >> >> >> - `regexp.matcher() -> matcher` - Create a streaming regexp matcher. >> >> - `matcher.consume(codePoint, charSize?) -> result | undefined` - >> >> Consume a Unicode code point or `-1` if no more characters exist, and >> >> return a match result, `undefined` if no match occurred. `charSize` is >> >> the number of bytes represented by `codePoint` (default: 1-2 if `/u` >> >> is set, 1 otherwise), so it can work with other encodings flexibly. >> >> - `matcher.nextPossibleStart -> number` - The next possible start the >> >> matcher could have, for more effective buffering and stream >> >> management. This is implementation-defined, but it *must* be be `-1` >> >> after the matcher completes, and it *must* be within [0, N) otherwise, >> >> where N is the next returned match. >> >> - `result.group -> string | number | undefined` - Return the group >> >> index/name of the current match, or `undefined` if it's just issuing a >> >> match of the global regexp. >> >> - `result.start -> number` - Return the matched value's start index. >> >> - `result.end -> number` - Return the matched value's end index. >> >> - This does *not* modify any globals or regexp instance members. It >> >> only reads `regexp.lastIndex` on creation. (It doesn't operate on >> >> strings, so it shouldn't return any it doesn't already have.) >> >> >> >> Most RegExp methods could similarly be built using this as a base: if >> >> they work on strings, they can iterate their code points. >> >> >> >> As for the various concerns: >> >> >> >> - Partial matching is just iterating a string's character codes and >> >> seeing if the matcher ever returned non-`undefined`. >> >> - Streaming pattern matching is pretty obvious from just reading the >> >> API. >> >> - Getting all matches is just iterating the string and returning an >> >> object with all the groups + strings it matched. >> >> >> >> So WDYT? >> >> >> >> /cc Mathias Bynens, since I know you're involved in this kind of >> >> text-heavy stuff. >> >> >> >> ----- >> >> >> >> Isiah Meadows >> >> cont...@isiahmeadows.com >> >> www.isiahmeadows.com >> >> _______________________________________________ >> >> es-discuss mailing list >> >> es-discuss@mozilla.org >> >> https://mail.mozilla.org/listinfo/es-discuss >> > >> > > > _______________________________________________ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss