I'll say I have at least a few times desired a way to stream input to RegExp. To the point I fiddled around implementing a finite state machine.
My use cases: 1. short circuiting on nth capture 2. parsing streams On Mon, Jul 30, 2018 at 2:39 PM, Isiah Meadows <isiahmead...@gmail.com> wrote: > There's two things I've found that need suspendable matching: > > 1. Partially matching against a string, which is useful with > interactive form validation and such. > 2. Pattern matching and replacement over a character stream, which is > useful for things like matching against files without loading the > entire thing into memory or easier filtering of requests. > > Also, it'd be nice if there was a facility to get *all* matches, > including duplicate group matches. This is often useful for simple > parsing, where if such support existed, you could just use a Kleene > star instead of the standard `exec` loops (which admittedly get old). > > And finally, we could avoid setting regexp globals here. That would > speed up the matcher quite a bit. > > So, here's my proposal: > > - `regexp.matcher() -> matcher` - Create a streaming regexp matcher. > - `matcher.consume(codePoint, charSize?) -> result | undefined` - > Consume a Unicode code point or `-1` if no more characters exist, and > return a match result, `undefined` if no match occurred. `charSize` is > the number of bytes represented by `codePoint` (default: 1-2 if `/u` > is set, 1 otherwise), so it can work with other encodings flexibly. > - `matcher.nextPossibleStart -> number` - The next possible start the > matcher could have, for more effective buffering and stream > management. This is implementation-defined, but it *must* be be `-1` > after the matcher completes, and it *must* be within [0, N) otherwise, > where N is the next returned match. > - `result.group -> string | number | undefined` - Return the group > index/name of the current match, or `undefined` if it's just issuing a > match of the global regexp. > - `result.start -> number` - Return the matched value's start index. > - `result.end -> number` - Return the matched value's end index. > - This does *not* modify any globals or regexp instance members. It > only reads `regexp.lastIndex` on creation. (It doesn't operate on > strings, so it shouldn't return any it doesn't already have.) > > Most RegExp methods could similarly be built using this as a base: if > they work on strings, they can iterate their code points. > > As for the various concerns: > > - Partial matching is just iterating a string's character codes and > seeing if the matcher ever returned non-`undefined`. > - Streaming pattern matching is pretty obvious from just reading the API. > - Getting all matches is just iterating the string and returning an > object with all the groups + strings it matched. > > So WDYT? > > /cc Mathias Bynens, since I know you're involved in this kind of > text-heavy stuff. > > ----- > > Isiah Meadows > cont...@isiahmeadows.com > www.isiahmeadows.com > _______________________________________________ > es-discuss mailing list > es-discuss@mozilla.org > https://mail.mozilla.org/listinfo/es-discuss >
_______________________________________________ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss