There's two things I've found that need suspendable matching: 1. Partially matching against a string, which is useful with interactive form validation and such. 2. Pattern matching and replacement over a character stream, which is useful for things like matching against files without loading the entire thing into memory or easier filtering of requests.
Also, it'd be nice if there was a facility to get *all* matches, including duplicate group matches. This is often useful for simple parsing, where if such support existed, you could just use a Kleene star instead of the standard `exec` loops (which admittedly get old). And finally, we could avoid setting regexp globals here. That would speed up the matcher quite a bit. So, here's my proposal: - `regexp.matcher() -> matcher` - Create a streaming regexp matcher. - `matcher.consume(codePoint, charSize?) -> result | undefined` - Consume a Unicode code point or `-1` if no more characters exist, and return a match result, `undefined` if no match occurred. `charSize` is the number of bytes represented by `codePoint` (default: 1-2 if `/u` is set, 1 otherwise), so it can work with other encodings flexibly. - `matcher.nextPossibleStart -> number` - The next possible start the matcher could have, for more effective buffering and stream management. This is implementation-defined, but it *must* be be `-1` after the matcher completes, and it *must* be within [0, N) otherwise, where N is the next returned match. - `result.group -> string | number | undefined` - Return the group index/name of the current match, or `undefined` if it's just issuing a match of the global regexp. - `result.start -> number` - Return the matched value's start index. - `result.end -> number` - Return the matched value's end index. - This does *not* modify any globals or regexp instance members. It only reads `regexp.lastIndex` on creation. (It doesn't operate on strings, so it shouldn't return any it doesn't already have.) Most RegExp methods could similarly be built using this as a base: if they work on strings, they can iterate their code points. As for the various concerns: - Partial matching is just iterating a string's character codes and seeing if the matcher ever returned non-`undefined`. - Streaming pattern matching is pretty obvious from just reading the API. - Getting all matches is just iterating the string and returning an object with all the groups + strings it matched. So WDYT? /cc Mathias Bynens, since I know you're involved in this kind of text-heavy stuff. ----- Isiah Meadows cont...@isiahmeadows.com www.isiahmeadows.com _______________________________________________ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss