> On 24 Sep 2024, at 00:39, Steffen Nurpmeso <[email protected]> wrote: > > Hans Åberg wrote in > <[email protected]>: > |I made the most efficient regular expression algorithm possible that \ > |catches all called for sub-matches, described here: > |https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85472 > > You have not posted an URL to your implementation in this thread?
It is not published yet. I wrote it back then, and is now seeing if I can integrate into a program I am writing. > |In short, backtracking is avoided by caches DFA states during the parsing, \ > |though all can be computed in advance, and transverse in a loop, like \ > |in Flex and Bison, with no stacking or function calls. DFA minimalization \ > |proved to not work well with actions of subexpressions: > | > |Some sub-NFAs, corresponding to sub-regexes, are marked with labels, \ > |and when transitioning out of one or more, all labeled matches are \ > |computed using a partial RNFA, reverse NFA. > | > |Unicode UTF-8 and UTF-32 are treated by generating byte regular expressi\ > |ons. > | > |In the context of the discussion here, the task is to find a suitable \ > |selection of found matches. For example, backreferences would typically \ > |take the last computed value of a labeled match. > | > |My implementation is oriented towards use with a parser, such as Bison. > > I am too stupid for yacc/bison lex/flex. I write parsers by hand. In the context of discussions here, features such as backreferences might be suitable as a replacement for a parser generator, like in the example of scanning HTML code given here: https://www.regular-expressions.info/backref.html There I would prefer a parser generator that admits more advanced grammars than regular expressions, though I am not sure how well such works with HTML. Handwritten parsers might be preferred if the grammar is well known, such in a compiler for a well-known programming language. > |Tim Shen said he would make an implementation along these lines. So \ > |you might check with him for a reference implementation. > > Oh! It is not as if *i* would be writing a regular expression > engine any time soon. I still remember, however, being a little > bit strained once a QT release came over with a ~30KB (iirc) > regular expression implementation. One always has an option to > replace a self-written suboptimal thing with a really thought > through optimized variant in the Open Source World! > (Pointers to Russ Cox and Plan9 regex i have seen in the issue.) There is an interesting one here: https://www.genivia.com/doc/reflex/html/ https://en.wikipedia.org/wiki/RE/flex Not to be confused with: https://invisible-island.net/reflex/
