Hans Åberg wrote in <32c01fe0-b715-412c-809a-0581bf1b8...@telia.com>: |I made the most efficient regular expression algorithm possible that \ |catches all called for sub-matches, described here: |https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85472
You have not posted an URL to your implementation in this thread? |In short, backtracking is avoided by caches DFA states during the parsing, \ |though all can be computed in advance, and transverse in a loop, like \ |in Flex and Bison, with no stacking or function calls. DFA minimalization \ |proved to not work well with actions of subexpressions: | |Some sub-NFAs, corresponding to sub-regexes, are marked with labels, \ |and when transitioning out of one or more, all labeled matches are \ |computed using a partial RNFA, reverse NFA. | |Unicode UTF-8 and UTF-32 are treated by generating byte regular expressi\ |ons. | |In the context of the discussion here, the task is to find a suitable \ |selection of found matches. For example, backreferences would typically \ |take the last computed value of a labeled match. | |My implementation is oriented towards use with a parser, such as Bison. I am too stupid for yacc/bison lex/flex. I write parsers by hand. |Tim Shen said he would make an implementation along these lines. So \ |you might check with him for a reference implementation. Oh! It is not as if *i* would be writing a regular expression engine any time soon. I still remember, however, being a little bit strained once a QT release came over with a ~30KB (iirc) regular expression implementation. One always has an option to replace a self-written suboptimal thing with a really thought through optimized variant in the Open Source World! (Pointers to Russ Cox and Plan9 regex i have seen in the issue.) --steffen | |Der Kragenbaer, The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt)