Hans Åberg wrote in <33614aed-35b0-4589-9d13-8c0d42afd...@telia.com>: | |> On 24 Sep 2024, at 00:39, Steffen Nurpmeso <stef...@sdaoden.eu> wrote: |> |> Hans Åberg wrote in |> <32c01fe0-b715-412c-809a-0581bf1b8...@telia.com>: |>|I made the most efficient regular expression algorithm possible that \ |>|catches all called for sub-matches, described here: |>|https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85472 |> |> You have not posted an URL to your implementation in this thread? | |It is not published yet. I wrote it back then, and is now seeing if \ |I can integrate into a program I am writing.
Well it seems that thread caused some performance boost if i understood that correctly, so.. |>|In short, backtracking is avoided by caches DFA states during the \ |>|parsing, \ |>|though all can be computed in advance, and transverse in a loop, like \ |>|in Flex and Bison, with no stacking or function calls. DFA minimalizatio\ |>|n \ |>|proved to not work well with actions of subexpressions: |>| |>|Some sub-NFAs, corresponding to sub-regexes, are marked with labels, \ |>|and when transitioning out of one or more, all labeled matches are \ |>|computed using a partial RNFA, reverse NFA. |>| |>|Unicode UTF-8 and UTF-32 are treated by generating byte regular expressi\ |>|ons. |>| |>|In the context of the discussion here, the task is to find a suitable \ |>|selection of found matches. For example, backreferences would typically \ |>|take the last computed value of a labeled match. |>| |>|My implementation is oriented towards use with a parser, such as Bison. |> |> I am too stupid for yacc/bison lex/flex. I write parsers by hand. | |In the context of discussions here, features such as backreferences \ |might be suitable as a replacement for a parser generator, like in \ |the example of scanning HTML code given here: |https://www.regular-expressions.info/backref.html | |There I would prefer a parser generator that admits more advanced grammars \ |than regular expressions, though I am not sure how well such works \ |with HTML. | |Handwritten parsers might be preferred if the grammar is well known, \ |such in a compiler for a well-known programming language. | |>|Tim Shen said he would make an implementation along these lines. So \ |>|you might check with him for a reference implementation. |> |> Oh! It is not as if *i* would be writing a regular expression |> engine any time soon. I still remember, however, being a little |> bit strained once a QT release came over with a ~30KB (iirc) |> regular expression implementation. One always has an option to |> replace a self-written suboptimal thing with a really thought |> through optimized variant in the Open Source World! |> (Pointers to Russ Cox and Plan9 regex i have seen in the issue.) | |There is an interesting one here: |https://www.genivia.com/doc/reflex/html/ |https://en.wikipedia.org/wiki/RE/flex That is an interesting thing, thanks for this link. (Pretty academical read, like the above thread of yours.) |Not to be confused with: |https://invisible-island.net/reflex/ --End of <33614aed-35b0-4589-9d13-8c0d42afd...@telia.com> --steffen | |Der Kragenbaer, The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt)