Re: [1003.1(2024)/Issue8 0001857]: Several problems with the new "lazy" regex quantifier.

Hans Åberg via austin-group-l at The Open Group Tue, 24 Sep 2024 00:37:54 -0700


> On 24 Sep 2024, at 00:39, Steffen Nurpmeso <[email protected]> wrote:
> 
> Hans Åberg wrote in
> <[email protected]>:
> |I made the most efficient regular expression algorithm possible that \
> |catches all called for sub-matches, described here:
> |https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85472
> 
> You have not posted an URL to your implementation in this thread?


It is not published yet. I wrote it back then, and is now seeing if I can 
integrate into a program I am writing.

> |In short, backtracking is avoided by caches DFA states during the parsing, \
> |though all can be computed in advance, and transverse in a loop, like \
> |in Flex and Bison, with no stacking or function calls. DFA minimalization \
> |proved to not work well with actions of subexpressions:
> |
> |Some sub-NFAs, corresponding to sub-regexes, are marked with labels, \
> |and when transitioning out of one or more, all labeled matches are \
> |computed using a partial RNFA, reverse NFA.
> |
> |Unicode UTF-8 and UTF-32 are treated by generating byte regular expressi\
> |ons.
> |
> |In the context of the discussion here, the task is to find a suitable \
> |selection of found matches. For example, backreferences would typically \
> |take the last computed value of a labeled match.
> |
> |My implementation is oriented towards use with a parser, such as Bison. 
> 
> I am too stupid for yacc/bison lex/flex.  I write parsers by hand.

In the context of discussions here, features such as backreferences might be 
suitable as a replacement for a parser generator, like in the example of 
scanning HTML code given here:
https://www.regular-expressions.info/backref.html

There I would prefer a parser generator that admits more advanced grammars than 
regular expressions, though I am not sure how well such works with HTML.

Handwritten parsers might be preferred if the grammar is well known, such in a 
compiler for a well-known programming language.

> |Tim Shen said he would make an implementation along these lines. So \
> |you might check with him for a reference implementation.
> 
> Oh!  It is not as if *i* would be writing a regular expression
> engine any time soon.  I still remember, however, being a little
> bit strained once a QT release came over with a ~30KB (iirc)
> regular expression implementation.  One always has an option to
> replace a self-written suboptimal thing with a really thought
> through optimized variant in the Open Source World!
> (Pointers to Russ Cox and Plan9 regex i have seen in the issue.)

There is an interesting one here:
https://www.genivia.com/doc/reflex/html/
https://en.wikipedia.org/wiki/RE/flex

Not to be confused with:
https://invisible-island.net/reflex/

Re: [1003.1(2024)/Issue8 0001857]: Several problems with the new "lazy" regex quantifier.

Reply via email to