Re: [1003.1(2024)/Issue8 0001857]: Several problems with the new "lazy" regex quantifier.

Steffen Nurpmeso via austin-group-l at The Open Group Mon, 23 Sep 2024 15:46:07 -0700

Hans Åberg wrote in
 <32c01fe0-b715-412c-809a-0581bf1b8...@telia.com>:
 |I made the most efficient regular expression algorithm possible that \
 |catches all called for sub-matches, described here:
 |https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85472


You have not posted an URL to your implementation in this thread?

 |In short, backtracking is avoided by caches DFA states during the parsing, \
 |though all can be computed in advance, and transverse in a loop, like \
 |in Flex and Bison, with no stacking or function calls. DFA minimalization \
 |proved to not work well with actions of subexpressions:
 |
 |Some sub-NFAs, corresponding to sub-regexes, are marked with labels, \
 |and when transitioning out of one or more, all labeled matches are \
 |computed using a partial RNFA, reverse NFA.
 |
 |Unicode UTF-8 and UTF-32 are treated by generating byte regular expressi\
 |ons.
 |
 |In the context of the discussion here, the task is to find a suitable \
 |selection of found matches. For example, backreferences would typically \
 |take the last computed value of a labeled match.
 |
 |My implementation is oriented towards use with a parser, such as Bison. 

I am too stupid for yacc/bison lex/flex.  I write parsers by hand.

 |Tim Shen said he would make an implementation along these lines. So \
 |you might check with him for a reference implementation.

Oh!  It is not as if *i* would be writing a regular expression
engine any time soon.  I still remember, however, being a little
bit strained once a QT release came over with a ~30KB (iirc)
regular expression implementation.  One always has an option to
replace a self-written suboptimal thing with a really thought
through optimized variant in the Open Source World!
(Pointers to Russ Cox and Plan9 regex i have seen in the issue.)

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: [1003.1(2024)/Issue8 0001857]: Several problems with the new "lazy" regex quantifier.

Reply via email to