> On 15 Sep 2018, at 07:02, Akim Demaille <a...@lrde.epita.fr> wrote: > >> Le 29 août 2018 à 15:56, Hans Åberg <haber...@telia.com> a écrit : >> >> I did that, too: I wrote some DFA/NFA code, and incidentally found the most >> efficient method make action matches via a reverse NFA lookup, cf. [1-3]. >> Also, I have made UTF-8/32 to octet character class translations. >> >> 1. https://gcc.gnu.org/ml/libstdc++/2018-04/msg00032.html >> 2. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85472 >> 3. https://gcc.gnu.org/ml/libstdc++/2018-05/msg00015.html > > That was interesting.
Thanks. I wanted a dynamic lexer and and at least a partially dynamic parser so users define their own operators. One thing that remains with the lexer is the backreferenses (see below). > I found that Tim Shen exposed his work on > <regex> https://www.youtube.com/watch?v=N_rkHzhXueo. I haven't seen all. > When it comes to conversion from expressions to automaton, I’m > a big fan of Brzozozski’s derivatives, that, in addition, easily > supported complement and intersection. I just have a C++ automaton class that builds the NFA directly through operators corresponding to those of a regular expression. The NFA then has no empty transitions, and a set of start states, which correspond to the singel DFA start state, in the subset construction. For example, for alteration just take the union of both the NFAs and their start state sets. A working regex implementation then has some additions, such as loops for count matches. > No idea about group > captures, and certainly not backward references. Then when building the NFA, its start and end states form a group, which can be identified with unique number, if you so will. Backreferences I think of working so that when it appearing, one makes a lookup of its value by the reverse NFA method I give, and then inserting it as a dynamic NFA. Strictly, the value of the backreference may then change as one comes to a new one, but I suspect those that invented the concept have not considered that. > redgrep implements this approach. This talks touches the case > of capturing groups. > > https://www.youtube.com/watch?v=CMhqlRBfVX4&t=8s&frags=pl%2Cwn The method I give in effect the sub-NFA that the input string uses, so the group capture is automatic. Then working together towards DFA minimalization, it turns out that one cannot even use the DFA, because different group markers may be merged in to the same DFA state. Any DF minimalization must then keep track of that. So it may be similar to LALR state merging, where one must to keep track of the whole rules.