Re: Bison lexer

Hans Åberg Sat, 15 Sep 2018 02:31:05 -0700


> On 15 Sep 2018, at 07:02, Akim Demaille <[email protected]> wrote:
> 
>> Le 29 août 2018 à 15:56, Hans Åberg <[email protected]> a écrit :
>> 
>> I did that, too: I wrote some DFA/NFA code, and incidentally found the most 
>> efficient method make action matches via a reverse NFA lookup, cf. [1-3]. 
>> Also, I have made UTF-8/32 to octet character class translations. 
>> 
>> 1. https://gcc.gnu.org/ml/libstdc++/2018-04/msg00032.html
>> 2. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85472
>> 3. https://gcc.gnu.org/ml/libstdc++/2018-05/msg00015.html
> 
> That was interesting.


Thanks. I wanted a dynamic lexer and and at least a partially dynamic parser so 
users define their own operators. One thing that remains with the lexer is the 
backreferenses (see below).

> I found that Tim Shen exposed his work on
> <regex> https://www.youtube.com/watch?v=N_rkHzhXueo.

I haven't seen all.

> When it comes to conversion from expressions to automaton, I’m
> a big fan of Brzozozski’s derivatives, that, in addition, easily
> supported complement and intersection.  

I just have a C++ automaton class that builds the NFA directly through 
operators corresponding to those of a regular expression. The NFA then has no 
empty transitions, and a set of start states, which correspond to the singel 
DFA start state, in the subset construction. For example, for alteration just 
take the union of both the NFAs and their start state sets. A working regex 
implementation then has some additions, such as loops for count matches.

> No idea about group
> captures, and certainly not backward references.

Then when building the NFA, its start and end states form a group, which can be 
identified with unique number, if you so will. Backreferences I think of 
working so that when it appearing, one makes a lookup of its value by the 
reverse NFA method I give, and then inserting it as a dynamic NFA. Strictly, 
the value of the backreference may then change as one comes to a new one, but I 
suspect those that invented the concept have not considered that.

> redgrep implements this approach.  This talks touches the case
> of capturing groups.
> 
> https://www.youtube.com/watch?v=CMhqlRBfVX4&t=8s&frags=pl%2Cwn

The method I give in effect the sub-NFA that the input string uses, so the 
group capture is automatic. Then working together towards DFA minimalization, 
it turns out that one cannot even use the DFA, because different  group markers 
may be merged in to the same DFA state. Any DF minimalization must then keep 
track of that. So it may be similar to LALR state merging, where one must to 
keep track of the whole rules.

Re: Bison lexer

Reply via email to