[Bug libstdc++/85472] Regex match bug

haberg-1 at telia dot com Wed, 25 Apr 2018 03:57:27 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85472


--- Comment #7 from Hans Åberg <haberg-1 at telia dot com> ---
(In reply to Tim Shen from comment #5)
> (In reply to Hans Åberg from comment #4)
> > I wrote an NFA/DFA that computes all matches, it seems, in an efficient
> > manner: Action numbers are marked on the states of the sub-NFAs one is
> > interested in. While lexing a string, the partial reverse NFA is recorded,
> > which is searched when a final state is encountered. The string position
> > action numbers, as marked on the corresponding states, are recorded, and for
> > each action number, the contiguous string sequences are the possible
> > submatches.
> 
> I'm not following the meaning of "action number"...

Arbitrary numbers used identify the sub-automatons; in the C++ library, the
number of left parentheses in the regex expression from which it derives. 

> ...and "the partial reverse
> NFA is recorded".

The lexing states are sets of NFA states, which are also the DFA states. If a
lexing character c in the state p = {p_0, ...} has transition p_i -> {q_0,
...}, the reverse state q_0 -> {r_0, ...} gets p_i added at this position in
the string. This is to record the states which c actually uses.

> How many actions numbers are recorded?

They are sets, both in the states and the string matches. Each final state
position in the string produces a separate match, containing information about
the submatches via the action numbers.

> for regex_match(s, regex(re)), is it
> on the order of O(s.size()), or O(re.size())? The goal is to allocate
> O(re.size()) of memory exactly once.

The action numbers are just stamped onto the NFA states when it is created, and
then carried along. So it is the same as the number of states in the NFA. When
a string is lexed, one just gets the action numbers for each position that
produced a character match.

[Bug libstdc++/85472] Regex match bug

Reply via email to