https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85472
--- Comment #11 from Tim Shen <timshen at gcc dot gnu.org> --- > The problem is that #4 has an earlier capture, making it impossible to see > that it is left undefined later. I wouldn't say it's impossible. libc++ implements it correctly at a cost. Specifically, see https://github.com/llvm-mirror/libcxx/blob/e54a22f809bc741760e39ac78444f57f06117652/include/regex#L5566. > So it would be better to capture it as an > empty string here: > > In the example I gave, if the transitions are marked (ignoring #2) > 1: {a ↦ {1#3, 2#3, 3#3$4}}; > 2: {b ↦ {2#4, 3#4}}; > 3: {c ↦ {3#5, 2#5, 1#5, ∅#2}. > Then in the "ac" part, 'a' in state 1 goes to 3 and gets a #3 match 'a' plus > an empty which I have written $4. That is the idea, though. I'm feeling that this is going to a quadratic space complexity here. Does "(((...((a_0)|(a_1)|(a_2)|...(a_n))...)))" if the regex expands to n surrounding parenthesis, and n characters in the middle? Would this regex match taking up to n^2 time, given a n-sized input string? If you think it's time and space efficiently doable, maybe you can develop a prototype first. > I have though not implemented the counted repetitions A{m, n}, which I think > usually are implemented as loops, so I do not know how that fits into the > picture. It might also be helpful to think about back-references like "(a+)\1".