On 2025-12-05 20:38, Collin Funk wrote:

    a{3,4}{6,10}  -> a{3,10}

That's not equivalent. Equivalent would be:

  a{3,4}{6,10} -> a{18,40}

There's a similar problem with one of your other examples.

These are nontrivial waters. For example, you can't simplify a{3}{6,10} or a{3,4}{6} in similar ways. More generally, one can use GNU EREs to decide linear diophantine equations or even full Büchi arithmetic, and it's hard to match such EREs efficiently.

If you're interested in this stuff, here's a modern intro to the theory; look for its discussion of automata:

Chistikov D. An introduction to the theory of linear integer arithmetic. FSTTCS 2024, 323 1:1-1:36. <https://doi.org/10.4230/LIPIcs.FSTTCS.2024.1> <https://wrap.warwick.ac.uk/id/eprint/188924/1/LIPIcs.FSTTCS.2024.1.pdf>

I haven't read that intro or pursued the basic (decades-old) idea, though, as I doubt whether implementing regex optimizations along these lines would interest anybody other than automata theorists.


It would also be nice to add the '?' operator added by POSIX.1-2024 to
get the leftmost shortest match [2]. But my impression is that few
people understand the regex code enough to add it.

Yup. Some years ago I wrote to that code's author but got no answer. It might be wise, at some point, to ditch the code entirely and use something more understandable and maintainable.

Reply via email to