Bruno Haible <[email protected]> writes:
> Collin Funk wrote:
>> I don't blame you for not wanting to touch regex.
>
> :)
I had a recent patch in glibc's regex that Florian noticed was
incorrect [1]. It was an attempt to avoid OOM with many adjacent
repetition operators, but it did not cover all cases.
I guess the correct way to do it is to merge them all into one operator
with the lowest and highest values of the adjacent ones, i.e.
a+++++++ -> a+
a+{3,4}{6,10} -> a+
a{3,4}{6,10} -> a{3,10}
a+* -> a*
But understanding that code (and adding tests) is quite a bit of work.
It would also be nice to add the '?' operator added by POSIX.1-2024 to
get the leftmost shortest match [2]. But my impression is that few
people understand the regex code enough to add it.
>> Maybe it is best to leave out all the files shared with glibc?
>
> That was my original plan. But when I saw how much the readability of
> the code was improved by this refactoring, I applied it also to files
> shared with glibc. (Take a look e.g. at lib/str-two-way.h, function
> 'critical_factorization'. IMO it was impenetrable before, due to 7 local
> variables declared upfront. Now this function has 3 local variables
> at function scope.)
>
> I think it will be worth the effort to push these refactorings (in
> argp, glob, getopt etc.) into glibc.
Okay, sounds like a good plan to me. Glibc uses C99 declartions in many
places, so it shouldn't be a problem.
> Yes. Paul warned me [1], and despite that, I made the first mistake two or
> three
> times and the second mistake once. It's a pitfall for those of us who use
> gcc 15 with the default options and the default AC_PROG_CC.
Yep, that is the same reason I missed it.
Collin
[1]
https://inbox.sourceware.org/libc-alpha/99787786690d4e5ae1faad9501d6a8aadb7189c5.1760680824.git.collin.fu...@gmail.com/
[2]
https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap09.html#tag_09_04_06