Bruno Haible <[email protected]> writes:

> Collin Funk wrote:
>> I don't blame you for not wanting to touch regex.
>
> :)

I had a recent patch in glibc's regex that Florian noticed was
incorrect [1]. It was an attempt to avoid OOM with many adjacent
repetition operators, but it did not cover all cases.

I guess the correct way to do it is to merge them all into one operator
with the lowest and highest values of the adjacent ones, i.e.

   a+++++++      -> a+
   a+{3,4}{6,10} -> a+
   a{3,4}{6,10}  -> a{3,10}
   a+*           -> a*

But understanding that code (and adding tests) is quite a bit of work.

It would also be nice to add the '?' operator added by POSIX.1-2024 to
get the leftmost shortest match [2]. But my impression is that few
people understand the regex code enough to add it.

>> Maybe it is best to leave out all the files shared with glibc?
>
> That was my original plan. But when I saw how much the readability of
> the code was improved by this refactoring, I applied it also to files
> shared with glibc. (Take a look e.g. at lib/str-two-way.h, function
> 'critical_factorization'. IMO it was impenetrable before, due to 7 local
> variables declared upfront. Now this function has 3 local variables
> at function scope.)
>
> I think it will be worth the effort to push these refactorings (in
> argp, glob, getopt etc.) into glibc.

Okay, sounds like a good plan to me. Glibc uses C99 declartions in many
places, so it shouldn't be a problem.

> Yes. Paul warned me [1], and despite that, I made the first mistake two or 
> three
> times and the second mistake once. It's a pitfall for those of us who use
> gcc 15 with the default options and the default AC_PROG_CC.

Yep, that is the same reason I missed it.

Collin

[1] 
https://inbox.sourceware.org/libc-alpha/99787786690d4e5ae1faad9501d6a8aadb7189c5.1760680824.git.collin.fu...@gmail.com/
[2] 
https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap09.html#tag_09_04_06

Reply via email to