Niu Danny wrote, on 04 Mar 2025: > > In https://www.austingroupbugs.net/view.php?id=1857#c6898 , > Geoff responded to my torture testing case with a step-by-step > broken-down analysis. However, I have some doubts: > > > (([0-9][a-z]+[0-9])+?)+ matches 2abc3 with 1 repetition > > as that's the longest match for which the minimal repetition > > has the shortest match > > Why doesn't the outer-most "+" quantifier involve "4def5" > in its match? The subexpression `([0-9][a-z]+[0-9])` can > totally match both "2abc3" and "4def5", and the greedy "+" > instruct the regex engine to repeat the immediately preceding > match.
Having written in my previous email about pathological cases, I believe this to be one too. I.e. it is simultaneously asking for both the longest and shortest match for the SAME part of the string. > > Here's my terminal interaction: > > ``` > // Portable Home on External Drive / > $ echo 12abc34def56 | grep -E -o '(([0-9][a-z]+[0-9])+?)+' > 2abc34def5 It looks like on macOS the conflict is resolved in favour of the last type of repetition. Adding another non-greedy one outside does this: $ echo 12abc34def56 | grep -E -o '((([0-9][a-z]+[0-9])+?)+)+?' 2abc3 4def5 However, it would probably be best to make these pathological cases unspecified. (As per my previous email, they ought not to occur in any real-world uses.) -- Geoff Clare <[email protected]> The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
