Geoff Clare via austin-group-l at The Open Group wrote in <ZvJ-YjQWnGfP9u7T@localhost>: |Steffen Nurpmeso wrote, on 24 Sep 2024: |> Geoff Clare wrote in |> <ZvGKHOb0E3IZ5Y4Q@localhost>: |> |>|I think this is required by the normative text (elsewhere than the |>|grammar), not assumed by the example as Mike says. The relevant text |>|is in the definition of "matched" in 9.1: |>| |>| Consistent with the whole match being the longest of the leftmost |>| matches, each subpattern, from left to right, shall match the |>| longest possible string. |> |> Yes, that is good. |> |>|and it goes on to give an example: |>| |>| For example, matching the BRE "\(.*\).*" against "abcdef", the |>| subexpression "(\1)" is "abcdef" |> |> And really in that paragraph there are only successful matches, |> even 'and matching the BRE "\(a*\)*" against "bc", the |> subexpression "(\1)" is the null string' is so. This text is, |> like shell field splitting etc, nothing for the occasional |> "standard text hopper", but can truly be read in full context |> only. | |Thinking some more about that text, I see a problem. Since it |specifically talks about subpatterns, it could be read as implying |that subpatterns are maximised at the expense of parts that are not |in subpatterns. Modifying the example to matching ".*\(.*\)" against
Well .. "no" i say now and today. Maybe the paragraph is just fine, and always has been (in this form). |"abcdef", this interpretation would mean that the subpattern matches |the longest possible string, which is "abcdef", with the initial ".*" |matching nothing. However, all the implementations I tried give the |expected null match for the subpattern. Modified for REG_EXTENDED, yes: #?0|kent:tmp$ ./p-c '.*(.*)' abcdef 0: 0/6 <abcdef> 1: 6/6 <> #?0|kent:tmp$ ./p-pcre2 '.*(.*)' abcdef 0: 0/6 <abcdef> 1: 6/6 <> #?0|kent:tmp$ ./p-tre '.*(.*)' abcdef MINIINININI 0 mini=0 MINIINININI 0 mini=0 HAHAHAH 0: 0/6 <abcdef> 1: 6/6 <> Also, compare for example this snippet of "man perlre" Alternatives are tried from left to right, so the first alternative found for which the entire expression matches, is the one that is chosen. This means that alternatives are not necessarily greedy. For example: when matching "foo|foot" against "barefoot", only the "foo" part will match, as that is the first alternative tried, and it successfully matches the target string. (This might not seem important, but it is important when you are capturing matched text using parentheses.) with 9.1 (p 179 bottom / 180 top), If the pattern permits a variable number of matching characters and thus there is more than one such sequence starting at that point, the longest such sequence is matched. For example, the BRE "bb*" matches the second to fourth characters of the string "abbbc", and the ERE "(wee|week)(knights|night)" matches all ten characters of the string "weeknights. What happens is $ ./p-c '(wee|week)(knights|night)' weeknights 0: 0/10 <weeknights> 1: 0/3 <wee> 2: 3/10 <knights> $ ./p-tre '(wee|week)(knights|night)' weeknights HAHAHAH 0: 0/10 <weeknights> 1: 0/3 <wee> 2: 3/10 <knights> $ ./p-pcre2 '(wee|week)(knights|night)' weeknights 0: 0/10 <weeknights> 1: 0/3 <wee> 2: 3/10 <knights> That i would not truly expect from "matches all ten characters" in respect to "match". If i have subpatterns "matching" means "i have data", whatever. Maybe it would make sense to especially refer to \1 being "wee", as that is not "the longest possible" match here. Other than that, you know, that is a large field. --steffen | |Der Kragenbaer, The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt)