Niu Danny via austin-group-l at The Open Group wrote in
<[email protected]>:
|Now I'm completely lost.
Me too.
|So how about the informative text from "precedence of construct" perspec\
|tive?
|I discussed it a few days earlier with Steffen:
I seem to have lost track of which discussions are going on here.
But Geoff Clare's mail quoted below i recall, and i found it
logical. He also before that already said
What I meant, when I said it is not recursive, is something like:
([0-9]+[a-z]*)+?
where the inner + and * are individually greedy; they don't inherit
the outer repetition's non-greediness.
which i also found logical.
Mike Haertel's MinRX (cool thing, my old 1998 C++ knowledge feels
a bit lost with all the "auto &X =" things), with the patch
diff --git a/tryit.c b/tryit.c
index 67bfbfb9ec..82123254d2 100644
--- a/tryit.c
+++ b/tryit.c
@@ -26,6 +26,8 @@ main(int argc, char *argv[])
eflags |= MINRX_REG_FIRSTSUB;
else if (strcmp(argv[1], "-i") == 0)
cflags |= MINRX_REG_ICASE;
+ else if (strcmp(argv[1], "-m") == 0)
+ cflags |= MINRX_REG_MINIMAL;
else if (strcmp(argv[1], "-n") == 0)
cflags |= MINRX_REG_NEWLINE;
else if (strcmp(argv[1], "-r") == 0)
@@ -66,7 +68,9 @@ main(int argc, char *argv[])
break;
for (int i = 0; i <= j; ++i)
if (rm[i].rm_so != -1)
- printf("(%d,%d)", (int) rm[i].rm_so,
(int) rm[i].rm_eo);
+ printf("(%d,%d=%.*s)",
+ (int) rm[i].rm_so, (int)
rm[i].rm_eo,
+ (int)
(rm[i].rm_eo-rm[i].rm_so), &argv[2][rm[i].rm_so]);
else
printf("(?,?)");
putchar('\n');
spits out
#?0|kent:minrx.git$ ./tryit 'X(([0-9a-zA-Z]+)([a-zA-Z]*))+Y' 'X000aaaYbbbY'
(0,12=X000aaaYbbbY)(1,11=000aaaYbbb)(1,11=000aaaYbbb)(11,11=)
#?0|kent:minrx.git$ ./tryit 'X(([0-9a-zA-Z]+)([a-zA-Z]*))+?Y' 'X000aaaYbbbY'
(0,8=X000aaaY)(1,7=000aaa)(1,7=000aaa)(7,7=)
#?0|kent:minrx.git$ ./tryit 'X(([0-9a-zA-Z]+?)([a-zA-Z]*))+?Y' 'X000aaaYbbbY'
(0,8=X000aaaY)(1,7=000aaa)(1,4=000)(4,7=aaa)
#?0|kent:minrx.git$ ./tryit 'X(([0-9a-zA-Z]+?)([a-zA-Z]*))+Y' 'X000aaaYbbbY'
(0,12=X000aaaYbbbY)(1,11=000aaaYbbb)(1,4=000)(4,11=aaaYbbb)
And perl says:
#?0|kent:nail.git$ perl -e '$i="X000aaaYbbbY"; if($i =~
"X(([0-9a-zA-Z]+)([a-zA-Z]*))+Y"){print ",$&,$1,$2,$3,\n"}'
,X000aaaYbbbY,000aaaYbbb,000aaaYbbb,,
#?0|kent:nail.git$ perl -e '$i="X000aaaYbbbY"; if($i =~
"X(([0-9a-zA-Z]+)([a-zA-Z]*))+?Y"){print ",$&,$1,$2,$3,\n"}'
,X000aaaYbbbY,000aaaYbbb,000aaaYbbb,,
#?0|kent:nail.git$ perl -e '$i="X000aaaYbbbY"; if($i =~
"X(([0-9a-zA-Z]+?)([a-zA-Z]*))+?Y"){print ",$&,$1,$2,$3,\n"}'
,X000aaaYbbbY,0aaaYbbb,0,aaaYbbb,
#?0|kent:nail.git$ perl -e '$i="X000aaaYbbbY"; if($i =~
"X(([0-9a-zA-Z]+?)([a-zA-Z]*))+Y"){print ",$&,$1,$2,$3,\n"}'
,X000aaaYbbbY,0aaaYbbb,0,aaaYbbb,
Now i need to think about that.
Ciao already here,
|> I was thinking about something like this:
|>
|> The precedence of quantifiers are as follow (from highest to lowest):
|>
|> 1. The length of any minimal quantifier -modified subexpression
|> shall be such that they match the shortest substring of the
|> subject string, in descending priority from left to right.
|>
|> 2. Consistent with rule 1, the length of the overall match shall
|> be the longest possible.
|>
|> 3. The length of any greedy quantifier -modified subexpression
|> shall be such that they match the longest substring of the
|> subject string, in descending priority from left to right.
|
|> 2025年3月12日 00:51,Geoff Clare via austin-group-l at The Open Group \
|> <[email protected]> 写道:
|>
|> Niu Danny wrote, on 07 Mar 2025:
|>>
|>>> 2025年3月6日 23:06,Geoff Clare via austin-group-l at The Open \
|>>> Group <[email protected]> 写道:
|>>>>
|>>>>
|>>>>
|>>>> e.g. `([0-9]+)+?`
|>>>
|>>> This is a pathological case because you are simultaneously asking for
|>>> both the longest and shortest match for the SAME part of the string.
|>>> Such cases ought not to occur in real-world use.
|>>>
|>>> What I meant, when I said it is not recursive, is something like:
|>>>
|>>> ([0-9]+[a-z]*)+?
|>>>
|>>> where the inner + and * are individually greedy; they don't inherit
|>>> the outer repetition's non-greediness.
|>>
|>> But greedy quantifiers are still nested in a non-greedy one.
|>> Would you say that this is also pathological, or do you
|>> have something else on your mind?
|>
|> Here's a better example. This uses the code from bug note 7094
|> modified to take the ERE and string as arguments:
|>
|> $ ./a.out 'X(([0-9a-zA-Z]+)([a-zA-Z]*))+?Y' 'X000aaaYbbbY'
|> regcomp returned: 0
|> regexec returned: 0
|> 0 8
|> 1 7
|> 1 7
|> 7 7
|> -1 -1
|>
|> Because [0-9a-zA-Z]+ is greedy it matched 000aaa and [a-zA-Z]* was
|> left matching the empty string. If the ? modifier was recursive then
|> [0-9a-zA-Z]+ would have matched just the first 0 and [a-zA-Z]* would
|> have matched 00aaa.
|>
|> --
|> Geoff Clare <[email protected]>
|> The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
--End of <[email protected]>
--steffen
|
|Der Kragenbaer, The moon bear,
|der holt sich munter he cheerfully and one by one
|einen nach dem anderen runter wa.ks himself off
|(By Robert Gernhardt)