Geoff Clare wrote in
 <ZvPGnLMvGkqL0a3U@localhost>:
 |Steffen Nurpmeso wrote, on 24 Sep 2024:
 |> Austin Group Bug Tracker via austin-group-l at The Open Group wrote in
 |>  <ea7880ce68ba63cac427845dc4029...@austingroupbugs.net>:
 |>  ...
 |>|https://austingroupbugs.net/view.php?id=1857 
 |>  ...
 |>| (0006881) geoffclare (manager) - 2024-09-24 10:46
 |>| https://austingroupbugs.net/view.php?id=1857#c6881 
 |>  ...
 |> I have not yet read this completely, but from a glance i see
 |> 
 |>   the ERE "(aaa??)*" matches only the first four characters of the
 |>   string "aaaaa", not all five, because in order to match all
 |>   five, "a??" would match with length one instead of zero
 |> 
 |> and that felt not right:
 |> 
 |>   echo 'aaaaa' |
 |>     perl -e '$i=<STDIN>;if($i =~ "(aaa??)*"){print "i<$i>; 1<$1> \
 |>     2<$2> 3<$3>\n"}else{print "no match\n"}'
 |>   i<aaaaa
 |>>; 1<aa> 2<> 3<>
 |> 
 |> It matches only two.
 |
 |Looks like a bug in perl.  Each repetition of "aaa??" matches "aa"
 |because the "??" is non-greedy, but the "*" is greedy so should
 |match the longest string of repeated "aa" as possible.
 |
 |With regcomp() on macOS it matches "aaaa", as expected.  (That example
 |is straight from the macOS re_format(7) man page, but I also tested it
 |to make sure.)

So ok then i really had to look and all i can say is "yes!" there
are plenty of bugs everywhere, as can be verified with the
following code snippet

  /* gcc -W -Wall -o p-tre preu.c -DXPCRE2 -ltre */
  #ifdef XTRE
  # define X(Y) tre_ ## Y
  # include <tre.h>
  /* gcc -W -Wall -o p-pcre2 preu.c -DXPCRE2 -lpcre2-posix */
  #elif defined XPCRE2
  # define X(Y) pcre2_ ## Y
  # include <pcre2posix.h>
  /* gcc -W -Wall -o p-c preu.c */
  #else
  # define X(Y) Y
  # include <regex.h>
  #endif
  #include <stdio.h>
  int main(int argc, char **argv){
        regmatch_t remt[8];
        regex_t ret;
        if(argc != 3)
                return 64;
        if(X(regcomp)(&ret, argv[1], REG_EXTENDED))
                return 65;
        if(X(regexec)(&ret, argv[2], sizeof(remt)/sizeof(remt[0]), &remt[0], 0))
                printf("no match\n");
        else{
                size_t i;
                for(i = 0; i <= ret.re_nsub; ++i){
                        printf("%zu: %ld/%ld\n", i, (long)remt[i].rm_so, 
(long)remt[i].rm_eo);
                        if(remt[i].rm_so != -1)
                                printf("\t<%.*s>\n",
                                        (int)(remt[i].rm_eo - remt[i].rm_so),
                                        &argv[2][(unsigned long)remt[i].rm_so]);
                }
        }
        X(regfree)(&ret);
        return 0;
  }

Doing that reveals that the tre library (as of HEAD of [1] is
*completely* broken in (at least) respect to _UNGREEDY matching,
and that libpcre2 10.44 as of [2] comes over like so:

  $ ./p-pcre2 '(aaa??)*' 'aaaaac'
  0: 0/4
          <aaaa>
  1: 2/4
          <aa>

which exactly mirrors the perl(1) outcome, and please look at the
indices, too.  Your usage of the asterisk ("star") outside of the
parenthesis does not actually multiplicate the content of the
match group, as Harald van Dijk has stated in another message
i have already seen.

  $ ./p-pcre2 '(aaa??)*' 'aaaaaac'
  0: 0/6
          <aaaaaa>
  1: 4/6
          <aa>
  $ ./p-tre '(aaa??)*' 'aaaaaac'
  MINIINININI  0 mini=0
  MINIINININI 1 mini=1 rest:?
  MINIINININI  0 mini=0
  HAHAHAH
  0: 0/0
          <>
  1: -1/-1

^ Totally borked somewhere below tre_ast_new_iter(), i have not
looked further.  But it seems Dag-Erling Smørgrav of FreeBSD has
actually started to having a look into tre, just a couple of
months ago!!  .. And that two reported UNGREEDY issues already
have been marked by him (after a decade of existence) as bugs,
back in July.  I want to reiterate that i opened the POSIX issue
in 2013, by then the sun must have been shining.  Anyhow, i think
Dag-Erling is also listening here.

  $ ./p-c '(aaa??)*' 'aaaaaac'

^ Shouldn't this report an error?

  0: 0/6
          <aaaaaa>
  1: 3/6
          <aaa>

  $ ./p-c '(aaa?)*' 'aaaaaac'
  0: 0/6
          <aaaaaa>
  1: 3/6
          <aaa>
  $ ./p-tre '(aaa?)*' 'aaaaaac'
  MINIINININI  0 mini=0
  MINIINININI  0 mini=0
  HAHAHAH
  0: 0/6
          <aaaaaa>
  1: 3/6
          <aaa>

^ Works without mini(mal)==UNGREEDY.  (Only fewest tests tre has.)

  $ ./p-pcre2 '(aaa?)*' 'aaaaaac'
  0: 0/6
          <aaaaaa>
  1: 3/6
          <aaa>

I am stunned, but not surprised, actually.  Ha-ha.
Anyhow, Apple is of course wrong when they do it like that, and
perl and libpcre2 are right.

Regarding all the other stuff, *my* opinion is that *if* i as
a user explicitly attach ? as an ungreedy/minimalizing modifier to
a regular expression, then i want it to be honoured.  The same if
a set REG_MINIMAL (REG_UNGREEDY) and suffix ? for the opposite.
And if that counteracts some other thing, then because of the
nature of regular expression the explicit "rule-changing" modifier
has to have preference over the default, because there is no other
way to adjust default behaviour otherwise.

  [1] https://github.com/laurikari/tre.git
  [2] https://www.pcre.org

 |Geoff Clare <g.cl...@opengroup.org>
 |The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
 --End of <ZvPGnLMvGkqL0a3U@localhost>

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

        • ... Niu Danny via austin-group-l at The Open Group
          • ... Geoff Clare via austin-group-l at The Open Group
  • [1003.1(2024... Austin Group Bug Tracker via austin-group-l at The Open Group
    • Re: [10... Steffen Nurpmeso via austin-group-l at The Open Group
      • Re:... Geoff Clare via austin-group-l at The Open Group
        • ... Harald van Dijk via austin-group-l at The Open Group
        • ... Garrett Wollman via austin-group-l at The Open Group
          • ... Steffen Nurpmeso via austin-group-l at The Open Group
          • ... Geoff Clare via austin-group-l at The Open Group
            • ... Geoff Clare via austin-group-l at The Open Group
        • ... Steffen Nurpmeso via austin-group-l at The Open Group
  • [1003.1(2024... Austin Group Bug Tracker via austin-group-l at The Open Group
  • [1003.1(2024... Austin Group Bug Tracker via austin-group-l at The Open Group
  • [1003.1(2024... Austin Group Bug Tracker via austin-group-l at The Open Group
  • [1003.1(2024... Austin Group Bug Tracker via austin-group-l at The Open Group
    • Re: [10... Niu Danny via austin-group-l at The Open Group
  • [1003.1(2024... Austin Group Bug Tracker via austin-group-l at The Open Group
  • [1003.1(2024... Austin Group Bug Tracker via austin-group-l at The Open Group
  • [1003.1(2024... Austin Group Bug Tracker via austin-group-l at The Open Group
  • [1003.1(2024... Austin Group Bug Tracker via austin-group-l at The Open Group
  • [1003.1(2024... Austin Group Bug Tracker via austin-group-l at The Open Group

Reply via email to