Re: Regexps using 'after' and 'before' like ^ and $

Brad Gilbert Tue, 26 May 2020 16:24:24 -0700

I'm not sure that is the best way to look at ｢<before>｣ and ｢<after>｣.

    > 'abcd123abcd' ~~ / <?before <digit>> .+ <?after <digit>> /
    ｢123｣

In the above code ｢<?before <digit>>｣ makes sure that the first thing that
｢.+｣ matches is a ｢<digit>｣
And ｢<?after <digit>>｣ makes sure that the last thing ｢.+｣ matches is also
a ｢<digit>｣

The ｢<?before <digit>>｣ is written in front of the ｢.+｣ so it starts at
that position

It does the thing that ｢<digit>｣ would normally do.

    ' a b c d 1 2 3 a b c d '
    ' _ _ _ _^1^_ _ _ _ _ _ '

The thing is, ｢<before>｣ resets the position to what it was immediately
before the successful ｢<digit>｣ match.

    ' a b c d 1 2 3 a b c d '
    ' _ _ _ _^_ _ _ _ _ _ _ '

The ｢.+｣ then tries to grab everything

    ' a b c d 1 2 3 a b c d '
    ' _ _ _ _^1 2 3 a b c d^'

Then  ｢<?after <digit>>｣ gets to tell it that it can't do that.

The reason is that ｢<after>｣ looks backwards from the current position. The
current position is at the very end.
It obviously isn't a ｢<digit>｣, so ｢.+｣ has to keep giving up characters
until its last value is a ｢<digit>｣.

    ' a b c d 1 2 3 a b c d '
    ' _ _ _ _^1 2 3^_ _ _ _ '

---

You can use ｢<after>｣ to check that is at the beginning.

     'abc' ~~ / <!after .> b /
     Nil

The reason is that if the current position is anywhere other than the
beginning ｢.｣ would match.
Since we used ｢!｣ that won't fly.

｢<!before .>｣ should probably also prevent the position from being at the
end.

It does work if you write it differently

    'abc' ~~ / b <!before( /./ )> /
    Nil

Note that ｢<before>｣ and ｢<after>｣ are really just function calls.

It does seem like there could be a bug here.

---

All of that said, I don't think it is useful to tell new Raku programmers
that you can use those features that way.

It make them think that these two regexes are doing something similar.

    / ^ ... /
    / <!after .> ... /

They match the same three characters, but for entirely different reasons.

The ｢^｣ version is basically the same as:

    / <?{ $/.pos == 0 }> ... /

While the other one is something like:

    / <!{ try $/.orig.substr( $/.pos - 1, 1 ) ~~ /./ }> ... /

(The ｢try｣ is needed because ｢.substr( -1 )｣ is a Failure.)

So then these:

    / ... $ /
    / ... <!before .>

Would be

    / ... <?{ $/.pos == $/.orig.chars }> /
    / ... <!{ try $/.orig.substr( $/.pos, 1 ) ~~ /./ }> /

---

What I think is happening is that the ｢<!after .>｣ works because the
｢.substr( -1, 1)｣ creates a Failure.

The thing is that ｢'abc'.substr( 3, 1 )｣ doesn't create a Failure, it just
gives you an empty Str.

(The second argument is the maximum number of characters to return.)

On Mon, May 25, 2020 at 4:10 PM Joseph Brenner <[email protected]> wrote:

> Given this string:
>    my $str = "Romp romp ROMP";
>
> We can match just the first or last by using the usual pinning
> features, '^' or '$':
>
>    say $str ~~ m:i:g/^romp/;               ## (｢Romp｣)
>    say $str ~~ m:i:g/romp$/;               ## (｢ROMP｣)
>
> Moritz Lenz (Section 3.8 of 'Parsing', p32) makes the point you
> can use 'after' to do something like '^' pinning:
>
>    say $str ~~ m:i:g/ <!after .> romp /;   ## (｢Romp｣)
>
> That makes sense:  the BOL is "not after any character"
> So: I wondered if there was a way to use 'before' to do
> something like '$' pinning:
>
>   say $str ~~ m:i:g/ romp <!before .> /;  ## (｢Romp｣ ｢romp｣)
>
> That was unexpected: it filters out the one I was trying to
> match for, though the logic seemed reasonable: the EOL is "not
> before any character".
>
> What if we flip this and do a positive before match?
>
>   say $str ~~ m:i:g/ romp <?before .> /;  ## (｢Romp｣ ｢romp｣)
>
> That does exactly the same thing, but here the logic makes
> sense to me: the first two are "before some character",
> but the last one isn't.
>

Re: Regexps using 'after' and 'before' like ^ and $

Reply via email to