Re: Regexps using 'after' and 'before' like ^ and $

Joseph Brenner Tue, 26 May 2020 17:12:25 -0700

Hey Brad, thanks much for the explication:

> ｢<!before .>｣ should probably also prevent the position from being at the end.


> It does work if you write it differently

>     'abc' ~~ / b <!before( /./ )> /
>     Nil

That's pretty interesting, though I can't say I understand at all
what's going on there.

> It does seem like there could be a bug here.

That was my suspicion.  I'll probably open an issue on it soon.

> All of that said, I don't think it is useful to tell new Raku programmers 
> that you can use those features that way.

Yes, certainly not.  Just to be clear, I'm just messing around
with after/before to get a better sense of what they do.

I tried to avoid saying the two forms are equivalent, they just
do roughly similar things.



On 5/26/20, Brad Gilbert <b2gi...@gmail.com> wrote:
> I'm not sure that is the best way to look at ｢<before>｣ and ｢<after>｣.
>
>     > 'abcd123abcd' ~~ / <?before <digit>> .+ <?after <digit>> /
>     ｢123｣
>
> In the above code ｢<?before <digit>>｣ makes sure that the first thing that
> ｢.+｣ matches is a ｢<digit>｣
> And ｢<?after <digit>>｣ makes sure that the last thing ｢.+｣ matches is also
> a ｢<digit>｣
>
> The ｢<?before <digit>>｣ is written in front of the ｢.+｣ so it starts at
> that position
>
> It does the thing that ｢<digit>｣ would normally do.
>
>     ' a b c d 1 2 3 a b c d '
>     ' _ _ _ _^1^_ _ _ _ _ _ '
>
> The thing is, ｢<before>｣ resets the position to what it was immediately
> before the successful ｢<digit>｣ match.
>
>     ' a b c d 1 2 3 a b c d '
>     ' _ _ _ _^_ _ _ _ _ _ _ '
>
> The ｢.+｣ then tries to grab everything
>
>     ' a b c d 1 2 3 a b c d '
>     ' _ _ _ _^1 2 3 a b c d^'
>
> Then  ｢<?after <digit>>｣ gets to tell it that it can't do that.
>
> The reason is that ｢<after>｣ looks backwards from the current position. The
> current position is at the very end.
> It obviously isn't a ｢<digit>｣, so ｢.+｣ has to keep giving up characters
> until its last value is a ｢<digit>｣.
>
>     ' a b c d 1 2 3 a b c d '
>     ' _ _ _ _^1 2 3^_ _ _ _ '
>
> ---
>
> You can use ｢<after>｣ to check that is at the beginning.
>
>      'abc' ~~ / <!after .> b /
>      Nil
>
> The reason is that if the current position is anywhere other than the
> beginning ｢.｣ would match.
> Since we used ｢!｣ that won't fly.
>
> ｢<!before .>｣ should probably also prevent the position from being at the
> end.
>
> It does work if you write it differently
>
>     'abc' ~~ / b <!before( /./ )> /
>     Nil
>
> Note that ｢<before>｣ and ｢<after>｣ are really just function calls.
>
> It does seem like there could be a bug here.
>
> ---
>
> All of that said, I don't think it is useful to tell new Raku programmers
> that you can use those features that way.
>
> It make them think that these two regexes are doing something similar.
>
>     / ^ ... /
>     / <!after .> ... /
>
> They match the same three characters, but for entirely different reasons.
>
> The ｢^｣ version is basically the same as:
>
>     / <?{ $/.pos == 0 }> ... /
>
> While the other one is something like:
>
>     / <!{ try $/.orig.substr( $/.pos - 1, 1 ) ~~ /./ }> ... /
>
> (The ｢try｣ is needed because ｢.substr( -1 )｣ is a Failure.)
>
> So then these:
>
>     / ... $ /
>     / ... <!before .>
>
> Would be
>
>     / ... <?{ $/.pos == $/.orig.chars }> /
>     / ... <!{ try $/.orig.substr( $/.pos, 1 ) ~~ /./ }> /
>
> ---
>
> What I think is happening is that the ｢<!after .>｣ works because the
> ｢.substr( -1, 1)｣ creates a Failure.
>
> The thing is that ｢'abc'.substr( 3, 1 )｣ doesn't create a Failure, it just
> gives you an empty Str.
>
> (The second argument is the maximum number of characters to return.)
>
> On Mon, May 25, 2020 at 4:10 PM Joseph Brenner <doom...@gmail.com> wrote:
>
>> Given this string:
>>    my $str = "Romp romp ROMP";
>>
>> We can match just the first or last by using the usual pinning
>> features, '^' or '$':
>>
>>    say $str ~~ m:i:g/^romp/;               ## (｢Romp｣)
>>    say $str ~~ m:i:g/romp$/;               ## (｢ROMP｣)
>>
>> Moritz Lenz (Section 3.8 of 'Parsing', p32) makes the point you
>> can use 'after' to do something like '^' pinning:
>>
>>    say $str ~~ m:i:g/ <!after .> romp /;   ## (｢Romp｣)
>>
>> That makes sense:  the BOL is "not after any character"
>> So: I wondered if there was a way to use 'before' to do
>> something like '$' pinning:
>>
>>   say $str ~~ m:i:g/ romp <!before .> /;  ## (｢Romp｣ ｢romp｣)
>>
>> That was unexpected: it filters out the one I was trying to
>> match for, though the logic seemed reasonable: the EOL is "not
>> before any character".
>>
>> What if we flip this and do a positive before match?
>>
>>   say $str ~~ m:i:g/ romp <?before .> /;  ## (｢Romp｣ ｢romp｣)
>>
>> That does exactly the same thing, but here the logic makes
>> sense to me: the first two are "before some character",
>> but the last one isn't.
>>
>

Re: Regexps using 'after' and 'before' like ^ and $

Reply via email to