Re: Working with a regex using positional captures stored in a variable

Ralph Mellor Wed, 17 Mar 2021 15:33:02 -0700

Works for me in Rakudo 2020.12.


On Wed, Mar 17, 2021 at 9:33 PM yary <not....@gmail.com> wrote:
>
> The "Interpolation" section of the raku docs use strings as the elements of 
> building up a larger regex from smaller pieces, but the example that looks 
> fruitful isn't working in my raku. This is taken from 
> https://docs.raku.org/language/regexes#Regex_interpolation
>
> > my $string   = 'Is this a regex or a string: 123\w+False$pattern1 ?';
>
> Is this a regex or a string: 123\w+False$pattern1 ?
>
> > my $regex    = /\w+/;
>
> /\w+/
>
> > say $string.match: / $regex /;
>
> Regex object coerced to string (please use .gist or .raku to do that)
>
>  ... and more error lines, and no result when the docs show matching '123':
>
> ｢｣
>
>
> $ raku -v
>
> Welcome to 𝐑𝐚𝐤𝐮𝐝𝐨™ v2020.10.
>
> Implementing the 𝐑𝐚𝐤𝐮™ programming language v6.d.
>
> Built on MoarVM version 2020.10.
>
>
>
> -y
>
>
> On Wed, Mar 17, 2021 at 3:17 PM William Michels via perl6-users 
> <perl6-us...@perl.org> wrote:
>>
>> Dear Brad,
>>
>> 1. The list you posted is fantastic ("If the first character inside is 
>> anything other than an alpha it doesn't capture"). It should be added to the 
>> Raku Docs ASAP.
>>
>> 2. There are some shortcuts that don't seem to follow a set pattern. For 
>> example a named capture can be accessed using $<myname> instead of 
>> $/<myname> ; the "/' can be elided. Do you have a method you can share for 
>> remembering these sorts of shortcuts? Or are they disfavored?
>>
>> > say ~$<myname> if 'abc' ~~ / $<myname> = [ \w+ ] /;
>> abc
>> >
>> [ Above from the example at https://docs.raku.org/syntax/Named%20captures ].
>>
>> 3. Finally, I've never seen in the Perl6/Raku literature the motto you cite: 
>> "One of the mottos of Raku, is that it is ok to confuse a new programmer, it 
>> is not ok to confuse an expert." Do you have a citation?
>>
>> [ The motto I prefer is from Larry Wall: "...easy things should stay easy, 
>> hard things should get easier, and impossible things should get hard... ." 
>> Citation: https://www.perl.com/pub/2000/10/23/soto2000.html/ ].
>>
>> Best Regards,
>>
>> Bill.
>>
>>
>>
>> On Sat, Mar 13, 2021 at 4:47 PM Brad Gilbert <b2gi...@gmail.com> wrote:
>>>
>>> It makes <…> more consistent precisely because <$pattern> doesn't capture.
>>>
>>> If the first character inside is anything other than an alpha it doesn't 
>>> capture.
>>> Which is a very simple description of when it captures.
>>>
>>>     <?before …> doesn't capture because of the ｢?｣
>>>     <!before …> doesn't capture because of the ｢!｣
>>>     <.ws> doesn't capture because of the ｢.｣
>>>     <&ws> doesn't capture because of the ｢&｣
>>>     <$pattern> doesn't capture because of the ｢$｣
>>>     <$0> doesn't capture because of the ｢$｣
>>>     <@a> doesn't capture because of the ｢@｣
>>>     <[…]> doesn't capture because of the ｢[｣
>>>     <-[…]> doesn't capture because of the ｢-]
>>>     <:Ll> doesn't capture because of the ｢:｣
>>>
>>> For most of those, you don't actually want it to capture.
>>> With ｢.｣ the whole point is that it doesn't capture.
>>>
>>>     <digit> does capture because it starts with an alpha
>>>     <pattern=$pattern> does capture because it starts with an alpha
>>>
>>>     $0 = <$pattern> doesn't capture to $<pattern>, but does capture to $0
>>>     $<pattern> = <$pattern> captures because of $<pattern> =
>>>
>>> It would be a mistake to just make <$pattern> capture.
>>> Consistency is perhaps Raku's most important feature.
>>>
>>> One of the mottos of Raku, is that it is ok to confuse a new programmer, it 
>>> is not ok to confuse an expert.
>>> An expert in Raku understands the deep fundamental ways that Raku is 
>>> consistent.
>>> So breaking consistency should be very carefully considered.
>>>
>>> In this case, there is very little benefit.
>>> Even worse, you then have to come up with some new syntax to prevent it 
>>> from capturing when you don't want it to.
>>> That new syntax wouldn't be as guessible as it currently is. Which again 
>>> would confuse experts.
>>>
>>> If anyone seriously suggests such a change, I will vehemently fight to 
>>> prevent it from happening.
>>>
>>> I would be more likely to accept <=$pattern> being added as a synonym to 
>>> <pattern=$pattern>.
>>>
>>> On Sat, Mar 13, 2021 at 3:30 PM Joseph Brenner <doom...@gmail.com> wrote:
>>>>
>>>> Thanks much for your answer on this.  I think this is the sort of
>>>> trick I was looking for:
>>>>
>>>> Brad Gilbert<b2gi...@gmail.com> wrote:
>>>>
>>>> > You can put it back in as a named
>>>>
>>>> >     > $input ~~ / <pattern=$pattern>
>>>> >     ｢9 million｣
>>>> >      pattern => ｢9 million｣
>>>> >       0 => ｢9｣
>>>> >       1 => ｢million｣
>>>>
>>>> That's good enough, I guess, though you need to know about the
>>>> issue... is there some reason it shouldn't happen automatically,
>>>> using the variable name to label the captures?
>>>>
>>>> I don't think this particular gotcha is all that well
>>>> documented, though I guess there's a reference to this being a
>>>> "known trap" in the documentation under "Regex interpolation"--
>>>> but that's the sort of remark that makes sense only after you know
>>>> what its talking about.
>>>>
>>>> I have to say, my first reaction was something like "if they
>>>> couldn't get this working right, why did they put it in?"
>>>>
>>>>
>>>> On 3/11/21, Brad Gilbert <b2gi...@gmail.com> wrote:
>>>> > If you interpolate a regex, it is a sub regex.
>>>> >
>>>> > If you have something like a sigil, then the match data structure gets
>>>> > thrown away.
>>>> >
>>>> > You can put it back in as a named
>>>> >
>>>> >     > $input ~~ / <pattern=$pattern>
>>>> >     ｢9 million｣
>>>> >      pattern => ｢9 million｣
>>>> >       0 => ｢9｣
>>>> >       1 => ｢million｣
>>>> >
>>>> > Or as a numbered:
>>>> >
>>>> >     > $input ~~ / $0 = <$pattern>
>>>> >     ｢9 million｣
>>>> >      0 => ｢9 million｣
>>>> >       0 => ｢9｣
>>>> >       1 => ｢million｣
>>>> >
>>>> > Or put it in as a lexical regex
>>>> >
>>>> >     > my regex pattern { (\d+) \s+ (\w+) }
>>>> >     > $input ~~ / <pattern>  /
>>>> >     ｢9 million｣
>>>> >      pattern => ｢9 million｣
>>>> >       0 => ｢9｣
>>>> >       1 => ｢million｣
>>>> >
>>>> > Or just use it as the whole regex
>>>> >
>>>> >     > $input ~~ $pattern # variable
>>>> >     ｢9 million｣
>>>> >      0 => ｢9｣
>>>> >      1 => ｢million｣
>>>> >
>>>> >     > $input ~~ &pattern # my regex pattern /…/
>>>> >     ｢9 million｣
>>>> >      0 => ｢9｣
>>>> >      1 => ｢million｣
>>>> >
>>>> > On Thu, Mar 11, 2021 at 2:29 AM Joseph Brenner <doom...@gmail.com> wrote:
>>>> >
>>>> >> Does this behavior make sense to anyone?  When you've got a regex
>>>> >> with captures in it, the captures don't work if the regex is
>>>> >> stashed in a variable and then interpolated into a regex.
>>>> >>
>>>> >> Do capture groups need to be defined at the top level where the
>>>> >> regex is used?
>>>> >>
>>>> >> { #  From a code example in the "Parsing" book by Moritz Lenz, p. 48,
>>>> >> section 5.2
>>>> >>    my $input = 'There are 9 million bicycles in beijing.';
>>>> >>    if $input ~~ / (\d+) \s+ (\w+) / {
>>>> >>        say $0.^name;  # Match
>>>> >>        say $0;        # ｢9｣
>>>> >>        say $1.^name;  # Match
>>>> >>        say $1;        # ｢million｣
>>>> >>        say $/;
>>>> >>         # ｢9 million｣
>>>> >>         #  0 => ｢9｣
>>>> >>         #  1 => ｢million｣
>>>> >>    }
>>>> >> }
>>>> >>
>>>> >> say '---';
>>>> >>
>>>> >> { # Moving the pattern to var which we interpolate into match
>>>> >>    my $input = 'There are 9 million bicycles in beijing.';
>>>> >>    my $pattern = rx{ (\d+) \s+ (\w+) };
>>>> >>    if $input ~~ / <$pattern> / {
>>>> >>        say $0.^name;  # Nil
>>>> >>        say $0;        # Nil
>>>> >>        say $1.^name;  # Nil
>>>> >>        say $1;        # Nil
>>>> >>        say $/;        # ｢9 million｣
>>>> >>    }
>>>> >> }
>>>> >>
>>>> >> In the second case, the match clearly works, but it behaves as
>>>> >> though the capture groups aren't there.
>>>> >>
>>>> >>
>>>> >>    raku --version
>>>> >>
>>>> >>    Welcome to 𝐑𝐚𝐤𝐮𝐝𝐨™ v2020.10.
>>>> >>    Implementing the 𝐑𝐚𝐤𝐮™ programming language v6.d.
>>>> >>
>>>> >

Re: Working with a regex using positional captures stored in a variable

Reply via email to