Re: Need regex in the middle wildcard help

Richard Hainsworth Mon, 19 Jun 2023 07:39:44 -0700

HI Todd,

Some more clean up:


On 19/06/2023 12:41, ToddAndMargo via perl6-users wrote:
<snip>

This is my test program:

<RegexTest.pl6>
#!/bin/raku

print "\n";
my Str $x = Q[<ahref="wike-2.0.1-1.fc38.noarch.rpm">wike-2.0.1-1.fc38.noarch.rpm</a>27-Apr-2023 01:53 143K] ~ Q[<ahref="wine-8.6-1.fc38.i686.rpm">wine-8.6-1.fc38.i686.rpm</a>19-Apr-2023 21:48 11K] ~ Q[<ahref="wine-8.6-1.fc38.x86_64.rpm">wine-8.6-1.fc38.x86_64.rpm</a> 19-Apr-2023 21:48 11K] ~ Q[<ahref="wine-alsa-8.6-1.fc38.i686.rpm">wine-alsa-8.6-1.fc38.i686.rpm</a> 19-Apr-2023 21:48 223K];
$x~~m:i/ .*? ("wine") (.*?) $(Q[">] ) .*? $( Q[a href="] ) (.*?) ( $(Q[">] ) ) /;
print "0 = <$0>\n1 = <$1>\n2 = <$2>\n\n";

my Str $y = $0 ~ $1 ~ " " ~ $2;
print "$y\n\n";
</RegexTest.pl6>


$ RegexTest.pl6

0 = <wine>
1 = <-8.6-1.fc38.i686.rpm>
2 = <wine-8.6-1.fc38.x86_64.rpm>

wine-8.6-1.fc38.i686.rpm wine-8.6-1.fc38.x86_64.rpm

<snip>

After Joseph's help:
      $SysRev  = $WebPage;
$SysRev~~m:i/ .*? ("wine") (.*?) $(Q[">] ) .*? $( Q[ahref="] ) (.*?) ( $(Q[">] ) ) /;
      $SysRev = $0 ~ $1 ~ "   " ~ $2;

maybe the following would be a bit more Raku-ish

[in file called todd-test.raku]

$=finish ~~ /:i ['href="' ~ \" $<ww> = ( 'wine-' \d .+? ) .*? ]+ $ /;
say $/<ww>.join(' ');

=finish <a href="wike-2.0.1-1.fc38.noarch.rpm">wike-2.0.1-1.fc38.noarch.rpm</a> 
27-Apr-2023 01:53  143K
<a href="wine-8.6-1.fc38.i686.rpm">wine-8.6-1.fc38.i686.rpm</a> 19-Apr-2023 
21:48  11K
<a href="wine-8.6-1.fc38.x86_64.rpm">wine-8.6-1.fc38.x86_64.rpm</a>             
    19-Apr-2023 21:48     11K
<a href="wine-alsa-8.6-1.fc38.i686.rpm">wine-alsa-8.6-1.fc38.i686.rpm</a>  
19-Apr-2023 21:48  223K

[end of todd-test.raku]
Test it in a terminal:

$ raku todd-test.raku
wine-8.6-1.fc38.i686.rpm wine-8.6-1.fc38.x86_64.rpm

Some comments.
1) `=finish` is an undocumented part of the POD6 specification (I only 
discovered it recently). It will be documented soon.
Anything after `=finish` is put in string that can be pulled into a Raku 
program with `$=finish` (also undocumented)
`=finish` was introduced instead of Perl's `__DATA__`.
It is useful, because if you have alot of text to be experimented on, just 
attach the text to the bottom of the program after a =finish
2) `~~` does not need a `m` (you only need 'm' if you want to associated a 
regex with the topic, eg. $_)
3) the /  'begin' ~ 'end' 'regex'  / syntax means match the regex between 
'begin' and 'end'.
4) The final output has a 'wine' in it, so why search for it separately? Just 
include it in the search.
5) You seem to be looking for a 'wine-' followed by a digit, so as to eliminate 
the 'wine-alsa-' line, so look for that
6) '$<ww>=' places the match into $/<ww> of the whole match. Multiple matches 
create an array.
7) `$/<ww>.join` takes an array and joins it with a separator.
8) In the original code, all the $() and Q[] add noise without any 
disambiguation.

But then we want to pull out interesting bits and we are not interested in the 
rest. So `comb` is better.

[start of test-2.raku]

$=finish.comb(/ <?after \">'wine-' \d .+? <?before \"> /).join(' ').say;

=finish <a href="wike-2.0.1-1.fc38.noarch.rpm">wike-2.0.1-1.fc38.noarch.rpm</a> 
27-Apr-2023 01:53  143K
<a href="wine-8.6-1.fc38.i686.rpm">wine-8.6-1.fc38.i686.rpm</a> 19-Apr-2023 
21:48  11K
<a href="wine-8.6-1.fc38.x86_64.rpm">wine-8.6-1.fc38.x86_64.rpm</a>             
    19-Apr-2023 21:48     11K
<a href="wine-alsa-8.6-1.fc38.i686.rpm">wine-alsa-8.6-1.fc38.i686.rpm</a>  
19-Apr-2023 21:48  223K

[end of test-2.raku]


Same output.

Notes:
1) comb looks for all matches in a string, so no need for the repeat and end of 
line in the regex
2) We are looking for something 'after' a ｢"｣ and 'before' a second ｢"｣, and so we can use 
the <?after regex> and <?before regex> zero-width matchers.

Richard, aka finanalyst

Re: Need regex in the middle wildcard help

Reply via email to