Hi Marc (and Bruce)!
I'm adapting a "word frequency" answer posted by Sean McAfee on this list.
The key seems to be adding the `:exhaustive` adverb to the `match` call.
AFAIK comb will not accept this adverb, so `match will have to do for now:
Sample Input (including quotes): “A horse, a horse, my kingdom for a
horse!”
~$ raku -e '++(my %digraphs){$_} for slurp.lc.match(:global, :exhaustive,
/<[a..z]>**2/); .say for %digraphs.sort(-*.value);'
Sample Output:
or 1 => 4
se 1 => 3
rs 1 => 3
ho 1 => 3
in 1 => 1
my 1 => 1
om 1 => 1
ki 1 => 1
ng 1 => 1
do 1 => 1
gd 1 => 1
fo 1 => 1
HTH, Bill.
On Sat, Aug 27, 2022 at 10:25 AM Bruce Gray <[email protected]> wrote:
>
>
> > On Aug 27, 2022, at 10:56 AM, Marc Chantreux <[email protected]> wrote:
>
> --snip--
>
> > but I think it is possible to move the cursor backward in the comb regex.
>
> --snip--
>
> I do *not* think you can ("move the cursor backward in the comb regex");
> See https://docs.raku.org/routine/comb :
> ... "returns a Seq of non-overlapping matches" ...
> The "non-overlapping" nature is the problem.
> (Please let me know if this turns out to be incorrect!)
>
> In foresight, Raku has added an optional `:exhaustive` flag to regex
> matching, and that will do what you want.
> This Raku code:
>
> my %digraphs = slurp.lc.match(:exhaustive, /(<[a..z]> **
> 2)/)».Str.Bag;
> .say for %digraphs.sort({ -.value, ~.key });
>
> , produces output identical to this Perl code:
>
> perl -lnE '
> END { say "$_ => $digraph{$_}" for
> sort { $digraph{$b} <=> $digraph{$a} || $a cmp $b }
> keys %digraph
> }
> $_=lc; while (/([a-z]{2})/g) {++$digraph{$1}; --pos; }
> ' Camelia.svg
>
> , when run against a downloaded copy of our mascot:
> https://upload.wikimedia.org/wikipedia/commons/8/85/Camelia.svg
>
> --
> Hope this helps,
> Bruce Gray (Util of PerlMonks)
>
>