Heads-up: code was correct in my last post, but the output is as follows
(Rakudo v2021.06):
~$ raku -e '++(my %digraphs){$_} for slurp.lc.match(:global, :exhaustive,
/<[a..z]>**2/); .say for %digraphs.sort(-*.value);' richard3.txt
or => 4
rs => 3
ho => 3
se => 3
gd => 1
in => 1
fo => 1
om => 1
do => 1
ng => 1
ki => 1
my => 1
On Sat, Aug 27, 2022 at 10:45 AM William Michels <[email protected]>
wrote:
> Hi Marc (and Bruce)!
>
> I'm adapting a "word frequency" answer posted by Sean McAfee on this list.
> The key seems to be adding the `:exhaustive` adverb to the `match` call.
> AFAIK comb will not accept this adverb, so `match will have to do for now:
>
> Sample Input (including quotes): “A horse, a horse, my kingdom for a
> horse!”
>
> ~$ raku -e '++(my %digraphs){$_} for slurp.lc.match(:global, :exhaustive,
> /<[a..z]>**2/); .say for %digraphs.sort(-*.value);'
>
> Sample Output:
>
> or 1 => 4
> se 1 => 3
> rs 1 => 3
> ho 1 => 3
> in 1 => 1
> my 1 => 1
> om 1 => 1
> ki 1 => 1
> ng 1 => 1
> do 1 => 1
> gd 1 => 1
> fo 1 => 1
>
> HTH, Bill.
>
>
> On Sat, Aug 27, 2022 at 10:25 AM Bruce Gray <[email protected]> wrote:
>
>>
>>
>> > On Aug 27, 2022, at 10:56 AM, Marc Chantreux <[email protected]> wrote:
>>
>> --snip--
>>
>> > but I think it is possible to move the cursor backward in the comb
>> regex.
>>
>> --snip--
>>
>> I do *not* think you can ("move the cursor backward in the comb regex");
>> See https://docs.raku.org/routine/comb :
>> ... "returns a Seq of non-overlapping matches" ...
>> The "non-overlapping" nature is the problem.
>> (Please let me know if this turns out to be incorrect!)
>>
>> In foresight, Raku has added an optional `:exhaustive` flag to regex
>> matching, and that will do what you want.
>> This Raku code:
>>
>> my %digraphs = slurp.lc.match(:exhaustive, /(<[a..z]> **
>> 2)/)».Str.Bag;
>> .say for %digraphs.sort({ -.value, ~.key });
>>
>> , produces output identical to this Perl code:
>>
>> perl -lnE '
>> END { say "$_ => $digraph{$_}" for
>> sort { $digraph{$b} <=> $digraph{$a} || $a cmp $b }
>> keys %digraph
>> }
>> $_=lc; while (/([a-z]{2})/g) {++$digraph{$1}; --pos; }
>> ' Camelia.svg
>>
>> , when run against a downloaded copy of our mascot:
>> https://upload.wikimedia.org/wikipedia/commons/8/85/Camelia.svg
>>
>> --
>> Hope this helps,
>> Bruce Gray (Util of PerlMonks)
>>
>>