Thank you, Samantha! An outstanding question is one posed by Joseph Brenner--that is--knowing which version of the Unicode standard is supported by Raku. I grepped through two files, one called "unicode.c" and the other called "unicode_db.c". They're both located in rakudo at: /rakudo/rakudo-2020.06/nqp/MoarVM/src/strings/ .
Below are the first 4 lines of my grep results. As you can see (above/below), rakudo-2020.06 supports Unicode12.1.0: ~$ raku -ne '.say if .grep(/unicode/)' ~/rakudo/rakudo-2020.06/nqp/MoarVM/src/strings/unicode_db.c # For terms of use, see http://www.unicode.org/terms_of_use.html # The UAXes can be accessed at http://www.unicode.org/versions/Unicode12.1.0/ >From http://unicode.org/copyright.html#Exhibit1 on 2017-11-28: Distributed under the Terms of Use in http://www.unicode.org/copyright.html. <TRUNCATED> It would be really interesting to follow your Unicode work, Samantha. The ideas you propose are interesting and everyone hopes for speed improvements. Is there any place Raku-uns can go to read updates--maybe a grant report, blog, or Github issue? Or maybe right here, on the Perl6-Users mailing list? Thanks in advance. Best, Bill. W. Michels, Ph.D. On Sun, Sep 27, 2020 at 4:03 AM Samantha McVey <samant...@posteo.net> wrote: > > So MoarVM uses its own database of the UCD. One nice thing is this can > probably be faster than calling to the ICU to look up information of each > codepoint in a long string. Secondly it implements its own text data > structures, so the nice features of the UCD to do that would be difficult to > use. > > In my opinion, it could make sense to use ICU for things like localized > collation (sorting). It also could make sense to use ICU for unicode > properties lookup for properties that don't have to do with grapheme > segmentation or casing. This would be a lot of work but if something like this > were implemented it would probably happen in the context of a larger > rethinking of how we use unicode. Though everything is complicated by that we > support lots of complicated regular expressions on different unicode > properties. I guess first I'd start by benchmarking the speed of ICU and > comparing to the current implementation. > >