Re: "ICU - International Components for Unicode"

Samantha McVey Sun, 27 Sep 2020 04:09:03 -0700

So MoarVM uses its own database of the UCD. One nice thing is this can 
probably be faster than calling to the ICU to look up information of each 
codepoint in a long string. Secondly it implements its own text data 
structures, so the nice features of the UCD to do that would be difficult to 
use.


In my opinion, it could make sense to use ICU for things like localized 
collation (sorting). It also could make sense to use ICU for unicode 
properties lookup for properties that don't have to do with grapheme 
segmentation or casing. This would be a lot of work but if something like this 
were implemented it would probably happen in the context of a larger 
rethinking of how we use unicode. Though everything is complicated by that we 
support lots of complicated regular expressions on different unicode 
properties. I guess first I'd start by benchmarking the speed of ICU and 
comparing to the current implementation.

Re: "ICU - International Components for Unicode"

Reply via email to