On Sunday, 3 July 2022 at 21:06:40 UTC, rikki cattermole wrote:
We have a perfectly good Unicode handling library already.
(Okay, little out of date and doesn't handle Turkic stuff, but
fixable).
The standard one is called ICU.
Yes, that is a common one that is maintained, but maybe there are
BOOST licensed implementations too? One can do an exhaustive test
for say two-character normalization against ICU to see if they
are compliant.
Anyway, normalization should not happen behind your back in a
system level language. You might want to treat different
encodings of the same string differently when comparing.
Anyway, we are straying from my original point, that limiting
ourselves to the string alias and not supporting wstring or
dstring in Phobos is going to bite us.
I guess some Windows programmers want 16 bit… but I don't think
the conversion matters all that much in that context?
There better be a good reason for this that isn't just removing
templates.
The good reason would be that you can focus on fast SIMD
optimized algoritms that makes sense for the byte-encoding of
UTF-8, and get something competitive.