On 2014-01-13 17:15:21 +0000, "Dominikus Dittes Scherkl"
<dominikus.sche...@continental-corporation.com> said:
On Sunday, 12 January 2014 at 12:48:05 UTC, Tobias Pankrath wrote:
On Saturday, 11 January 2014 at 21:42:46 UTC, Dmitry Olshansky wrote:
12-Jan-2014 01:22, monarch_dodra пишет:
And it's indeed quite high, the amount of "bad sheep" that gets
longer/shorter across the whole Unicode is around 5-10 codepoints IRC.
More important than the absolute amount of "bad sheep" is the frequency
of them in your input :-)
In german the frequency of "ß" is 0.31% and the mess with getting a longer
result ("SS") is only for toUpper().
I think greak has a similar problem but don't know the frequency there...
The funny thing about "ß" is that in UTF-8 it's two bytes (0xC3 0x9F)
and you replace it with "SS" which is two bytes too (0x53 0x53). So
with some cleverness it can be done in place for char[], but not for
wchar[] or dchar[]. :-)
--
Michel Fortin
michel.for...@michelf.ca
http://michelf.ca