On 2014-01-13 17:15:21 +0000, "Dominikus Dittes Scherkl" <dominikus.sche...@continental-corporation.com> said:

On Sunday, 12 January 2014 at 12:48:05 UTC, Tobias Pankrath wrote:
On Saturday, 11 January 2014 at 21:42:46 UTC, Dmitry Olshansky wrote:
12-Jan-2014 01:22, monarch_dodra пишет:
And it's indeed quite high, the amount of "bad sheep" that gets longer/shorter across the whole Unicode is around 5-10 codepoints IRC.

More important than the absolute amount of "bad sheep" is the frequency of them in your input :-)

In german the frequency of "ß" is 0.31% and the mess with getting a longer
result ("SS") is only for toUpper().
I think greak has a similar problem but don't know the frequency there...

The funny thing about "ß" is that in UTF-8 it's two bytes (0xC3 0x9F) and you replace it with "SS" which is two bytes too (0x53 0x53). So with some cleverness it can be done in place for char[], but not for wchar[] or dchar[]. :-)

--
Michel Fortin
michel.for...@michelf.ca
http://michelf.ca

Reply via email to