On Wednesday, 16 October 2013 at 12:18:40 UTC, Jacob Carlborg
wrote:
On 2013-10-16 10:03, qznc wrote:
Most code might be buggy then.
An issue the often comes up is file names. A file called "bär"
will be
normalized differently depending on the operating system. In
both cases
it is one grapheme. However, on Linux it is one code point,
but on OS X
it is two code points.
Why would it require two code points?
It is either [U+00E4] as one code point or [a,U+0308] for two
code points. The second is "combining diaeresis" [0]. Not
required, but possible. Those combining characters [1] provide a
nearly infinite number of combinations. You can go crazy with it:
http://stackoverflow.com/questions/6579844/how-does-zalgo-text-work
[0] http://www.fileformat.info/info/unicode/char/0308/index.htm
[1] http://en.wikipedia.org/wiki/Combining_character