On Wednesday, 16 October 2013 at 12:18:40 UTC, Jacob Carlborg wrote:
On 2013-10-16 10:03, qznc wrote:

Most code might be buggy then.

An issue the often comes up is file names. A file called "bär" will be normalized differently depending on the operating system. In both cases it is one grapheme. However, on Linux it is one code point, but on OS X
it is two code points.

Why would it require two code points?

It is either [U+00E4] as one code point or [a,U+0308] for two code points. The second is "combining diaeresis" [0]. Not required, but possible. Those combining characters [1] provide a nearly infinite number of combinations. You can go crazy with it: http://stackoverflow.com/questions/6579844/how-does-zalgo-text-work

[0] http://www.fileformat.info/info/unicode/char/0308/index.htm
[1] http://en.wikipedia.org/wiki/Combining_character

Reply via email to