> So... if you want to help make people more aware of the grapheme_* > functions, one place to start would be editing the documentation for the > various string, mbstring, and grapheme functions to use consistent > terminology, and sign-post each other more clearly. > http://doc.php.net/tutorial/
Yes I agree, Also I've edited documentation before in the svn days. I already planned to read up on how this is working nowadays. Also I'm planning an outline for a conference talk on the subject. I've educated people on unicode related subjects before, and think I have a few very good stories that can give insight into this for unsuspecting developers. I love the analogy that most Europeans understand. For the city of Cologne, there are two equally valid ways to write it's German name. Köln and Koeln. (Used when hindered by technical limitations, or maybe in informal conversation) Every German can extra_e_decode() and extra_e_encode(). Same for Straße and Strasse. Ligatures in fonts make it harder though, sometimes they intentionally obfuscate what's happening in the unicode layer. You might know this from special programming fonts with glyphs for ===, <> and such. Some Dutch fonts do a special ligature that combines ij, which was in the Dutch alphabet when I was a kid, 'y' was not. Unicode U+0132 and U+0133 describe this symbol, but I've never seen them used. Fonts combining ij to one visual entity is more common. I imagine most languages and cultures have these kind of edge-cases.