Am Mon, 30 May 2016 17:35:36 +0000 schrieb Chris <wend...@tcd.ie>: > I was actually talking about ICU with a colleague today. Could it > be that Unicode itself is broken? I've often heard criticism of > Unicode but never looked into it.
You have to compare to the situation before, when every operating system with every localization had its own encoding. Have some text file with ASCII art in a DOS code page? Doesn't render on Windows with the same locale. Open Cyrillic text on a Latin system? Indigestible. Someone wrote a website on Windows and incorrectly tagged it with an ISO charset? The browser has to fix it up for them. One objection I remember was the Han Unification: https://en.wikipedia.org/wiki/Han_unification Not everyone liked how Chinese, Japanese, Korean were represented with a common set of ideograms. At the time Unicode was still 16-bit and the unified symbols would already make up 32% of all code points. In my eyes many of the perceived problems of Unicode are stemming from the fact that raises awareness to different writing systems all over the globe in a way that we didn't have to, when software was developed locally instead of globally on GitHub, when the target was Windows instead of cross-platform and mobile, when we were lucky if we localized for a couple of latin languages, but Asia was a real barrier. I don't know what you and your colleague discussed about ICU, but likely if you should add another dependency and what alternatives there are. In Linux user space, almost everything is an outside project, an extra library, most of them with alternatives. My own research lead me to the point where I came to think that there was one set of libraries without real alternatives: ICU -> HarfBuff -> Pango That's the go-to chain for Unicode text. From text processing over rendering to layouting. Moreover many successful open-source projects make use of it: LibreOffice, sqlite, Qt, libxml2, WebKit to name a few. Unicode is here to stay, no matter what could have been done better in the past, and I think it is perfectly safe to bet on ICU on Linux for what e.g. Windows has built-in. Otherwise just do as Adam Ruppe said: > Don't mess with strings. Get them from the user, store them > without modification, spit them back out again. :p -- Marco