On Sunday, 19 April 2015 at 02:20:01 UTC, Shachar Shemesh wrote:
U0065+U0301 rather than U00e9. Because of legacy systems, and because they would rather have the ISO-8509 code pages be 1:1 mappings, rather than 1:n mappings, they introduced code points they really would rather do without.
That's probably right. It is in fact a major feat to have the world adopt a new standard wholesale, but there are also difficult "semiotic" issues when you encode symbols and different languages view symbols differently (e.g. is "ä" an "a" or do you have two unique letters in the alphabet?)
Take "å", it can represent a unit (ångström) or a letter with a circle above it, or a unique letter in the alphabet. The letter "æ" can be seen as a combination of "ae" or a unique letter.
And we can expect languages, signs and practices to evolve over time too. How can you normalize encodings without normalizing writing practice and natural language development? That would be beyond the mandate of a unicode standard organization...