On Sun, 01 Dec 2013 11:37:30 +1300, Gregory Ewing wrote: > Which makes it even sillier to have an 'ffi' character in this day and > age, when you can simply space the characters so that they overlap.
It's in Unicode to support legacy character sets that included it[1]. There are a bunch of similar cases: * LATIN CAPITAL LETTER A WITH RING ABOVE versus ANGSTROM SIGN * KELVIN SIGN versus LATIN CAPITAL LETTER A * DEGREE CELSIUS and DEGREE FAHRENHEIT * the whole set of full-width and half-width forms On the other hand, there are cases which to a naive reader might look like needless duplication but actually aren't. For example, there are a bunch of visually indistinguishable characters[2] in European languages, like AΑА and BΒВ. The reason for this becomes more obvious[3] when you lowercase them: py> 'AΑА BΒВ'.lower() 'aαа bβв' Sorting and case-conversion rules would become insanely complicated, and context-sensitive, if Unicode only included a single code point per thing- that-looks-the-same. The rules for deciding what is and what isn't a distinct character can be quite complex, and often politically charged. There's a lot of opposition to Unicode in East Asian countries because it unifies Han ideograms that look and behave the same in Chinese, Japanese and Korean. The reason they do this is for the same reason that Unicode doesn't distinguish between (say) English A, German A and French A. One reason some East Asians want it to is for the same reason you or I might wish to flag a section of text as English and another section of text as German, and have them displayed in slightly different typefaces and spell-checked with a different dictionary. The Unicode Consortium's answer to that is, this is beyond the remit of the character set, and is best handled by markup or higher-level formatting. (Another reason for opposing Han unification is, let's be frank, pure nationalism.) [1] As far as I can tell, the only character supported by legacy character sets which is not included in Unicode is the Apple logo from Mac charsets. [2] The actual glyphs depends on the typeface used. [3] Again, modulo the typeface you're using to view them. -- Steven -- https://mail.python.org/mailman/listinfo/python-list