If I'm reading you right, you're saying it might be easier if everything were encoded as combining (or maybe more aptly non-combining) codes, regardless of language?
So, we might encode 'Waffles' as w+upper a f f l e s and let the renderer (if there is one) handle the presentation of the case shift and the potential ligature, but things like grep get noticeably easier with no overlap of ő and o+umlaut. Again, oversimplified, with no real understanding on my part of the depth or breadth of the problem space. If this is the case, could it be handled by pushing everything into a subset of unicode rather than use the unallocated space to create a superset? -J On 7/26/09, erik quanstrom <quans...@quanstro.net> wrote: >> to be fair to the unicode people, this decoupling of glyphs and codepoints >> is (i think) the most straightforward way to implement some languages like >> arabic, where the glyphs for characters depend on their position within a >> word. that is, a letter at the beginning of a word looks different from >> what it would look like if it was in the middle. > > my opinion (not that i'm entitled to one here) is > that the unicode guys screwed up. unicode is not > consistant. explain why there are two code points sigma. > 03c3 greek small letter sigma > 03c2 greek small letter final sigma > why does german get ä, ö, ü? if you want to take > this further, why are there capital forms of latin letters? > can't that also be inferred by the font? > > what's called a ligature in one language is a character > in another. i see no consistency. it seems like the > unicode committee had a problem with too much > knowledge of the specific problems and few actual > unifying (sorry) concepts. > > i think it would make much more sense to put this logic > in editors. this would also allow the freedom to use a > capital, ligature, final form in the wrong place. > like say studlyCaps. i can't imagine english is the only > language in the world that gets abused. > > - erik > > -- Sent from my mobile device