Not bucky bits, but one potential example of utilizing more bits per character.
If I'm writing an editor for multilingual text, I need to put language attributes to the region in the buffer, since the same character should necessarily be rendered differently (For CJK unified characters, glyphs are different for each language; using fonts created for another language makes them look very weird. And I want one level of indirection, e.g. specifying a language then select fonts for that language, instead of directly associating fonts for each region). Suppose I want to use Scheme as the extension language of the editor. It will have an operation to extract a region of the buffer as a Scheme string. And it will be useful if the extracted string contains language information as well, for I might want to do language-specific operations. Using 32bits per character and put auxiliary language info into the top 11 bits can be a plausible implementation. (At least it looks better than using "strongly discouraged" Unicode language tag characters). R6RS string/character operation may just ignore those aux bits, which is fine, but I like the standard allows me to add extensions that deals with them. I suspect locking into utf-16 prohibits such extension. (Of course, in the editor buffer, things gets more complicated because of combining characters, but I'm thinking the case when I extract a part of it into Scheme strings). (I think Emacs treats characters of different language by adding leading octet unique to each language. With that it can properly distinguish Japanese and Chinese characters even if they are mixed in a single document. I doubt most unicode plaintext editor can do that.) --shiro _______________________________________________ r6rs-discuss mailing list [email protected] http://lists.r6rs.org/cgi-bin/mailman/listinfo/r6rs-discuss
