Shiro Kawai wrote:
Suppose I want to use Scheme as the extension language of
the editor. It will have an operation to extract a region
of the buffer as a Scheme string. And it will be useful
if the extracted string contains language information as
well, for I might want to do language-specific operations.
Associating arbitrary "properties" with a character or a
run of characters in a string is a very useful operation.
Emacs has this:
Each character position in a buffer or a string can have a "text
property list", much like the property list of a symbol.
Java Swing text "Document" objects provide something similar.
Using 32bits per character and put auxiliary language info
into the top 11 bits can be a plausible implementation.
For some applications 11 bits may be enough. But if you want a
language property as well as a font property, why then you're
already out of bits.
(I think Emacs treats characters of different language by
adding leading octet unique to each language.
Not quite. It can represent simultaneously different encodings
in the same buffer, but encoding isn't the same as language.
This "feature" is a holdover from the pre-Unicode (or rather
anti-Unicode) days: "Mule" was developed in Japan where there
was a lot anti-Unicode sentiment, but I think that war is over.
--
--Per Bothner
[EMAIL PROTECTED] http://per.bothner.com/
_______________________________________________
r6rs-discuss mailing list
[email protected]
http://lists.r6rs.org/cgi-bin/mailman/listinfo/r6rs-discuss