In article <[EMAIL PROTECTED]>, Jan Djärv <[EMAIL PROTECTED]> writes:

> > AFAIK, only when TEXT is requested, an selection owner can
> > choose the returning type from STRING, COMPOUND_TEXT, or
> > UTF8_STRING.  When UTF8_STRING is requested, we should
> > return it or return nothing.
> > 
> > And, if Emacs owns a unibyte string, perhaps the right thing
> > is to make it multibyte according to the current
> > lang. env. (by string-make-multibyte) at first, then encode
> > it by utf-8.

> What would that do to illegal UTF-8 sequences in the original unibyte string? 

The original unibyte string won't be in UTF-8 format.  But,
string-make-multibyte will convert it to a correct multibyte
string, thus encoding that multibyte string by UTF-8 will
produce a correct UTF-8 string ... usually.

>   I.e. will this procedure always produce valid UTF-8 data?

No.  If a byte in the original unibyte string is not a valid
code point of the primary charset of the current lang. env.,
string-make-unibyte will produce a multibyte string that
contains eight-bit-control or eight-bit-graphic character.
Then, encoding it by UTF-8 will results in incorrect UTF-8
sequence.  So, for safely, we must delete such eight-bit
characters or replace them with U+FFFD (REPLACEMENT
CHARACTER) before encoding by UTF-8.

Or, in such a case, don't return anything (which means Emacs
doesn't hold a requested data).

---
Kenichi Handa
[EMAIL PROTECTED]


_______________________________________________
emacs-pretest-bug mailing list
emacs-pretest-bug@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug

Reply via email to