In article <[EMAIL PROTECTED]>, Jan Djärv <[EMAIL PROTECTED]> writes:
> > AFAIK, only when TEXT is requested, an selection owner can > > choose the returning type from STRING, COMPOUND_TEXT, or > > UTF8_STRING. When UTF8_STRING is requested, we should > > return it or return nothing. > > > > And, if Emacs owns a unibyte string, perhaps the right thing > > is to make it multibyte according to the current > > lang. env. (by string-make-multibyte) at first, then encode > > it by utf-8. > What would that do to illegal UTF-8 sequences in the original unibyte string? The original unibyte string won't be in UTF-8 format. But, string-make-multibyte will convert it to a correct multibyte string, thus encoding that multibyte string by UTF-8 will produce a correct UTF-8 string ... usually. > I.e. will this procedure always produce valid UTF-8 data? No. If a byte in the original unibyte string is not a valid code point of the primary charset of the current lang. env., string-make-unibyte will produce a multibyte string that contains eight-bit-control or eight-bit-graphic character. Then, encoding it by UTF-8 will results in incorrect UTF-8 sequence. So, for safely, we must delete such eight-bit characters or replace them with U+FFFD (REPLACEMENT CHARACTER) before encoding by UTF-8. Or, in such a case, don't return anything (which means Emacs doesn't hold a requested data). --- Kenichi Handa [EMAIL PROTECTED] _______________________________________________ emacs-pretest-bug mailing list emacs-pretest-bug@gnu.org http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug