Dave Love <[EMAIL PROTECTED]> writes: >>>>>> Kenichi Handa writes: >> It seems that it doesn't have a major problem,
> I hope not, because I've basically used your customization hooks or > similar ones and done the sort of things you'd talked about at some > time! I may have written various ideas but please don't assume that they all works fine. :-p >> but I found one problem related to handling unibyte case. > I didn't expect it to do anything sensible with unibyte, but if > there's an easy to improve it, that would be fine. >> If unify-8859-on-decoding-mode is on, for instance, in >> latin-2 lang. env., 8859-2 characters files are decoded into >> latin-iso8859-1 and mule-unicode-0100-24ff. But, C-q XXX >> still inserts latin-iso8859-2 characters. > Yes. I'm not sure that should change, but the relevant primitives > could now use `translation-table-for-input'. It wasn't the sort of > thing I could control in user-level customization anyway, without > kludging it with a post-command hook. C-q (quoted-insert) assumes that a code in the range 0240..0377 is a code for single-byte charset, and convert it to multibyte code by calling unibyte-char-to-multibyte. This function does the conversion by nonascii-translation-table or nonascii-insert-offset. So, once they are set correctly, C-q XXX can insert an appropriate character. >> And, when we paste mule-unicode-0100-24ff characters into >> unibyte buffer, or paste unibyte string into a multibyte >> buffer, they are not correctly converted. > What would be correct? As far as characters are in the range what the current language environment support in unibyte mode, they should be correctly converted. For instance, in Latin-2 lang. env., all characters in latin-2 charset should be handled correctly, i.e. A-ogonek in mule-unicode-0100-24ff <=> 0xA1. And, that kind of conversion can be done only by setting nonascii-translation-table correctly in each language environment. > Is general Unicode text any different to, say, JISX-based > Japanese in that respect? Of course, in Japanese lang. env. or in UTF-8 lang. env., such a conversion between unibyte and multibyte is meaningless and doesn't work. And anyway, in such a lang. env., peaple doesn't expect it to work well. What we need is to make it work well only in single-byte-charset-based lang. env. --- Ken'ichi HANDA [EMAIL PROTECTED] -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/