Bram Moolenaar wrote on 2000-08-03 12:10 UTC:
> Takuhiro Nishioka <[EMAIL PROTECTED]>:
> > I've tested xterm with utf-8 patch. It seems almost OK,
> > but not all scripts are classified well. For example,
> > here is a WORD:
> >
> > XXXYYY
> >
> > where "XXX" is a sequence of Hiragana characters and "YYY"
> > is a sequence of Katakana characters. But left-mouse
> > double clicking select "XXXYYY".
That one is quite easy to fix:
Add in the routine init_classtab() the following:
@@ -104,6 +104,11 @@
SetCharacterClassRange(0x2080, 0x208f, 0x2080); /* subscript */
SetCharacterClassRange(0x3000, 0x3000, 32); /* ideographic space */
SetCharacterClassRange(0x3001, 0x3020, -1); /* ideographic punctuation */
+ SetCharacterClassRange(0x3040, 0x309f, 0x3040); /* Hiragana */
+ SetCharacterClassRange(0x30a0, 0x30ff, 0x30a0); /* Katakana */
+ SetCharacterClassRange(0x3300, 0x9fff, -1); /* CJK Ideographs */
+ SetCharacterClassRange(0xac00, 0xd7a3, 0xac00); /* Hangul Syllables */
+ SetCharacterClassRange(0xf900, 0xfaff, -1); /* CJK Ideographs */
SetCharacterClassRange(0xfe30, 0xfe6b, -1); /* punctuation forms */
SetCharacterClassRange(0xff00, 0xff0f, -1); /* half/fullwidth ASCII */
SetCharacterClassRange(0xff1a, 0xff20, -1); /* half/fullwidth ASCII */
Are there other scripts than the above treated ones, where words are not
separated by space or punctuation?
Given the limitations of the mechanism, I guess it is best to treat each
Kanji character as a word on its own.
Markus
--
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/