Re: Unicode and the Linux console (again)

Christopher Fynn Mon, 10 Jan 2005 19:54:22 -0800

Edward, in the murky past (maybe 17-18 years ago I saw console / terminal type applications and utilities that worked with CJK scripts, Indic scripts, Tibetan, Arabic or CJK scripts running under both PC DOS and Xenix. (The systems for Indic scripts & Tibetan were made by CDAC in Pune India) Most of these systems required some sort of card with the characters in ROM or a special terminal, given the amount of memory and the graphics cards in modern PC's it it shouldn't be beyond the wit of man to get something like this working without special hardware.

How about using vector fonts in the console? At one time there were a few MS DOS applications which could use TrueType fonts without Windows or other GUI.

- Chris

Edward H. Trager wrote:

Hi, Simos,
Sorry that I have probably not given this thread as much attention as it
deserves due to limitations on time and being too busy at work.  Nevertheless,
at the risk of possibly repeating some things others may have mentioned, I
will put forward a few comments:
First, I think the Linux developer community needs to think very *broadly* to include all scripts defined in Unicode 4.1. It is not good enough to only be able to handle Latin, Greek, and Cyrillic, even if one can solve the problem with accented characters for Latin/Greek/Cyrillic. Ideally the console would be able to handle CJK, Arabic, Syriac, Devanagari, Bengali, Myanmar, Tibetan, and Mongolian UTF-8 as deftly as it can handle Latin. So anyone who understands the issues surrounding the console should also spend enough time to understand the issues of input methods and complex text layout for various scripts, especially for complex-text layout scripts like Myanmar, for example.

Some months ago I had had the idea of trying to fill out the missing parts of the GNU Unifont bitmap font: When one looks at a script like Myanmar, it is not at all obvious how one should try to "squish" the various glyphs into one cell or two cells. Some characters, like MYANMAR LETTER KA u+1000 clearly look like they should take up two console character cells, just like Han chinese characters do. Others, like MYANMAR LETTER KHA u+1001 clearly need only one character cell. Other letters like MYANMAR LETTER II u+1024 ought to use up *THREE CONSOLE CHARACTER CELLS* and MYANMAR LETTER AU u+102A should have *FOUR CONSOLE CHARACTER CELLS*. Has anyone ever thought about this before? So, if you ask me, having the option of "single width" vs. "double width" vs. "zero-width" (i.e., accent marks or other diacritics that combine with a previous character but don't take up any additional console character cells) is not enough. There has to be a system that would allow for zero, one, two, three, and four character cell widths. Maybe even more--I'd have to look more carefully to know the answer. One can envision a similar problem for other Indic and Indic-derived scripts, like Devanagari.

Now, even if I had a system which allowed me to use up to four character cells 
just for one glyph
(the glyph itself representing one or more unicode characters -- think about 
all of the consonant
conjuncts in Devanagari, Myanmar, and other Indic and Indic-derived scripts), I 
still have to have
smart input method support and smart text layout support ...  I doubt if just 
one person can
implement all of this and claim a bounty (if there was a bounty to be claimed).

Just my 2 cent opinion.  The console is not an area I know very much about--If 
I have time, I'll
try to understand more about the actual implementation issues.

-- Ed Trager

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: Unicode and the Linux console (again)

Reply via email to