Re: Unicode and the Linux console (again)

Edward H. Trager Mon, 10 Jan 2005 13:11:21 -0800

Hi, Simos,

Sorry that I have probably not given this thread as much attention as it
deserves due to limitations on time and being too busy at work.  Nevertheless,
at the risk of possibly repeating some things others may have mentioned, I
will put forward a few comments:

First, I think the Linux developer community needs to think very *broadly*
to include all scripts defined in Unicode 4.1.  It is not good enough to only be
able to handle Latin, Greek, and Cyrillic, even if one can solve the problem 
with accented characters for Latin/Greek/Cyrillic.  Ideally the console
would be able to handle CJK, Arabic, Syriac, Devanagari, Bengali, Myanmar, 
Tibetan, and
Mongolian UTF-8 as deftly as it can handle Latin.  So anyone who understands the
issues surrounding the console should also spend enough time to understand
the issues of input methods and complex text layout for various scripts, 
especially for
complex-text layout scripts like Myanmar, for example.  

Some months ago I had had the idea
of trying to fill out the missing parts of the GNU Unifont bitmap font:  When 
one 
looks at a script like Myanmar, it is not at all obvious how one should try to 
"squish"
the various glyphs into one cell or two cells.  Some characters, like MYANMAR 
LETTER KA
u+1000 clearly look like they should take up two console character cells, just 
like Han
chinese characters do.  Others, like MYANMAR LETTER KHA u+1001 clearly need 
only one
character cell.  Other letters like MYANMAR LETTER II u+1024 ought to use up 
*THREE CONSOLE
CHARACTER CELLS* and MYANMAR LETTER AU u+102A should have *FOUR CONSOLE 
CHARACTER CELLS*.
Has anyone ever thought about this before?  So, if you ask me, having the 
option of
"single width" vs. "double width" vs. "zero-width" (i.e., accent marks or other 
diacritics
 that combine with a previous character but don't take up any additional 
console character
cells) is not enough.  There has to be a system that would allow for zero, one, 
two, three,
and four character cell widths.  Maybe even more--I'd have to look more 
carefully to know the answer.
  One can envision a similar problem for other Indic and Indic-derived
scripts, like Devanagari.

Now, even if I had a system which allowed me to use up to four character cells 
just for one glyph
(the glyph itself representing one or more unicode characters -- think about 
all of the consonant
conjuncts in Devanagari, Myanmar, and other Indic and Indic-derived scripts), I 
still have to have
smart input method support and smart text layout support ...  I doubt if just 
one person can
implement all of this and claim a bounty (if there was a bounty to be claimed).

Just my 2 cent opinion.  The console is not an area I know very much about--If 
I have time, I'll
try to understand more about the actual implementation issues.

-- Ed Trager

On Monday 2005.01.10 19:35:23 +0000, Simon wrote:
> 
> Hi All,
> I suppose this is a long established issue that would make quite a few 
> people happy
> when it is resolved.
> Currently, one can read Unicode on the Linux console. In addition, one 
> can write Unicode
> on the Linux console with the expeption of accents; you cannot use dead 
> keys.
> (Of course I am refering to languages without combining characters, etc...).
> The issue about dead keys is described at
> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=143014
> 
> There was a discussion on LKML about this issue with the aim to get the 
> patch by
> Chris Heath included 
> (http://www.ussg.iu.edu/hypermail/linux/kernel/0412.1/1039.html).
> The reaction was not positive (not complaining), see 
> http://www.ussg.iu.edu/hypermail/linux/kernel/0412.1/1575.html
> and there was no more pressing to get it accepted.
> 
> However, this issue has to be resolved and at least basic Unicode 
> (+accents) should be supported on the console. The ideal
> situation is to be able to do a bit more languages such as arabic.
> There are quite a few projects to enable the use of old computers in the 
> developing world and there is interest
> to use the Linux console 
> (http://lists.debian.org/debian-l10n-greek/2004/12/msg00020.html)
> 
> The way I see it is to write a description of the task to enable better 
> Unicode on the Linux console (use framebuffer?) and
> post it as a "bounty" for whoever is able to implement.
> 
> - Could you have a look at 
> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=143014
> and post any information missing/corrections?
> - How do you envision should change on the Linux console to support 
> better Unicode?
> - Could this be written as a task (so to post as a bounty)?
> 
> Cheers,
> Simos
> 
> 
> -- 
> No virus found in this outgoing message.
> Checked by AVG Anti-Virus.
> Version: 7.0.300 / Virus Database: 265.6.9 - Release Date: 06/01/2005
> 
> 
> --
> Linux-UTF8:   i18n of Linux on all levels
> Archive:      http://mail.nl.linux.org/linux-utf8/
> 
> 
> 

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: Unicode and the Linux console (again)

Reply via email to