Hi Rich Felker,

I find your work to provide support for Indic text on console/terminal to be 
admirable, and yes, any kind of display is far better than none at all (and I 
do not consider your statement insulting) :)

What I was referring to was a comment along the lines of "... have a set of 
wcwidth classes (say, 1, 2, and 3) and assign - glyphs - to one of those 
classes... ". (Please forgive me if I misunderstood the last few posts.) The 
word to note is "glyph". What I'm saying is you cannot in advance specify the 
width of any given conjunct. It may be different in different fonts.

I suppose, we need to develop console specific fonts which can make proper use 
of the available width classes (or the structure you propose), however, I don't 
think any research has occurred in this regard.

Malayalam typography died in the 70s as a result of disastrous script reforms 
(The peak was the SPCS press, which produced many beautiful types for its 
publications - SPCS btw is supposed to be the worlds first co-operative of 
authors). Most artists/graphic designers do not use the stock fonts for any 
kind of artistic work, other than in running text where they have no choice. A 
"theory of style" doesnt exist for Malayalam (or afaik in any Indic language).

So, a proper answer to your question: how many width classes, really needs a 
lot of work both artistic as well as technical. (Malayalam has about 950 
conjuncts, so it has to be seen how they can fit into those classes).


Speaking to my older colleague who is a linguist and lexicologist in Dravidian 
languages, Kannada has pretty much the same structure as Malayalam with regard 
to conjuncts.


Speaking of curses, doesnt Debian/(K)ubuntu use curses for its installer ? I 
remember telling the Kubuntu devels to remove Hindi from the list of languages, 
because looking at the rendering is really horrible (misplaced vowels, and so 
many other things, unrelated to spacing/width). 

It is unfortunate, that many developers think that by using widestrings for 
each character is equivalent to support for all languages under Unicode. I 
guess some even think that the dotted-circle is a part of the script ;)

Regards,
Rajeev J Sebastian

----- Original Message ----
From: Rich Felker <[EMAIL PROTECTED]>
To: linux-utf8@nl.linux.org
Sent: Monday, October 30, 2006 7:02:04 PM
Subject: Re: Proposed fix for Malayalam (& other Indic?) chars and wcwidth

On Mon, Oct 30, 2006 at 04:17:54AM -0800, rajeev joseph sebastian wrote:
> Hello Rich Felker,
> 
> It is impossible to fit Malayalam "glyphs" into a given width class,
> if you want even barely aesthetic text. This is because a given
> sequence of Unicode characters may map into somewhat different
> conjunct styles depending on the font: either proper top to bottom
> (subjoining), or left to right (adjoining) or something in between
> as well :)

Yes, I'm aware of the aesthetic considerations but between the choice
of seeing nothing at all and seeing something with excessive spacing
(still correctly subjoining, but with extra width/spacing to make up
for the second character not using horizontal space), wouldn't the
latter be preferable? I don't claim it will be pretty but I believe
one can put together something which at least avoids being hideously
ugly. I also don't mean to insult your script by presenting it in an
ugly way (even having "i" and "m" the same width is ugly although much
less severely so), but a terminal and the apps that can be run on it
are quite useful IMO and it seems a shame for many people to be unable
to use them on account of language.

BTW the situation for Kannada seems much less severe... do you know
enough about the script to confirm this?

Thanks for the comments.

Rich


P.S. There's also the possibility of treating syllable clusters as the
fundamental unit of display and requiring a context-sensative function
rather than wcwidth to measure width; however from my experience
getting application maintainers just to fix their handling of
nonspacing characters is difficult enough without asking them to add
script-specific processing. Also the curses library (which is a bad
library anyway but many apps use it) doesn't support this model. :(
IMO the best long-term solution is to support both, with a terminal
escape to switch the terminal between "dumb" wcwidth-based spacing for
compatibility with apps that are not specifically Indic-script aware,
and "smart" context-sensitive spacing.


--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/







--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to