Hi Rich Felker, I find your work to provide support for Indic text on console/terminal to be admirable, and yes, any kind of display is far better than none at all (and I do not consider your statement insulting) :)
What I was referring to was a comment along the lines of "... have a set of wcwidth classes (say, 1, 2, and 3) and assign - glyphs - to one of those classes... ". (Please forgive me if I misunderstood the last few posts.) The word to note is "glyph". What I'm saying is you cannot in advance specify the width of any given conjunct. It may be different in different fonts. I suppose, we need to develop console specific fonts which can make proper use of the available width classes (or the structure you propose), however, I don't think any research has occurred in this regard. Malayalam typography died in the 70s as a result of disastrous script reforms (The peak was the SPCS press, which produced many beautiful types for its publications - SPCS btw is supposed to be the worlds first co-operative of authors). Most artists/graphic designers do not use the stock fonts for any kind of artistic work, other than in running text where they have no choice. A "theory of style" doesnt exist for Malayalam (or afaik in any Indic language). So, a proper answer to your question: how many width classes, really needs a lot of work both artistic as well as technical. (Malayalam has about 950 conjuncts, so it has to be seen how they can fit into those classes). Speaking to my older colleague who is a linguist and lexicologist in Dravidian languages, Kannada has pretty much the same structure as Malayalam with regard to conjuncts. Speaking of curses, doesnt Debian/(K)ubuntu use curses for its installer ? I remember telling the Kubuntu devels to remove Hindi from the list of languages, because looking at the rendering is really horrible (misplaced vowels, and so many other things, unrelated to spacing/width). It is unfortunate, that many developers think that by using widestrings for each character is equivalent to support for all languages under Unicode. I guess some even think that the dotted-circle is a part of the script ;) Regards, Rajeev J Sebastian ----- Original Message ---- From: Rich Felker <[EMAIL PROTECTED]> To: linux-utf8@nl.linux.org Sent: Monday, October 30, 2006 7:02:04 PM Subject: Re: Proposed fix for Malayalam (& other Indic?) chars and wcwidth On Mon, Oct 30, 2006 at 04:17:54AM -0800, rajeev joseph sebastian wrote: > Hello Rich Felker, > > It is impossible to fit Malayalam "glyphs" into a given width class, > if you want even barely aesthetic text. This is because a given > sequence of Unicode characters may map into somewhat different > conjunct styles depending on the font: either proper top to bottom > (subjoining), or left to right (adjoining) or something in between > as well :) Yes, I'm aware of the aesthetic considerations but between the choice of seeing nothing at all and seeing something with excessive spacing (still correctly subjoining, but with extra width/spacing to make up for the second character not using horizontal space), wouldn't the latter be preferable? I don't claim it will be pretty but I believe one can put together something which at least avoids being hideously ugly. I also don't mean to insult your script by presenting it in an ugly way (even having "i" and "m" the same width is ugly although much less severely so), but a terminal and the apps that can be run on it are quite useful IMO and it seems a shame for many people to be unable to use them on account of language. BTW the situation for Kannada seems much less severe... do you know enough about the script to confirm this? Thanks for the comments. Rich P.S. There's also the possibility of treating syllable clusters as the fundamental unit of display and requiring a context-sensative function rather than wcwidth to measure width; however from my experience getting application maintainers just to fix their handling of nonspacing characters is difficult enough without asking them to add script-specific processing. Also the curses library (which is a bad library anyway but many apps use it) doesn't support this model. :( IMO the best long-term solution is to support both, with a terminal escape to switch the terminal between "dumb" wcwidth-based spacing for compatibility with apps that are not specifically Indic-script aware, and "smart" context-sensitive spacing. -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/ -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/