Re: Shaping/Joining within fvwm

Mikhael Goikhman Thu, 19 Dec 2002 04:07:12 -0600 (CST)

On 18 Dec 2002 18:57:32 -0800, Nadim Shaikli wrote:
> 
> > > With regard to your 'naskhi' font - if it contains the required
> > > Form-B glyphs (U+FE70 - U+FEFF), then the following ought to work,
> > > 
> > >   Style Arabic Font *-naskhi-medium-*-iso8859-6/iso10646-1
> > 
> > If it is iso8859-6 font (not unicode), it can't be promoted to iso10646-1.
> 
> iso8859-6 is a subset of iso10646-1 -- again, iso8859-6 alone is
> simply not usable; its visually incorrect without shaping and one
> is not able to shape sans Form-B glyphs (there are a plethora of
> posts regarding this topic on the 'net - I can certainly send you
> the links if you like so as not to go on a tangent on this forum).


I know the theory well now. But you miss my point. I want everything to
work, including iso8859-6. You can't deny that there is such charset (and
encoding) and as you said it is not losing for Arabic. So if a user
requested iso8859-6 fonts (without non iso8859-6 characters, of course)
he don't want to see question marks for valid iso8859-6 characters.

Don't worry about this, I may later fix one-byte Arabic charsets myself.
Or not fix, if you are against supporting all existing Arabic fonts. :)

> > Nadim, you seem to imply that the only valid way to write Arabic is
> > unicode. But this is not correct. Here is a valid Arabic that is not
> > unicode: env LANG=ar_JO.iso8859-6 date
> 
> I don't have any Arabic locale on this machine - sorry.  But I do
> indeed imply and state that Arabic should be used with UTF-8 and
> nothing else (not even CP-1256 :-)  I'm actually curious to why
> fvwm doesn't simply default to UTF-8 at all times ?

If you use iso10646-1 fonts, it defaults to utf-8, is not it?

> > We supported all iso encodings. I see no valid reason to stop to support
> > iso8859-6. I think the problem is that once shaping is applied fribidi
> > (or is it iconv?) can't go back to iso8859-6 and uses question marks then,
> > so we should only apply shaping for unicode encoding of original strings.
> 
> I don't think its a question of support.  Fvwm is doing the right
> thing.  I view this as "faulty/missing font" issue.  The font file
> you were using simply doesn't have the _required_ Form-B glyphs and
> thus Arabic can't be displayed properly.  Its like wanting to display
> chinese without having the correct chinese glyphs and getting question
> marks instead.

What you say is that all existing CP1256 and iso8859-6 one byte fonts
should show question marks and never Arabic glyphs that they contain.
I don't know, it is not hard to fix this situation.

> Out of curiosity, how do
> 'env LANG=ar_JO.iso8859-6 date' and
> 'env LANG=ar_JO.UTF-8     date' differ ?

The first returns regular one-byte Arabic and English characters, totally
40 bytes. The second returns the same text, but in utf-8, 56 bytes.
Only ascii characters (the first 127) are the same in both encodings.
Try: cat Arabic+English-utf8-encoded-file | iconv -f utf-8 -t iso8859-6

By the way, FVWM supports CP-1256 encoding without problems, as far as
I can see, when I set CP-1256 encoded title using:

  env LANG=ar_JO.iso8859-6 date | iconv -f iso8859-6 -t cp1256

I even see it correctly shaped (I think) if I use unicode font like:

  Style Arabic Font StringEncoding=CP1256:*-arabeyes-*/iso10646-1

The reason arabeyes is not recognized as iso10646-1 is bugs in this font.

> iso8859-6 is an code-table representation (ie. an assignment of
> integer numbers to characters) where-as UTF-8 is a representation
> format (sequence of bytes).  So I'm not sure what you mean above
> by "support all iso encodings".

Actually, here I meant more "support all charsets, i.e. fonts".

> In other words, my ability to do
> StringEncoding=iso8859-6 and StringEncoding=UTF-8 seems a bit like
> comparing apples to oranges.

No, they are not apples and oranges. iso8859-6 is both charset and
encoding, like any other one-byte charset, where one char is one byte.
There are iso8859-6 encoded texts (short) and utf-8 encoded texts (use
more bytes). It is possible to convert such text in one direction, but
not always in another. It is always possible to convert between iso8859-6
and cp1256 texts except for maybe some funky chars, I guess.

> I can understand the following encodings
> UTF-8, USC-2, USC-4 and UTF-16, but don't quite understand a setting
> akin to 'StringEncoding=iso8859-6' (unless fvwm is mapping names to
> encodings which is what I thought it did - "convenience magic").

If you have text stored in some encoding (iso8859-6 or cp1256), you may
find it useful to be able to convert it to something else, like utf-8
to use with unicode fonts. FVWM allows this using StringEncoding.

Out of curiosity, do you have Arabic text files? Are they all in one or
another unicode encoding? I read literature in several languages, but I
should yet encounter utf-8 text. If there are one byte encodings and there
is only one language (except for English) unicode is a waste of space. :)

Of course, to see one byte encoded text, you should replace "set
encoding=utf-8" in your .vimrc that I know you have.

Regards,
Mikhael.
--
Visit the official FVWM web page at <URL:http://www.fvwm.org/>.
To unsubscribe from the list, send "unsubscribe fvwm-workers" in the
body of a message to [EMAIL PROTECTED]
To report problems, send mail to [EMAIL PROTECTED]

Re: Shaping/Joining within fvwm

Reply via email to