Hi all,

Since I am quite a heavy user of UTF-8 as the web master of a
multilingual web site, http://JV.Gilead.org.il, let me summerize what I
see as the page title when I use firefox to browse to different pages of
my site. Of course, if I use tabbed browsing, the tab title is always
correct.
1) If the title have non-latin-1 letters, no problem. E.g.,
<http://JV.Gilead.org.il/hebrew/>. In this case xprop gives

WM_NAME(COMPOUND_TEXT) = "Les Voyages extraordinaires en Hébreu — המסעות
המופלאים - Mozilla Firefox"
_NET_WM_NAME(UTF8_STRING) = 0x4c, 0x65, 0x73, 0x20, 0x56, 0x6f, 0x79,
0x61, 0x67, 0x65, 0x73, 0x20, 0x65, 0x78, 0x74, 0x72, 0x61, 0x6f, 0x72,
0x64, 0x69, 0x6e, 0x61, 0x69, 0x72, 0x65, 0x73, 0x20, 0x65, 0x6e, 0x20,
0x48, 0xc3, 0xa9, 0x62, 0x72, 0x65, 0x75, 0x20, 0xe2, 0x80, 0x94, 0x20,
0xd7, 0x94, 0xd7, 0x9e, 0xd7, 0xa1, 0xd7, 0xa2, 0xd7, 0x95, 0xd7, 0xaa,
0x20, 0xd7, 0x94, 0xd7, 0x9e, 0xd7, 0x95, 0xd7, 0xa4, 0xd7, 0x9c, 0xd7,
0x90, 0xd7, 0x99, 0xd7, 0x9d, 0x20, 0x2d, 0x20, 0x4d, 0x6f, 0x7a, 0x69,
0x6c, 0x6c, 0x61, 0x20, 0x46, 0x69, 0x72, 0x65, 0x66, 0x6f, 0x78

And this is exactly what the title is (also if you use f.identify in
ctwm, this is what you get). Even the non-ascii Latin-1 characters are
translated to two bytes by Xlib.

You can also look at the Cyrillic page at
<http://jv.gilead.org.il/FAQ/index.ru.html>. Good title.

2) If you have only latin-1 letters, however some of them are non ascii,
which are represented in UTF-8 by TWO bytes, but in only ONE byte in
ISO-8859-1 . E.g., http://jv.gilead.org.il/sjv.html. In this case xprop
gives

WM_NAME(STRING) = "Société Jules Verne - Mozilla Firefox"
_NET_WM_NAME(UTF8_STRING) = 0x53, 0x6f, 0x63, 0x69, 0xc3, 0xa9, 0x74,
0xc3, 0xa9, 0x20, 0x4a, 0x75, 0x6c, 0x65, 0x73, 0x20, 0x56, 0x65, 0x72,
0x6e, 0x65, 0x20, 0x2d, 0x20, 0x4d, 0x6f, 0x7a, 0x69, 0x6c, 0x6c, 0x61,
0x20, 0x46, 0x69, 0x72, 0x65, 0x66, 0x6f, 0x78

However the actual title (and f.identify) show

Socit Jules Verne - Mozilla Firefox

That is, the non-ascci characters disappear. Note that the property
WM_NAME is now a STRING rather than COMPOUND_NAME.
You can also look at the Turkish page at
<http://jv.gilead.org.il/FAQ/index.tu.html>. Some nonascii character
disappear.

3) If you have punctuation like the single right quote (U+2019), etc.,
which are represented in Unicode in the U+2000 range, and represented in
UTF-8 by THREE bytes, you get extra spaces. E.g., 
http://jv.gilead.org.il/. In this case, xprop gives

WM_NAME(COMPOUND_TEXT) = "Zvi Har’El’s Jules Verne Collection - Mozilla
Firefox"
_NET_WM_NAME(UTF8_STRING) = 0x5a, 0x76, 0x69, 0x20, 0x48, 0x61, 0x72,
0xe2, 0x80, 0x99, 0x45, 0x6c, 0xe2, 0x80, 0x99, 0x73, 0x20, 0x4a, 0x75,
0x6c, 0x65, 0x73, 0x20, 0x56, 0x65, 0x72, 0x6e, 0x65, 0x20, 0x43, 0x6f,
0x6c, 0x6c, 0x65, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x20, 0x2d, 0x20, 0x4d,
0x6f, 0x7a, 0x69, 0x6c, 0x6c, 0x61, 0x20, 0x46, 0x69, 0x72, 0x65, 0x66,
0x6f, 0x78

The actual title (and f.identify) show

Zvi Har’ El’ s Jules Verne Collection - Mozilla Firefox

with spaces after the aposroph (single right quote).

As you can see from the examples, the generated X windows have always
the right properties (in my locale, en_US.UTF8),  but CTWM
interpretation is to be improved.

Zvi.

-- 
Dr. Zvi Har'El      mailto:[EMAIL PROTECTED]    Department of Mathematics
tel:+972-54-4227607 icq:179294841    Technion - Israel Institute of Technology
fax:+972-4-8293388  http://www.math.technion.ac.il/~rl/    Haifa 32000, ISRAEL
"If you can't say somethin' nice, don't say nothin' at all." -- Thumper (1942)

Reply via email to