Re: [Fonts]Re: [I18n]Unicode coverage for languages
On Mon, Jul 08, 2002, Jungshik Shin wrote: On Mon, 8 Jul 2002, Edward Lee wrote: There are `two' Traditional Chinese fonts here. In zh-tw the radical/stroke of some glyphs are differrent with the TC glyphs in GB18030 fonts. Could you give Unicode code points of a few of those characters? Have you checked them out at your own government's Han character variant dictionary at http://140.111.1.40? Yes, I know the site. The examples are U+89D2(Big5 0xa8a4), U+904E(Big5 0xb94c), U+9AA8(Big5 0xb0a9), U+5433(Big5 0xa764), ... Of course including that glyphs contained those radicals. And try to compaire with the following fonts: ftp://cle.linux.org.tw/pub/fonts/fonts/twmoefont/ttf/ Some of Arphic font(bsmi00lp.ttf) use GB18030 fonts convention. So if we(zh-TW) use GB18030 fonts, it will confuse our school( teacher and student) and/or government. cause we can't find those glyphs in our dictionary. By 'our dictionary', did you mean all dictionaries used in Taiwan or just some small (not so extensive) dictionaries supposedly used by (elementary) school children? I have Kang-Shi Chinese dictionary and The New Yutang Chinese-English Dictionary(The Chinese University of Hon Kong) and three other Chinese dictionaries(not so small, 2464 pages). There are some writting convention of the `two' Traditional Chinese fonts, so if possible zh-TW should use Ming typeface, especially for the school(education) use. Some are variant, but some not. Edward G.J. Lee ___ Fonts mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/fonts
Re: [Fonts]Re: [I18n]Unicode coverage for languages
Edward Lee wrote: On Mon, Jul 08, 2002, Jungshik Shin wrote: On Mon, 8 Jul 2002, Edward Lee wrote: There are `two' Traditional Chinese fonts here. In zh-tw the radical/stroke of some glyphs are differrent with the TC glyphs in GB18030 fonts. Could you give Unicode code points of a few of those characters? Have you checked them out at your own government's Han character variant dictionary at http://140.111.1.40? Yes, I know the site. The examples are U+89D2(Big5 0xa8a4), U+904E(Big5 0xb94c), U+9AA8(Big5 0xb0a9), U+5433(Big5 0xa764), Yes, you are right. There are differences in how a Chinese character is written. The examples you mentioned above are well documented in almost all the Chinese dictionaries published in mainland China. There is always a Cross Reference Table for New and Old Glyph Forms in those dictionaries. What happened was: Because there were so many small variants in Chinese character forms over the thousands of years, there was an effort going on to standardize those forms at least in printing. For one reason or another, a particular form was picked which resulted the differences we see today. Now, back to a Unicode font which covers CJK Unified Ideographs and Extension A. One such example is SimSun18030.ttc. Its OS/2 table indicates it is intended for traditional Chinese. Is it correct? Of cause, yes! Because traditional Chinese is also used in mainland China and SimSun18030.ttc provides those traditional characters in a form some of which may not be used to people in HK and TW. Now another font covering CJK Unified Ideographs and its Extension A, MING_UNI.TTF. Its OS/2 table also indicates it is intended for both simplified and traditional Chinese. But those glyph forms are in the form people in HK and TW are used to. These two fonts have similar coverage for Chinese characters (except MING_UNI.TTF has some unique Cantonese characters in PUA). So from coverage one can not tell if it is for zh_CN or zh_HK/zh_TW. Their OS/2 table correctly states that both are supporting simplified and traditional Chinese. So from OS/2 table one still can not tell if a font is intended for zh_CN or zh_HK/zh_TW. I know most people think zh_CN = simplefied Chinese and zh_TW = tranditional Chinese It is only mostly true but not exactly true. Especially for fonts covering CJK Unified Ideographs and its Extension A, it is wrong. With the rapid adoption of Unicode, we are going to see more fonts from different regions covering CJK Unified Ideographs and its Extension A. The only way to find out which Han variant one font has is by looking at it. Coverage doesn't help us here and OS/2 table won't too. As long as we can configure it properly, zh_CN, zh_HK and zh_TW etc really do not matter. Regards, Yao Zhang ___ Fonts mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/fonts
[Fonts]Re: [I18n]Using current locale in font selection
Keith Packard [EMAIL PROTECTED] writes: Much as I hate the C locale model, I'm wondering if I shouldn't use the current locale as a language hint where applications don't provide explicit language information when selecting fonts. This would make the generic aliases (like sans-serif) pick a font appropriate for the locale instead of some random font most likely suitable for Latin languages. Or would this only lead to confusion and chaos? I think this is right; it's the only way that we'll get reasonably consistent font choice between places where there is a heavyweight system on top of Xft. (Pango, Mozilla, etc) and places that are using Xft directly. My language hack did this; feel free to use the code below if it helps. Regards, Owen --- fontconfig/src/fccfg.c.langtag Fri Jun 21 02:14:45 2002 +++ fontconfig/src/fccfg.c Mon Jun 24 14:09:35 2002 @@ -23,6 +23,7 @@ */ #include fcint.h +#include locale.h FcConfig*_fcConfig; @@ -1059,6 +1060,50 @@ FcPatternDel (p, object); } +static const FcChar8 canon_map[256] = { + 0, 0, 0, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, '-', 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0, 0, 0, + 0, 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', + 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 0, 0, 0, 0, '-', + 0, 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', + 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 0, 0, 0, 0, 0 +}; + +static FcChar8 * +FcGetDefaultLanguage (void) +{ +FcChar8 *locale; +FcChar8 *result; +FcChar8 *p, *q; + +locale = (FcChar8 *)setlocale (LC_CTYPE, NULL); +result = malloc (strlen (locale + 1)); +if (!result) + return 0; + +p = locale; +q = result; +while (*p) +{ + FcChar8 value; + + if (*p == '.' || *p == '@') + break; + + value = canon_map[*p]; + if (value) + *(q++) = value; + + p++; +} + +*q = 0; + +return result; +} + FcBool FcConfigSubstitute (FcConfig *config, FcPattern *p, @@ -1070,7 +1115,8 @@ FcTest *t; FcEdit *e; FcValueList*l; - +FcValue v; + if (!config) { config = FcConfigGetCurrent (); @@ -1078,6 +1124,18 @@ return FcFalse; } +if (FcPatternGet (p, language, 0, v) == FcResultNoMatch) +{ + FcChar8 *language; + + language = FcGetDefaultLanguage (); + if (!language) + return FcFalse; + + FcPatternAddString (p, language, language); + free (language); +} + st = (FcSubState *) malloc (config-maxObjects * sizeof (FcSubState)); if (!st config-maxObjects) return FcFalse; ___ Fonts mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/fonts
Re: [Fonts]Re: [I18n]Using current locale in font selection
Owen Taylor wrote: +locale = (FcChar8 *)setlocale (LC_CTYPE, NULL); +result = malloc (strlen (locale + 1)); Should be strlen(locale) + 1. Erik ___ Fonts mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/fonts
[Fonts]Re: [I18n]Using current locale in font selection
Around 14 o'clock on Jul 8, Owen Taylor wrote: +locale = (FcChar8 *)setlocale (LC_CTYPE, NULL); Don't you mean LC_MESSAGES? If so, I think we should be able to use this return value almost raw; stripping out the language and territory codes and passing them in as FC_LANG, right? (no case conversion is necessary, FC_LANG comparisons are already case insensitive). Keith PackardXFree86 Core TeamHP Cambridge Research Lab ___ Fonts mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/fonts
Re: [Fonts] [I18n] language tags in fontconfig
On Sun, 2002-07-07 at 21:59, Keith Packard wrote: While I've never seen ñ in my limited exposure to French, I don't find it impossible to believe that it occurs in some limited contexts, perhaps for place names along the border with Spain? My french peers say: Nope, never, ever. No doubt. Cheers, -- Joaquín Cuenca Abela [EMAIL PROTECTED] ___ Fonts mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/fonts
[Fonts]Re: [I18n]Using current locale in font selection
On Mon, 8 Jul 2002, Keith Packard wrote: Around 14 o'clock on Jul 8, Owen Taylor wrote: +locale = (FcChar8 *)setlocale (LC_CTYPE, NULL); Don't you mean LC_MESSAGES? I believe it should be LC_CTYPE. Some people like me have the following because English menu and (error) messages are easier to understand than not-so-good translation. LC_CTYPE=ko_KR.eucKR LC_MESSAGES=C LC_PAPER=en_US # because the US doesn't use ISO std. paper size . or LC_CTYPE=ko_KR.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 . If so, I think we should be able to use this return value almost raw; stripping out the language and territory codes and passing them in as FC_LANG, right? Did you mean that only codeset part is relevant here and we can go without relying on lang and territory codes? The codeset doesn't carry any lang-specific information if UTF-8 locale is used. Jungshik ___ Fonts mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/fonts
[Fonts]Re: [I18n]Using current locale in font selection
Keith Packard [EMAIL PROTECTED] writes: Around 14 o'clock on Jul 8, Owen Taylor wrote: +locale = (FcChar8 *)setlocale (LC_CTYPE, NULL); Don't you mean LC_MESSAGES? If so, I think we should be able to use this return value almost raw; stripping out the language and territory codes and passing them in as FC_LANG, right? Interesting question... My opinion is that LC_CTYPE is right... if someone does something like: LANG=ja_JP.UTF-8 LC_MESSAGES=en_US.UTF-8 They typically mean I want to process Japanese text, but show me messages in English. We should then be picking fonts that work well for Japanese. It's very seldom that anybody would have a LC_MESSAGES value that wasn't displayable in their LC_CTYPE; that would be nonsensical... LC_CTYPE is defined by POSIX to specify Character classification and case conversion, while LC_MESSAGES is Formats of informative and diagnostic messages and interactive responses. Neither is an exact match here, but I'd argue that the typical use of mixed locales makes LC_CTYPE more useful. (no case conversion is necessary, FC_LANG comparisons are already case insensitive). This is code taken from GTK+ ... case conversions were mostly for elegance, so that the form returned by get_default_lang() matched the canonical form of the Pango language tags. Regards, Owen ___ Fonts mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/fonts