Re: [Fonts]Re: [I18n]Unicode coverage for languages

2002-07-08 Thread Edward Lee

On Mon, Jul 08, 2002, Jungshik Shin wrote:
 On Mon, 8 Jul 2002, Edward Lee wrote:
 
There are `two' Traditional Chinese fonts here. In zh-tw the
radical/stroke of some glyphs are differrent with the TC glyphs
in GB18030 fonts.
 
   Could you give Unicode code points of a few of those characters?
 Have you checked them out at your own government's Han character variant
 dictionary at http://140.111.1.40?

  Yes, I know the site.

  The examples are U+89D2(Big5 0xa8a4), U+904E(Big5 0xb94c),
   U+9AA8(Big5 0xb0a9), U+5433(Big5 0xa764),
   ...

  Of course including that glyphs contained those radicals.

  And try to compaire with the following fonts:

  ftp://cle.linux.org.tw/pub/fonts/fonts/twmoefont/ttf/

  Some of Arphic font(bsmi00lp.ttf) use GB18030 fonts convention.

So if we(zh-TW) use GB18030 fonts, it will confuse our school(
teacher and student) and/or government. cause we can't find those
glyphs in our dictionary.
 
   By 'our dictionary', did you mean all dictionaries used in Taiwan
 or just some small (not so extensive) dictionaries supposedly used by
 (elementary) school children?

  I have Kang-Shi Chinese dictionary and The New Yutang Chinese-English
  Dictionary(The Chinese University of Hon Kong) and three other
  Chinese dictionaries(not so small, 2464 pages).

  There are some writting convention of the `two' Traditional Chinese
  fonts, so if possible zh-TW should use Ming typeface, especially
  for the school(education) use. Some are variant, but some not.



Edward G.J. Lee
___
Fonts mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/fonts



Re: [Fonts]Re: [I18n]Unicode coverage for languages

2002-07-08 Thread Yao Zhang

Edward Lee wrote:

 On Mon, Jul 08, 2002, Jungshik Shin wrote:
  On Mon, 8 Jul 2002, Edward Lee wrote:
  
 There are `two' Traditional Chinese fonts here. In zh-tw the
 radical/stroke of some glyphs are differrent with the TC glyphs
 in GB18030 fonts.
  
Could you give Unicode code points of a few of those characters?
  Have you checked them out at your own government's Han character variant
  dictionary at http://140.111.1.40?
 
   Yes, I know the site.
 
   The examples are U+89D2(Big5 0xa8a4), U+904E(Big5 0xb94c),
U+9AA8(Big5 0xb0a9), U+5433(Big5 0xa764),

Yes, you are right.  There are differences in how a Chinese character
is written.  The examples you mentioned above are well documented in
almost all the Chinese dictionaries published in mainland China.  There
is always a Cross Reference Table for New and Old Glyph Forms in
those dictionaries.

What happened was: Because there were so many small variants in
Chinese character forms over the thousands of years, there was an
effort going on to standardize those forms at least in printing.  For
one reason or another, a particular form was picked which resulted
the differences we see today.

Now, back to a Unicode font which covers CJK Unified Ideographs and
Extension A.  One such example is SimSun18030.ttc.  Its OS/2 table
indicates it is intended for traditional Chinese.  Is it correct?
Of cause, yes!  Because traditional Chinese is also used in mainland
China and SimSun18030.ttc provides those traditional characters in a
form some of which may not be used to people in HK and TW.

Now another font covering CJK Unified Ideographs and its Extension A,
MING_UNI.TTF.  Its OS/2 table also indicates it is intended for both
simplified and traditional Chinese.  But those glyph forms are in the
form people in HK and TW are used to.

These two fonts have similar coverage for Chinese characters (except
MING_UNI.TTF has some unique Cantonese characters in PUA).  So from
coverage one can not tell if it is for zh_CN or zh_HK/zh_TW.  Their
OS/2 table correctly states that both are supporting simplified and
traditional Chinese.  So from OS/2 table one still can not tell if
a font is intended for zh_CN or zh_HK/zh_TW.

I know most people think
zh_CN = simplefied Chinese
and
zh_TW = tranditional Chinese
It is only mostly true but not exactly true.  Especially for fonts
covering CJK Unified Ideographs and its Extension A, it is wrong.

With the rapid adoption of Unicode, we are going to see more fonts
from different regions covering CJK Unified Ideographs and its Extension A.
The only way to find out which Han variant one font has is by looking at
it.  Coverage doesn't help us here and OS/2 table won't too.  As long
as we can configure it properly, zh_CN, zh_HK and zh_TW etc really do not
matter.

Regards,

Yao Zhang
___
Fonts mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/fonts



[Fonts]Re: [I18n]Using current locale in font selection

2002-07-08 Thread Owen Taylor


Keith Packard [EMAIL PROTECTED] writes:

 Much as I hate the C locale model, I'm wondering if I shouldn't use the 
 current locale as a language hint where applications don't provide 
 explicit language information when selecting fonts.  This would make
 the generic aliases (like sans-serif) pick a font appropriate for the 
 locale instead of some random font most likely suitable for Latin 
 languages.
 
 Or would this only lead to confusion and chaos?

I think this is right; it's the only way that we'll get reasonably
consistent font choice between places where there is a heavyweight
system on top of Xft. (Pango, Mozilla, etc) and places that are
using Xft directly.

My language hack did this; feel free to use the code below if
it helps. 

Regards,
Owen


--- fontconfig/src/fccfg.c.langtag  Fri Jun 21 02:14:45 2002
+++ fontconfig/src/fccfg.c  Mon Jun 24 14:09:35 2002
@@ -23,6 +23,7 @@
  */
 
 #include fcint.h
+#include locale.h
 
 FcConfig*_fcConfig;
 
@@ -1059,6 +1060,50 @@
FcPatternDel (p, object);
 }
 
+static const FcChar8 canon_map[256] = {
+   0,   0,   0,   0,   0,   0,   0,   0,0,   0,   0,   0,   0,   0,   0,   0, 
+   0,   0,   0,   0,   0,   0,   0,   0,0,   0,   0,   0,   0,   0,   0,   0, 
+   0,   0,   0,   0,   0,   0,   0,   0,0,   0,   0,   0,   0,  '-',  0,   0, 
+   0,   0,   0,   0,   0,   0,   0,   0,0,   0,   0,   0,   0,   0,   0,   0, 
+   0,  'a', 'b', 'c', 'd', 'e', 'f', 'g',  'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o',
+  'p', 'q', 'r', 's', 't', 'u', 'v', 'w',  'x', 'y', 'z',  0,   0,   0,   0,  '-',
+   0,  'a', 'b', 'c', 'd', 'e', 'f', 'g',  'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o',
+  'p', 'q', 'r', 's', 't', 'u', 'v', 'w',  'x', 'y', 'z',  0,   0,   0,   0,   0
+};
+   
+static FcChar8 *
+FcGetDefaultLanguage (void)
+{
+FcChar8 *locale;
+FcChar8 *result;
+FcChar8 *p, *q;
+
+locale = (FcChar8 *)setlocale (LC_CTYPE, NULL);
+result = malloc (strlen (locale + 1));
+if (!result)
+   return 0;
+
+p = locale;
+q = result;
+while (*p)
+{
+   FcChar8 value;
+
+   if (*p == '.' || *p == '@')
+   break;
+
+   value = canon_map[*p];
+   if (value)
+   *(q++) = value;
+
+   p++;
+}
+
+*q = 0;
+
+return result;
+}
+
 FcBool
 FcConfigSubstitute (FcConfig   *config,
FcPattern   *p,
@@ -1070,7 +1115,8 @@
 FcTest *t;
 FcEdit *e;
 FcValueList*l;
-
+FcValue v;
+
 if (!config)
 {
config = FcConfigGetCurrent ();
@@ -1078,6 +1124,18 @@
return FcFalse;
 }
 
+if (FcPatternGet (p, language, 0, v) == FcResultNoMatch)
+{
+   FcChar8 *language;
+
+   language = FcGetDefaultLanguage ();
+   if (!language)
+   return FcFalse;
+   
+   FcPatternAddString (p, language, language);
+   free (language);
+}
+
 st = (FcSubState *) malloc (config-maxObjects * sizeof (FcSubState));
 if (!st  config-maxObjects)
return FcFalse;
___
Fonts mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/fonts



Re: [Fonts]Re: [I18n]Using current locale in font selection

2002-07-08 Thread Erik van der Poel

Owen Taylor wrote:
 
 +locale = (FcChar8 *)setlocale (LC_CTYPE, NULL);
 +result = malloc (strlen (locale + 1));

Should be strlen(locale) + 1.

Erik
___
Fonts mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/fonts



[Fonts]Re: [I18n]Using current locale in font selection

2002-07-08 Thread Keith Packard


Around 14 o'clock on Jul 8, Owen Taylor wrote:

 +locale = (FcChar8 *)setlocale (LC_CTYPE, NULL);

Don't you mean LC_MESSAGES?  If so, I think we should be able to use this 
return value almost raw; stripping out the language and territory codes and
passing them in as FC_LANG, right?

(no case conversion is necessary, FC_LANG comparisons are already case 
insensitive).

Keith PackardXFree86 Core TeamHP Cambridge Research Lab


___
Fonts mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/fonts



Re: [Fonts] [I18n] language tags in fontconfig

2002-07-08 Thread Joaquín Cuenca Abela

On Sun, 2002-07-07 at 21:59, Keith Packard wrote:
 
 While I've never seen ñ in my limited exposure to French, I don't find it 
 impossible to believe that it occurs in some limited contexts, perhaps for 
 place names along the border with Spain?

My french peers say: Nope, never, ever.  No doubt.  

Cheers,

-- 
Joaquín Cuenca Abela
[EMAIL PROTECTED]

___
Fonts mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/fonts



[Fonts]Re: [I18n]Using current locale in font selection

2002-07-08 Thread Jungshik Shin




On Mon, 8 Jul 2002, Keith Packard wrote:


 Around 14 o'clock on Jul 8, Owen Taylor wrote:

  +locale = (FcChar8 *)setlocale (LC_CTYPE, NULL);

 Don't you mean LC_MESSAGES?

  I believe it should be LC_CTYPE. Some people like me
have the following because English menu and (error) messages are easier
to understand than not-so-good translation.


  LC_CTYPE=ko_KR.eucKR
  LC_MESSAGES=C
  LC_PAPER=en_US   # because the US doesn't use ISO std. paper size
  .

  or

  LC_CTYPE=ko_KR.UTF-8
  LC_MESSAGES=en_US.UTF-8
  LC_PAPER=en_US.UTF-8
  .


 If so, I think we should be able to use this
 return value almost raw; stripping out the language and territory codes and
 passing them in as FC_LANG, right?

  Did you mean that only codeset part is relevant here and we can
go without relying on lang and territory codes? The codeset  doesn't
carry any lang-specific information if UTF-8 locale is used.

   Jungshik

___
Fonts mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/fonts



[Fonts]Re: [I18n]Using current locale in font selection

2002-07-08 Thread Owen Taylor


Keith Packard [EMAIL PROTECTED] writes:

 Around 14 o'clock on Jul 8, Owen Taylor wrote:
 
  +locale = (FcChar8 *)setlocale (LC_CTYPE, NULL);
 
 Don't you mean LC_MESSAGES?  If so, I think we should be able to use this 
 return value almost raw; stripping out the language and territory codes and
 passing them in as FC_LANG, right?

Interesting question... My opinion is that LC_CTYPE is right... if
someone does something like:

 LANG=ja_JP.UTF-8 LC_MESSAGES=en_US.UTF-8 

They typically mean I want to process Japanese text, but show me
messages in English. 

We should then be picking fonts that work well for Japanese.

It's very seldom that anybody would have a LC_MESSAGES value that
wasn't displayable in their LC_CTYPE; that would be nonsensical...

LC_CTYPE is defined by POSIX to specify Character classification and
case conversion, while LC_MESSAGES is Formats of informative and
diagnostic messages and interactive responses. Neither is an exact
match here, but I'd argue that the typical use of mixed locales 
makes LC_CTYPE more useful.

 (no case conversion is necessary, FC_LANG comparisons are already case 
 insensitive).

This is code taken from GTK+ ... case conversions were mostly for
elegance, so that the form returned by get_default_lang() matched the
canonical form of the Pango language tags.

Regards,
Owen
___
Fonts mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/fonts