Re: [Fonts]fontconfig peculiarity(??)

2002-10-18 Thread Jungshik Shin



On Fri, 18 Oct 2002, Keith Packard wrote:

 Around 7 o'clock on Oct 18, Jungshik Shin wrote:

  For some unknown reason, 'New Gulim' is picked up by 'fontconfig' or 'Xft'
  for a certain characters when CODE2000 is explicitly requested by
  applications like Mozilla and gedit (via Pango) More specifically, those
  certain characters are U+115F(Hangul leading consonant filler) and
  U+1160(Hangul trailing consonant filler).

 Fontconfig has a kludge to weed out fonts with broken encoding tables;
 such fonts often have encoding table entries pointing at blank glyphs
 which aren't supposed to be blank.  It checks each glyph in the encoding
 and ignores those which are inappropriately blank.

 which are expected to be blank, that list was derived from a similar table
 in Mozilla.  Blank glyphs not in the table are assumed to represent broken

  Keith, we talked about this a month ago (Sep. 7th) on this very list
:-) You came up with  a much more extensive  list of characters than
Mozilla's blank glyph list. I also filed a bug for Mozilla-Windows
(http://bugzilla.mozilla.org/show_bug.cgi?id=167136).   You must have
forgottent about it. :-) I added those two characters to the blank glyph
list  /etc/fonts/fonts.conf then. In addition, both Ngulim and Code2000
have blank glyphs for both characters. The only difference is that in
Ngulim they're both *spacing*(width  0) while in Code2000 only U+115f
is spacing and U+1160 is non-spacing(width=0). So, even if my blank
glyph list doesn't have them, there's no reason I can think of Ngulim is
preferred over Code2000 for those characters. If they're equal on this
count, the explicit request seems to have to take precedence, doesn't it?

 One possible explanation is that Code2000 isn't marked as supporting
'ko' in font-cache for some reason while Ngulim is. However, both fonts
have more or less similar coverage of Korean characters (the full set
of precomposed syllables and Hangul Conjoining Jamos and other symbols
in KS X 1001). So, this is a bit mytery, too.

 weren't included in the table.   This means that no font will ever be
 listed as supporting these glyphs, so Mozilla will pick the first font in
 the match list to draw them with, expecting that this will produce a
 missing glyph indication.

  BTW, could it be possible to 'deceive' or 'force' fontconfig
to believe that a certain font covers a certain range of Unicode
even if it doesn't appear to? I guess it's not possible at the moment,
but wouldn't it be nice to add it? What I'm thinking of is something
like this:

match target=font
  test qual=any name=familystringGulim Old Hangul Jamo/string/test
  edit name=coverage mode=assign binding=strong
  coderanges./coderanges/edit
/match

where coderanges  are a comman-separated list of unicode code points
(integer) or code ranges (sth. like [0x-0x]).

I found in font cache file that charset property does exactly the
thing I want to do with 'coderanges'. If so, would it be possible
to use 'charset' to achieve what I described above?  Well, I've
gotta figure out how  'charset' represents Unicode ranges.

Some fonts have a hack-encoding (although advertised as in Unicode)
and their apparent Unicode coverage cannot be guessed at all by
fontconfig based on Unicode cmap. An application or library aware of this
hack-encoding can do some hack with them, though. However, fontconfig does
not appear to return a requested font and come up with a fallback after
'intelligent guess' even if explicitly specified because what it thinks
a font with hack-encoding can cover does not match at all the range of
Unicode an application want to draw with the font.

It'd be also nice to be able to do  something similar with 'lang' tag.
I thought the following line would *make* fontconfig *believe* (*ignoring*
what it finds out with OS lang tag and orthography map) that 'Gulim Old
Hangul Jamo' is suitable for Korean,  but it doesn't seem to work. Did
I do anything wrong?

---
match target=font
test qual=any name=familystringGulim Old Hangul Jamo/string/test
edit name=lang mode=assign binding=strongstringko/string/edit
/match
---

  Both of these certaily look like a hack, but some applications
(perhaps mathml, Indic script handling, Korean  alphabet handling...)
need them until OTFs are widely available. Related problems are talked
about at

  http://bugzilla.mozilla.org/show_bug.cgi?id=126919#c315 and comments
references therein.

  http://bugzilla.mozilla.org/show_bug.cgi?id=95708


   Jungshik

___
Fonts mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/fonts



Re: [Fonts]fontconfig peculiarity(??)

2002-10-18 Thread Keith Packard

Around 12 o'clock on Oct 18, Jungshik Shin wrote:

 Keith, we talked about this a month ago (Sep. 7th) on this very list
 :-)

Sorry, I didn't look at the email address from your previous message.

 One possible explanation is that Code2000 isn't marked as supporting 'ko'
 in font-cache for some reason while Ngulim is.

If your font specification includes language, this would cause Ngulim to 
be preferred over Code2000 if both are added to the pattern in the config 
file.  If the application explicitly names 'Code2000' as a family name, 
then the language shouldn't matter.

Code2000 isn't marked as supporting Korean as it is missing a large number
of Han glyphs, totalling some 3136 characters from the KSC 5601-1992
encoding.  Many Korean documents will not be completely covered by this
font.  It also isn't marked as supporting Japanese or any of the Chinese
languages.

Keith PackardXFree86 Core TeamHP Cambridge Research Lab


___
Fonts mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/fonts



Re: [Fonts]fontconfig peculiarity(??)

2002-10-18 Thread Jungshik Shin
On Fri, 18 Oct 2002, Keith Packard wrote:

 Around 12 o'clock on Oct 18, Jungshik Shin wrote:

  One possible explanation is that Code2000 isn't marked as supporting 'ko'
  in font-cache for some reason while Ngulim is.

  This explanation only makes sense when those two chars are NOT
included in the blank glyph list, doesn't it?  As I wrote, they've have
been in the blank glyph list in my fonts.conf since early September.

  Hmm, things are getting more interesting. After I removed Ngulim.ttf
from my font path and then put it back (I ran fc-cache before testing),
suddenly Mozilla picks up U+1160 glyph from Code2000. The same is true of
'gedit' when Code2000 is specified as a font to use. Is it at the
whim of electrons whirling around inside my computer :-) ?


 If your font specification includes language, this would cause Ngulim to
 be preferred over Code2000 if both are added to the pattern in the config
 file.  If the application explicitly names 'Code2000' as a family name,
 then the language shouldn't matter.

  The page in question (http://jshin.net/i18n/korean/hunmin.html
and http://jshin.net/i18n/korean/hunmin_comp.html) specifies font-family
to be CODE2000 explicitly with CSS. I assume this will make Mozilla with
Xft enabled ask fontconfig for that font explicitly.

  As for Pango(gedit), I'm less certain because I don't know whether
Pango specifies language when sending  fonts request down(or up) the
road.

  Therefore, my original mystery still remains a mystery :-)

 Code2000 isn't marked as supporting Korean as it is missing a large number
 of Han glyphs, totalling some 3136 characters from the KSC 5601-1992
 encoding.  Many Korean documents will not be completely covered by this

  Sorry I didn't check Han glyphs only checking that it has the full set
of precomposed Hangul syllables(11,172 of them.). As I suggested before, a
kind of multi-level orthography check may be necessary to cope with situations
like this. Or, would it be possible for users to override manually
what fontconfig *detects* (both code range coverage and lang) in
fonts.conf as suggested in my prev. email?

   Jungshik

___
Fonts mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/fonts



Re: [Fonts]fontconfig peculiarity(??)

2002-10-18 Thread Keith Packard

Around 16 o'clock on Oct 18, Jungshik Shin wrote:

   Hmm, things are getting more interesting. After I removed Ngulim.ttf
 from my font path and then put it back (I ran fc-cache before testing),
 suddenly Mozilla picks up U+1160 glyph from Code2000. The same is true of
 'gedit' when Code2000 is specified as a font to use. Is it at the
 whim of electrons whirling around inside my computer :-) ?

fc-cache ignores directories which are older than the associated cache 
file; you have to use the '-f' option to force it to rescan the files.  
The cache holds the list of available characters in each font, so a 
failure to update the cache could easily have been the source of this 
problem.

Note that fc-cache doesn't rescan directories when the configuration 
changes; the only configuration option which affects the resulting cache 
file is the blank glyph list, which isn't expected to (ever) change aside 
from bug fixes.

  The page in question (http://jshin.net/i18n/korean/hunmin.html
 and http://jshin.net/i18n/korean/hunmin_comp.html) specifies font-family
 to be CODE2000 explicitly with CSS. I assume this will make Mozilla with
 Xft enabled ask fontconfig for that font explicitly.

Yes it does.

  As for Pango(gedit), I'm less certain because I don't know whether
 Pango specifies language when sending  fonts request down(or up) the
 road.

I don't know either, but fontconfig will pick up the current locale and 
convert that to a language if Pango doesn't explicitly set one.

 As I suggested before, a kind of multi-level orthography check may be
 necessary to cope with situations like this. Or, would it be possible for
 users to override manually what fontconfig *detects* (both code range
 coverage and lang) in fonts.conf as suggested in my prev. email?

I believe Korean may be unique in this reguard; I don't know of other 
languages with multiple common character sets which are essentially 
independently usable.  Japanese has kana and kanji, but there are strong 
conventions on which words are spelled in each set.

As per my comment above, I strongly prefer to make the contents of the 
cache files independent of the configuration so that multiple 
configurations can share the same cache files without difficulty.

Remember that the language name is just a shorthand notation for a
unicode coverage table; if you want to identify fonts with Hangul 
syllables, you can easily build a charset encompassing those and ask for 
the font covering the greatest number.

Keith PackardXFree86 Core TeamHP Cambridge Research Lab



___
Fonts mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/fonts