Freetype, fontconfig,Xft, Mozilla and Non-BMP char. support

Jungshik Shin Sun, 01 Dec 2002 05:27:29 -0800

On Thu, 28 Nov 2002, Jungshik Shin wrote:

> On Thu, 28 Nov 2002, Owen Taylor wrote:
>
> > The path to adding full beyond-the-BMP support to Pango is
> > pretty straightforward. (I'm a little suprised that it doesn't
> > sort of work now for TrueType fonts, but I haven't tested
> > it at all.)
>
>  So, what I wrote about 'UTF-32 cleanness' was not the case. There are
> some libraries that support BMP only for the momemnt. As for Pango,


   It turned out that Pango/Glib are not the only libraries that
need to be modified a bit to be UTF-32-clean.  Xft and Freetype2 also
have a little problem with UTF-32 support although all the data structures
used in both have been UTF-32-clean from the beginning.

  Because MS IE5.5 or later can render non-BMP characters well with
fonts like Code2001, I decided to put Mozilla on par with it.
It's relatively easy(for Mozilla-Xft) and I went so far as to get
Mozilla to draw  nice 6-hex digit unknown character glyphs for unknown
non-BMP characters. (for unknown chara. in BMP, it still draws 4-hex
digit unkn. char. glyph). It's good to see 6-digit unknown char. glyph
work, but it's disappointing to see them show up even for characters
covered by a font(CODE2001.TTF) I have on my fontconfig-search path and I
explicitly specified to use via CSS.

  Why? It's simple. FcCharSetHasChar is returning false for
non-BMP characters (e.g. U+10331) although Code2001 has them.  I wrote
a couple of test programs, one to test fontconfig and the other to
freetype2(2.1.3. the latest stable release released a couple of weeks
ago) [1]. I found out the cause and I'm gonna enclose my patch.

  Why does FcCharSetHasChar fail for non-BMP characters?  It's because
fontconfig calls FT_Select_CharMap() instead of FT_Set_CharMap().
fontconfig doesn't use the latter apparently because Keith didn't want
to deal with   encodings (to make fontconfig portable and to be able
to deal with legacy multibyte encodings, it needs to have a built-in
conversion routine, which would bloat the size of fontconfig.) other
than Unicode, AppleRoman and AdobeSymbol.

 When FT_Select_CharMap() is called with 'FT_ENCODING_UNICODE'(or
deprecated ft_encoding_unicode), freetype activates the first cmap with
Unicode encoding for subsequent operations on a font until another cmap
is activated. It's not a problem for fonts covering BMP only. However,
fonts like Code2001 has multiple Cmaps all with the identical symbolic
FT encoding 'FT_ENCODING_UNICODE' but with different char. coverage.
Code2001 has 4 cmaps, pid=0,eid=0(Unicode), pid=1,eid=1(AppleRoman),
pid=3(MS),eid=1(Unicode) and  pid=3(MS),eid=10(Unicode).  Only the last
cmap has non-BMP characters although the first and the third are also
Unicode cmap. They're actually UCS-2 cmap. As mentioned above, Freetyp2
makes the first cmap matching 'symbolic encoding name' active and
unfortunately that happens to be the one not covering non-BMP characters.

  One may say that the font (CODE2001) is to blame and pid=0/eid=0
and pid=3/eid=3 cmaps should have non-BMP characters covered as
well. However, it's not very clear that it has to according to the MS
document at <http://www.microsoft.com/typography/otspec/cmap.htm>. Even if
it has to, I think Freetype2 has to be defensive and provide a workaround
because there may be some fonts lying around with similar problems.

  One possible solution is to return not the first
cmap table matching the symbolic encoding name of 'FT_ENCODING_UNICODE'
but to keep on looking to see if pid=3/eid=10 cmap is also present.
If it is, it has to be activated instead of the first Unicode cmap found.

  Alternative is to introduce a new symbolic encoding name,
'FT_ENCODING_UCS4' (or UTF32) to distinguish pid=3/eid=10 cmap from other
unicode cmaps (pid=0/eid=0, pid=1/eid=?, pid=3/eid=1) which appear to
be UCS-2 only in most cases. In that case, consumers of FT2 libraries
(e.g. fontconfig) have to be modified as well. If non-BMP chars.
are dealt with, FT_ENCODING_UCS4 cmap has to be requested instead of
FT_ENCODING_UNICODE.

  I thought the first is better (with a little performance penalty
arising from having to keep on looking after hitting the
first Unicode cmap) and it worked well with Mozilla (see
<http://bugzilla.mozilla.org/show_bug.cgi?id=182877>).

  I also have to extend XftTextExtents16() included in  fcpackage-2.1
to deal with UTF-16 (instead of UCS-2). Xft2 has XftDrawStringUtf16() in
addition to XftDrawString16() (the latter is for UCS-2).  I thought about
adding XftTextExtentsUtf16(), but it appears that it's more convenient for
programs like Mozilla which uses UTF-16 for internal string representation
when XftTextExtents16() is extended to support UTF-16. Again, there's a
little speed penalty.

  Below are links to FT2 patch (against 2.1.3) and Xft patch
  (against fcpackage 2.1)

 http://bugzilla.mozilla.org/attachment.cgi?id=107852 : FT2 patch
 http://bugzilla.mozilla.org/attachment.cgi?id=107858 : Xft patch

There are a couple of screenshots  along with Mozilla patch and
a couple of sample pages with non-BMP characters at

 http://bugzilla.mozilla.org/show_bug.cgi?id=182877

 I believe Werner is on this list so that I won't write to him
separately for a while. Werner, if you find that my patch makes sense,
it'd be nice to apply it to Freetype2. BTW, it just occurred to me that
the routine setting the default Cmap for a newly opened FT_Face has to
be modified in a similar manner. (currently, it sets the first-found
Unicode Cmap as the default, but the first-matched Unicode Cmap may not
be the most extensive one as I explained above.)

   Jungshik


--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Freetype, fontconfig,Xft, Mozilla and Non-BMP char. support

Reply via email to