[ft-devel] Incomplete cmap table for platform 0 (Apple Unicode)

2007-09-30 Thread Dmitry Timoshkov

Hello,

http://bugs.winehq.org/show_bug.cgi?id=9840 has a ttf font attached
to it which can be perfectly displayed in Windows, but Wine is not able
to actually show any character using this font, only 'c' is displayed.
That's because Freetype selects first unicode cmap table which happens
to be with platform id 0 (Apple Unicode), and that cmap table is incomplete
(or truncated). cmap tables for other platforms (1 and 3) are good, and
making Freetype ignore cmap with platform id 0 makes the font display
properly in Wine's notepad.

Attached is the hack that makes Freetype ignore cmaps with platform id 0.

What Freetype developers think about this problem?

--
Dmitry.
diff -uprN freetype2/src/sfnt/ttcmap.c freetype2/src/sfnt/ttcmap.c
--- freetype2/src/sfnt/ttcmap.c 2007-06-13 17:05:55.0 +0900
+++ freetype2/src/sfnt/ttcmap.c 2007-09-30 20:45:39.0 +0900
@@ -2284,6 +2284,20 @@
  charmap.encoding= FT_ENCODING_NONE;  /* will be filled later */
  offset  = TT_NEXT_ULONG( p );

+  FT_TRACE2(( "found cmap platform_id %u, encoding_id %u\n",
+  charmap.platform_id, charmap.encoding_id ));
+
+  /* cmap tables with platform_id == Apple Unicode sometimes are
+   * incomplete in comparison to other tables.
+   * FIXME: perhaps fallback to this table if no other table exists.
+   */
+  if ( charmap.platform_id == 0 ) /* Apple Unicode */
+  {
+FT_TRACE2(( "ignoring Apple Unicode encoding\n" ));
+continue;
+  }
+  
+

  if ( offset && offset <= face->cmap_size - 2 )
  {
FT_Byte* volatile  cmap   = table + offset;
___
Freetype-devel mailing list
Freetype-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/freetype-devel


Re: [ft-devel] Incomplete cmap table for platform 0 (Apple Unicode)

2007-09-30 Thread mpsuzuki
Hi,

On Sun, 30 Sep 2007 21:13:00 +0900
"Dmitry Timoshkov" <[EMAIL PROTECTED]> wrote:
>http://bugs.winehq.org/show_bug.cgi?id=9840 has a ttf font attached
>to it which can be perfectly displayed in Windows, but Wine is not able
>to actually show any character using this font, only 'c' is displayed.

Thank you for providing the sample font. It seems that
the first cmap (platformID, platformSpecificID) = (0, 0)
= (Unicode, Default Semantics) is not broken, but designed
to ignore most of glyph. Its contents is like this:

[cmap]format = 4, length=0x007a, languageID=0x(unknown)
  startCode=0x0022, endCode=0x0022, idDelta=0x0036, idRangeOffset=0x
  startCode=0x0026, endCode=0x0027, idDelta=0x, idRangeOffset=0x0012
  startCode=0x002c, endCode=0x002c, idDelta=0x0025, idRangeOffset=0x
  startCode=0x002e, endCode=0x002f, idDelta=0x, idRangeOffset=0x0012
  startCode=0x003b, endCode=0x003c, idDelta=0x, idRangeOffset=0x0014
  startCode=0x003e, endCode=0x003e, idDelta=0x0021, idRangeOffset=0x
  startCode=0x005b, endCode=0x005d, idDelta=0x, idRangeOffset=0x0014
  startCode=0x0060, endCode=0x0060, idDelta=0xfff5, idRangeOffset=0x
  startCode=0x007b, endCode=0x007e, idDelta=0x, idRangeOffset=0x0016
  startCode=0x, endCode=0x, idDelta=0x0001, idRangeOffset=0x

  subtable0 cmap fmt4 : code 0x0022 -> gid 0x0058(88) glyfLength=22
  subtable0 cmap fmt4 : code 0x0026 -> gid 0x0053(83 = 83 + 0) glyfLength=0
  subtable0 cmap fmt4 : code 0x0027 -> gid 0x0052(82 = 82 + 0) glyfLength=18
  subtable0 cmap fmt4 : code 0x002c -> gid 0x0051(81) glyfLength=20
  subtable0 cmap fmt4 : code 0x002e -> gid 0x0054(84 = 84 + 0) glyfLength=14
  subtable0 cmap fmt4 : code 0x002f -> gid 0x0056(86 = 86 + 0) glyfLength=16
  subtable0 cmap fmt4 : code 0x003b -> gid 0x0061(97 = 97 + 0) glyfLength=24
  subtable0 cmap fmt4 : code 0x003c -> gid 0x0060(96 = 96 + 0) glyfLength=22
  subtable0 cmap fmt4 : code 0x003e -> gid 0x005f(95) glyfLength=22
  subtable0 cmap fmt4 : code 0x005b -> gid 0x0059(89 = 89 + 0) glyfLength=20
  subtable0 cmap fmt4 : code 0x005c -> gid 0x0057(87 = 87 + 0) glyfLength=16
  subtable0 cmap fmt4 : code 0x005d -> gid 0x005a(90 = 90 + 0) glyfLength=20
  subtable0 cmap fmt4 : code 0x0060 -> gid 0x0055(85) glyfLength=14
  subtable0 cmap fmt4 : code 0x007b -> gid 0x005e(94 = 94 + 0) glyfLength=52
  subtable0 cmap fmt4 : code 0x007c -> gid 0x005d(93 = 93 + 0) glyfLength=14
  subtable0 cmap fmt4 : code 0x007d -> gid 0x005c(92 = 92 + 0) glyfLength=52
  subtable0 cmap fmt4 : code 0x007e -> gid 0x005b(91 = 91 + 0) glyfLength=0
  subtable0 cmap fmt4 : code 0x -> gid 0x(0) glyfLength=0

In HUDfont.ttf, most alphabets have glyphIndex < 81,
so this cmap cannot access them. One of the problem
is that the cmap is NOT broken in the viewpoint of
the data structure, so non-intellectual validator
(that does not compare the coverage of accessible glyph
and the coverage of included glyph) cannot refuse it.

Your patch assumes that Apple Unicode cmap is often
broken but others are more reliable, but I'm afraid
that this is not generic assumption.

>That's because Freetype selects first unicode cmap table which happens
>to be with platform id 0 (Apple Unicode), and that cmap table is incomplete
>(or truncated). cmap tables for other platforms (1 and 3) are good, and
>making Freetype ignore cmap with platform id 0 makes the font display
>properly in Wine's notepad.
>
>Attached is the hack that makes Freetype ignore cmaps with platform id 0.
>
>What Freetype developers think about this problem?

Excuse me, do you think the selection of best cmap is
the role of FreeType? I think, FreeType2 provides an
API for users to select cmap subtable by the pair of
platformID & platform-specificID.
http://www.freetype.org/freetype2/docs/reference/ft2-base_interface.html

I think, it's better for Wine to have internal priorities
of "prefered" cmap and try to load from the best to
the worst. For example, thinking about UCS-4 capable fonts
(like SURSONG.TTF or SIMSUN-EXTB.TTF). Such fonts have
cmap subtables for Microsoft UCS2, and Microsoft UCS4.
Usually Microsoft UCS4 cmap subtable appears after MS
UCS2 cmap subtable. So, if we let FreeType to choose the
cmap subtable automatically, we cannot reach Microsoft
UCS4 cmap subtable, even if we ignore Apple Unicode.
>From the viewpoint of compatibility with Microsoft products,
it's not good idea.

So, I think, it's better for Wine to have internal priorities
of "prefered" cmap subatble and choose the best cmap subatble
by himself. How do you think of?

Regards,
mpsuzuki



___
Freetype-devel mailing list
Freetype-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/freetype-devel


Re: [ft-devel] Incomplete cmap table for platform 0 (Apple Unicode)

2007-09-30 Thread Dmitry Timoshkov

<[EMAIL PROTECTED]> wrote:


Your patch assumes that Apple Unicode cmap is often
broken but others are more reliable, but I'm afraid
that this is not generic assumption.


That was really a hack to show that other cmap tables actually
work better for that font. Of course other fonts can have cmap
tables for other platfoms rather than 0 "broken".


Excuse me, do you think the selection of best cmap is
the role of FreeType? I think, FreeType2 provides an
API for users to select cmap subtable by the pair of
platformID & platform-specificID.
http://www.freetype.org/freetype2/docs/reference/ft2-base_interface.html

I think, it's better for Wine to have internal priorities
of "prefered" cmap and try to load from the best to
the worst. For example, thinking about UCS-4 capable fonts
(like SURSONG.TTF or SIMSUN-EXTB.TTF). Such fonts have
cmap subtables for Microsoft UCS2, and Microsoft UCS4.
Usually Microsoft UCS4 cmap subtable appears after MS
UCS2 cmap subtable. So, if we let FreeType to choose the
cmap subtable automatically, we cannot reach Microsoft
UCS4 cmap subtable, even if we ignore Apple Unicode.
From the viewpoint of compatibility with Microsoft products,
it's not good idea.

So, I think, it's better for Wine to have internal priorities
of "prefered" cmap subatble and choose the best cmap subatble
by himself. How do you think of?


Wine uses FT_Select_Charmap API to select either FT_ENCODING_UNICODE,
FT_ENCODING_MS_SYMBOL or FT_ENCODING_APPLE_ROMAN when appropriate.
So that's actually Freetype's responsibility to choose the best/correct/
working charmap table in that case.

Yes, Wine can arrange some kind of cmap priorites, what if some of "preferred"
cmap tables is broken? How an application can decide which cmap table is better
without digging into internal cmap data? Shouldn't that be a responsibility of
Freetype to ignore incomplete/broken cmaps, especially since it already parses
cmap tables and can easily decide which one is better?

--
Dmitry.


___
Freetype-devel mailing list
Freetype-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/freetype-devel


Re: [ft-devel] Incomplete cmap table for platform 0 (Apple Unicode)

2007-09-30 Thread mpsuzuki
Dear Sir,

On Mon, 1 Oct 2007 11:52:19 +0900
"Dmitry Timoshkov" <[EMAIL PROTECTED]> wrote:
><[EMAIL PROTECTED]> wrote:
>> So, I think, it's better for Wine to have internal priorities
>> of "prefered" cmap subatble and choose the best cmap subatble
>> by himself. How do you think of?
>
>Wine uses FT_Select_Charmap API to select either FT_ENCODING_UNICODE,
>FT_ENCODING_MS_SYMBOL or FT_ENCODING_APPLE_ROMAN when appropriate.
>So that's actually Freetype's responsibility to choose the best/correct/
>working charmap table in that case.

I see, I was misunderstanding. Now I think your report
says that FT_ENCODING_UNICODE is too rough to choose
the best cmap subtable for Unicode encoding. I guess,
if you can specify Microsoft platform and UCS2 or UCS4
encoding cmap subtable explicitly, it serves your purpose.
Am I understanding correctly? I think it's reasonable
request.

>Yes, Wine can arrange some kind of cmap priorites, what if some of "preferred"
>cmap tables is broken? How an application can decide which cmap table is better
>without digging into internal cmap data? Shouldn't that be a responsibility of
>Freetype to ignore incomplete/broken cmaps, especially since it already parses
>cmap tables and can easily decide which one is better?

As I've shown in previous post, Apple Unicode cmap in the
sample font is NOT broken from the viewpoint of data
structure, I think. To detect "broken" cmap as you say,
it's required to investigate cmap, loca, glyf subtables
content semantically. After checking the all cmap, loca,
glyf subtables, the best cmap subtable would be chosen.
I'm not against the fact that such feature is convenient,
but I'm questionable if it should be built-in feature of
FreeType and should be enabled by default.

Regards,
mpsuzuki


___
Freetype-devel mailing list
Freetype-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/freetype-devel


Re: [ft-devel] Incomplete cmap table for platform 0 (Apple Unicode)

2007-09-30 Thread George Williams
On Sun, 2007-09-30 at 20:16, [EMAIL PROTECTED] wrote:
> As I've shown in previous post, Apple Unicode cmap in the
> sample font is NOT broken from the viewpoint of data
> structure, I think. To detect "broken" cmap as you say,
> it's required to investigate cmap, loca, glyf subtables
> content semantically. After checking the all cmap, loca,
> glyf subtables, the best cmap subtable would be chosen.
> I'm not against the fact that such feature is convenient,
> but I'm questionable if it should be built-in feature of
> FreeType and should be enabled by default.
I'm not sure how you could even do it. It is perfectly valid to have
unencoded glyphs, desirable even. How on earth could you determine that
one subtable was "better"?

In ttc files many glyphs won't be encoded for a given font, nor with
they be the result of GSUB transformations.

I can't think of an algorithm that could produce reasonable results.



___
Freetype-devel mailing list
Freetype-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/freetype-devel


Re: [ft-devel] Incomplete cmap table for platform 0 (Apple Unicode)

2007-09-30 Thread Dmitry Timoshkov

"George Williams" <[EMAIL PROTECTED]> wrote:


On Sun, 2007-09-30 at 20:16, [EMAIL PROTECTED] wrote:

As I've shown in previous post, Apple Unicode cmap in the
sample font is NOT broken from the viewpoint of data
structure, I think. To detect "broken" cmap as you say,
it's required to investigate cmap, loca, glyf subtables
content semantically. After checking the all cmap, loca,
glyf subtables, the best cmap subtable would be chosen.
I'm not against the fact that such feature is convenient,
but I'm questionable if it should be built-in feature of
FreeType and should be enabled by default.

I'm not sure how you could even do it. It is perfectly valid to have
unencoded glyphs, desirable even. How on earth could you determine that
one subtable was "better"?


http://www.freetype.org/freetype2/docs/reference/ft2-base_interface.html#FT_Select_Charmap

says:
"Because many fonts contain more than a single cmap for Unicode encoding, this
function has some special code to select the one which covers Unicode best."

So it looks like either documentation should be changed to not mention
that FT_Select_Charmap does the best choice, or FT_Select_Charmap behaviour
should be changed to actually "select the one which covers Unicode best".

--
Dmitry.


___
Freetype-devel mailing list
Freetype-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/freetype-devel


Re: [ft-devel] Incomplete cmap table for platform 0 (Apple Unicode)

2007-09-30 Thread mpsuzuki
Dear Dmitry,

Following is a function whose API is similar to
FT_Select_Charmap() but ignores non-Microsoft
cmap subtables. Does it serve your purpose?


#include 
#include FT_FREETYPE_H
#include FT_TRUETYPE_IDS_H

/* getting the first cmap subtable for Microsoft platform
 * matching specified encoding. If encoding is FT_ENCODING_UNICODE,
 * UCS4 is prioritized than UCS2.
 */
FT_Error
FT_Select_Charmap_Microsoft( FT_Face  face,
 FT_Encoding  encoding )
{
FT_Int  i, chosen_cmap_idx;


chosen_cmap_idx = -1; /* -1 means not found */

for ( i = 0; i < face->num_charmaps; i ++ )
{
if ( face->charmaps[ i ]->platform_id != TT_PLATFORM_MICROSOFT )
continue;
else if ( encoding != FT_ENCODING_UNICODE &&
  face->charmaps[ i ]->encoding == encoding )
{
chosen_cmap_idx = i;
break;
}
else if ( face->charmaps[ i ]->encoding_id == TT_MS_ID_UCS_4 )
{
chosen_cmap_idx = i;
break;
}
else if ( face->charmaps[ i ]->encoding_id == TT_MS_ID_UNICODE_CS )
{
if ( chosen_cmap_idx < 0 )
chosen_cmap_idx = i;
}
}

if ( chosen_cmap_idx < 0 )
return FT_Err_Invalid_CharMap_Handle;

return FT_Set_Charmap( face, face->charmaps[ chosen_cmap_idx ] );
}


___
Freetype-devel mailing list
Freetype-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/freetype-devel