I'm now trying to use FreeType API to access the font's cmap table. To
create a font face, I either need a font file filename or a buffer
containing the font data. How to get a hold of one of these?
If I understood correctly, font file is embed in the pdf, but how get to
it?
In podofo, only PdfFontMetricsFreetype seems to create new font faces, and
it does so by using the font filename or font data buffer that are passed
as constructor arguments. Since I don't have the filename, I guess that I
need an in-memory buffer of the font data, but PdfFontMetricsFreetype gets
that in GetWin32Font function which seems to deal with windows known fonts
only. My font name is something like "TT1.1" and BaseFont value is
"KAIXMV+Calibri-Bold", so GetWin32Font doesn't return anything...
Can anyone help me out with this, I'm completely stuck. I just can't figure
out how to create a FreeType font face by using data from the pdf and
podofo..
Filip
On Wed, Jul 31, 2013 at 7:25 PM, Filip Djumic <theprop...@gmail.com> wrote:
> Thank you for your reply.
>
> If I understood correctly, I need to use the currently unused FT_Library*
> parameter of the PdfFontFactory::CreateFont function to access the FreeType
> api for that font.
> FreeType api should then provide me with all the data needed to encode and
> extract the text in this font in this case.
> Is this a correct outline of how it should be done?
>
> F.
>
>
> On Thu, Jul 18, 2013 at 3:08 AM, Leonard Rosenthol <lrose...@adobe.com>wrote:
>
>> You need to dig into the font data/format itself. Since you have access
>> to FreeType, you should be able to use it's public APIs to get what you
>> need.
>>
>> Leonard
>>
>> From: Filip Djumic <theprop...@gmail.com>
>> Date: Wednesday, July 17, 2013 9:02 PM
>> To: "podofo-users@lists.sourceforge.net" <
>> podofo-users@lists.sourceforge.net>
>> Subject: [Podofo-users] Text extraction for TrueType fonts without
>> encoding entry
>>
>> I'm trying to extract the plain text with podofo from a pdf thats using a
>> TrueType font, I attached a sample document. The font dictionary has no
>> encoding entry, here is an excerpt from Adobe's PDF ISO document about this
>> case:
>> *
>> A TrueType font program’s built-in encoding maps directly from character
>> codes to glyph descriptions by means
>> of an internal data structure called a “cmap” (not to be confused with
>> the CMap described in 9.7.5, "CMaps").
>> ...
>> A “cmap” table may contain one or more subtables that represent multiple
>> encodings intended for use on
>> different platforms (such as Mac OS and Windows). Each subtable shall be
>> identified by the two numbers, such
>> as (3, 1), that represent a combination of a platform ID and a
>> platform-specific encoding ID, respectively. *
>> ...
>> *When the font has no Encoding entry, or the font descriptor’s Symbolic
>> flag is set (in which case the Encoding
>> entry is ignored), this shall occur:
>> • If the font contains a (3, 0) subtable, the range of character codes
>> shall be one of these: 0x0000 - 0x00FF,
>> 0xF000 - 0xF0FF, 0xF100 - 0xF1FF, or 0xF200 - 0xF2FF. Depending on the
>> range of codes, each byte
>> from the string shall be prepended with the high byte of the range, to
>> form a two-byte character, which shall
>> be used to select the associated glyph description from the subtable.
>> • Otherwise, if the font contains a (1, 0) subtable, single bytes from
>> the string shall be used to look up the
>> associated glyph descriptions from the subtable.*
>>
>> In PdfFontFactory::CreateFont method this case is not handled, since both
>> font descriptor and encoding are required to create a TrueType font. I
>> would like to try doing this myself but I'm not sure where to start..
>> Obviously I need to get to the cmap table somehow first, but I have no idea
>> how. In the attached pdf, each text block's font dictionary has these
>> entries:
>>
>> BaseFont=KAIXMV+Calibri-Bold
>> FirstChar=33
>> FontDescriptor dictionary
>> LastChar=59
>> Subtype=TrueType
>> ToUnicode dictionary
>> Type=Font
>> Widths array
>>
>> ToUnicode dictionary has these entries:
>> Filter=FlateDecode
>> Length reference
>>
>> Cmap doesn't seem to be there and PDF ISO doc doesn't provide any useful
>> details.. Does anyone have any hints on this?
>>
>
>
------------------------------------------------------------------------------
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58041391&iu=/4140/ostg.clktrk
_______________________________________________
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users