Thanks for the suggestion.
The problem is I can't get the stream of PdfFont (gotten by
PdfDocument->GetFont() as in TextExtractor example) since GetStream doesn't
return anything and PdfFont->GetObject()->HasStream() returns false..
This seems to be the case with objects fetched by
PdfPage->GetFromResources() as well.
Is there something I'm missing here?
I apologize if the solution is trivial, but I wasn't able to find it.


On Thu, Sep 5, 2013 at 8:11 AM, Dominik Seichter <domseich...@googlemail.com
> wrote:

> See PdfFont::GetObject ()-> GetStream ()-> GetData () and similar methods
>
> Cheers
> Am 05.09.2013 01:34 schrieb "Filip Djumic" <theprop...@gmail.com>:
>
>> I'm now trying to use FreeType API to access the font's cmap table. To
>> create a font face, I either need a font file filename or a buffer
>> containing the font data. How to get a hold of one of these?
>> If I understood correctly, font file is embed in the pdf, but how get to
>> it?
>> In podofo, only PdfFontMetricsFreetype seems to create new font faces,
>> and it does so by using the font filename or font data buffer that are
>> passed as constructor arguments. Since I don't have the filename, I guess
>> that I need an in-memory buffer of the font data, but
>> PdfFontMetricsFreetype gets that in GetWin32Font function which seems to
>> deal with windows known fonts only. My font name is something like "TT1.1"
>> and BaseFont value is "KAIXMV+Calibri-Bold", so GetWin32Font doesn't return
>> anything...
>> Can anyone help me out with this, I'm completely stuck. I just can't
>> figure out how to create a FreeType font face by using data from the pdf
>> and podofo..
>>
>> Filip
>>
>>
>> On Wed, Jul 31, 2013 at 7:25 PM, Filip Djumic <theprop...@gmail.com>wrote:
>>
>>> Thank you for your reply.
>>>
>>> If I understood correctly, I need to use the currently unused
>>> FT_Library* parameter of the PdfFontFactory::CreateFont function to access
>>> the FreeType api for that font.
>>> FreeType api should then provide me with all the data needed to encode
>>> and extract the text in this font in this case.
>>> Is this a correct outline of how it should be done?
>>>
>>> F.
>>>
>>>
>>> On Thu, Jul 18, 2013 at 3:08 AM, Leonard Rosenthol 
>>> <lrose...@adobe.com>wrote:
>>>
>>>> You need to dig into the font data/format itself.  Since you have
>>>> access to FreeType, you should be able to use it's public APIs to get what
>>>> you need.
>>>>
>>>> Leonard
>>>>
>>>> From: Filip Djumic <theprop...@gmail.com>
>>>> Date: Wednesday, July 17, 2013 9:02 PM
>>>> To: "podofo-users@lists.sourceforge.net" <
>>>> podofo-users@lists.sourceforge.net>
>>>> Subject: [Podofo-users] Text extraction for TrueType fonts without
>>>> encoding entry
>>>>
>>>> I'm trying to extract the plain text with podofo from a pdf thats using
>>>> a TrueType font, I attached a sample document. The font dictionary has no
>>>> encoding entry, here is an excerpt from Adobe's PDF ISO document about this
>>>> case:
>>>> *
>>>> A TrueType font program’s built-in encoding maps directly from
>>>> character codes to glyph descriptions by means
>>>> of an internal data structure called a “cmap” (not to be confused with
>>>> the CMap described in 9.7.5, "CMaps").
>>>> ...
>>>> A “cmap” table may contain one or more subtables that represent
>>>> multiple encodings intended for use on
>>>> different platforms (such as Mac OS and Windows). Each subtable shall
>>>> be identified by the two numbers, such
>>>> as (3, 1), that represent a combination of a platform ID and a
>>>> platform-specific encoding ID, respectively. *
>>>> ...
>>>> *When the font has no Encoding entry, or the font descriptor’s
>>>> Symbolic flag is set (in which case the Encoding
>>>> entry is ignored), this shall occur:
>>>> • If the font contains a (3, 0) subtable, the range of character codes
>>>> shall be one of these: 0x0000 - 0x00FF,
>>>> 0xF000 - 0xF0FF, 0xF100 - 0xF1FF, or 0xF200 - 0xF2FF. Depending on the
>>>> range of codes, each byte
>>>> from the string shall be prepended with the high byte of the range, to
>>>> form a two-byte character, which shall
>>>> be used to select the associated glyph description from the subtable.
>>>> • Otherwise, if the font contains a (1, 0) subtable, single bytes from
>>>> the string shall be used to look up the
>>>> associated glyph descriptions from the subtable.*
>>>>
>>>> In PdfFontFactory::CreateFont method this case is not handled, since
>>>> both font descriptor and encoding are required to create a TrueType font. I
>>>> would like to try doing this myself but I'm not sure where to start..
>>>> Obviously I need to get to the cmap table somehow first, but I have no idea
>>>> how. In the attached pdf, each text block's font dictionary has these
>>>> entries:
>>>>
>>>> BaseFont=KAIXMV+Calibri-Bold
>>>> FirstChar=33
>>>> FontDescriptor dictionary
>>>> LastChar=59
>>>> Subtype=TrueType
>>>> ToUnicode dictionary
>>>> Type=Font
>>>> Widths array
>>>>
>>>> ToUnicode dictionary has these entries:
>>>> Filter=FlateDecode
>>>> Length reference
>>>>
>>>> Cmap doesn't seem to be there and PDF ISO doc doesn't provide any
>>>> useful details.. Does anyone have any hints on this?
>>>>
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
>> Discover the easy way to master current and previous Microsoft
>> technologies
>> and advance your career. Get an incredible 1,500+ hours of step-by-step
>> tutorial videos with LearnDevNow. Subscribe today and save!
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=58041391&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Podofo-users mailing list
>> Podofo-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/podofo-users
>>
>>
------------------------------------------------------------------------------
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58041391&iu=/4140/ostg.clktrk
_______________________________________________
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users

Reply via email to