Re: [Podofo-users] Problem parsing font CMAP for Type0 fonts and TrueType fonts.

Hugues LEFEBVRE Mon, 28 Sep 2015 02:01:38 -0700

Hi

I’ve checked the current trunck.


CMap parsing (has been moved) and should be OK now (the '<' is now '<='
for ranges and the 'loop' variable is well reset).

It might be cool to merge the code from PdfEncoding and PdfCMapEncoding
to avoid having it duplicated (both versions looks fine now).


For my TrueType problem it is not fixed.

I have a PDF which contains a TrueType font without the Encoding name in
the dictionnary and with a ToUnicode reference.

With current code the if (pEncoding && pDescriptor) will fail and the
font is not loaded.

Assigning the ToUnicode to Encoding when it is NULL set the pointer to
non NULL if ToUnicode is present and in PdfEncodingObjectFactory it
will create a PdfCMapEncoding (thus needing to give ToUnicode again as
second argument).

Not really clean but works for me, font is loaded and glyphs are mapped
correctly using the ToUnicode map.

as Patch format

|File: PdfFontFactory.cpp

289a291,292
>         if (!pEncoding)
>             pEncoding = pObject->GetIndirectKey( "ToUnicode" );
294c297
<                PdfEncodingObjectFactory::CreateEncoding( pEncoding );
---
>                PdfEncodingObjectFactory::CreateEncoding( pEncoding, 
> pObject->GetIndirectKey( "ToUnicode" ) );
|

as unified diff

|File: PdfFontFactory.cpp

@@ -287,11 +288,13 @@
     {
         pDescriptor = pObject->GetIndirectKey( "FontDescriptor" );
         pEncoding   = pObject->GetIndirectKey( "Encoding" );
+        if (!pEncoding)
+            pEncoding = pObject->GetIndirectKey( "ToUnicode" );

         if ( pEncoding && pDescriptor ) // OC 18.08.2010: Avoid sigsegv
         {
            const PdfEncoding* const pPdfEncoding = 
-               PdfEncodingObjectFactory::CreateEncoding( pEncoding );
+               PdfEncodingObjectFactory::CreateEncoding( pEncoding, 
pObject->GetIndirectKey( "ToUnicode" ) );

            // OC 15.08.2010 BugFix: Parameter pObject added:
            pMetrics    = new PdfFontMetricsObject( pObject, pDescriptor, 
pPdfEncoding );
|


Regards,

Hugues


On 09/27/15 23:00, Palmer Zent wrote:

> Hi,
>
> It happens that I just committed a patch that addresses similar issues
> a few minutes ago. Would you mind checking the current trunk to see if
> your approach is the same as mine?
>
> You can email your patches directly to this list.
>
> Merci
>
> -- 
> Palmer Zent
>
> On September 27, 2015 at 10:14:37 AM, Hugues LEFEBVRE
> ([email protected] <mailto:[email protected]>) wrote:
>
>>
>> Hi,
>>
>> I'm was using PoDoFo to extract text from a PDF.
>> When trying to get the unicode characters from glyphs (ie: for Tj
>> command) it was not working in some cases.
>>
>> When a TrueType font has no Encoding but a ToUnicode map then it's
>> not read.
>> The ToUnicode CMAP parser (from PdfIdentityEncoding and PdfCMapEncoding)
>> has some also bugs (like the value of
>> loop variable that is not reset between sections) loading only partial
>> informations from the CMap.
>>
>> I've fixed this points and now I'm able to get all the text in the PDF
>> file with PoDoFo.
>>
>> I'm new to PoDoFo and I don't know how to submit a patch for these
>> corrections (if there is a way) in case it helps other people having the
>> same problems.
>> Meanwhile or if patch are not accepted/reviewed people having same issue
>> can ask me for the patch.
>>
>>
>> Regards,
>>
>> Hugues
>>
>>
>> ------------------------------------------------------------------------------
>>
>> _______________________________________________
>> Podofo-users mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/podofo-users

------------------------------------------------------------------------------

_______________________________________________
Podofo-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/podofo-users

Re: [Podofo-users] Problem parsing font CMAP for Type0 fonts and TrueType fonts.

Reply via email to