Hello All,

    I recently noticed that PoDoFo (svn rev 1642) was unable to parse
several older PDFs (all obtained from the USA IRS for tax years 2011 and
before).  These PDFs were made with profession Adobe products, so I expect
them to be conformant.

    I narrowed down the version of PoDoFo that causes the failure, but I
have not analyzes the source code diff yet.  These PDFs parsed without
error under PoDoFO svn rev 1586, but failed on rev 1857 (2014-04-01, change
to PdfParser.cpp).  Attempting to open the document with
PoDoFo::PdfMemDocument() throws "ePdfError_NoNumber".

    I have a total of 6 IRS tax forms for various years that all fail to
open in PoDoFo (they all throw the same exception [2]), but for now, I'll
just focus on one.  This [1] PDF was created with "Adobe LiveCycle Designer
ES 8.2" on 2010-11-22. (October 2010 revision of the 941 tax form).

    I suspect that PDFs are conformant (unproven hunch) and that PoDoFo
1587+ is buggy.

    Thoughts?  Analysis?


[1]   http://www.irs.gov/pub/irs-prior/f941--2010.pdf

[2]  The following stack trace is from PoDoFo rev 1587:
PoDoFo encounter an error. Error: 14 ePdfError_NoNumber
        Error Description: A number was expected but not found.
        Callstack:
        #0 Error Source: /tmp/podofo/src/src/base/PdfParser.cpp:226
                Information: Unable to load objects from file.
        #1 Error Source: /tmp/podofo/src/src/base/PdfParser.cpp:289
                Information: Unable to skip xref dictionary.
        #2 Error Source: /tmp/podofo/src/src/base/PdfParser.cpp:738
        #3 Error Source: /tmp/podofo/src/src/base/PdfParser.cpp:551
                Information: Unable to load /XRefStm xref stream.
        #4 Error Source: /tmp/podofo/src/src/base/PdfParserObject.cpp:109
                Information: Object and generation number cannot be read.
        #5 Error Source: /tmp/podofo/src/src/base/PdfTokenizer.cpp:365
                Information: xref
------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users

Reply via email to