Re: Problem to parse a PDF document

Timo Boehme Wed, 13 Jun 2012 07:30:11 -0700

Hi,

Am 13.06.2012 14:29, schrieb Dave Smith:

Bug
https://issues.apache.org/jira/browse/PDFBOX-1067

as I see it this bug has nothing to do with PDFBOX-1067 but relates toPDFBOX-1099. The PDF in question was changed and we have 2 XREF tablesand 2 object streams. The pages object (objnr 2) is in both streams(first with 1 page, second with 2 pages) and first stream is parsedfirst, second after it and existing objects are skipped which is wrongin this case. For a correct handling XREF information must be used.

However there is a workaround: use NonSequentialPDFParser. Load yourdocument with PDDocument.loadNonSeq() and you are fine.



Best regards,
Timo

On Wed, Jun 13, 2012 at 8:02 AM,<[email protected]>  wrote:

Sorry,

apparently the pdf was not correctly attached to the previous mail, I
just zip it and re-attach it.

Pierre Huttin

On Wed, 13 Jun 2012 13:56:50 +0200,<[email protected]>  wrote:

Hello,

I have some trouble with documents the library is not not able to
retreive the number of pages and load them into the list using
PDDocument.getDocumentCatalog().getAllPages() method.

The pdf file and the java code to retreive the number of pages are
attached to this mail. apparently it's look like the PDFParser do not
read correctly the /Pages object the ref of pages are "8 0" and "19
0".

I open the document correctly with adobe reader and itextrups, both
retrieve the correct number of pages : 2.

I try to run my code using the version 1.7.0 of PDFBox

Thanks in advance for your help.

Best regards

Pierre Huttin



--

 Timo Boehme
 OntoChem GmbH
 H.-Damerow-Str. 4
 06120 Halle/Saale
 T: +49 345 4780474
 F: +49 345 4780471
 [email protected]

_____________________________________________________________________

 OntoChem GmbH
 Geschäftsführer: Dr. Lutz Weber
 Sitz: Halle / Saale
 Registergericht: Stendal
 Registernummer: HRB 215461
_____________________________________________________________________

Re: Problem to parse a PDF document

Reply via email to