Hi,
Am 13.06.2012 14:29, schrieb Dave Smith:
Bug
https://issues.apache.org/jira/browse/PDFBOX-1067
as I see it this bug has nothing to do with PDFBOX-1067 but relates to
PDFBOX-1099. The PDF in question was changed and we have 2 XREF tables
and 2 object streams. The pages object (objnr 2) is in both streams
(first with 1 page, second with 2 pages) and first stream is parsed
first, second after it and existing objects are skipped which is wrong
in this case. For a correct handling XREF information must be used.
However there is a workaround: use NonSequentialPDFParser. Load your
document with PDDocument.loadNonSeq() and you are fine.
Best regards,
Timo
On Wed, Jun 13, 2012 at 8:02 AM,<[email protected]> wrote:
Sorry,
apparently the pdf was not correctly attached to the previous mail, I
just zip it and re-attach it.
Pierre Huttin
On Wed, 13 Jun 2012 13:56:50 +0200,<[email protected]> wrote:
Hello,
I have some trouble with documents the library is not not able to
retreive the number of pages and load them into the list using
PDDocument.getDocumentCatalog().getAllPages() method.
The pdf file and the java code to retreive the number of pages are
attached to this mail. apparently it's look like the PDFParser do not
read correctly the /Pages object the ref of pages are "8 0" and "19
0".
I open the document correctly with adobe reader and itextrups, both
retrieve the correct number of pages : 2.
I try to run my code using the version 1.7.0 of PDFBox
Thanks in advance for your help.
Best regards
Pierre Huttin
--
Timo Boehme
OntoChem GmbH
H.-Damerow-Str. 4
06120 Halle/Saale
T: +49 345 4780474
F: +49 345 4780471
[email protected]
_____________________________________________________________________
OntoChem GmbH
Geschäftsführer: Dr. Lutz Weber
Sitz: Halle / Saale
Registergericht: Stendal
Registernummer: HRB 215461
_____________________________________________________________________