Re: IOException when merging PDF after increasing pushBackSize

Timo Boehme Wed, 05 Mar 2014 07:05:26 -0800

Hi,

please try the proposed solution from my last email to this list(subject Re: [jira] [Commented] (PDFBOX-1920) Buffer Error when tryingto run node).

I wrote:

according to the error message the stream is not properly terminated
by the token 'endstream'. While it might be a broken PDF it could
also be a valid one but the sequential parser you are using might be
processing junk data within the PDF.


I would recommend to use the non-sequential working parser with
PDDocument.loadNonSeq() instead of PDDocument.load(). Since you are
using PDFMergerUtility and this currently does not provide an option
to choose the other parser you could create an own class by copying
PDFMergerUtility and replacing the relevant calls (parameter
scratchFile may be set to null or an memory or file instance of
RandomAccess).
You could file a JIRA feature request of adding such an option to
PDFMergerUtility - preferably with a patch

If the error still exists than the PDF is broken and cannot be read
by PDFBox (some more healing mechanisms might be added to version
2.0).



Best,
Timo

Am 05.03.2014 15:21, schrieb James Carter:

When attempting to merge the attached PDF with several other documents,
PDF throws the following exception: Could not push back 328764 bytes in
order to reparse stream. Try increasing push back buffer using system
property org.apache.pdfbox.baseParser.pushBackSize

The discussion on the JIRA ticket (PDFBOX-1920) mentioned that the PDF
is not well formed. Upon increasing the pushBackSize, the following
error is seen:

Exception in thread "main" java.io.IOException: expected='endstream'
actual='' org.apache.pdfbox.io.PushBackInputStream@45cb0cdc
at
org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:609)
at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:605)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:194)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1219)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1186)
at
org.apache.pdfbox.util.PDFMergerUtility.mergeDocuments(PDFMergerUtility.java:196)
at com.acme.MergePDF.runSmartService(MergePDF.java:52)
at com.acme.MergePDF.main(MergePDF.java:68)

Is this reasonably something that PDFBox could handle, or does the ill
formed nature of the PDF leave this outside of what PDFBox would support?

Thanks,
James



--

 Timo Boehme
 OntoChem GmbH
 H.-Damerow-Str. 4
 06120 Halle/Saale
 T: +49 345 4780474
 F: +49 345 4780471
 timo.boe...@ontochem.com

_____________________________________________________________________

 OntoChem GmbH
 Geschäftsführer: Dr. Lutz Weber
 Sitz: Halle / Saale
 Registergericht: Stendal
 Registernummer: HRB 215461
_____________________________________________________________________

Re: IOException when merging PDF after increasing pushBackSize

Reply via email to