[
https://issues.apache.org/jira/browse/PDFBOX-1417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13542886#comment-13542886
]
Øyvind Stegard commented on PDFBOX-1417:
----------------------------------------
PDFBox 1.7.1 no longer causes stack overflow error, so I'd call it fixed.
The following exception is thrown though:
[13-01-03 13:21:53.944] {resin-port-8983-41} Specified stream length 1559572 is
wrong. Fall back to reading stream until 'endstream'.
[13-01-03 13:21:53.950] {resin-port-8983-41} Parsing Error, Skipping Object
org.apache.pdfbox.exceptions.WrappedIOException: Could not push back 1559572
bytes in order to reparse stream. Try increasing push back buffer using system
property org.apache.pdfbox.baseParser.pushBackSize
at
org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:546)
at
org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:566)
at
org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187)
at
org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1090)
at
org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1055)
at
org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:123)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
But fail-fast is a lot better and it does not cause problems with Solr.
> StackOverflowError [COSDictionary.toString(COSDictionary.java:1418)]
> --------------------------------------------------------------------
>
> Key: PDFBOX-1417
> URL: https://issues.apache.org/jira/browse/PDFBOX-1417
> Project: PDFBox
> Issue Type: Bug
> Affects Versions: 1.6.0
> Reporter: Øyvind Stegard
> Attachments: haugesundsavis.pdf
>
>
> The attached PDF document causes PDFBox 1.6.0 to fail with StackOverflowError.
> Issue discovered with Solr 3.6.1 (using Tika->PDFBox to extract text from
> document).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira