[ 
https://issues.apache.org/jira/browse/PDFBOX-1417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13542886#comment-13542886
 ] 

Øyvind Stegard commented on PDFBOX-1417:
----------------------------------------

PDFBox 1.7.1 no longer causes stack overflow error, so I'd call it fixed.

The following exception is thrown though:
[13-01-03 13:21:53.944] {resin-port-8983-41} Specified stream length 1559572 is 
wrong. Fall back to reading stream until 'endstream'.
[13-01-03 13:21:53.950] {resin-port-8983-41} Parsing Error, Skipping Object
                                             
org.apache.pdfbox.exceptions.WrappedIOException: Could not push back 1559572 
bytes in order to reparse stream. Try increasing push back buffer using system 
property org.apache.pdfbox.baseParser.pushBackSize
                                                at 
org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:546)
                                                at 
org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:566)
                                                at 
org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187)
                                                at 
org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1090)
                                                at 
org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1055)
                                                at 
org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:123)
                                                at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
                                                at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
                                                at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)


But fail-fast is a lot better and it does not cause problems with Solr.
                
> StackOverflowError [COSDictionary.toString(COSDictionary.java:1418)]
> --------------------------------------------------------------------
>
>                 Key: PDFBOX-1417
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1417
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.6.0
>            Reporter: Øyvind Stegard
>         Attachments: haugesundsavis.pdf
>
>
> The attached PDF document causes PDFBox 1.6.0 to fail with StackOverflowError.
> Issue discovered with Solr 3.6.1 (using Tika->PDFBox to extract text from 
> document).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to