Privezentsev Konstantin created TIKA-1038:
---------------------------------------------

             Summary: Parsing PDF with StackOverlowError 
                 Key: TIKA-1038
                 URL: https://issues.apache.org/jira/browse/TIKA-1038
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 1.2
            Reporter: Privezentsev Konstantin


Tika corrupt with StackOverflowError on some pdf documents:
http://www.ellipse-labo.com/fiches/1303214351.pdf
http://downloads.joomlacode.org/frsrelease/5/4/0/54089/handbuch_ckforms-DE-1.3.2.pdf

Code:
{code:java}

AutoDetectParser parser = new AutoDetectParser(
                new TypeDetector(),
                new PDFParser(),
                new OfficeParser(),
                new HtmlParser(),
                new RTFParser(),
                new OOXMLParser());

WriteOutContentHandler contentHandler = new WriteOutContentHandler();
Metadata metadata = new Metadata();

parser.parse(contentStream, new BodyContentHandler(contentHandler), metadata, 
new ParseContext());
{code}

Stack trace:
{code}
java.lang.StackOverflowError
        at 
java.util.LinkedHashMap$LinkedHashIterator.<init>(LinkedHashMap.java:345)
        at 
java.util.LinkedHashMap$LinkedHashIterator.<init>(LinkedHashMap.java:345)
        at java.util.LinkedHashMap$KeyIterator.<init>(LinkedHashMap.java:383)
        at java.util.LinkedHashMap$KeyIterator.<init>(LinkedHashMap.java:383)
        at java.util.LinkedHashMap.newKeyIterator(LinkedHashMap.java:396)
        at java.util.HashMap$KeySet.iterator(HashMap.java:874)
        at org.apache.pdfbox.cos.COSDictionary.toString(COSDictionary.java:1416)
        at org.apache.pdfbox.cos.COSDictionary.toString(COSDictionary.java:1421)
        at org.apache.pdfbox.cos.COSDictionary.toString(COSDictionary.java:1421)
        at org.apache.pdfbox.cos.COSDictionary.toString(COSDictionary.java:1421)
        at org.apache.pdfbox.cos.COSDictionary.toString(COSDictionary.java:1421)
...
{code}


 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to