Privezentsev Konstantin created TIKA-1038: ---------------------------------------------
Summary: Parsing PDF with StackOverlowError Key: TIKA-1038 URL: https://issues.apache.org/jira/browse/TIKA-1038 Project: Tika Issue Type: Bug Components: parser Affects Versions: 1.2 Reporter: Privezentsev Konstantin Tika corrupt with StackOverflowError on some pdf documents: http://www.ellipse-labo.com/fiches/1303214351.pdf http://downloads.joomlacode.org/frsrelease/5/4/0/54089/handbuch_ckforms-DE-1.3.2.pdf Code: {code:java} AutoDetectParser parser = new AutoDetectParser( new TypeDetector(), new PDFParser(), new OfficeParser(), new HtmlParser(), new RTFParser(), new OOXMLParser()); WriteOutContentHandler contentHandler = new WriteOutContentHandler(); Metadata metadata = new Metadata(); parser.parse(contentStream, new BodyContentHandler(contentHandler), metadata, new ParseContext()); {code} Stack trace: {code} java.lang.StackOverflowError at java.util.LinkedHashMap$LinkedHashIterator.<init>(LinkedHashMap.java:345) at java.util.LinkedHashMap$LinkedHashIterator.<init>(LinkedHashMap.java:345) at java.util.LinkedHashMap$KeyIterator.<init>(LinkedHashMap.java:383) at java.util.LinkedHashMap$KeyIterator.<init>(LinkedHashMap.java:383) at java.util.LinkedHashMap.newKeyIterator(LinkedHashMap.java:396) at java.util.HashMap$KeySet.iterator(HashMap.java:874) at org.apache.pdfbox.cos.COSDictionary.toString(COSDictionary.java:1416) at org.apache.pdfbox.cos.COSDictionary.toString(COSDictionary.java:1421) at org.apache.pdfbox.cos.COSDictionary.toString(COSDictionary.java:1421) at org.apache.pdfbox.cos.COSDictionary.toString(COSDictionary.java:1421) at org.apache.pdfbox.cos.COSDictionary.toString(COSDictionary.java:1421) ... {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira