Privezentsev Konstantin created TIKA-1038:
---------------------------------------------
Summary: Parsing PDF with StackOverlowError
Key: TIKA-1038
URL: https://issues.apache.org/jira/browse/TIKA-1038
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions: 1.2
Reporter: Privezentsev Konstantin
Tika corrupt with StackOverflowError on some pdf documents:
http://www.ellipse-labo.com/fiches/1303214351.pdf
http://downloads.joomlacode.org/frsrelease/5/4/0/54089/handbuch_ckforms-DE-1.3.2.pdf
Code:
{code:java}
AutoDetectParser parser = new AutoDetectParser(
new TypeDetector(),
new PDFParser(),
new OfficeParser(),
new HtmlParser(),
new RTFParser(),
new OOXMLParser());
WriteOutContentHandler contentHandler = new WriteOutContentHandler();
Metadata metadata = new Metadata();
parser.parse(contentStream, new BodyContentHandler(contentHandler), metadata,
new ParseContext());
{code}
Stack trace:
{code}
java.lang.StackOverflowError
at
java.util.LinkedHashMap$LinkedHashIterator.<init>(LinkedHashMap.java:345)
at
java.util.LinkedHashMap$LinkedHashIterator.<init>(LinkedHashMap.java:345)
at java.util.LinkedHashMap$KeyIterator.<init>(LinkedHashMap.java:383)
at java.util.LinkedHashMap$KeyIterator.<init>(LinkedHashMap.java:383)
at java.util.LinkedHashMap.newKeyIterator(LinkedHashMap.java:396)
at java.util.HashMap$KeySet.iterator(HashMap.java:874)
at org.apache.pdfbox.cos.COSDictionary.toString(COSDictionary.java:1416)
at org.apache.pdfbox.cos.COSDictionary.toString(COSDictionary.java:1421)
at org.apache.pdfbox.cos.COSDictionary.toString(COSDictionary.java:1421)
at org.apache.pdfbox.cos.COSDictionary.toString(COSDictionary.java:1421)
at org.apache.pdfbox.cos.COSDictionary.toString(COSDictionary.java:1421)
...
{code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira