Amit Kumar created TIKA-2271:
--------------------------------
Summary: Tika parsing gives maximum limit reached error
Key: TIKA-2271
URL: https://issues.apache.org/jira/browse/TIKA-2271
Project: Tika
Issue Type: Bug
Reporter: Amit Kumar
I am using Apache Tika for getting content from PDF files. When I run it I get
below error. I don't see this error documented anywhere and this is just a bad
surprise.
org.apache.tika.sax.WriteOutContentHandler$WriteLimitReachedException: Your
document contained more than 100000 characters, and so your requested limit has
been reached. To receive the full text of the document, increase your limit.
(Text up to the limit is however available).
at
org.apache.tika.sax.WriteOutContentHandler.characters(WriteOutContentHandler.java:141)
at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at
org.apache.tika.sax.xpath.MatchingContentHandler.characters(MatchingContentHandler.java:85)
at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at
org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at
org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
at
org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
at
org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
at
org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
at
org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)
at
org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)
at
org.apache.tika.parser.pdf.PDF2XHTML.writeWordSeparator(PDF2XHTML.java:318)
at
org.apache.pdfbox.text.PDFTextStripper.writeLine(PDFTextStripper.java:1741)
at
org.apache.pdfbox.text.PDFTextStripper.writePage(PDFTextStripper.java:672)
at
org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:392)
at org.apache.tika.parser.pdf.PDF2XHTML.processPage(PDF2XHTML.java:141)
at
org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:319)
at
org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:266)
at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:111)
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:150)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:136)
Just want to know how to get away with this error and be able to parse files
again. Or How to make this limit unlimited.
This question is also raised in SOO
http://stackoverflow.com/questions/42392145/tika-parsing-gives-maximum-limit-reached-error
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)