[
https://issues.apache.org/jira/browse/TIKA-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15878174#comment-15878174
]
Nick Burch commented on TIKA-2271:
----------------------------------
Why are you setting a character limit on your ContentHandler if you don't want
one? Why not do what the error message suggests, and pass either a higher
limit, or no limit?
(Without knowing how you're calling Apache Tika, which you've not mentioned
here, not on SO, we can't tell you exactly what changes to make to remove the
limit you've set)
> Tika parsing gives maximum limit reached error
> ----------------------------------------------
>
> Key: TIKA-2271
> URL: https://issues.apache.org/jira/browse/TIKA-2271
> Project: Tika
> Issue Type: Bug
> Reporter: Amit Kumar
>
> I am using Apache Tika for getting content from PDF files. When I run it I
> get below error. I don't see this error documented anywhere and this is just
> a bad surprise.
> org.apache.tika.sax.WriteOutContentHandler$WriteLimitReachedException: Your
> document contained more than 100000 characters, and so your requested limit
> has been reached. To receive the full text of the document, increase your
> limit. (Text up to the limit is however available).
> at
> org.apache.tika.sax.WriteOutContentHandler.characters(WriteOutContentHandler.java:141)
> at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
> at
> org.apache.tika.sax.xpath.MatchingContentHandler.characters(MatchingContentHandler.java:85)
> at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
> at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
> at
> org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
> at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
> at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
> at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
> at
> org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
> at
> org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
> at
> org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
> at
> org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
> at
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)
> at
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)
> at
> org.apache.tika.parser.pdf.PDF2XHTML.writeWordSeparator(PDF2XHTML.java:318)
> at
> org.apache.pdfbox.text.PDFTextStripper.writeLine(PDFTextStripper.java:1741)
> at
> org.apache.pdfbox.text.PDFTextStripper.writePage(PDFTextStripper.java:672)
> at
> org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:392)
> at org.apache.tika.parser.pdf.PDF2XHTML.processPage(PDF2XHTML.java:141)
> at
> org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:319)
> at
> org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:266)
> at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:111)
> at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:150)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:136)
> Just want to know how to get away with this error and be able to parse files
> again. Or How to make this limit unlimited.
> This question is also raised in SOO
> http://stackoverflow.com/questions/42392145/tika-parsing-gives-maximum-limit-reached-error
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)