[
https://issues.apache.org/jira/browse/TIKA-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Amit Kumar closed TIKA-2271.
----------------------------
Resolution: Not A Problem
One can use the writeLimit to set the limit or even disable it using:
public BodyContentHandler(int writeLimit)
The docs says the following:
writeLimit - maximum number of characters to include in the string, or -1 to
disable the write limit
> Tika parsing gives maximum limit reached error
> ----------------------------------------------
>
> Key: TIKA-2271
> URL: https://issues.apache.org/jira/browse/TIKA-2271
> Project: Tika
> Issue Type: Bug
> Reporter: Amit Kumar
>
> I am using Apache Tika for getting content from PDF files. When I run it I
> get below error. I don't see this error documented anywhere and this is just
> a bad surprise.
> org.apache.tika.sax.WriteOutContentHandler$WriteLimitReachedException: Your
> document contained more than 100000 characters, and so your requested limit
> has been reached. To receive the full text of the document, increase your
> limit. (Text up to the limit is however available).
> at
> org.apache.tika.sax.WriteOutContentHandler.characters(WriteOutContentHandler.java:141)
> at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
> at
> org.apache.tika.sax.xpath.MatchingContentHandler.characters(MatchingContentHandler.java:85)
> at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
> at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
> at
> org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
> at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
> at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
> at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
> at
> org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
> at
> org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
> at
> org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
> at
> org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
> at
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)
> at
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)
> at
> org.apache.tika.parser.pdf.PDF2XHTML.writeWordSeparator(PDF2XHTML.java:318)
> at
> org.apache.pdfbox.text.PDFTextStripper.writeLine(PDFTextStripper.java:1741)
> at
> org.apache.pdfbox.text.PDFTextStripper.writePage(PDFTextStripper.java:672)
> at
> org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:392)
> at org.apache.tika.parser.pdf.PDF2XHTML.processPage(PDF2XHTML.java:141)
> at
> org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:319)
> at
> org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:266)
> at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:111)
> at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:150)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:136)
> Just want to know how to get away with this error and be able to parse files
> again. Or How to make this limit unlimited.
> This question is also raised in SOO
> http://stackoverflow.com/questions/42392145/tika-parsing-gives-maximum-limit-reached-error
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)