[ 
https://issues.apache.org/jira/browse/TIKA-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amit Kumar closed TIKA-2271.
----------------------------
    Resolution: Not A Problem

One can use the writeLimit to set the limit or even disable it using:

public BodyContentHandler(int writeLimit)

The docs says the following:
writeLimit - maximum number of characters to include in the string, or -1 to 
disable the write limit

> Tika parsing gives maximum limit reached error
> ----------------------------------------------
>
>                 Key: TIKA-2271
>                 URL: https://issues.apache.org/jira/browse/TIKA-2271
>             Project: Tika
>          Issue Type: Bug
>            Reporter: Amit Kumar
>
> I am using Apache Tika for getting content from PDF files. When I run it I 
> get below error. I don't see this error documented anywhere and this is just 
> a bad surprise.
> org.apache.tika.sax.WriteOutContentHandler$WriteLimitReachedException: Your 
> document contained more than 100000 characters, and so your requested limit 
> has been reached. To receive the full text of the document, increase your 
> limit. (Text up to the limit is however available).
>     at 
> org.apache.tika.sax.WriteOutContentHandler.characters(WriteOutContentHandler.java:141)
>     at 
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>     at 
> org.apache.tika.sax.xpath.MatchingContentHandler.characters(MatchingContentHandler.java:85)
>     at 
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>     at 
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>     at 
> org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
>     at 
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>     at 
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>     at 
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>     at 
> org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
>     at 
> org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
>     at 
> org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
>     at 
> org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
>     at 
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)
>     at 
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)
>     at 
> org.apache.tika.parser.pdf.PDF2XHTML.writeWordSeparator(PDF2XHTML.java:318)
>     at 
> org.apache.pdfbox.text.PDFTextStripper.writeLine(PDFTextStripper.java:1741)
>     at 
> org.apache.pdfbox.text.PDFTextStripper.writePage(PDFTextStripper.java:672)
>     at 
> org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:392)
>     at org.apache.tika.parser.pdf.PDF2XHTML.processPage(PDF2XHTML.java:141)
>     at 
> org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:319)
>     at 
> org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:266)
>     at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:111)
>     at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:150)
>     at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>     at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>     at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>     at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:136)
> Just want to know how to get away with this error and be able to parse files 
> again. Or How to make this limit unlimited.
> This question is also raised in SOO 
> http://stackoverflow.com/questions/42392145/tika-parsing-gives-maximum-limit-reached-error



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to