Amit Kumar created TIKA-2271:
--------------------------------

             Summary: Tika parsing gives maximum limit reached error
                 Key: TIKA-2271
                 URL: https://issues.apache.org/jira/browse/TIKA-2271
             Project: Tika
          Issue Type: Bug
            Reporter: Amit Kumar




I am using Apache Tika for getting content from PDF files. When I run it I get 
below error. I don't see this error documented anywhere and this is just a bad 
surprise.

org.apache.tika.sax.WriteOutContentHandler$WriteLimitReachedException: Your 
document contained more than 100000 characters, and so your requested limit has 
been reached. To receive the full text of the document, increase your limit. 
(Text up to the limit is however available).
    at 
org.apache.tika.sax.WriteOutContentHandler.characters(WriteOutContentHandler.java:141)
    at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
    at 
org.apache.tika.sax.xpath.MatchingContentHandler.characters(MatchingContentHandler.java:85)
    at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
    at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
    at 
org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
    at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
    at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
    at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
    at 
org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
    at 
org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
    at 
org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
    at 
org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
    at 
org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)
    at 
org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)
    at 
org.apache.tika.parser.pdf.PDF2XHTML.writeWordSeparator(PDF2XHTML.java:318)
    at 
org.apache.pdfbox.text.PDFTextStripper.writeLine(PDFTextStripper.java:1741)
    at 
org.apache.pdfbox.text.PDFTextStripper.writePage(PDFTextStripper.java:672)
    at 
org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:392)
    at org.apache.tika.parser.pdf.PDF2XHTML.processPage(PDF2XHTML.java:141)
    at 
org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:319)
    at 
org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:266)
    at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:111)
    at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:150)
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
    at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
    at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:136)

Just want to know how to get away with this error and be able to parse files 
again. Or How to make this limit unlimited.

This question is also raised in SOO 
http://stackoverflow.com/questions/42392145/tika-parsing-gives-maximum-limit-reached-error



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to