[ https://issues.apache.org/jira/browse/TIKA-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16070455#comment-16070455 ]
Jorge Spinsanti commented on TIKA-2406: --------------------------------------- IMHO, bad inputs (corrupt files) should be managed more specific than TikaException: may be a subclass of TikaException with current message (e.x. CorruptFileException). When one app consumes your service can catch CorruptFileException and proceed with other flow than a generic TikaException. Make sense? > IllegalArgumentException in text extraction from PDF file > --------------------------------------------------------- > > Key: TIKA-2406 > URL: https://issues.apache.org/jira/browse/TIKA-2406 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 1.15 > Reporter: Jorge Spinsanti > Attachments: IllegalArgumentException.pdf > > > I got an IllegalArgumentException in text extraction from PDF file (attached): > {code} > Caused by: org.apache.tika.exception.TikaException: Unexpected > RuntimeException from org.apache.tika.parser.pdf.PDFParser@d71dc5e > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > ... 16 more > Caused by: java.lang.IllegalArgumentException: root cannot be null > at org.apache.pdfbox.pdmodel.PDPageTree.<init>(PDPageTree.java:75) > at > org.apache.pdfbox.pdmodel.PDDocumentCatalog.getPages(PDDocumentCatalog.java:129) > at > org.apache.pdfbox.pdmodel.PDDocument.getNumberOfPages(PDDocument.java:1381) > at > org.apache.tika.parser.pdf.PDFParser.extractMetadata(PDFParser.java:235) > at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:146) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > ... 23 more > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)