[ https://issues.apache.org/jira/browse/TIKA-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Allison closed TIKA-1372. ----------------------------- Resolution: Fixed [~tilman], thank you for notifying us. Y, that was Tika's (well, my) fault. I fixed that thanks to a doc in govdocs1 and work on TIKA-1302. Tika SNAPSHOT works on the file submitted with PDFBOX-2218, and we should have TIKA 1.6 out shortly. [~mdhussain], thank you for submitting the issue to PDFBOX. Let us know if you are having problems with Tika trunk and your file. > PDCheckbox NPE > -------------- > > Key: TIKA-1372 > URL: https://issues.apache.org/jira/browse/TIKA-1372 > Project: Tika > Issue Type: Bug > Reporter: Tilman Hausherr > > One of your users, [~mdhussain], opened PDFBOX-2218: > PDF parsing fails for attached PDF. > Stack trace of failure: > {code} > Exception in thread "main" org.apache.tika.exception.TikaException: > Unexpected RuntimeException from org.apache.tika.parser.pdf.PDFParser@1747c > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) > at > com.sabax.extraction.FileExtractionHandler.getFileData(FileExtractionHandler.java:145) > at GenerateIndex.main(GenerateIndex.java:59) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > at java.lang.reflect.Method.invoke(Unknown Source) > at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134) > Caused by: java.lang.NullPointerException > at > org.apache.pdfbox.pdmodel.interactive.form.PDCheckbox.getOnValue(PDCheckbox.java:141) > at > org.apache.pdfbox.pdmodel.interactive.form.PDCheckbox.isChecked(PDCheckbox.java:79) > at > org.apache.pdfbox.pdmodel.interactive.form.PDRadioCollection.getValue(PDRadioCollection.java:128) > at > org.apache.tika.parser.pdf.PDF2XHTML.addFieldString(PDF2XHTML.java:507) > at > org.apache.tika.parser.pdf.PDF2XHTML.processAcroField(PDF2XHTML.java:461) > at > org.apache.tika.parser.pdf.PDF2XHTML.processAcroField(PDF2XHTML.java:479) > at > org.apache.tika.parser.pdf.PDF2XHTML.extractAcroForm(PDF2XHTML.java:447) > at org.apache.tika.parser.pdf.PDF2XHTML.endDocument(PDF2XHTML.java:195) > at > org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:341) > at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:106) > at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:143) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) > ... 9 more > {code} > Sample code use to parse > {code} > TikaInputStream tikaStream = TikaInputStream.get(stream); > TikaResultWrapper result; > try { > long streamSize = tikaStream.getLength(); > Metadata metadata = > constructMetadata(fileName, mimeType, streamSize); > if (streamSize < maxFileSize) { > SamplingSaxHandler handler = > new SamplingSaxHandler(samplingSize, metadata); > handler.setBufferLimit(bufferSize); > parser.parse(tikaStream, handler, metadata, new ParseContext()); > result = handler.getResult(); > } else { > result = new TikaResultWrapper(null, metadata); > } > } finally { > tikaStream.close(); > } > return result; > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)