Reinhard, The root element in your PDF references object 1554 as the object which informs us of the pages within this document. This object does not seem to exist in the PDF, which is a violation of the PDF spec and why PDFBox is unable to parse it. You can open the PDF in a decent text editor and search for 1554 and you'll see the Pages section which references this object, but that's the only place it's found, there's no object definition.
Now, having said that, if we can find a reliable way to parse files like these, we can update the code. Do you know what program was used to create this PDF? Would it be possible for you to remove the encryption on this file and try it again? That would make it much easier to debug (if it still crashes without the encryption, it might not). I also encourage you to create an issue of JIRA and upload this file there (in case the link dies in the future). https://issues.apache.org/jira ---- Thanks, Adam From: reinhard schwab <[email protected]> To: [email protected] Date: 08/21/2010 11:42 Subject: NPE in PDPageNode i get a nullpointer exception when parsing a pdf with tika. http://www.awsg.at/portal/media/4218.pdf java.lang.NullPointerException at org.apache.pdfbox.pdmodel.PDPageNode.getCount(PDPageNode.java:109) at org.apache.pdfbox.pdmodel.PDDocument.getNumberOfPages(PDDocument.java:943) at org.apache.tika.parser.pdf.PDFParser.extractMetadata(PDFParser.java:105) at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:86) regards reinhard ? Click here to submit conditions This email and any content within or attached hereto from Sun West Mortgage Company, Inc. is confidential and/or legally privileged. The information is intended only for the use of the individual or entity named on this email. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or the taking of any action in reliance on the contents of this email information is strictly prohibited, and that the documents should be returned to this office immediately by email. Receipt by anyone other than the intended recipient is not a waiver of any privilege. Please do not include your social security number, account number, or any other personal or financial information in the content of the email. Should you have any questions, please call (800) 453 7884.
