Hello,
I'm trying to understand why I'm getting a NullPointerException when I
merely try to load this one particular PDF and then getNumberOfPages().
The core problem seems to be there the "Pages" in the document catalog
references an object which doesn't seem to exist. Here's the metadata
from the PDF:
<</Metadata 178 0 R/PageLayout/OneColumn/Pages 186 0 R/Type/Catalog>>
I searched for "186" with a text editor and it doesn't appear anywhere
else in the PDF. This explains why cat.getPages() (in
PDDocument.getNumberOfPages()) returns null, which then causes the NPE.
Code:
doc = PDDocument.load(inputFile);
System.out.println("Number of pages = " + doc.getNumberOfPages());
Stacktrace:
java.lang.NullPointerException
at
org.apache.pdfbox.pdmodel.PDPageNode.getCount(PDPageNode.java:102)
at
org.apache.pdfbox.pdmodel.PDDocument.getNumberOfPages(PDDocument.java:931)
at com.xldynamics.common.PdfBoxTest.main(PdfBoxTest.java:30)
I can open this same file in Adobe Acrobat and Adobe Reader with no
problem. If those programs can open it, I think PDFBox should be able to
as well. I'm using HEAD tag (revision 937546), Windows Vista 32-bit, Java
1.5.0_06.
I think the reason this is happening may be on account of the owner
password (which I don't know, by the way), however I didn't think the
owner password would prevent doing things as simple as getting a page
count. So my questions are:
1.) Is this NullPointerException caused by the owner password?
2.) How can I process this file (or any file with an owner password, if
that is the issue)?
3.) I'm not sure if this is a bug in the lib or not, but should I open up
a ticket on jira anyway so I can attach the PDF for reference (since I
can't attach it on the mailing lists)?
I remember seeing someone suggest decrypting PDFs with a null password, or
empty string at some point in the past for some crypto problem. I'm not
sure if that's a logical thing to do in my particular case, but I tried it
anyway. That resulted in a different stacktrace, but I may be going in
the complete wrong direction here... The reason for this stacktrace is
that lastByte (BaseParser.java line 1254) was -1 on the first iteration of
the loop which left intBuffer empty. Integer.parseInt() then throws the
exception and results in the following stacktrace:
java.io.IOException: Error: Expected an integer type, actual=''
at
org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1275)
at
org.apache.pdfbox.pdfparser.PDFObjectStreamParser.parse(PDFObjectStreamParser.java:81)
at
org.apache.pdfbox.cos.COSDocument.dereferenceObjectStreams(COSDocument.java:449)
at
org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:1100)
at com.xldynamics.common.PdfBoxTest.main(PdfBoxTest.java:32)
If anyone has any suggestions on where I should go next, I'd be most
grateful. Just for the record, this issue is not at all related to
PDFBOX-699 nor PDFBOX-700 which I opened yesterday.
Thanks,
Adam
? Click here to submit conditions
This email and any content within or attached hereto from Sun West Mortgage
Company, Inc. is confidential and/or legally privileged. The information is
intended only for the use of the individual or entity named on this email. If
you are not the intended recipient, you are hereby notified that any
disclosure, copying, distribution or the taking of any action in reliance on
the contents of this email information is strictly prohibited, and that the
documents should be returned to this office immediately by email. Receipt by
anyone other than the intended recipient is not a waiver of any privilege.
Please do not include your social security number, account number, or any other
personal or financial information in the content of the email. Should you have
any questions, please call (800) 453 7884.