[ https://issues.apache.org/jira/browse/PDFBOX-5815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tilman Hausherr closed PDFBOX-5815. ----------------------------------- Fix Version/s: (was: 3.0.3 PDFBox) Resolution: Duplicate Duplicate of PDFBOX-5792. You can test with a snapshot: https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/3.0.3-SNAPSHOT/ > Can't split the document into individual pages > ---------------------------------------------- > > Key: PDFBOX-5815 > URL: https://issues.apache.org/jira/browse/PDFBOX-5815 > Project: PDFBox > Issue Type: Bug > Affects Versions: 3.0.2 PDFBox > Reporter: Nicolò Rossi > Priority: Critical > Labels: Links, java.lang.nullPointerException, link, split > Attachments: CTU.pdf > > Original Estimate: 2h > Remaining Estimate: 2h > > If I try to split a document, containing links to internal pages, by single > page, Splitter class throws {*}NPE{*}. > > This is our code: > > {code:java} > PDDocument pdfDocument = Loader.loadPDF(new File("path/to/file.pdf")); > List<PDDocument> splitted = splitter.split(pdfDocument); {code} > > This the exception: > > {code:java} > java.lang.NullPointerException: Cannot invoke > "org.apache.pdfbox.pdmodel.PDPage.getCOSObject()" because the return value of > "org.apache.pdfbox.pdmodel.interactive.documentnavigation.destination.PDPageDestination.getPage()" > is null > at org.apache.pdfbox.multipdf.Splitter.fixDestinations(Splitter.java:153) > at org.apache.pdfbox.multipdf.Splitter.split(Splitter.java:136){code} > > I search for the error and i see that it breaks in splitter class in > +{color:#172b4d}_fixDestinations_{color}+ {color:#172b4d}method.{color} > > {color:#172b4d}I report here the method definition:{color} > {code:java} > private void fixDestinations(PDDocument destinationDocument) > { > PDPageTree pageTree = destinationDocument.getPages(); > for (PDPageDestination pageDestination : destToFixSet) > { > COSDictionary srcPageDict = pageDestination.getPage().getCOSObject(); > COSDictionary dstPageDict = pageDictMap.get(srcPageDict); > PDPage dstPage = new PDPage(dstPageDict); > // Find whether destination is inside or outside > if (pageTree.indexOf(dstPage) >= 0) > { > pageDestination.setPage(dstPage); > } > else > { > pageDestination.setPage(null); > } > } > } {code} > h2. What's the problem: > _+pageDestination.getPage()+_ returns null because the document contains > links to internal pages, so splitting by page there is no more valid page to > link in the result splitted document. > > h2. Possible solution: > check the page returned and if null set +_pageDestination_+ to null, I could > suggest something like this: > > {code:java} > private void fixDestinations(PDDocument destinationDocument) > { > PDPageTree pageTree = destinationDocument.getPages(); > for (PDPageDestination pageDestination : destToFixSet) > { > PDPage srcPage = pageDestination.getPage(); > if (srcPage != null){ > COSDictionary srcPageDict = srcPage.getCOSObject(); > COSDictionary dstPageDict = pageDictMap.get(srcPageDict); > PDPage dstPage = new PDPage(dstPageDict); > // Find whether destination is inside or outside > if (pageTree.indexOf(dstPage) >= 0) > { > pageDestination.setPage(dstPage); > } > else > { > pageDestination.setPage(null); > } > } > else > { > pageDestination.setPage(null); > } > } > } {code} > > I've attached example file, thanks. > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org