Nicolò Rossi created PDFBOX-5815:
------------------------------------

             Summary: Can't split the document into individual pages
                 Key: PDFBOX-5815
                 URL: https://issues.apache.org/jira/browse/PDFBOX-5815
             Project: PDFBox
          Issue Type: Bug
    Affects Versions: 3.0.2 PDFBox
            Reporter: Nicolò Rossi
             Fix For: 3.0.3 PDFBox
         Attachments: CTU.pdf

If I try to split a document, containing links to internal pages, by single 
page, Splitter class throws {*}NPE{*}.

 

This is our code:

 
{code:java}
PDDocument pdfDocument = Loader.loadPDF(new File("path/to/file.pdf"));
List<PDDocument> splitted = splitter.split(pdfDocument); {code}
 

This the exception:

 
{code:java}
java.lang.NullPointerException: Cannot invoke 
"org.apache.pdfbox.pdmodel.PDPage.getCOSObject()" because the return value of 
"org.apache.pdfbox.pdmodel.interactive.documentnavigation.destination.PDPageDestination.getPage()"
 is null
    at org.apache.pdfbox.multipdf.Splitter.fixDestinations(Splitter.java:153)
    at org.apache.pdfbox.multipdf.Splitter.split(Splitter.java:136){code}
 

I search for the error and i see that it breaks in splitter class in 
+{color:#172b4d}_fixDestinations_{color}+ {color:#172b4d}method.{color}
 
{color:#172b4d}I report here the method definition:{color}
{code:java}
private void fixDestinations(PDDocument destinationDocument)
{
    PDPageTree pageTree = destinationDocument.getPages();
    for (PDPageDestination pageDestination : destToFixSet)
    {
        COSDictionary srcPageDict = pageDestination.getPage().getCOSObject();
        COSDictionary dstPageDict = pageDictMap.get(srcPageDict);
        PDPage dstPage = new PDPage(dstPageDict);
        // Find whether destination is inside or outside
        if (pageTree.indexOf(dstPage) >= 0)
        {
            pageDestination.setPage(dstPage);
        }
        else
        {
            pageDestination.setPage(null);
        }
    }
} {code}
h2. What's the problem:

_+pageDestination.getPage()+_ returns null because the document contains links 
to internal pages, so splitting by page there is no more valid page to link in 
the result splitted document.

 
h2. Possible solution:

check the page returned and if null set +_pageDestination_+ to null, I could 
suggest something like this:

 
{code:java}
private void fixDestinations(PDDocument destinationDocument)
{
    PDPageTree pageTree = destinationDocument.getPages();
    for (PDPageDestination pageDestination : destToFixSet)
    {
        PDPage srcPage = pageDestination.getPage();
        if (srcPage != null){
            COSDictionary srcPageDict = srcPage.getCOSObject();
            COSDictionary dstPageDict = pageDictMap.get(srcPageDict);
            PDPage dstPage = new PDPage(dstPageDict);
            // Find whether destination is inside or outside
            if (pageTree.indexOf(dstPage) >= 0)
            {
                pageDestination.setPage(dstPage);
            }
            else
            {
                pageDestination.setPage(null);
            }
        }
        else
        {
            pageDestination.setPage(null);
        }
    }
} {code}
 

I've attached example file, thanks.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to