It's still impossible to get a page number for a given bookmark without 
the toList() method below.  In addition to that accessor, I've added a 
couple functions to PDDocument which will build a map of object IDs and 
page numbers.  This allows getting a bookmark and then getting the page 
number very quickly.  Here's an example usage:

COSObject targetPageRef = 
(COSObject)((COSArray)item.getAction().getCOSDictionary().getDictionaryObject("D")).get(0);
doc.generatePageMap();
String objStr = 
String.valueOf(targetPageRef.getObjectNumber().intValue());
String genStr = 
String.valueOf(targetPageRef.getGenerationNumber().intValue());
Integer pageNumber = doc.getPageMap().get(objStr+","+genStr);

I didn't change the load method to automatically generate this because I 
feel this map should only be generated if necessary, not done by default. 
Here are the additional methods I added:

    /**
     * This assocates object ids with a page number.  It's used to 
determine
     * the page number for bookmarks (or page numbers for anything else 
for
     * which you have an object id for that matter). 
     */
    private Map pageMap = null;


    public void generatePageMap() {
        this.pageMap = new HashMap();
 
        // these page nodes could be references to pages, or to arrarys
        COSArray pageNodes = 
((COSArrayList)(this.getDocumentCatalog().getPages().getKids())).toList();
 
        for(int arrayCounter=0; arrayCounter < pageNodes.size(); 
++arrayCounter) {
            parseCatalogObject((COSObject)pageNodes.get(arrayCounter));
        }
    }

    private void parseCatalogObject(COSObject thePageOrArrayObject) {
        COSBase arrayCountBase = 
thePageOrArrayObject.getItem(COSName.COUNT);
        int arrayCount = -1;
        if(arrayCountBase instanceof COSInteger)
            arrayCount = ((COSInteger)arrayCountBase).intValue();

        COSBase kidsBase = thePageOrArrayObject.getItem(COSName.KIDS);
        int kidsCount = -1;
        if(kidsBase instanceof COSArray)
            kidsCount = ((COSArray)kidsBase).size();

        if(arrayCount == -1 || kidsCount == -1) {
            // these cases occur when we have a page, not an array of 
pages
            String objStr = 
String.valueOf(thePageOrArrayObject.getObjectNumber().intValue());
            String genStr = 
String.valueOf(thePageOrArrayObject.getGenerationNumber().intValue());
            this.getPageMap().put(objStr+","+genStr, 
this.getPageMap().size()+1);
        } else {
            // we either have an array of page pointers, or an array of 
arrays
            if(arrayCount == kidsCount) {
                // process the kids... they're all references to pages
                COSArray kidsArray = ((COSArray)kidsBase);
                for(int i=0; i<kidsArray.size(); ++i) {
                    COSObject thisObject = (COSObject)kidsArray.get(i);
                    String objStr = 
String.valueOf(thisObject.getObjectNumber().intValue());
                    String genStr = 
String.valueOf(thisObject.getGenerationNumber().intValue());
                    this.getPageMap().put(objStr+","+genStr, 
this.getPageMap().size()+1);
                }
            } else {
                // this object is an array of references to other arrays
                COSArray list = null;
                if(kidsBase instanceof COSArray)
                    list = ((COSArray)kidsBase);
                if(list != null) {
                    for(int arrayCounter=0; arrayCounter < list.size(); 
++arrayCounter) {
 parseCatalogObject((COSObject)list.get(arrayCounter));
                    }
                }
            }
        }
    }

I'm not sure how to contribute it back to the project other than posting 
to this list.  As I understand it JIRA is for bugs, and this is not a bug, 
it's a feature addition.  I've tested this code with a few PDFs from 
various sources and it's working fine.  I removed the templates so it's 
compatible with Java 1.4.  Hopefully this code will be accepted and get 
put in the HEAD tag so it can help other people.

If there are any questions as to the logic which aren't answered in the 
Adobe Specification for PDF, let me know and I'll explain.

--Adam




Adam Nichols/UR/CER/XLDynamics
06/02/2009 11:24

To
pdfbox-dev@incubator.apache.org
cc

Subject
Re: Get page number for bookmark





toArray returns an array of PDPage objects.  Since the PDPage objects do 
not contain the page ID (as far as  can tell), this will not suffice. 
Likewise iterator() and listIterator() both iterate over collections of 
PDPages.  What I need is a list of object IDs for the pages; I don't want 
the actual data in the pages.

--Adam




Andreas Lehmkühler <andr...@lehmi.de> 
05/31/2009 10:25
Please respond to
pdfbox-dev@incubator.apache.org


To
pdfbox-dev@incubator.apache.org
cc

Subject
Re: Get page number for bookmark






Hi Adam,

did you ever try to use one of the following methods:

COSArrayList.iterator
COSArrayList.listIterator
COSArrayList.toArray

All of them should return the data you're looking for.

Andreas Lehmkühler

Adam Nichols schrieb:
> I propose the following function is added to COSArrayList.
> 
>     public COSArray toList() 
>     {
>         COSArray copy = new COSArray();
>         for(int i=0; i < array.size(); ++i)
>             copy.add(array.get(i));
>         return copy;
>     }
> 
> This returns a copy of the array to ensure the callers can't modify the 
> internal array variable.  Callers already have access to this 
information 
> as it is the same data as is returned by toString() just in a more 
useful 
> format.  If it's OK to give the callers the ability to modify array 
> directly, a reference could simply be returned.
> 
> I needed this because I got the "indirect reference to a page object" 
from 
> the target listed in a GoTo action and needed the page number.  By using 

> the method above I was able to get the page references in the catalog 
and 
> thus was able to determine the page number based on the page object id I 

> had.
> 


Reply via email to