It's still impossible to get a page number for a given bookmark without the toList() method below. In addition to that accessor, I've added a couple functions to PDDocument which will build a map of object IDs and page numbers. This allows getting a bookmark and then getting the page number very quickly. Here's an example usage:
COSObject targetPageRef = (COSObject)((COSArray)item.getAction().getCOSDictionary().getDictionaryObject("D")).get(0); doc.generatePageMap(); String objStr = String.valueOf(targetPageRef.getObjectNumber().intValue()); String genStr = String.valueOf(targetPageRef.getGenerationNumber().intValue()); Integer pageNumber = doc.getPageMap().get(objStr+","+genStr); I didn't change the load method to automatically generate this because I feel this map should only be generated if necessary, not done by default. Here are the additional methods I added: /** * This assocates object ids with a page number. It's used to determine * the page number for bookmarks (or page numbers for anything else for * which you have an object id for that matter). */ private Map pageMap = null; public void generatePageMap() { this.pageMap = new HashMap(); // these page nodes could be references to pages, or to arrarys COSArray pageNodes = ((COSArrayList)(this.getDocumentCatalog().getPages().getKids())).toList(); for(int arrayCounter=0; arrayCounter < pageNodes.size(); ++arrayCounter) { parseCatalogObject((COSObject)pageNodes.get(arrayCounter)); } } private void parseCatalogObject(COSObject thePageOrArrayObject) { COSBase arrayCountBase = thePageOrArrayObject.getItem(COSName.COUNT); int arrayCount = -1; if(arrayCountBase instanceof COSInteger) arrayCount = ((COSInteger)arrayCountBase).intValue(); COSBase kidsBase = thePageOrArrayObject.getItem(COSName.KIDS); int kidsCount = -1; if(kidsBase instanceof COSArray) kidsCount = ((COSArray)kidsBase).size(); if(arrayCount == -1 || kidsCount == -1) { // these cases occur when we have a page, not an array of pages String objStr = String.valueOf(thePageOrArrayObject.getObjectNumber().intValue()); String genStr = String.valueOf(thePageOrArrayObject.getGenerationNumber().intValue()); this.getPageMap().put(objStr+","+genStr, this.getPageMap().size()+1); } else { // we either have an array of page pointers, or an array of arrays if(arrayCount == kidsCount) { // process the kids... they're all references to pages COSArray kidsArray = ((COSArray)kidsBase); for(int i=0; i<kidsArray.size(); ++i) { COSObject thisObject = (COSObject)kidsArray.get(i); String objStr = String.valueOf(thisObject.getObjectNumber().intValue()); String genStr = String.valueOf(thisObject.getGenerationNumber().intValue()); this.getPageMap().put(objStr+","+genStr, this.getPageMap().size()+1); } } else { // this object is an array of references to other arrays COSArray list = null; if(kidsBase instanceof COSArray) list = ((COSArray)kidsBase); if(list != null) { for(int arrayCounter=0; arrayCounter < list.size(); ++arrayCounter) { parseCatalogObject((COSObject)list.get(arrayCounter)); } } } } } I'm not sure how to contribute it back to the project other than posting to this list. As I understand it JIRA is for bugs, and this is not a bug, it's a feature addition. I've tested this code with a few PDFs from various sources and it's working fine. I removed the templates so it's compatible with Java 1.4. Hopefully this code will be accepted and get put in the HEAD tag so it can help other people. If there are any questions as to the logic which aren't answered in the Adobe Specification for PDF, let me know and I'll explain. --Adam Adam Nichols/UR/CER/XLDynamics 06/02/2009 11:24 To pdfbox-dev@incubator.apache.org cc Subject Re: Get page number for bookmark toArray returns an array of PDPage objects. Since the PDPage objects do not contain the page ID (as far as can tell), this will not suffice. Likewise iterator() and listIterator() both iterate over collections of PDPages. What I need is a list of object IDs for the pages; I don't want the actual data in the pages. --Adam Andreas Lehmkühler <andr...@lehmi.de> 05/31/2009 10:25 Please respond to pdfbox-dev@incubator.apache.org To pdfbox-dev@incubator.apache.org cc Subject Re: Get page number for bookmark Hi Adam, did you ever try to use one of the following methods: COSArrayList.iterator COSArrayList.listIterator COSArrayList.toArray All of them should return the data you're looking for. Andreas Lehmkühler Adam Nichols schrieb: > I propose the following function is added to COSArrayList. > > public COSArray toList() > { > COSArray copy = new COSArray(); > for(int i=0; i < array.size(); ++i) > copy.add(array.get(i)); > return copy; > } > > This returns a copy of the array to ensure the callers can't modify the > internal array variable. Callers already have access to this information > as it is the same data as is returned by toString() just in a more useful > format. If it's OK to give the callers the ability to modify array > directly, a reference could simply be returned. > > I needed this because I got the "indirect reference to a page object" from > the target listed in a GoTo action and needed the page number. By using > the method above I was able to get the page references in the catalog and > thus was able to determine the page number based on the page object id I > had. >