Hi, Gesendet: Do, 22. Okt 2009 Von: Robin Diederen<[email protected]>
> Hi, > > Thanks for looking into the code; I'm a bit confused though. I guess it's > your suggestion to inspect the three locations for metadata "by hand"? What > would be the best way to proceed? As I've already said I'm not a XMP expert, I just try to find possible locations where metadata are used within pdfbox. PDPage-metadata: - load the document - get all pages calling document.getDocumentCatalog().getAllPages() - iterate through all pages and check them for metadata calling getMetadata() PDXObject: - load the document - get all pages calling document.getDocumentCatalog().getAllPages() - iterate through all pages and get all XObjects by calling getXObjects() - iterate through all XObjects and check them for metadata calling getMetadata() I don't know if that really works, but give it a try. BR Andreas Lehmkühler > > Best, Robin > > -----Original message----- > From: Andreas Lehmkühler <[email protected]> > Sent: Thu 22-10-2009 22:36 > To: [email protected]; > Subject: Re: Question XMP metadata extraction > > > Robin Diederen schrieb: > > Andreas, > > > > According to the JavaDoc > (http://www.pdfbox.org/javadoc/org/pdfbox/pdmodel/common/PDMetadata.html#PDM > etadata%28org.pdfbox.pdmodel.PDDocument%29) the extractxmpmetadata method > should be able to do this. Or am I missing something? > Ok, I had a deeper look and it seems that there are 3 supported > locations for metadata within pdfbox: PDDocumentCatalog, PDPage and > PDXObject. The "classic" metadata are located in the catalog. Perhaps > you will find the metadata your are looking for in the two other objects? > > BR > Andreas Lehmkühler > > > Thanks for your help, greatly appreciated! > > > > > > > > Best, Robin > > > > -----Original message----- > > From: Andreas Lehmkühler <[email protected]> > > Sent: Thu 22-10-2009 22:09 > > To: [email protected]; > > Subject: Re: Question XMP metadata extraction > > > > Hi, > > > > Robin Diederen schrieb: > >> Hello Andreas, > >> > >> I did have a look at the PrintDocumentMetaData.java fille; there I find > that using the PDDocumentInformation metadata is extracted. This code is > useful for PDF files with "classic" metadata, but not for PDF files only > carrying XMP metadata, right? > > OK, I see. I'm not that familiar with the XMP stuff, but I guess I > > understand your problem. > > > >> There's my issue.. as soon as I have a PDF file with only XMP metadata I > need some other way to extract this metadata.. > > I'm afraid that pdfbox is yet limited to the handling of "classic" > metadata. > > > > > >> Best, Robin > >> > >> -----Original message----- > >> From: Andreas Lehmkühler <[email protected]> > >> Sent: Thu 22-10-2009 21:47 > >> To: [email protected]; > >> Subject: Re: Question XMP metadata extraction > >> > >> Hi, > >> > >> Robin Diederen schrieb: > >>> Hello all, > >>> > >>> I'm quite new to PDFbox and currently figuring out how to extract > metadata from PDF files which is in XMP format. > >>> > >>> I have a few files containing XMP metadata, but I can not get any of > those to work. And I can't seem to figure out where I am failing. > >>> > >>> A code snippet (all non-relevant code was deleted): > >>> > >>> String inputFile = "/some/file.pdf" > >>> > >>> PDDocument pdfDocument = null; > >>> pdfDocument = new PDDocument(); > >>> pdfDocument = PDDocument.load(inputFile); > >>> PDMetadata pdfMetaData = new PDMetadata(pdfDocument); > >>> > >>> int metadataLength = pdfMetaData.getLength(); > >>> System.out.println(pdfMetaData.getLength()); > >>> > >>> > >>> pdfMetaData.exportXMPMetadata(); > >>> > >>> > >>> The getLength call always returns 0; the exportXMPMetadata call returns > an error: > >>> > >>> [Fatal Error] :-1:-1: Premature end of file. > >>> Exception in thread "main" java.io.IOException: Premature end of file. > >>> at org.apache.jempbox.impl.XMLUtil.parse(XMLUtil.java:78) > >>> at org.apache.jempbox.xmp.XMPMetadata.load(XMPMetadata.java:554) > >>> at > org.apache.pdfbox.pdmodel.common.PDMetadata.exportXMPMetadata(PDMetadata.jav > a:86) > >>> at > com.robindiederen.pdf.Extractor.extractMetaDataFromXMP(Extractor.java:124) > >>> at com.robindiederen.pdf.Extractor.main(Extractor.java:90) > >>> > >>> > >>> > >>> This happens for every PDF I test. Extracting metadata from the > DocumentInformation table works as a charm. I'm using PDFbox 0.80 on Java > 1.5. > >> Have a look at PrintDocumentMetaData as an example how to extract the > >> docs metadata. > >> > >> HTH > >> Andreas Lehmkühler > >> > >> > > BR > > Andreas Lehmkühler > > > > > > > > > --- original Nachricht Ende ----
