Got it. Thank you. I wanted to confirm that nothing had changed since last summer (PDFBOX-2855).
Are you taking bug reports for jempbox or is that entirely eol'd? Any recommendations for a somewhat lenient, Apache license-compatible XMP parser? Might it make sense to include in the README or in the package javadocs something about the goals for XmpBox? It is entirely possible that I missed the warning. ;) Thank you, again. Best, Tim -----Original Message----- From: Tilman Hausherr [mailto:thaush...@t-online.de] Sent: Tuesday, March 08, 2016 12:13 PM To: dev@pdfbox.apache.org Subject: Re: roadmap for XMPBox? I think the problem is that XmpBox was written for PDF/A checking, so it fails with XMPs that are not PDF/A. For example, file 000142.pdf has the schema http://ns.adobe.com/pdfx/1.3/ which is not allowed for PDF/A: http://www.pdfa.org/wp-content/uploads/2011/08/tn0008_predefined_xmp_properties_in_pdfa-1_2008-03-20.pdf And no, there are no plans for anything on XMP at this time... Tilman Am 07.03.2016 um 19:31 schrieb Allison, Timothy B.: > All, > > > > When we migrate to PDFBox 2.x over on Tika, I'd much prefer to switch > from our current reliance on jempbox to XMPBox. I recently extracted ~70k > XMPs from PDFs with PDFBox 2.0.0-SNAPSHOT, and when I ran XMPBox's parser, > there were exceptions on roughly 40% of the XMPs. > > > > I’m including a table below of the counts of exception messages. Are > there any plans to make XMPBox more lenient or is this what we can expect > going forward? > > > > As always, I’m more than happy to help with files and tests. Let me know > what I can do. > > > > Cheers, > > > > Tim > > > > No XmpParsingException on 42,022 files. > > > > > > > > Exceptions: > > > Cannot find a definition for the namespace > http://ns.adobe.com/pdfx/1.3/ > > 13403 > > Type 'originalDocumentID' not defined in > http://ns.adobe.com/xap/1.0/sType/ResourceRef# > > 3710 > > Missing pdfaSchema:property in type definition > > 3113 > > Expecting namespace 'adobe:ns:meta/' and found > 'http://www.w3.org/1999/02/22-rdf-syntax-ns#' > > 2867 > > Invalid array type, expecting Seq and found Bag [prefix=dc; > name=creator] > > 927 > > Invalid array type, expecting Alt and found Seq [prefix=dc; > name=description] > > 723 > > Cannot find a definition for the namespace > http://ns.adobe.com/xmp/InDesign/private > > 710 > > Invalid array type, expecting Bag and found Seq [prefix=dc; > name=subject] > > 654 > > Cannot find a definition for the namespace > http://ns.adobe.com/AcrobatAdhocWorkflow/1.0/ > > 522 > > Failed to parse > > 492 > > Invalid array definition, expecting Seq and found > com.sun.org.apache.xerces.internal.dom.DeferredTextImpl [prefix=dc; > name=date] > > 370 > > Cannot find a definition for the namespace > http://ns.adobe.com/illustrator/1.0/ > > 262 > > Cannot find a definition for the namespace > http://ns.adobe.com/xfa/promoted-desc/ > > 188 > > Failed to instanciate property in xmp:CreateDate > > 144 > > Schema is not set in this document : > http://www.w3.org/1999/02/22-rdf-syntax-ns# > > 125 > > Expecting local name 'xmpmeta' and found 'xapmeta' > > 94 > > Cannot find a definition for the namespace > http://www.rwjf.org/rwjf/1.0 > > 84 > > Failed to instanciate property in xap:CreateDate > > 74 > > Invalid array definition, expecting Bag and found > com.sun.org.apache.xerces.internal.dom.DeferredTextImpl [prefix=dc; > name=language] > > 68 > > Invalid array definition, expecting Alt and found > com.sun.org.apache.xerces.internal.dom.DeferredTextImpl [prefix=dc; > name=title] > > 49 > > Cannot find a definition for the namespace http://www.sap.com > > 46 > > Failed to instanciate property in exif:ColorSpace > > 33 > > Failed to instanciate property in xmpMM:History > > 28 > > xmp should start with a processing instruction > > 26 > > Cannot find a definition for the namespace > http://prismstandard.org/namespaces/basic/2.0/ > > 24 > > Cannot find a definition for the namespace > http://www.npes.org/pdfx/ns/id/ > > 21 > > Cannot find a definition for the namespace > http://ns.InsiderSoftware.com/fontlist/1.0/ > > 14 > > Invalid array definition, expecting Seq and found > com.sun.org.apache.xerces.internal.dom.DeferredTextImpl [prefix=dc; > name=creator] > > 14 > > Failed to instanciate property in xmp:MetadataDate > > 12 > > Cannot find a definition for the namespace > http://ns.xinet.com/webnative/private/1.0/ > > 10 > > Failed to instanciate property in xap:ModifyDate > > 10 > > Failed to instanciate property in xmp:ModifyDate > > 10 > > Type 'params' not defined in > http://ns.adobe.com/xap/1.0/sType/ResourceEvent# > > 9 > > Invalid array type, expecting Seq and found Bag [prefix=xmpMM; > name=History] > > 8 > > Type 'documentName' not defined in > http://ns.adobe.com/xap/1.0/sType/ResourceRef# > > 8 > > Cannot find a definition for the namespace http://www.day.com/dam/1.0 > > 7 > > Cannot find a definition for the namespace ptc > > 7 > > Failed to instanciate property in xapMM:History > > 6 > > Invalid array definition, expecting Seq and found > com.sun.org.apache.xerces.internal.dom.DeferredTextImpl [prefix=tiff; > name=YCbCrPositioning] > > 5 > > Schema is not set in this document : http://purl.org/dc/elements/1.1/ > > 5 > > Cannot find a definition for the namespace > http://www.extensis.com/meta/FontSense/ > > 4 > > Excepted xpacket 'end' attribute (must be present and placed in first) > > 4 > > Invalid array type, expecting Seq and found Bag [prefix=photoshop; > name=TextLayers] > > 3 > > Schema is not set in this document : http://ns.adobe.com/xap/1.0/ > > 3 > > no message (NPE) > > 2 > > Cannot find a definition for the namespace > http://laserfiche.com/xmp/schema/1.0/ > > 2 > > Cannot find a definition for the namespace > http://ns.adobe.com/AdobeFormsCentralWorkflow/1.0/ > > 2 > > Cannot find a definition for the namespace > http://ns.adobe.com/camera-raw-settings/1.0/ > > 2 > > Failed to instanciate property in xapRights:Marked > > 2 > > Invalid array type, expecting Alt and found Bag [prefix=dc; > name=title] > > 2 > > Invalid array type, expecting Alt and found Seq [prefix=dc; > name=title] > > 2 > > Invalid array type, expecting Seq and found Alt [prefix=dc; > name=creator] > > 2 > > Cannot find a definition for the namespace > http://ns.cambridgeassociates.com/status/1.0/ > > 1 > > Cannot find a definition for the namespace > http://ns.computershare.com.au/ccs/1.0/ > > 1 > > Cannot find a definition for the namespace > http://ns.esko-graphics.com/grinfo/1.0/ > > 1 > > Cannot find a definition for the namespace > http://ns.tripletriangle.com/ns/tripletri/ > > 1 > > Cannot find a definition for the namespace > http://prismstandard.org/namespaces/basic/2.1/ > > 1 > > Cannot find a definition for the namespace > http://www.aiim.org/pdfa/ns/id.html > > 1 > > Cannot find a definition for the namespace > http://www.aiim.org/pdfe/ns/id/ > > 1 > > Cannot find a definition for the namespace > http://www.enfocus.com/ns/CertifiedPDF/2.0/ > > 1 > > Cannot find a definition for the namespace > http://www.northplains.com/xmpnps/cov/1.0/ > > 1 > > Failed to instanciate property in xmpRights:Marked > > 1 > > Invalid array type, expecting Seq and found Bag [prefix=dc; name=date] > > 1 > > This namespace is not a schema or a structured type : > http://ns.adobe.com/xap/1.0/sType/Job# > > 1 > > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org