Got it.  Thank you.  I wanted to confirm that nothing had changed since last 
summer (PDFBOX-2855).  

Are you taking bug reports for jempbox or is that entirely eol'd?  

Any recommendations for a somewhat lenient, Apache license-compatible XMP 
parser?

Might it make sense to include in the README or in the package javadocs 
something about the goals for XmpBox?  It is entirely possible that I missed 
the warning. ;)

Thank you, again.

        Best,

                  Tim

-----Original Message-----
From: Tilman Hausherr [mailto:thaush...@t-online.de] 
Sent: Tuesday, March 08, 2016 12:13 PM
To: dev@pdfbox.apache.org
Subject: Re: roadmap for XMPBox?

I think the problem is that XmpBox was written for PDF/A checking, so it fails 
with XMPs that are not PDF/A. For example, file 000142.pdf has the schema 
http://ns.adobe.com/pdfx/1.3/ which is not allowed for PDF/A:
http://www.pdfa.org/wp-content/uploads/2011/08/tn0008_predefined_xmp_properties_in_pdfa-1_2008-03-20.pdf

And no, there are no plans for anything on XMP at this time...

Tilman


Am 07.03.2016 um 19:31 schrieb Allison, Timothy B.:
> All,
>
>
>
>    When we migrate to PDFBox 2.x  over on Tika, I'd much prefer to switch 
> from our current reliance on jempbox to XMPBox.  I recently extracted ~70k 
> XMPs from PDFs with PDFBox 2.0.0-SNAPSHOT, and when I ran XMPBox's parser, 
> there were exceptions on roughly 40% of the XMPs.
>
>
>
>    I’m including a table below of the counts of exception messages.  Are 
> there any plans to make XMPBox more lenient or is this what we can expect 
> going forward?
>
>
>
>    As always, I’m more than happy to help with files and tests.  Let me know 
> what I can do.
>
>
>
>               Cheers,
>
>
>
>                        Tim
>
>
>
> No XmpParsingException on 42,022 files.
>
>
>
>
>
>
>
> Exceptions:
>
>
> Cannot find a definition for the namespace 
> http://ns.adobe.com/pdfx/1.3/
>
> 13403
>
> Type 'originalDocumentID' not defined in 
> http://ns.adobe.com/xap/1.0/sType/ResourceRef#
>
> 3710
>
> Missing pdfaSchema:property in type definition
>
> 3113
>
> Expecting namespace 'adobe:ns:meta/' and found 
> 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'
>
> 2867
>
> Invalid array type, expecting Seq and found Bag [prefix=dc; 
> name=creator]
>
> 927
>
> Invalid array type, expecting Alt and found Seq [prefix=dc; 
> name=description]
>
> 723
>
> Cannot find a definition for the namespace 
> http://ns.adobe.com/xmp/InDesign/private
>
> 710
>
> Invalid array type, expecting Bag and found Seq [prefix=dc; 
> name=subject]
>
> 654
>
> Cannot find a definition for the namespace 
> http://ns.adobe.com/AcrobatAdhocWorkflow/1.0/
>
> 522
>
> Failed to parse
>
> 492
>
> Invalid array definition, expecting Seq and found 
> com.sun.org.apache.xerces.internal.dom.DeferredTextImpl [prefix=dc; 
> name=date]
>
> 370
>
> Cannot find a definition for the namespace 
> http://ns.adobe.com/illustrator/1.0/
>
> 262
>
> Cannot find a definition for the namespace 
> http://ns.adobe.com/xfa/promoted-desc/
>
> 188
>
> Failed to instanciate property in xmp:CreateDate
>
> 144
>
> Schema is not set in this document : 
> http://www.w3.org/1999/02/22-rdf-syntax-ns#
>
> 125
>
> Expecting local name 'xmpmeta' and found 'xapmeta'
>
> 94
>
> Cannot find a definition for the namespace 
> http://www.rwjf.org/rwjf/1.0
>
> 84
>
> Failed to instanciate property in xap:CreateDate
>
> 74
>
> Invalid array definition, expecting Bag and found 
> com.sun.org.apache.xerces.internal.dom.DeferredTextImpl [prefix=dc; 
> name=language]
>
> 68
>
> Invalid array definition, expecting Alt and found 
> com.sun.org.apache.xerces.internal.dom.DeferredTextImpl [prefix=dc; 
> name=title]
>
> 49
>
> Cannot find a definition for the namespace http://www.sap.com
>
> 46
>
> Failed to instanciate property in exif:ColorSpace
>
> 33
>
> Failed to instanciate property in xmpMM:History
>
> 28
>
> xmp should start with a processing instruction
>
> 26
>
> Cannot find a definition for the namespace 
> http://prismstandard.org/namespaces/basic/2.0/
>
> 24
>
> Cannot find a definition for the namespace 
> http://www.npes.org/pdfx/ns/id/
>
> 21
>
> Cannot find a definition for the namespace 
> http://ns.InsiderSoftware.com/fontlist/1.0/
>
> 14
>
> Invalid array definition, expecting Seq and found 
> com.sun.org.apache.xerces.internal.dom.DeferredTextImpl [prefix=dc; 
> name=creator]
>
> 14
>
> Failed to instanciate property in xmp:MetadataDate
>
> 12
>
> Cannot find a definition for the namespace 
> http://ns.xinet.com/webnative/private/1.0/
>
> 10
>
> Failed to instanciate property in xap:ModifyDate
>
> 10
>
> Failed to instanciate property in xmp:ModifyDate
>
> 10
>
> Type 'params' not defined in 
> http://ns.adobe.com/xap/1.0/sType/ResourceEvent#
>
> 9
>
> Invalid array type, expecting Seq and found Bag [prefix=xmpMM; 
> name=History]
>
> 8
>
> Type 'documentName' not defined in 
> http://ns.adobe.com/xap/1.0/sType/ResourceRef#
>
> 8
>
> Cannot find a definition for the namespace http://www.day.com/dam/1.0
>
> 7
>
> Cannot find a definition for the namespace ptc
>
> 7
>
> Failed to instanciate property in xapMM:History
>
> 6
>
> Invalid array definition, expecting Seq and found 
> com.sun.org.apache.xerces.internal.dom.DeferredTextImpl [prefix=tiff; 
> name=YCbCrPositioning]
>
> 5
>
> Schema is not set in this document : http://purl.org/dc/elements/1.1/
>
> 5
>
> Cannot find a definition for the namespace 
> http://www.extensis.com/meta/FontSense/
>
> 4
>
> Excepted xpacket 'end' attribute (must be present and placed in first)
>
> 4
>
> Invalid array type, expecting Seq and found Bag [prefix=photoshop; 
> name=TextLayers]
>
> 3
>
> Schema is not set in this document : http://ns.adobe.com/xap/1.0/
>
> 3
>
> no message (NPE)
>
> 2
>
> Cannot find a definition for the namespace 
> http://laserfiche.com/xmp/schema/1.0/
>
> 2
>
> Cannot find a definition for the namespace 
> http://ns.adobe.com/AdobeFormsCentralWorkflow/1.0/
>
> 2
>
> Cannot find a definition for the namespace 
> http://ns.adobe.com/camera-raw-settings/1.0/
>
> 2
>
> Failed to instanciate property in xapRights:Marked
>
> 2
>
> Invalid array type, expecting Alt and found Bag [prefix=dc; 
> name=title]
>
> 2
>
> Invalid array type, expecting Alt and found Seq [prefix=dc; 
> name=title]
>
> 2
>
> Invalid array type, expecting Seq and found Alt [prefix=dc; 
> name=creator]
>
> 2
>
> Cannot find a definition for the namespace 
> http://ns.cambridgeassociates.com/status/1.0/
>
> 1
>
> Cannot find a definition for the namespace 
> http://ns.computershare.com.au/ccs/1.0/
>
> 1
>
> Cannot find a definition for the namespace 
> http://ns.esko-graphics.com/grinfo/1.0/
>
> 1
>
> Cannot find a definition for the namespace 
> http://ns.tripletriangle.com/ns/tripletri/
>
> 1
>
> Cannot find a definition for the namespace 
> http://prismstandard.org/namespaces/basic/2.1/
>
> 1
>
> Cannot find a definition for the namespace 
> http://www.aiim.org/pdfa/ns/id.html
>
> 1
>
> Cannot find a definition for the namespace 
> http://www.aiim.org/pdfe/ns/id/
>
> 1
>
> Cannot find a definition for the namespace 
> http://www.enfocus.com/ns/CertifiedPDF/2.0/
>
> 1
>
> Cannot find a definition for the namespace 
> http://www.northplains.com/xmpnps/cov/1.0/
>
> 1
>
> Failed to instanciate property in xmpRights:Marked
>
> 1
>
> Invalid array type, expecting Seq and found Bag [prefix=dc; name=date]
>
> 1
>
> This namespace is not a schema or a structured type : 
> http://ns.adobe.com/xap/1.0/sType/Job#
>
> 1
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional 
commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to