Hi all

As a third option: What about the BSD-licensed Adobe XMP Toolkit? At least verapdf seems to use a fork it: https://github.com/veraPDF/veraPDF-xmp

Cheers, beat


Am 07.03.2016 um 19:31 schrieb Allison, Timothy B.:
All,



   When we migrate to PDFBox 2.x  over on Tika, I'd much prefer to switch from 
our current reliance on jempbox to XMPBox.  I recently extracted ~70k XMPs from 
PDFs with PDFBox 2.0.0-SNAPSHOT, and when I ran XMPBox's parser, there were 
exceptions on roughly 40% of the XMPs.



   I’m including a table below of the counts of exception messages.  Are there 
any plans to make XMPBox more lenient or is this what we can expect going 
forward?



   As always, I’m more than happy to help with files and tests.  Let me know 
what I can do.



              Cheers,



                       Tim



No XmpParsingException on 42,022 files.







Exceptions:


Cannot find a definition for the namespace http://ns.adobe.com/pdfx/1.3/

13403

Type 'originalDocumentID' not defined in 
http://ns.adobe.com/xap/1.0/sType/ResourceRef#

3710

Missing pdfaSchema:property in type definition

3113

Expecting namespace 'adobe:ns:meta/' and found 
'http://www.w3.org/1999/02/22-rdf-syntax-ns#'

2867

Invalid array type, expecting Seq and found Bag [prefix=dc; name=creator]

927

Invalid array type, expecting Alt and found Seq [prefix=dc; name=description]

723

Cannot find a definition for the namespace 
http://ns.adobe.com/xmp/InDesign/private

710

Invalid array type, expecting Bag and found Seq [prefix=dc; name=subject]

654

Cannot find a definition for the namespace 
http://ns.adobe.com/AcrobatAdhocWorkflow/1.0/

522

Failed to parse

492

Invalid array definition, expecting Seq and found 
com.sun.org.apache.xerces.internal.dom.DeferredTextImpl [prefix=dc; name=date]

370

Cannot find a definition for the namespace http://ns.adobe.com/illustrator/1.0/

262

Cannot find a definition for the namespace 
http://ns.adobe.com/xfa/promoted-desc/

188

Failed to instanciate property in xmp:CreateDate

144

Schema is not set in this document : http://www.w3.org/1999/02/22-rdf-syntax-ns#

125

Expecting local name 'xmpmeta' and found 'xapmeta'

94

Cannot find a definition for the namespace http://www.rwjf.org/rwjf/1.0

84

Failed to instanciate property in xap:CreateDate

74

Invalid array definition, expecting Bag and found 
com.sun.org.apache.xerces.internal.dom.DeferredTextImpl [prefix=dc; 
name=language]

68

Invalid array definition, expecting Alt and found 
com.sun.org.apache.xerces.internal.dom.DeferredTextImpl [prefix=dc; name=title]

49

Cannot find a definition for the namespace http://www.sap.com

46

Failed to instanciate property in exif:ColorSpace

33

Failed to instanciate property in xmpMM:History

28

xmp should start with a processing instruction

26

Cannot find a definition for the namespace 
http://prismstandard.org/namespaces/basic/2.0/

24

Cannot find a definition for the namespace http://www.npes.org/pdfx/ns/id/

21

Cannot find a definition for the namespace 
http://ns.InsiderSoftware.com/fontlist/1.0/

14

Invalid array definition, expecting Seq and found 
com.sun.org.apache.xerces.internal.dom.DeferredTextImpl [prefix=dc; 
name=creator]

14

Failed to instanciate property in xmp:MetadataDate

12

Cannot find a definition for the namespace 
http://ns.xinet.com/webnative/private/1.0/

10

Failed to instanciate property in xap:ModifyDate

10

Failed to instanciate property in xmp:ModifyDate

10

Type 'params' not defined in http://ns.adobe.com/xap/1.0/sType/ResourceEvent#

9

Invalid array type, expecting Seq and found Bag [prefix=xmpMM; name=History]

8

Type 'documentName' not defined in 
http://ns.adobe.com/xap/1.0/sType/ResourceRef#

8

Cannot find a definition for the namespace http://www.day.com/dam/1.0

7

Cannot find a definition for the namespace ptc

7

Failed to instanciate property in xapMM:History

6

Invalid array definition, expecting Seq and found 
com.sun.org.apache.xerces.internal.dom.DeferredTextImpl [prefix=tiff; 
name=YCbCrPositioning]

5

Schema is not set in this document : http://purl.org/dc/elements/1.1/

5

Cannot find a definition for the namespace 
http://www.extensis.com/meta/FontSense/

4

Excepted xpacket 'end' attribute (must be present and placed in first)

4

Invalid array type, expecting Seq and found Bag [prefix=photoshop; 
name=TextLayers]

3

Schema is not set in this document : http://ns.adobe.com/xap/1.0/

3

no message (NPE)

2

Cannot find a definition for the namespace http://laserfiche.com/xmp/schema/1.0/

2

Cannot find a definition for the namespace 
http://ns.adobe.com/AdobeFormsCentralWorkflow/1.0/

2

Cannot find a definition for the namespace 
http://ns.adobe.com/camera-raw-settings/1.0/

2

Failed to instanciate property in xapRights:Marked

2

Invalid array type, expecting Alt and found Bag [prefix=dc; name=title]

2

Invalid array type, expecting Alt and found Seq [prefix=dc; name=title]

2

Invalid array type, expecting Seq and found Alt [prefix=dc; name=creator]

2

Cannot find a definition for the namespace 
http://ns.cambridgeassociates.com/status/1.0/

1

Cannot find a definition for the namespace 
http://ns.computershare.com.au/ccs/1.0/

1

Cannot find a definition for the namespace 
http://ns.esko-graphics.com/grinfo/1.0/

1

Cannot find a definition for the namespace 
http://ns.tripletriangle.com/ns/tripletri/

1

Cannot find a definition for the namespace 
http://prismstandard.org/namespaces/basic/2.1/

1

Cannot find a definition for the namespace http://www.aiim.org/pdfa/ns/id.html

1

Cannot find a definition for the namespace http://www.aiim.org/pdfe/ns/id/

1

Cannot find a definition for the namespace 
http://www.enfocus.com/ns/CertifiedPDF/2.0/

1

Cannot find a definition for the namespace 
http://www.northplains.com/xmpnps/cov/1.0/

1

Failed to instanciate property in xmpRights:Marked

1

Invalid array type, expecting Seq and found Bag [prefix=dc; name=date]

1

This namespace is not a schema or a structured type : 
http://ns.adobe.com/xap/1.0/sType/Job#

1





---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to