Dear list members,
I need to create PDF/A-3B compliant documents with special references to an
embedded document (ZUGFeRD compliant invoices). Currently, this fails in
VeraPDF with some very vague and rather misleading validation errors:
* "Specification: ISO 19005-3:2012, Clause: 6.6.2.1, Test number: 5
All metadata streams present in the PDF shall conform to the XMP Specification.
The XMP package must be encoded as UTF-8"
* "Specification: ISO 19005-3:2012, Clause: 6.6.4, Test number: 1
The PDF/A version and conformance level of a file shall be specified using the
PDF/A Identification extension schema"
* "Specification: ISO 19005-3:2012, Clause: 6.6.2.1, Test number: 4
All metadata streams present in the PDF shall conform to the XMP Specification.
All content of all XMP packets shall be well-formed, as defined by Extensible
Markup Language (XML) 1.0 (Third Edition), 2.1, and the RDF/XML Syntax
Specification (Revised)"
I finally tracked this down to FOP, which dropped some necessary namespaces in
the created XMP stream. This can be reproduced with the corresponding example
from the FOP homepage (https://xmlgraphics.apache.org/fop/2.10/metadata.html).
E.g. here is the fo:
<fo:simple-page-master master-name="simple">
<fo:region-body/>
<pdf:page page-numbers="*">
<x:xmpmeta xmlns:x="adobe:ns:meta/">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:abc="http://www.abc.de/abc/">
<rdf:Description rdf:about="" abc:def="val"/>
<rdf:Description rdf:about=""
xmlns:pdfaExtension="http://www.aiim.org/pdfa/ns/extension/"
xmlns:pdfaSchema="http://www.aiim.org/pdfa/ns/schema#"
xmlns:pdfaProperty="http://www.aiim.org/pdfa/ns/property#">
<pdfaExtension:schemas>
<rdf:Bag>
<rdf:li rdf:parseType="Resource">
<pdfaSchema:property>
<rdf:Seq>
<rdf:li rdf:parseType="Resource">
<pdfaProperty:name>split</pdfaProperty:name>
</rdf:li>
</rdf:Seq>
</pdfaSchema:property>
</rdf:li>
</rdf:Bag>
</pdfaExtension:schemas>
</rdf:Description>
</rdf:RDF>
</x:xmpmeta>
</pdf:page>
</fo:simple-page-master>
And this is returned when I run pdfinfo -meta example.pdf:
<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?><x:xmpmeta
xmlns:x="adobe:ns:meta/">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:RDF xmlns:abc="http://www.abc.de/abc/" abc:def="val" rdf:about=""/>
<rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" rdf:about="">
<dc:format>application/pdf</dc:format>
<dc:language>
<rdf:Bag>
<rdf:li>x-unknown</rdf:li>
</rdf:Bag>
</dc:language>
<dc:date>
<rdf:Seq>
<rdf:li>2024-11-05T11:46:27+01:00</rdf:li>
</rdf:Seq>
</dc:date>
</rdf:RDF>
<rdf:RDF xmlns:pdf="http://ns.adobe.com/pdf/1.3/" rdf:about="">
<pdf:Producer>Apache FOP Version 2.10</pdf:Producer>
<pdf:PDFVersion>1.4</pdf:PDFVersion>
</rdf:RDF>
<rdf:RDF xmlns:pdfaExtension="http://www.aiim.org/pdfa/ns/extension/"
rdf:about="">
<pdfaExtension:schemas>
<rdf:Bag>
<rdf:li rdf:parseType="Resource">
<pdfaSchema:property>
<rdf:Seq>
<rdf:li rdf:parseType="Resource">
<pdfaProperty:name>split</pdfaProperty:name>
</rdf:li>
</rdf:Seq>
</pdfaSchema:property>
</rdf:li>
</rdf:Bag>
</pdfaExtension:schemas>
</rdf:RDF>
<rdf:RDF xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/" rdf:about="">
<pdfaid:conformance>B</pdfaid:conformance>
<pdfaid:part>3</pdfaid:part>
</rdf:RDF>
<rdf:RDF xmlns:xmp="http://ns.adobe.com/xap/1.0/" rdf:about="">
<xmp:MetadataDate>2024-11-05T11:46:27+01:00</xmp:MetadataDate>
<xmp:CreateDate>2024-11-05T11:46:27+01:00</xmp:CreateDate>
</rdf:RDF>
</rdf:RDF>
</x:xmpmeta>
<?xpacket end="r"?>
As you can see, the two namespaces
xmlns:pdfaSchema="http://www.aiim.org/pdfa/ns/schema#" and
xmlns:pdfaProperty="http://www.aiim.org/pdfa/ns/property#"
are dropped from the metadata, while the prefixes are still used. Thus, the XML
is invalid and so is the PDF/A.
This is pretty unfortunate, as I don't have any workaround for this. I am using
Apache FOP 2.10 in the context of Apache Camel 4.8.1. Any help would be greatly
appreciated.
With kind regards,
Jörn Willhöft