[jira] [Commented] (PDFBOX-5976) DomXmpParser incorrectly expects namespaces on attribute level

Tilman Hausherr (Jira) Wed, 31 Dec 2025 09:02:16 -0800


    [ 
https://issues.apache.org/jira/browse/PDFBOX-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18048589#comment-18048589
 ]


Tilman Hausherr commented on PDFBOX-5976:
-----------------------------------------

I've reverted the change in PDFBOX-6099 (but kept the test) because it is no 
longer needed. The real cause of the problem is fixed in PDFBOX-6099, which is 
that the namespace attributes in rdf:RDF were ignored. The change here wasn't 
wrong, but not optimal. My understanding of the code is better now. Many 
changes have been done in the last few weeks, and the parser has a very high 
test coverage. Make sure that you have a lot of tests on your side and run 
these tests regularly before releases.

> DomXmpParser incorrectly expects namespaces on attribute level
> --------------------------------------------------------------
>
>                 Key: PDFBOX-5976
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5976
>             Project: PDFBox
>          Issue Type: Bug
>          Components: XmpBox
>    Affects Versions: 2.0.33, 3.0.4 PDFBox
>            Reporter: Jochen Stärk
>            Assignee: Tilman Hausherr
>            Priority: Major
>              Labels: xml
>             Fix For: 2.0.34, 3.0.5 PDFBox, 4.0.0
>
>         Attachments: AN-10005_v28_2025-03-19-2.pdf, 
> AN-10005_v28_2025-03-19x-1.pdf
>
>
> When trying to determine the PDF-A-Version like 
> {{PDDocument document = null;}}
> {{try {}}
> {{document = Loader.loadPDF(new File("AN-10005_v28_2025-03-19.pdf"));}}
> {{PDDocumentCatalog catalog = document.getDocumentCatalog();}}
> {{PDMetadata metadata = catalog.getMetadata();}}
> {{DomXmpParser xmpParser = new DomXmpParser();}}
> {{XMPMetadata xmp = xmpParser.parse(metadata.createInputStream());}}
> {{PDFAIdentificationSchema pdfaSchema = xmp.getPDFAIdentificationSchema();}}
> {{if (pdfaSchema != null) {}}
> {{System.out.println("It's a PDF A-" + pdfaSchema.getPart());}}
> {{}}}
> {{document.close();}}
> {{} catch (XmpParsingException e) {}}
> {{e.printStackTrace();}}
> {{} catch (IOException e) {}}
> {{e.printStackTrace();}}
> {{}}}
> on the attached (and valid) PDF A-3b AN-10005_v28_2025-03-19-2.pdf, PDFBox
> incorrectly fails with a 
>  
> {{org.apache.xmpbox.xml.XmpParsingException: Schema is not set in this 
> document : http://www.aiim.org/pdfa/ns/id/}}
> {{    at 
> org.apache.xmpbox.xml.DomXmpParser.checkPropertyDefinition(DomXmpParser.java:920)}}
> {{    at 
> org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRootAttr(DomXmpParser.java:276)}}
> {{    at 
> org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRoot(DomXmpParser.java:247)}}
> {{    at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:201)}}
> {{    at de.usegroup.Main.main(Main.java:25)}}
>  
> After manipulating the metadata stream with itext RuPS from 
> {{<rdf:RDF xmlns:pdf="http://ns.adobe.com/pdf/1.3/"; 
> xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/"; 
> xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#";><rdf:Description 
> rdf:about="" pdfaid:part="3" pdfaid:conformance="B" /><rdf:Description 
> rdf:about="" pdf:Producer="WeasyPrint 64.1" /></rdf:RDF>}}
> to
> {{  <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#";>}}
> {{    <rdf:Description rdf:about=""}}
> {{        xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/"}}
> {{        xmlns:pdf="http://ns.adobe.com/pdf/1.3/"}}
> {{        xmlns:xmp="http://ns.adobe.com/xap/1.0/"}}
> {{      pdfaid:conformance="B"}}
> {{      pdfaid:part="3"}}
> {{      pdf:Producer="WeasyPrint 64.1; modified using iTextÂ® Core 7.2.5 
> (AGPL version) Â©2000-2023 iText Group NV"}}
> {{      xmp:ModifyDate="2025-03-21T08:16:58+01:00"/>}}
> {{  </rdf:RDF>}}
> putting the namespace definition in the rdf:Description 
> (AN-10005_v28_2025-03-19x-1.pdf) it works. 
> The issue is: it should be sufficient to put the namespace definitions in the 
> root element, "RDF", i.e. the first example should also work.
>  
> When searching for similar issues I had the impression this may be similar to 
> PDFBOX-2913.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (PDFBOX-5976) DomXmpParser incorrectly expects namespaces on attribute level

Reply via email to