[ 
https://issues.apache.org/jira/browse/PDFBOX-5835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17854315#comment-17854315
 ] 

Maruan Sahyoun commented on PDFBOX-5835:
----------------------------------------

I think the main problem is that from the naming DomXmpParser one would expect 
that it parses all valid XMP but the lib is really targeted to parse PDF/A-1 
XMP with a specific definition of what a valid PDF/A-1 XMP should look like. 
There have been some revisions/clarifications done to PDF/A-1 XMP for which I 
don't know if the lib has been revised to handle these.

> DomXmpParser - IllegalArgumentException: prefix cannot be "null" when 
> creating a QName
> --------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-5835
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5835
>             Project: PDFBox
>          Issue Type: Bug
>          Components: XmpBox
>    Affects Versions: 3.0.2 PDFBox
>            Reporter: Oliver Schmidtmer
>            Priority: Major
>
> I've got a PDF from, where parsing the metadata fails with an 
> IllegalArgumentException
> {code:java}
> java.lang.IllegalArgumentException: prefix cannot be "null" when creating a 
> QName
>       at java.xml/javax.xml.namespace.QName.<init>(QName.java:192)
>       at org.apache.xmpbox.xml.DomHelper.getQName(DomHelper.java:99)
>       at 
> org.apache.xmpbox.xml.DomXmpParser.parseChildrenAsProperties(DomXmpParser.java:306)
>       at 
> org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRoot(DomXmpParser.java:250)
>       at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:201)
>       at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:112)
> {code}
> This can be reproduced with a simple test, using the extracted metadata:
> {code:java}
>     @Test
>     void testDomXmpParser() throws XmpParsingException
>     {
>         // taken from file test-landscape2.pdf
>         String xmpmeta = "<?xml version=\"1.0\" encoding=\"UTF-8\" 
> standalone=\"no\"?>\n" +
>                 "<?xpacket begin=\"\uFEFF\" 
> id=\"W5M0MpCehiHzreSzNTczkc9d\"?><x:xmpmeta xmlns:x=\"adobe:ns:meta/\" 
> x:xmptk=\"FIS/xee\">\n" +
>                 " <rdf:RDF 
> xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\";>\n" +
>                 " <rdf:Description 
> xmlns:pdfaid=\"http://www.aiim.org/pdfa/ns/id/\";>\n" +
>                 "   <pdfaid:part>3</pdfaid:part>\n" +
>                 "   <pdfaid:conformance>A</pdfaid:conformance>\n" +
>                 "  </rdf:Description>\n" +
>                 "  <rdf:Description 
> xmlns:pdfaExtension=\"http://www.aiim.org/pdfa/ns/extension/\"; 
> xmlns:pdfaField=\"http://www.aiim.org/pdfa/ns/field#\"; 
> xmlns:pdfaProperty=\"http://www.aiim.org/pdfa/ns/property#\"; 
> xmlns:pdfaSchema=\"http://www.aiim.org/pdfa/ns/schema#\"; 
> xmlns:pdfaType=\"http://www.aiim.org/pdfa/ns/type#\"; rdf:about=\"\"/>\n" +
>                 "  <rdf:Description>\n" +
>                 "   <schemas 
> xmlns=\"http://www.aiim.org/pdfa/ns/extension/\";>\n" +
>                 "    <rdf:Bag>\n" +
>                 "     <rdf:li rdf:parseType=\"Resource\">\n" +
>                 "      <schema 
> xmlns=\"http://www.aiim.org/pdfa/ns/schema#\";>ZUGFeRD PDFA Extension 
> Schema</schema>\n" +
>                 "      <namespaceURI 
> xmlns=\"http://www.aiim.org/pdfa/ns/schema#\";>urn:ferd:pdfa:CrossIndustryDocument:invoice:1p0#</namespaceURI>\n"
>  +
>                 "      <prefix 
> xmlns=\"http://www.aiim.org/pdfa/ns/schema#\";>zf</prefix>\n" +
>                 "      <property 
> xmlns=\"http://www.aiim.org/pdfa/ns/schema#\";>\n" +
>                 "       <rdf:Seq>\n" +
>                 "        <rdf:li rdf:parseType=\"Resource\">\n" +
>                 "         <name 
> xmlns=\"http://www.aiim.org/pdfa/ns/property#\";>DocumentFileName</name>\n" +
>                 "         <valueType 
> xmlns=\"http://www.aiim.org/pdfa/ns/property#\";>Text</valueType>\n" +
>                 "         <category 
> xmlns=\"http://www.aiim.org/pdfa/ns/property#\";>external</category>\n" +
>                 "         <description 
> xmlns=\"http://www.aiim.org/pdfa/ns/property#\";>name of the embedded XML 
> invoice file</description>\n" +
>                 "        </rdf:li>\n" +
>                 "        <rdf:li rdf:parseType=\"Resource\">\n" +
>                 "         <name 
> xmlns=\"http://www.aiim.org/pdfa/ns/property#\";>DocumentType</name>\n" +
>                 "         <valueType 
> xmlns=\"http://www.aiim.org/pdfa/ns/property#\";>Text</valueType>\n" +
>                 "         <category 
> xmlns=\"http://www.aiim.org/pdfa/ns/property#\";>external</category>\n" +
>                 "         <description 
> xmlns=\"http://www.aiim.org/pdfa/ns/property#\";>INVOICE</description>\n" +
>                 "        </rdf:li>\n" +
>                 "        <rdf:li rdf:parseType=\"Resource\">\n" +
>                 "         <name 
> xmlns=\"http://www.aiim.org/pdfa/ns/property#\";>Version</name>\n" +
>                 "         <valueType 
> xmlns=\"http://www.aiim.org/pdfa/ns/property#\";>Text</valueType>\n" +
>                 "         <category 
> xmlns=\"http://www.aiim.org/pdfa/ns/property#\";>external</category>\n" +
>                 "         <description 
> xmlns=\"http://www.aiim.org/pdfa/ns/property#\";>The actual version of the 
> ZUGFeRD data</description>\n" +
>                 "        </rdf:li>\n" +
>                 "        <rdf:li rdf:parseType=\"Resource\">\n" +
>                 "         <name 
> xmlns=\"http://www.aiim.org/pdfa/ns/property#\";>ConformanceLevel</name>\n" +
>                 "         <valueType 
> xmlns=\"http://www.aiim.org/pdfa/ns/property#\";>Text</valueType>\n" +
>                 "         <category 
> xmlns=\"http://www.aiim.org/pdfa/ns/property#\";>external</category>\n" +
>                 "         <description 
> xmlns=\"http://www.aiim.org/pdfa/ns/property#\";>The conformance level of the 
> ZUGFeRD data</description>\n" +
>                 "        </rdf:li>\n" +
>                 "       </rdf:Seq>\n" +
>                 "      </property>\n" +
>                 "     </rdf:li>\n" +
>                 "    </rdf:Bag>\n" +
>                 "   </schemas>\n" +
>                 "  </rdf:Description>\n" +
>                 "  <rdf:Description 
> xmlns:zf=\"urn:ferd:pdfa:CrossIndustryDocument:invoice:1p0#\" rdf:about=\"\" 
> zf:ConformanceLevel=\"EXTENDED\" zf:DocumentFileName=\"ZUGFeRD-invoice.xml\" 
> zf:DocumentType=\"INVOICE\" zf:Version=\"1.0\"/>\n" +
>                 " </rdf:RDF>\n" +
>                 "</x:xmpmeta><?xpacket end=\"w\"?>\n";
>         DomXmpParser xmpParser = new DomXmpParser();
>         xmpParser.setStrictParsing(false);
>         XMPMetadata xmp = xmpParser.parse(xmpmeta.getBytes());
>     }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to