[ https://issues.apache.org/jira/browse/PDFBOX-5835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17853759#comment-17853759 ]
Tilman Hausherr commented on PDFBOX-5835: ----------------------------------------- I can fix avoid the IllegalArgumentException but now you'll get XmpParsingException: Schema is not set in this document : http://www.aiim.org/pdfa/ns/extension/ which is a 9 year old unfixed bug (PDFBOX-2913). Would this be helpful? > DomXmpParser - IllegalArgumentException: prefix cannot be "null" when > creating a QName > -------------------------------------------------------------------------------------- > > Key: PDFBOX-5835 > URL: https://issues.apache.org/jira/browse/PDFBOX-5835 > Project: PDFBox > Issue Type: Bug > Components: XmpBox > Affects Versions: 3.0.2 PDFBox > Reporter: Oliver Schmidtmer > Priority: Major > > I've got a PDF from, where parsing the metadata fails with an > IllegalArgumentException > {code:java} > java.lang.IllegalArgumentException: prefix cannot be "null" when creating a > QName > at java.xml/javax.xml.namespace.QName.<init>(QName.java:192) > at org.apache.xmpbox.xml.DomHelper.getQName(DomHelper.java:99) > at > org.apache.xmpbox.xml.DomXmpParser.parseChildrenAsProperties(DomXmpParser.java:306) > at > org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRoot(DomXmpParser.java:250) > at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:201) > at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:112) > {code} > This can be reproduced with a simple test, using the extracted metadata: > {code:java} > @Test > void testDomXmpParser() throws XmpParsingException > { > // taken from file test-landscape2.pdf > String xmpmeta = "<?xml version=\"1.0\" encoding=\"UTF-8\" > standalone=\"no\"?>\n" + > "<?xpacket begin=\"\uFEFF\" > id=\"W5M0MpCehiHzreSzNTczkc9d\"?><x:xmpmeta xmlns:x=\"adobe:ns:meta/\" > x:xmptk=\"FIS/xee\">\n" + > " <rdf:RDF > xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\">\n" + > " <rdf:Description > xmlns:pdfaid=\"http://www.aiim.org/pdfa/ns/id/\">\n" + > " <pdfaid:part>3</pdfaid:part>\n" + > " <pdfaid:conformance>A</pdfaid:conformance>\n" + > " </rdf:Description>\n" + > " <rdf:Description > xmlns:pdfaExtension=\"http://www.aiim.org/pdfa/ns/extension/\" > xmlns:pdfaField=\"http://www.aiim.org/pdfa/ns/field#\" > xmlns:pdfaProperty=\"http://www.aiim.org/pdfa/ns/property#\" > xmlns:pdfaSchema=\"http://www.aiim.org/pdfa/ns/schema#\" > xmlns:pdfaType=\"http://www.aiim.org/pdfa/ns/type#\" rdf:about=\"\"/>\n" + > " <rdf:Description>\n" + > " <schemas > xmlns=\"http://www.aiim.org/pdfa/ns/extension/\">\n" + > " <rdf:Bag>\n" + > " <rdf:li rdf:parseType=\"Resource\">\n" + > " <schema > xmlns=\"http://www.aiim.org/pdfa/ns/schema#\">ZUGFeRD PDFA Extension > Schema</schema>\n" + > " <namespaceURI > xmlns=\"http://www.aiim.org/pdfa/ns/schema#\">urn:ferd:pdfa:CrossIndustryDocument:invoice:1p0#</namespaceURI>\n" > + > " <prefix > xmlns=\"http://www.aiim.org/pdfa/ns/schema#\">zf</prefix>\n" + > " <property > xmlns=\"http://www.aiim.org/pdfa/ns/schema#\">\n" + > " <rdf:Seq>\n" + > " <rdf:li rdf:parseType=\"Resource\">\n" + > " <name > xmlns=\"http://www.aiim.org/pdfa/ns/property#\">DocumentFileName</name>\n" + > " <valueType > xmlns=\"http://www.aiim.org/pdfa/ns/property#\">Text</valueType>\n" + > " <category > xmlns=\"http://www.aiim.org/pdfa/ns/property#\">external</category>\n" + > " <description > xmlns=\"http://www.aiim.org/pdfa/ns/property#\">name of the embedded XML > invoice file</description>\n" + > " </rdf:li>\n" + > " <rdf:li rdf:parseType=\"Resource\">\n" + > " <name > xmlns=\"http://www.aiim.org/pdfa/ns/property#\">DocumentType</name>\n" + > " <valueType > xmlns=\"http://www.aiim.org/pdfa/ns/property#\">Text</valueType>\n" + > " <category > xmlns=\"http://www.aiim.org/pdfa/ns/property#\">external</category>\n" + > " <description > xmlns=\"http://www.aiim.org/pdfa/ns/property#\">INVOICE</description>\n" + > " </rdf:li>\n" + > " <rdf:li rdf:parseType=\"Resource\">\n" + > " <name > xmlns=\"http://www.aiim.org/pdfa/ns/property#\">Version</name>\n" + > " <valueType > xmlns=\"http://www.aiim.org/pdfa/ns/property#\">Text</valueType>\n" + > " <category > xmlns=\"http://www.aiim.org/pdfa/ns/property#\">external</category>\n" + > " <description > xmlns=\"http://www.aiim.org/pdfa/ns/property#\">The actual version of the > ZUGFeRD data</description>\n" + > " </rdf:li>\n" + > " <rdf:li rdf:parseType=\"Resource\">\n" + > " <name > xmlns=\"http://www.aiim.org/pdfa/ns/property#\">ConformanceLevel</name>\n" + > " <valueType > xmlns=\"http://www.aiim.org/pdfa/ns/property#\">Text</valueType>\n" + > " <category > xmlns=\"http://www.aiim.org/pdfa/ns/property#\">external</category>\n" + > " <description > xmlns=\"http://www.aiim.org/pdfa/ns/property#\">The conformance level of the > ZUGFeRD data</description>\n" + > " </rdf:li>\n" + > " </rdf:Seq>\n" + > " </property>\n" + > " </rdf:li>\n" + > " </rdf:Bag>\n" + > " </schemas>\n" + > " </rdf:Description>\n" + > " <rdf:Description > xmlns:zf=\"urn:ferd:pdfa:CrossIndustryDocument:invoice:1p0#\" rdf:about=\"\" > zf:ConformanceLevel=\"EXTENDED\" zf:DocumentFileName=\"ZUGFeRD-invoice.xml\" > zf:DocumentType=\"INVOICE\" zf:Version=\"1.0\"/>\n" + > " </rdf:RDF>\n" + > "</x:xmpmeta><?xpacket end=\"w\"?>\n"; > DomXmpParser xmpParser = new DomXmpParser(); > xmpParser.setStrictParsing(false); > XMPMetadata xmp = xmpParser.parse(xmpmeta.getBytes()); > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org