[
https://issues.apache.org/jira/browse/PDFBOX-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18047569#comment-18047569
]
Tilman Hausherr edited comment on PDFBOX-6133 at 12/26/25 5:03 AM:
-------------------------------------------------------------------
I was able to fix get this to work yesterday based on what I wrote initially
above:
- changed the flow of {{getSpecifiedPropertyType()}} so that a namespace URI
can be from a schema and from a pdfaType:type.
- added another mapping, I've named it {{definedStructuredNamespaces2}} for
now, it maps namespaces to a *list* of pdfaType:type.
- added another method {{getDefinedDescriptionByNamespace2}} for now, it uses
the new mapping to get the correct PropertiesDescription based on the namespace
and the property name / pdfaField:name.
- parsing in {{testPropertyNotDefined()}} now succeeds instead of failing
(PDFLib claims that it isn't valid), I'm able to retrieve the property. This is
weird but I'll accept that.
TODOs:
- make mass tests
- investigate whether it would be more useful to have
{{definedStructuredNamespaces}} with different content, e.g. namespace to
PropertiesDescription, e.g. have ONE PropertiesDescription despite that there
are TWO types? Maybe not, because what if they'd have different namespaces in
the future?!
- investigate whether {{getDefinedDescriptionByNamespace}} could just be
replaced by the content of the new method, after looking at the other usages
- maybe add some javadocs in the hope of clarify this a bit
- check that preflight tests in PDFBox 3 still work
- create another test now that testPropertyNotDefined() parses
was (Author: tilman):
I was able to fix get this to work yesterday based on what I wrote initially
above:
- changing the flow of {{getSpecifiedPropertyType()}} so that a namespace URI
can be from a schema and from a pdfaType:type.
- adding another mapping, I've named it {{definedStructuredNamespaces2}} for
now, it maps namespaces to a *list* of pdfaType:type.
- add another method {{getDefinedDescriptionByNamespace2}} for now, it uses the
new mapping to get the correct PropertiesDescription based on the namespace and
the property name / pdfaField:name.
- parsing in {{testPropertyNotDefined()}} now succeeds instead of failing
(PDFLib claims that it isn't valid), I'm able to retrieve the property. This is
weird but I'll accept that.
TODOs:
- make mass tests
- investigate whether it would be more useful to have
{{definedStructuredNamespaces}} with different content, e.g. namespace to
PropertiesDescription, e.g. have ONE PropertiesDescription despite that there
are TWO types? Maybe not, because what if they'd have different namespaces in
the future?!
- investigate whether {{getDefinedDescriptionByNamespace}} could just be
replaced by the content of the new method, after looking at the other usages
- maybe add some javadocs in the hope of clarify this a bit
- check that preflight tests in PDFBox 3 still work
- create another test now that testPropertyNotDefined() parses
> XmpParsingException: Property 'CountryCode' not defined in
> http://www.epo.org/patent-bibliographic-data/1.0/
> ------------------------------------------------------------------------------------------------------------
>
> Key: PDFBOX-6133
> URL: https://issues.apache.org/jira/browse/PDFBOX-6133
> Project: PDFBox
> Issue Type: Bug
> Components: XmpBox
> Reporter: Tilman Hausherr
> Priority: Major
> Attachments: PDFBOX-6131-0064638.pdf, PDFBOX-6131-0064638.xml
>
>
> The attached file, which I had thought to be a test case for PDFBOX-6131)
> validates with PDFLib and VeraPDF but fails in xmpbox. I suspect that this is
> because the URI is used several times.
> Some debug output in checkPropertyDefinition() for later:
> {noformat}
> t1 prop (prefix, local, ns) : patent, CountryCode,
> http://www.epo.org/patent-bibliographic-data/1.0/
> t1 isDefinedTypeNamespace : true
> t1 isDefinedSchema : true
> t1 isDefinedNamespace : true
> t1 containsNamespace : true
> {noformat}
> In {{getSpecifiedPropertyType()}}, {{getSchemaFactory()}} returns something,
> but {{factory.getPropertyType(name.getLocalPart())}} doesn't.
> Changing that doesn't make it better, because definedStructuredNamespaces has
> only 1 entry with the prefix "Bookmark", but there are more and the type is
> nested.
> {{tm.addToDefinedStructuredTypes()}} in {{populatePDFAType()}} is called
> twice, so the second overwrites the first:
> ttype: DocId, tns: http://www.epo.org/patent-bibliographic-data/1.0/, pm:
> [Number, CorrectionCode, CountryCode, KindCode, Date]
> ttype: Bookmark, tns: http://www.epo.org/patent-bibliographic-data/1.0/, pm:
> [StartPage, DocumentSection, NumberOfPages]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]