[ 
https://issues.apache.org/jira/browse/PDFBOX-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18047569#comment-18047569
 ] 

Tilman Hausherr edited comment on PDFBOX-6133 at 12/26/25 5:02 AM:
-------------------------------------------------------------------

I was able to fix get this to work yesterday based on what I wrote initially 
above:
- changing the flow of {{getSpecifiedPropertyType()}} so that a namespace URI 
can be from a schema and from a pdfaType:type.
- adding another mapping, I've named it {{definedStructuredNamespaces2}} for 
now, it maps namespaces to a *list* of pdfaType:type.
- add another method {{getDefinedDescriptionByNamespace2}} for now, it uses the 
new mapping to get the correct PropertiesDescription based on the namespace and 
the property name / pdfaField:name.
- parsing in {{testPropertyNotDefined()}} now succeeds instead of failing 
(PDFLib claims that it isn't valid), I'm able to retrieve the property. This is 
weird but I'll accept that.

TODOs:
- make mass tests
- investigate whether it would be more useful to have 
{{definedStructuredNamespaces}} with different content, e.g. namespace to 
PropertiesDescription, e.g. have ONE PropertiesDescription despite that there 
are TWO types? Maybe not, because what if they'd have different namespaces in 
the future?!
- investigate whether {{getDefinedDescriptionByNamespace}} could just be 
replaced by the content of the new method, after looking at the other usages
- maybe add some javadocs in the hope of clarify this a bit
- check that preflight tests in PDFBox 3 still work
- create another test now that testPropertyNotDefined() parses


was (Author: tilman):
I was able to fix get this to work yesterday based on what I wrote initially 
above:
- changing the flow of {{getSpecifiedPropertyType()}} so that a namespace URI 
can be from a schema and from a pdfaType:type.
- adding another mapping, I've named it {{definedStructuredNamespaces2}} for 
now, it maps namespaces to a *list* of pdfaType:type.
- add another method {{getDefinedDescriptionByNamespace2}} for now, it uses the 
new mapping to get the correct PropertiesDescription based on the namespace and 
the property name / pdfaField:name.

TODOs:
- make mass tests
- investigate whether it would be more useful to have 
{{definedStructuredNamespaces}} with different content, e.g. namespace to 
PropertiesDescription, e.g. have ONE PropertiesDescription despite that there 
are TWO types? Maybe not, because what if they'd have different namespaces in 
the future?!
- investigate whether {{getDefinedDescriptionByNamespace}} could just be 
replaced by the content of the new method, after looking at the other usages
- maybe add some javadocs in the hope of clarify this a bit

> XmpParsingException: Property 'CountryCode' not defined in 
> http://www.epo.org/patent-bibliographic-data/1.0/
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-6133
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-6133
>             Project: PDFBox
>          Issue Type: Bug
>          Components: XmpBox
>            Reporter: Tilman Hausherr
>            Priority: Major
>         Attachments: PDFBOX-6131-0064638.pdf, PDFBOX-6131-0064638.xml
>
>
> The attached file, which I had thought to be a test case for PDFBOX-6131) 
> validates with PDFLib and VeraPDF but fails in xmpbox. I suspect that this is 
> because the URI is used several times.
> Some debug output in checkPropertyDefinition() for later:
> {noformat}
> t1 prop (prefix, local, ns) : patent, CountryCode, 
> http://www.epo.org/patent-bibliographic-data/1.0/
> t1 isDefinedTypeNamespace   : true
> t1 isDefinedSchema          : true
> t1 isDefinedNamespace       : true
> t1 containsNamespace        : true
> {noformat}
> In {{getSpecifiedPropertyType()}}, {{getSchemaFactory()}} returns something, 
> but {{factory.getPropertyType(name.getLocalPart())}} doesn't.
> Changing that doesn't make it better, because definedStructuredNamespaces has 
> only 1 entry with the prefix "Bookmark", but there are more and the type is 
> nested.
> {{tm.addToDefinedStructuredTypes()}} in {{populatePDFAType()}} is called 
> twice, so the second overwrites the first:
> ttype: DocId, tns: http://www.epo.org/patent-bibliographic-data/1.0/, pm: 
> [Number, CorrectionCode, CountryCode, KindCode, Date]
> ttype: Bookmark, tns: http://www.epo.org/patent-bibliographic-data/1.0/, pm: 
> [StartPage, DocumentSection, NumberOfPages]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to