[
https://issues.apache.org/jira/browse/TIKA-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13895528#comment-13895528
]
Thomas Ledoux commented on TIKA-1232:
-------------------------------------
Regarding XMP ouput from tika and the inclusion of version, in the case of PDF,
special ontologies are defined.
Namely, in the http://wwwns.adobe.com/pdf/1.3/ namespace, there is a
pdf:PDFVersion property.
It can even be refined in the case of PDF/A where the conformance level can be
given using the http://www.aiim.org/pdfa/ns/id/ namespace in the property
pdfaid:conformance (see TN0008). There are similar properties
pdfx:GTS_PDFXVersion and pdfx:GTS_PDFXConformance in the
http://ns.adobe.com/pdfx/1.3 namespace for PDF/X files.
However, all these properties are only available for PDF formats and will break
the idea of having a generic metadata map exposed by tika.
So I agree with Andrew proposal of using a "version" parameter in the mimetype,
which is allowed in XMP.
Indeed, the XMP definition of the value of dc:format is a MIMEType following
IETF RFC 2045 section 5.1.
Finally, in order to prevent the confusion of client code that Andrew raises,
we could take advantage of the repeatability of the dc:format attribute and
output 2 dc:formats : the first being the "normal" Content-Type and the second
being the Extended-Content-Type.
> Add PDF version to PDFParser output
> -----------------------------------
>
> Key: TIKA-1232
> URL: https://issues.apache.org/jira/browse/TIKA-1232
> Project: Tika
> Issue Type: Improvement
> Components: parser
> Affects Versions: 1.5
> Environment: JDK6
> Reporter: William Palmer
> Assignee: Tim Allison
> Priority: Minor
> Attachments: pdfversion.patch
>
>
> I'd like to identify the PDF version of files, this is not currently reported
> by the PDFParser although the information is available via PDFBox. I have
> attached a patch that adds the format version to the Metadata object.
> However, I am not familiar enough with the Tika source to know if an
> alternative metadata key should be used, or this new one added.
> Comments welcome.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)