[ https://issues.apache.org/jira/browse/TIKA-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13895528#comment-13895528 ]
Thomas Ledoux commented on TIKA-1232: ------------------------------------- Regarding XMP ouput from tika and the inclusion of version, in the case of PDF, special ontologies are defined. Namely, in the http://wwwns.adobe.com/pdf/1.3/ namespace, there is a pdf:PDFVersion property. It can even be refined in the case of PDF/A where the conformance level can be given using the http://www.aiim.org/pdfa/ns/id/ namespace in the property pdfaid:conformance (see TN0008). There are similar properties pdfx:GTS_PDFXVersion and pdfx:GTS_PDFXConformance in the http://ns.adobe.com/pdfx/1.3 namespace for PDF/X files. However, all these properties are only available for PDF formats and will break the idea of having a generic metadata map exposed by tika. So I agree with Andrew proposal of using a "version" parameter in the mimetype, which is allowed in XMP. Indeed, the XMP definition of the value of dc:format is a MIMEType following IETF RFC 2045 section 5.1. Finally, in order to prevent the confusion of client code that Andrew raises, we could take advantage of the repeatability of the dc:format attribute and output 2 dc:formats : the first being the "normal" Content-Type and the second being the Extended-Content-Type. > Add PDF version to PDFParser output > ----------------------------------- > > Key: TIKA-1232 > URL: https://issues.apache.org/jira/browse/TIKA-1232 > Project: Tika > Issue Type: Improvement > Components: parser > Affects Versions: 1.5 > Environment: JDK6 > Reporter: William Palmer > Assignee: Tim Allison > Priority: Minor > Attachments: pdfversion.patch > > > I'd like to identify the PDF version of files, this is not currently reported > by the PDFParser although the information is available via PDFBox. I have > attached a patch that adds the format version to the Metadata object. > However, I am not familiar enough with the Tika source to know if an > alternative metadata key should be used, or this new one added. > Comments welcome. -- This message was sent by Atlassian JIRA (v6.1.5#6160)