[ https://issues.apache.org/jira/browse/TIKA-1109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris A. Mattmann updated TIKA-1109: ------------------------------------ Fix Version/s: (was: 1.4) 1.5 - push to 1.5, get ready for 1.4 RC #1. > Metadata not extracted before the context in OOXML (pptx) > --------------------------------------------------------- > > Key: TIKA-1109 > URL: https://issues.apache.org/jira/browse/TIKA-1109 > Project: Tika > Issue Type: Bug > Components: parser > Reporter: Daniel Bonniot de Ruisselet > Priority: Critical > Fix For: 1.5 > > > It seems that when processing OOXML documents, the metadata is only read > after the text. This means it's impossible to use the medata while processing > the text. I think it would be more useful to have the metadata populated > first. > As a symptom: > java -jar tika-app-1.3.jar test-classes/test-documents/testPPT.pptx > outputs only as metadata: > <meta name="Content-Length" content="36518"/> > <meta name="Content-Type" > content="application/vnd.openxmlformats-officedocument.presentationml.presentation"/> > <meta name="resourceName" content="testPPT.pptx"/> > while there is more medata in the file (e.g. <dc:title>Attachment > Test</dc:title>). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira