[jira] [Updated] (TIKA-1109) Metadata not extracted before the content in OOXML (pptx)
[ https://issues.apache.org/jira/browse/TIKA-1109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Bonniot de Ruisselet updated TIKA-1109: -- Attachment: TIKA-1109.patch > Metadata not extracted before the content in OOXML (pptx) > - > > Key: TIKA-1109 > URL: https://issues.apache.org/jira/browse/TIKA-1109 > Project: Tika > Issue Type: Bug > Components: parser >Reporter: Daniel Bonniot de Ruisselet >Priority: Critical > Fix For: 1.5 > > Attachments: TIKA-1109.patch > > > It seems that when processing OOXML documents, the metadata is only read > after the text. This means it's impossible to use the medata while processing > the text. I think it would be more useful to have the metadata populated > first. > As a symptom: > java -jar tika-app-1.3.jar test-classes/test-documents/testPPT.pptx > outputs only as metadata: > > content="application/vnd.openxmlformats-officedocument.presentationml.presentation"/> > > while there is more medata in the file (e.g. Attachment > Test). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (TIKA-1109) Metadata not extracted before the content in OOXML (pptx)
[ https://issues.apache.org/jira/browse/TIKA-1109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Bonniot de Ruisselet updated TIKA-1109: -- Summary: Metadata not extracted before the content in OOXML (pptx) (was: Metadata not extracted before the context in OOXML (pptx)) > Metadata not extracted before the content in OOXML (pptx) > - > > Key: TIKA-1109 > URL: https://issues.apache.org/jira/browse/TIKA-1109 > Project: Tika > Issue Type: Bug > Components: parser >Reporter: Daniel Bonniot de Ruisselet >Priority: Critical > Fix For: 1.5 > > > It seems that when processing OOXML documents, the metadata is only read > after the text. This means it's impossible to use the medata while processing > the text. I think it would be more useful to have the metadata populated > first. > As a symptom: > java -jar tika-app-1.3.jar test-classes/test-documents/testPPT.pptx > outputs only as metadata: > > content="application/vnd.openxmlformats-officedocument.presentationml.presentation"/> > > while there is more medata in the file (e.g. Attachment > Test). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira