[ https://issues.apache.org/jira/browse/TIKA-1840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris A. Mattmann updated TIKA-1840: ------------------------------------ Fix Version/s: (was: 1.15) 1.16 > No way to link slide notes to slide in PPT output. > -------------------------------------------------- > > Key: TIKA-1840 > URL: https://issues.apache.org/jira/browse/TIKA-1840 > Project: Tika > Issue Type: Improvement > Components: parser > Affects Versions: 1.11 > Reporter: Sam H > Assignee: Chris A. Mattmann > Fix For: 1.16 > > > I'm integrating Apache Tika into my project, and I want to extract (text) > information from Powerpoint slides. Both PPT and PPTX > I've noticed when using PPT format, the slide notes are all aggregated at the > end of the XML output, and there is no way to identify which note belongs to > which slide. > I began looking at the code and found the following: > {code} > // TODO Find the Notes for this slide and extract inline > {code} > in > [HSLFExtractor.java|https://github.com/apache/tika/blob/master/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/HSLFExtractor.java] > on line 140 > I would like to implement this part and contribute -- This message was sent by Atlassian JIRA (v6.3.15#6346)