[ 
https://issues.apache.org/jira/browse/TIKA-1840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15114576#comment-15114576
 ] 

ASF GitHub Bot commented on TIKA-1840:
--------------------------------------

Github user asfgit closed the pull request at:

    https://github.com/apache/tika/pull/72


> No way to link slide notes to slide in PPT output.
> --------------------------------------------------
>
>                 Key: TIKA-1840
>                 URL: https://issues.apache.org/jira/browse/TIKA-1840
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 1.11
>            Reporter: Sam H
>            Assignee: Chris A. Mattmann
>             Fix For: 1.12
>
>
> I'm integrating Apache Tika into my project, and I want to extract (text) 
> information from Powerpoint slides. Both PPT and PPTX
> I've noticed when using PPT format, the slide notes are all aggregated at the 
> end of the XML output, and there is no way to identify which note belongs to 
> which slide.
> I began looking at the code and found the following:
> {code}
> // TODO Find the Notes for this slide and extract inline
> {code}
> in 
> [HSLFExtractor.java|https://github.com/apache/tika/blob/master/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/HSLFExtractor.java]
>  on line 140 
> I would like to implement this part and contribute



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to