[ 
https://issues.apache.org/jira/browse/TIKA-727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13117311#comment-13117311
 ] 

Nick Burch commented on TIKA-727:
---------------------------------

Thanks for the patch, applied with a few tweaks in r1177313.

For any NPE's in getShapes, any chance you could open POI bugs for any you come 
across? We shouldn't have them, so would be good to fix them there
                
> Improve the outputed XHTML by HSLFExtractor
> -------------------------------------------
>
>                 Key: TIKA-727
>                 URL: https://issues.apache.org/jira/browse/TIKA-727
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 0.10
>            Reporter: Pablo Queixalos
>            Priority: Minor
>         Attachments: HSLFExtractor.java, HSLFExtractor.patch
>
>
> The XHTML output of HSLFExtractor parser is not pure XHTML, it only inserts 
> the full text into a P[aragraph] tag (including non-html carriage returns).  
> This behavior comes from the poor capabilities that the POI 
> PowerPointExtractor offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to