[ 
https://issues.apache.org/jira/browse/TIKA-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16169130#comment-16169130
 ] 

ASF GitHub Bot commented on TIKA-2347:
--------------------------------------

darkdreamingdan commented on issue #173: Fix for TIKA-2347 Adds underline 
extraction from word documents
URL: https://github.com/apache/tika/pull/173#issuecomment-330000650
 
 
   Could you also add strikethrough support?  It's just the same thnig but 
using the <strike> xhtml element.  We have our own branch for this code but it 
would be good to unify our PRs
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Underlined text is not decorated as such when extracting from word documents
> ----------------------------------------------------------------------------
>
>                 Key: TIKA-2347
>                 URL: https://issues.apache.org/jira/browse/TIKA-2347
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 2.0, 1.14
>            Reporter: Stuart Hendren
>
> When extracting from doc and docx bold and italic text decoration is 
> extracted, however underlining is not.  Can be demonstrated in WordParserTest 
> or OOXMLParserTest (change to docx) with the following test case.
> {code:title=WordParserTest.java|borderStyle=solid}
>     @Test
>     public void testTextDecoration() throws Exception {
>       XMLResult result = getXML("testWORD_various.doc");
>       String xml = result.xml;
>       assertTrue(xml.contains("<b>Bold</b>"));
>       assertTrue(xml.contains("<i>italic</i>"));
>       assertTrue(xml.contains("<u>underline</u>"));
>     }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to