[ https://issues.apache.org/jira/browse/TIKA-2601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Konstantin Gribov reopened TIKA-2601: ------------------------------------- Assignee: Konstantin Gribov > Invalid XHTML output (overlapping a and formatting tags) for some WORD > documents > -------------------------------------------------------------------------------- > > Key: TIKA-2601 > URL: https://issues.apache.org/jira/browse/TIKA-2601 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 1.17 > Environment: Linked is a sample document with its corresponding > output. > Reporter: Filip > Assignee: Konstantin Gribov > Priority: Major > Attachments: Invalid-XML.doc, Test.doc, test.html > > > In some WORD (.doc, .docx) documents the XHTML elements are not closed > properly. This usually happens when there are link elements (<a>) as well as > italic or bold elements (<b><i>). > > Fix should be done in > [https://github.com/apache/tika/blob/master/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/WordExtractor.java] -- This message was sent by Atlassian JIRA (v7.6.3#76005)