[ https://issues.apache.org/jira/browse/TIKA-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371939#comment-14371939 ]
Tyler Palsulich commented on TIKA-1194: --------------------------------------- Thank you, [~tssk]! Is there any way you can create a patch from {{svn diff}}, instead of (I think) just regular {{diff}}? Then, we can hopefully integrate this into trunk. :) > Missing text from MS Word (DOC) file > ------------------------------------ > > Key: TIKA-1194 > URL: https://issues.apache.org/jira/browse/TIKA-1194 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 1.4 > Reporter: Tomas Safarik > Priority: Critical > Attachments: OP-06-015.doc, apache-tika-1.5.patch > > > Hello, > we noticed that filtered text from some MS Word DOC files is missing one line > (in table cell) in the original document. > - If you add or remove one character anywhere before the problematic > line/cell then the filtered text is correct. If you get the text back to > original the filtering problem is back. > - If the file is resaved as DOCX filtering works fine. > I will provide sample document. And please let me know if more information is > needed. > Regards, > Tomas -- This message was sent by Atlassian JIRA (v6.3.4#6332)