[ https://issues.apache.org/jira/browse/TIKA-1428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Theodor Sjöstedt updated TIKA-1428: ----------------------------------- Attachment: TIKA-doc-footnotes-issue.png Original document to the left. TIKA 1.4 in Center TIKA 1.6 to the right > Microsoft Word 97 - 2003 (.doc) footnote references are Unicode Replacement > Character > ------------------------------------------------------------------------------------- > > Key: TIKA-1428 > URL: https://issues.apache.org/jira/browse/TIKA-1428 > Project: Tika > Issue Type: Bug > Affects Versions: 1.4, 1.6 > Reporter: Theodor Sjöstedt > Priority: Minor > Attachments: TIKA-doc-footnotes-issue.png > > > Footnotes from {{.doc}} documents are extracted, but the references to the > footnotes are replaced by the Unicode Replacement Character (�). > I have tried this in 1.4 and 1.6. > In 1.4, both reference in text and reference at footnote have been replaced. > In 1.6, reference in text has disappeared completely. > See attached image for original document, 1.4 Formatted text, and 1.6 > Formatted text. -- This message was sent by Atlassian JIRA (v6.3.4#6332)