Theodor Sjöstedt created TIKA-1428: --------------------------------------
Summary: Microsoft Word 97 - 2003 (.doc) footnote references are Unicode Replacement Character Key: TIKA-1428 URL: https://issues.apache.org/jira/browse/TIKA-1428 Project: Tika Issue Type: Bug Affects Versions: 1.6, 1.4 Reporter: Theodor Sjöstedt Priority: Minor Footnotes from {{.doc}} documents are extracted, but the references to the footnotes are replaced by the Unicode Replacement Character (�). I have tried this in 1.4 and 1.6. In 1.4, both reference in text and reference at footnote have been replaced. In 1.6, reference in text has disappeared completely. See attached image for original document, 1.4 Formatted text, and 1.6 Formatted text. -- This message was sent by Atlassian JIRA (v6.3.4#6332)