Theodor Sjöstedt created TIKA-1428:
--------------------------------------

             Summary: Microsoft Word 97 - 2003 (.doc) footnote references are 
Unicode Replacement Character
                 Key: TIKA-1428
                 URL: https://issues.apache.org/jira/browse/TIKA-1428
             Project: Tika
          Issue Type: Bug
    Affects Versions: 1.6, 1.4
            Reporter: Theodor Sjöstedt
            Priority: Minor


Footnotes from {{.doc}} documents are extracted, but the references to the 
footnotes are replaced by the Unicode Replacement Character (�).

I have tried this in 1.4 and 1.6.

In 1.4, both reference in text and reference at footnote have been replaced.
In 1.6, reference in text has disappeared completely.
See attached image for original document, 1.4 Formatted text, and 1.6 Formatted 
text.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to