[ 
https://issues.apache.org/jira/browse/TIKA-1428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Theodor Sjöstedt updated TIKA-1428:
-----------------------------------
    Attachment: TIKA-doc-footnotes-issue.png

Original document to the left. 
TIKA 1.4 in Center
TIKA 1.6 to the right

> Microsoft Word 97 - 2003 (.doc) footnote references are Unicode Replacement 
> Character
> -------------------------------------------------------------------------------------
>
>                 Key: TIKA-1428
>                 URL: https://issues.apache.org/jira/browse/TIKA-1428
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 1.4, 1.6
>            Reporter: Theodor Sjöstedt
>            Priority: Minor
>         Attachments: TIKA-doc-footnotes-issue.png
>
>
> Footnotes from {{.doc}} documents are extracted, but the references to the 
> footnotes are replaced by the Unicode Replacement Character (�).
> I have tried this in 1.4 and 1.6.
> In 1.4, both reference in text and reference at footnote have been replaced.
> In 1.6, reference in text has disappeared completely.
> See attached image for original document, 1.4 Formatted text, and 1.6 
> Formatted text.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to