[ 
https://issues.apache.org/jira/browse/TIKA-1428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14147880#comment-14147880
 ] 

Hong-Thai Nguyen commented on TIKA-1428:
----------------------------------------

Thanks [~theoettheo], any chance to have a patch with a test case for this 
problem ?

> Microsoft Word 97 - 2003 (.doc) footnote references are Unicode Replacement 
> Character
> -------------------------------------------------------------------------------------
>
>                 Key: TIKA-1428
>                 URL: https://issues.apache.org/jira/browse/TIKA-1428
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 1.4, 1.6
>            Reporter: Theodor Sjöstedt
>            Priority: Minor
>         Attachments: TIKA-doc-footnotes-issue.png
>
>
> Footnotes from {{.doc}} documents are extracted, but the references to the 
> footnotes are replaced by the Unicode Replacement Character (�).
> I have tried this in 1.4 and 1.6.
> In 1.4, both reference in text and reference at footnote have been replaced.
> In 1.6, reference in text has disappeared completely.
> See attached image for original document, 1.4 Formatted text, and 1.6 
> Formatted text.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to