[ 
https://issues.apache.org/jira/browse/TIKA-3827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17575891#comment-17575891
 ] 

Tim Allison commented on TIKA-3827:
-----------------------------------

For now, I've added a mediatype hint that the bytes are of type 
{{image/x-rtf-raw-bitmap}}.  This prevents parsers from being applied.

The correct solution would be to figure out the algorithm to manipulate the 
bytes to convert them to an actual image file, but that is beyond my reach atm.

> Word Document extracted mpga file extension instead of bitmap 
> --------------------------------------------------------------
>
>                 Key: TIKA-3827
>                 URL: https://issues.apache.org/jira/browse/TIKA-3827
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>            Reporter: Tika User
>            Priority: Major
>         Attachments: Screenshot from 2022-08-04 06-05-09.png, example.DOC, 
> example.zip, file_1.bmp, file_2.bmp, image-2022-08-04-10-52-44-800.png, 
> image-2022-08-04-10-53-48-894.png, image-2022-08-04-15-44-48-396.png, 
> image-2022-08-04-15-45-10-892.png
>
>
> When tried to parser the .doc document it is extracted two mpga files which 
> can't be open to play. We are suspecting they should be bitmap image files. 
> The Tika version we are using is 2.4.1.
> [^example.DOC]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to