[ 
https://issues.apache.org/jira/browse/TIKA-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17481533#comment-17481533
 ] 

Ajesh edited comment on TIKA-3656 at 1/25/22, 4:30 AM:
-------------------------------------------------------

[~nick] But it is detecting the right content type for docx as
{code:java}
application/vnd.openxmlformats-officedocument.wordprocessingml.document{code}
 which has both content type and file extension as docx.

But the issue rise when we change the extension from docx to pdf that time it 
detect file type as application/zip


was (Author: JIRAUSER283930):
[~nick] But it is detecting the right content type of a
{code:java}
pplication/vnd.openxmlformats-officedocument.wordprocessingml.document{code}
 which has both content type and file extension as docx.

But the issue rise when we change the extension from docx to pdf that time it 
detect file type as application/zip

> Tika returns wrong content type for docx types.
> -----------------------------------------------
>
>                 Key: TIKA-3656
>                 URL: https://issues.apache.org/jira/browse/TIKA-3656
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 2.2.0
>         Environment: Windows 10, Java 1.8
>            Reporter: Ajesh
>            Priority: Major
>
> Steps to reproduce
>  # Select a DOCX file say example.docx
>  # Rename the DOCX file to PDF say example.pdf
>  # Use Tika to detect the content type of the example.pdf file
>  # Returns application/zip instead  
> application/vnd.openxmlformats-officedocument.wordprocessingml.document



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to