[ 
https://issues.apache.org/jira/browse/TIKA-2294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018339#comment-17018339
 ] 

Andrey Nizienko commented on TIKA-2294:
---------------------------------------

Hi [~tallison], thanks for your quick reply.

I've tried with versions 1.2 and 1.23 in my maven project.  

{code:java}
        <dependency>
            <groupId>org.apache.tika</groupId>
            <artifactId>tika-core</artifactId>
            <version>1.23</version>
        </dependency>
{code}


Here is the code snippet:
{code:java}

import org.apache.tika.Tika;
import org.apache.tika.mime.MimeType;
import org.apache.tika.mime.MimeTypeException;
import org.apache.tika.mime.MimeTypes;

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;

public class TikaFileCheck {
        private static Tika tika = new Tika();
        public static void main(String[] args)  {
            try {
                byte[] fileContent = 
Files.readAllBytes(Paths.get("D:/google_doc.docx"));
                MimeType mimeType = 
MimeTypes.getDefaultMimeTypes().forName(tika.detect(fileContent));
                System.out.println(mimeType);
            } catch (MimeTypeException | IOException e) {
                System.out.println(e);
            }
    }
}
{code}

The output is: application/zip

Regards,
Andrii

> Tika inconsistently detects ooxml files as zip file sometimes
> -------------------------------------------------------------
>
>                 Key: TIKA-2294
>                 URL: https://issues.apache.org/jira/browse/TIKA-2294
>             Project: Tika
>          Issue Type: Bug
>          Components: mime
>    Affects Versions: 1.11
>         Environment: linux
>            Reporter: chanchal
>            Assignee: Tim Allison
>            Priority: Major
>         Attachments: google_doc.docx
>
>
> Tika sometimes incorrectly detects  ooxml file as zip and sometimes correctly 
> detects as docx/pptx/xlsx.
> Is there a possibility of it happening and how?
> I cannot share the file as it has sensitive content.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to