[
https://issues.apache.org/jira/browse/TIKA-2294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018339#comment-17018339
]
Andrey Nizienko commented on TIKA-2294:
---------------------------------------
Hi [~tallison], thanks for your quick reply.
I've tried with versions 1.2 and 1.23 in my maven project.
{code:java}
<dependency>
<groupId>org.apache.tika</groupId>
<artifactId>tika-core</artifactId>
<version>1.23</version>
</dependency>
{code}
Here is the code snippet:
{code:java}
import org.apache.tika.Tika;
import org.apache.tika.mime.MimeType;
import org.apache.tika.mime.MimeTypeException;
import org.apache.tika.mime.MimeTypes;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
public class TikaFileCheck {
private static Tika tika = new Tika();
public static void main(String[] args) {
try {
byte[] fileContent =
Files.readAllBytes(Paths.get("D:/google_doc.docx"));
MimeType mimeType =
MimeTypes.getDefaultMimeTypes().forName(tika.detect(fileContent));
System.out.println(mimeType);
} catch (MimeTypeException | IOException e) {
System.out.println(e);
}
}
}
{code}
The output is: application/zip
Regards,
Andrii
> Tika inconsistently detects ooxml files as zip file sometimes
> -------------------------------------------------------------
>
> Key: TIKA-2294
> URL: https://issues.apache.org/jira/browse/TIKA-2294
> Project: Tika
> Issue Type: Bug
> Components: mime
> Affects Versions: 1.11
> Environment: linux
> Reporter: chanchal
> Assignee: Tim Allison
> Priority: Major
> Attachments: google_doc.docx
>
>
> Tika sometimes incorrectly detects ooxml file as zip and sometimes correctly
> detects as docx/pptx/xlsx.
> Is there a possibility of it happening and how?
> I cannot share the file as it has sensitive content.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)