Tung Nguyen created TIKA-3034:
---------------------------------

             Summary: Detector always returns text/plain when scanning 
Mathematica files
                 Key: TIKA-3034
                 URL: https://issues.apache.org/jira/browse/TIKA-3034
             Project: Tika
          Issue Type: Bug
          Components: detector
    Affects Versions: 1.23
            Reporter: Tung Nguyen
             Fix For: 1.24


We are working with Tika to implement our mime types detection module. The 
library seemingly cannot detect Mathematica files although the documentation 
confirmed it does [1]. The Tika detector always returns `text/plain` instead of 
`application/mathematica` as described in the documentation as well as unit 
tests [2].

By doing the same need with Python code as below, we can obtain the right mime 
types for any Mathematica file downloaded from the Wolfram Library Archive [3]. 
{code:java}
#!/usr/bin/python3
import mimetypes, os, sys
test_file = sys.argv[1]
print(mimetypes.MimeTypes().guess_type(test_file)[0])
{code}
 Therefore, we suspected there is a bug in Tika detector where it tries to 
guess mime types for Mathematica files.

References:

 [1] [https://tika.apache.org/1.23/formats.html]

 [2] 
[https://github.com/apache/tika/blob/master/tika-core/src/test/java/org/apache/tika/TikaDetectionTest.java#L64]

[3] [https://library.wolfram.com/infocenter/Courseware/4706/]

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to