[ https://issues.apache.org/jira/browse/TIKA-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17029796#comment-17029796 ]
Hudson commented on TIKA-3034: ------------------------------ UNSTABLE: Integrated in Jenkins build Tika-trunk #1767 (See [https://builds.apache.org/job/Tika-trunk/1767/]) TIKA-3034 Mathematica files don't have a unique magic, but try to detect (nick: [https://github.com/apache/tika/commit/f5571fa99ef6f178a16bd1bd3a3cded83c7b0013]) * (edit) tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml > Detector always returns text/plain when scanning Mathematica files > ------------------------------------------------------------------ > > Key: TIKA-3034 > URL: https://issues.apache.org/jira/browse/TIKA-3034 > Project: Tika > Issue Type: Bug > Components: detector > Affects Versions: 1.23 > Reporter: Tung Nguyen > Priority: Blocker > Fix For: 1.23 > > > We are working with Tika to implement our mime types detection module. The > library seemingly cannot detect Mathematica files although the documentation > confirmed it does [1]. The Tika detector always returns `text/plain` instead > of `application/mathematica` as described in the documentation as well as > unit tests [2]. > By doing the same need with Python code as below, we can obtain the right > mime types for any Mathematica file downloaded from the Wolfram Library > Archive [3]. > {code:java} > #!/usr/bin/python3 > import mimetypes, os, sys > test_file = sys.argv[1] > print(mimetypes.MimeTypes().guess_type(test_file)[0]) > {code} > Therefore, we suspected there is a bug in Tika detector where it tries to > guess mime types for Mathematica files. > References: > [1] [https://tika.apache.org/1.23/formats.html] > [2] > [https://github.com/apache/tika/blob/master/tika-core/src/test/java/org/apache/tika/TikaDetectionTest.java#L64] > [3] [https://library.wolfram.com/infocenter/Courseware/4706/] > -- This message was sent by Atlassian Jira (v8.3.4#803005)