[ https://issues.apache.org/jira/browse/TIKA-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14332210#comment-14332210 ]
Aakarsh Medleri Hire Math commented on TIKA-1532: ------------------------------------------------- Hi Nick, Sorry for the delayed response. It seems like there is no unique mime type associated with GCMD .dif files. We have crawled around 8000 files from ACADIS website (https://www.aoncadis.org) and all these files had their content type set to text/plain. However, the data itself is represented in XML format. Does that mean TIKA should detect it as application/xml or text/xml? Here is one such example: https://www.aoncadis.org/dataset/Zamora2010.dif You can find rest of the crawled links at: https://raw.githubusercontent.com/shekarprashant/TikaDirectedResearch/master/Acadis%20Complete%20Crawl%20Raw%20Results.csv Looking forward for your inputs. Thanks, Aakarsh > DIF Parser > ---------- > > Key: TIKA-1532 > URL: https://issues.apache.org/jira/browse/TIKA-1532 > Project: Tika > Issue Type: New Feature > Components: parser > Reporter: Aakarsh Medleri Hire Math > > MIME Type detection & content parser for .dif format -- This message was sent by Atlassian JIRA (v6.3.4#6332)