[ 
https://issues.apache.org/jira/browse/TIKA-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14332210#comment-14332210
 ] 

Aakarsh Medleri Hire Math commented on TIKA-1532:
-------------------------------------------------

Hi Nick,

Sorry for the delayed response.
It seems like there is no unique mime type associated with GCMD .dif files. We 
have crawled around 8000 files from ACADIS website (https://www.aoncadis.org) 
and all these files had their content type set to text/plain. However, the data 
itself is represented in XML format. Does that mean TIKA should detect it as 
application/xml or text/xml?

Here is one such example: https://www.aoncadis.org/dataset/Zamora2010.dif

You can find rest of the crawled links at:
https://raw.githubusercontent.com/shekarprashant/TikaDirectedResearch/master/Acadis%20Complete%20Crawl%20Raw%20Results.csv

Looking forward for your inputs.

Thanks,
Aakarsh

> DIF Parser
> ----------
>
>                 Key: TIKA-1532
>                 URL: https://issues.apache.org/jira/browse/TIKA-1532
>             Project: Tika
>          Issue Type: New Feature
>          Components: parser
>            Reporter: Aakarsh Medleri Hire Math
>
> MIME Type detection & content parser for .dif format



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to