[ 
https://issues.apache.org/jira/browse/TIKA-1286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14356859#comment-14356859
 ] 

Nick Burch commented on TIKA-1286:
----------------------------------

Any chance you could create very small sample files of as many of these as 
possible? We can then use these for unit tests of the detection once we 
implement the definitions, and possibly also use them for updating the ooxml 
container-aware detector to match them without filenames as well

> Adding MS Visio VSDX to mime-types detection
> --------------------------------------------
>
>                 Key: TIKA-1286
>                 URL: https://issues.apache.org/jira/browse/TIKA-1286
>             Project: Tika
>          Issue Type: Improvement
>          Components: mime
>    Affects Versions: 1.5
>         Environment: Any
>            Reporter: Pascal Essiembre
>            Priority: Minor
>              Labels: easyfix
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> Visio files under the Open Office XML (ooxml) format are not recognized by 
> the mim-type detector and always returns the family mime-type instead: 
> {{application/x-tika-ooxml}}.
> It turns out most Microsoft OOXML file formats are defined in the 
> tika-mimetypes.xml, but not not Visio.  I have created the list for someone 
> to add:
> {code:xml}
>   <mime-type type="application/vnd.ms-visio.drawing.main+xml">
>     <_comment>Office Open XML Visio Drawing (macro-free)</_comment>
>     <glob pattern="*.vsdx"/>
>     <sub-class-of type="application/x-tika-ooxml"/>
>   </mime-type>
>   <mime-type type="application/vnd.ms-visio.template.main+xml">
>     <_comment>Office Open XML Visio Template (macro-free)</_comment>
>     <glob pattern="*.vstx"/>
>     <sub-class-of type="application/x-tika-ooxml"/>
>   </mime-type>
>   <mime-type type="application/vnd.ms-visio.stencil.main+xml">
>     <_comment>Office Open XML Visio Stencil (macro-free)</_comment>
>     <glob pattern="*.vssx"/>
>     <sub-class-of type="application/x-tika-ooxml"/>
>   </mime-type>
>   <mime-type type="application/vnd.ms-visio.drawing.macroEnabled.main+xml">
>     <_comment>Office Open XML Visio Drawing (macro-enabled)</_comment>
>     <glob pattern="*.vsdm"/>
>     <sub-class-of type="application/x-tika-ooxml"/>
>   </mime-type>
>   <mime-type type="application/vnd.ms-visio.template.macroEnabled.main+xml">
>     <_comment>Office Open XML Visio Template (macro-enabled)</_comment>
>     <glob pattern="*.vstm"/>
>     <sub-class-of type="application/x-tika-ooxml"/>
>   </mime-type>
>   <mime-type type="application/vnd.ms-visio.stencil.macroEnabled.main+xml">
>     <_comment>Office Open XML Visio Stencil (macro-enabled)</_comment>
>     <glob pattern="*.vssm"/>
>     <sub-class-of type="application/x-tika-ooxml"/>
>   </mime-type>
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to