[ https://issues.apache.org/jira/browse/TIKA-1823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Luca Moretti updated TIKA-1823: ------------------------------- Affects Version/s: (was: 1.11) > Support detecting DWF format > ---------------------------- > > Key: TIKA-1823 > URL: https://issues.apache.org/jira/browse/TIKA-1823 > Project: Tika > Issue Type: Improvement > Components: detector, mime > Reporter: Luca Moretti > Priority: Minor > Labels: detection, dwf, mime > > Tika currently detects dwf files as application/octect-stream. > To make Tika mime magic detector correctly recognize dwf files it should be > added this code fragment in _tika-mimetypes.xml_ registry: > {code:xml} > <mime-type type="model/vnd.dwf"> > <acronym>dwf</acronym> > <_comment>Design Web Format</_comment> > <magic priority="50"> > <match type="string" offset="0" value="(DWF V"> > <match type="string" offset="8" value="."> > <match type="string" offset="11" value=")" /> > </match> > </match> > </magic> > <glob pattern="*.dwf" /> > </mime-type> > {code} > \\ > In current version (DWF 6.0), dwf file is a ZIP-compressed container for > vector-based CAD drawings. It is basically a ZIP archive with the _(DWF > V06.00)_ signature added before the regular ZIP magic number. For this > reason, the match value to detect dwf files should be: {{(DWF V06.00)PK}}. > In the previous versions, the dwf data transport isn't a ZIP file format, so > the magic number is only the _(DWF V00.55)_ signature in the file header. > To make Tika detect dwf files with this version too I propose the match value > in the code above. > Thanks, > Luca > \\ > P.S.: The DWF format specification is included in the DWF Toolkit. The DWF > Toolkit is available for free at [http://www.autodesk.com/dwftoolkit] -- This message was sent by Atlassian JIRA (v6.3.4#6332)