Luca Moretti created TIKA-1823: ---------------------------------- Summary: Support detecting DWF format Key: TIKA-1823 URL: https://issues.apache.org/jira/browse/TIKA-1823 Project: Tika Issue Type: Improvement Components: detector, mime Affects Versions: 1.11 Reporter: Luca Moretti Priority: Minor
Tika currently detects dwf files as application/octect-stream. To make Tika mime magic detector correctly recognize dwf files it should be added this code fragment in _tika-mimetypes.xml_ registry: {code:xml} <mime-type type="model/vnd.dwf"> <acronym>dwf</acronym> <_comment>Design Web Format</_comment> <magic priority="50"> <match type="string" offset="0" value="(DWF V"> <match type="string" offset="8" value="."> <match type="string" offset="11" value=")" /> </match> </match> </magic> <glob pattern="*.dwf" /> </mime-type> {code} \\ In current version (DWF 6.0), dwf file is a ZIP-compressed container for vector-based CAD drawings. It is basically a ZIP archive with the _(DWF V06.00)_ signature added before the regular ZIP magic number. For this reason, the match value to detect dwf files should be: {{(DWF V06.00)PK}}. In the previous versions, the dwf data transport isn't a ZIP file format, so the magic number is only the _(DWF V00.55)_ signature in the file header. To make Tika detect dwf files with this version too I propose the match value in the code above. Thanks, Luca \\ P.S.: The DWF format specification is included in the DWF Toolkit. The DWF Toolkit is available for free at [http://www.autodesk.com/dwftoolkit] -- This message was sent by Atlassian JIRA (v6.3.4#6332)