Luca Moretti created TIKA-1823:
----------------------------------

             Summary: Support detecting DWF format
                 Key: TIKA-1823
                 URL: https://issues.apache.org/jira/browse/TIKA-1823
             Project: Tika
          Issue Type: Improvement
          Components: detector, mime
    Affects Versions: 1.11
            Reporter: Luca Moretti
            Priority: Minor


Tika currently detects dwf files as application/octect-stream.
To make Tika mime magic detector correctly recognize dwf files it should be 
added this code fragment in _tika-mimetypes.xml_ registry:

{code:xml}
<mime-type type="model/vnd.dwf">
        <acronym>dwf</acronym>
        <_comment>Design Web Format</_comment>
        <magic priority="50">
                <match type="string" offset="0" value="(DWF V">
                        <match type="string" offset="8" value=".">
                                <match type="string" offset="11" value=")" />
                        </match>
                </match>
        </magic>
        <glob pattern="*.dwf" />
</mime-type>
{code}
\\
In current version (DWF 6.0), dwf file is a ZIP-compressed container for 
vector-based CAD drawings. It is basically a ZIP archive with the _(DWF 
V06.00)_ signature added before the regular ZIP magic number. For this reason, 
the match value to detect dwf files should be: {{(DWF V06.00)PK}}.
In the previous versions, the dwf data transport isn't a ZIP file format, so 
the magic number is only the _(DWF V00.55)_ signature in the file header.
To make Tika detect dwf files with this version too I propose the match value 
in the code above.

Thanks,

Luca

\\
P.S.: The DWF format specification is included in the DWF Toolkit. The DWF 
Toolkit is available for free at [http://www.autodesk.com/dwftoolkit]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to