Cservenak, Tamas created TIKA-1241: -------------------------------------- Summary: Tike does not recognise empty nor spanning ZIP files magic Key: TIKA-1241 URL: https://issues.apache.org/jira/browse/TIKA-1241 Project: Tika Issue Type: Improvement Reporter: Cservenak, Tamas Priority: Minor
As it turns out, magic differs for non-empty, empty and spanning ZIP files. Tika recognizes only the non-empty ZIP files. Magic for empty ZIP file is validated with hexdump: https://gist.github.com/cstamas/6e90ae73f83c8e4a3f42 Also described on Wikipedia http://en.wikipedia.org/wiki/Zip_(file_format) (see sidebar with Magic Numbers) Proposed change: add two more match entries to ZIP MIME definition: https://github.com/apache/tika/pull/4 -- This message was sent by Atlassian JIRA (v6.1.5#6160)