GitHub user cstamas opened a pull request:

    https://github.com/apache/tika/pull/4

    Make Tika recognize empty and spanning ZIP files too

    As it turns out, magic differs for non-empty, empty and
    spanning ZIP files. Tika recognizes only the non-empty ZIP files.
    
    Magic for empty ZIP file is validated with hexdump:
    https://gist.github.com/cstamas/6e90ae73f83c8e4a3f42
    
    Also described on Wikipedia
    http://en.wikipedia.org/wiki/Zip_(file_format)
    (see sidebar with Magic Numbers)
    
    Am completely unsure about MIME types `application/vnd.zip.empty` and 
`application/vnd.zip.spanning`, so please correct those to appropriate values.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cstamas/tika zip-magic

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/tika/pull/4.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4
    
----
commit fee430c1c8dac267ed54ba4ec51777f32fac4981
Author: Tamas Cservenak <ta...@cservenak.net>
Date:   2014-02-21T14:18:10Z

    Added two more magic entries for ZIP files.
    
    As it turns out, magic differs for non-empty, empty and
    spanning ZIP files.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastruct...@apache.org or file a JIRA ticket with INFRA.
---

Reply via email to