[ 
https://issues.apache.org/jira/browse/TIKA-533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12921930#action_12921930
 ] 

Geoff Jarrad commented on TIKA-533:
-----------------------------------

No, using the ContainerAwareDetector doesn't seem to change the detected 
content-type. My quick code test looks like this:

      String url = "file:/D:/debug-experiments/debug-corpus/zip-within-zip.zip";
      URL source = new URL(url);
      Metadata metadata = new Metadata();
      InputStream stream = source.openStream();
      AutoDetectParser a = new AutoDetectParser();
      ContainerAwareDetector c = new ContainerAwareDetector(a.getDetector());
      MediaType mt = c.detect(stream, metadata);
      System.out.printf("Detected media-type=%s\n", mt);

The output for me is:

Detected media-type=application/vnd.apple.iwork

I acknowledge that I might be misusing the ContainerAwareDetector. In my usual 
code (due to the fact that Tika does not detect UTF-16 encoded XML as XML), I 
have been extracting the byte header myself and calling 
AutoDetectParser.getDetector().getMimeType(header) directly. This doesn't 
currently seem to be possible with the ContainerAwareDetector, hence I haven't 
been using it.

> Mis-detection of zip-within-zip as application/vnd.apple.iwork, with no 
> output by CLI app
> -----------------------------------------------------------------------------------------
>
>                 Key: TIKA-533
>                 URL: https://issues.apache.org/jira/browse/TIKA-533
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.8
>         Environment: Windows 7 64-bit, latest Tika build as of 18th Oct 2010.
>            Reporter: Geoff Jarrad
>         Attachments: zip-within-zip.zip
>
>
> It appears that, at least in some circumstances, a zip file containing only 
> another zip file is being mis-detected as  application/vnd.apple.iwork.
> In addition, for such files, the command-line parser does not return any 
> output at all.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to