[
https://issues.apache.org/jira/browse/TIKA-533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12921930#action_12921930
]
Geoff Jarrad commented on TIKA-533:
-----------------------------------
No, using the ContainerAwareDetector doesn't seem to change the detected
content-type. My quick code test looks like this:
String url = "file:/D:/debug-experiments/debug-corpus/zip-within-zip.zip";
URL source = new URL(url);
Metadata metadata = new Metadata();
InputStream stream = source.openStream();
AutoDetectParser a = new AutoDetectParser();
ContainerAwareDetector c = new ContainerAwareDetector(a.getDetector());
MediaType mt = c.detect(stream, metadata);
System.out.printf("Detected media-type=%s\n", mt);
The output for me is:
Detected media-type=application/vnd.apple.iwork
I acknowledge that I might be misusing the ContainerAwareDetector. In my usual
code (due to the fact that Tika does not detect UTF-16 encoded XML as XML), I
have been extracting the byte header myself and calling
AutoDetectParser.getDetector().getMimeType(header) directly. This doesn't
currently seem to be possible with the ContainerAwareDetector, hence I haven't
been using it.
> Mis-detection of zip-within-zip as application/vnd.apple.iwork, with no
> output by CLI app
> -----------------------------------------------------------------------------------------
>
> Key: TIKA-533
> URL: https://issues.apache.org/jira/browse/TIKA-533
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 0.8
> Environment: Windows 7 64-bit, latest Tika build as of 18th Oct 2010.
> Reporter: Geoff Jarrad
> Attachments: zip-within-zip.zip
>
>
> It appears that, at least in some circumstances, a zip file containing only
> another zip file is being mis-detected as application/vnd.apple.iwork.
> In addition, for such files, the command-line parser does not return any
> output at all.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.