[ 
https://issues.apache.org/jira/browse/TIKA-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894501#action_12894501
 ] 

Alex Ott commented on TIKA-447:
-------------------------------

2Nick: does this will allow to implement support for self-extracted archives? 
Because, if we'll implement this as separate checker, then we'll need to 
implement archive extraction/detection inside this checker - this could lead to 
code duplication.

> Container aware mimetype detection
> ----------------------------------
>
>                 Key: TIKA-447
>                 URL: https://issues.apache.org/jira/browse/TIKA-447
>             Project: Tika
>          Issue Type: New Feature
>          Components: mime
>    Affects Versions: 0.7
>            Reporter: Nick Burch
>         Attachments: TikaContainerDetection.patch
>
>
> As discussed on the dev list, Tika should ideally have a configurable way to 
> process container based formats (eg zip files and ole2 files) when trying to 
> detect the correct mime type for a document.
> This needs to be configurable, because some people won't want Tika to have to 
> do all the work of parsing the whole file when they're not interested in 
> knowing exactly what's in it
> Once we have gone to the trouble of opening and parsing the container file, 
> we should try to keep the open container around to speed up parsing of the 
> contents

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to