[ 
https://issues.apache.org/jira/browse/TIKA-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12893620#action_12893620
 ] 

Chris A. Mattmann commented on TIKA-447:
----------------------------------------

Nick, awesome job! Comments below:

{quote}
I think the only bit left for now is to document it. We don't currently have a 
Detection section in the documentation. Shall I create a new one, put in the 
basics from one of the apachecon Tika talks, then add a section on container 
aware detection? 
{quote}

Yep, I would do this. I would just add some APT documentation and create a 
section called "Detection", with some useful information on there. You could 
also then from that APT page, link to the page on the Wiki where the discussion 
on container Metadata occurred too:

http://wiki.apache.org/tika/MetadataDiscussion

Cheers,
Chris



> Container aware mimetype detection
> ----------------------------------
>
>                 Key: TIKA-447
>                 URL: https://issues.apache.org/jira/browse/TIKA-447
>             Project: Tika
>          Issue Type: New Feature
>          Components: mime
>    Affects Versions: 0.7
>            Reporter: Nick Burch
>         Attachments: TikaContainerDetection.patch
>
>
> As discussed on the dev list, Tika should ideally have a configurable way to 
> process container based formats (eg zip files and ole2 files) when trying to 
> detect the correct mime type for a document.
> This needs to be configurable, because some people won't want Tika to have to 
> do all the work of parsing the whole file when they're not interested in 
> knowing exactly what's in it
> Once we have gone to the trouble of opening and parsing the container file, 
> we should try to keep the open container around to speed up parsing of the 
> contents

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to