[ 
https://issues.apache.org/jira/browse/TIKA-509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907607#action_12907607
 ] 

Jukka Zitting commented on TIKA-509:
------------------------------------

Yes, I think the ContainerExtractor and ContainerEmbeddedResourceHandler 
(rename to EmbeddedResourceHandler?) interfaces should remain as they provide a 
much more convenient way to achieve this use case.

> [...] we should try to make the container related Parsers call the nested 
> parser from the ParseContext?

Yes, that's the way the PackageParser was designed and how I'd like to see also 
other container formats handled.

> "I don't want that file, don't bother doing lots of work to extract it"

We could implement that with an optional strategy object that gets passed 
through the parse context along with the component parser.

> Container contents extraction
> -----------------------------
>
>                 Key: TIKA-509
>                 URL: https://issues.apache.org/jira/browse/TIKA-509
>             Project: Tika
>          Issue Type: New Feature
>          Components: parser
>    Affects Versions: 0.7
>            Reporter: Nick Burch
>            Assignee: Nick Burch
>            Priority: Minor
>         Attachments: 0001-TIKA-509-Container-contents-extraction.patch
>
>
> As discussed on the mailing list:
> http://mail-archives.apache.org/mod_mbox/tika-dev/201009.mbox/%3calpine.deb.1.10.1009010000250.5...@urchin.earth.li%3e
> This service will operate in a push mode, using streaming where possible (not 
> all container formats will support that). Users can control recursion, and 
> will be given the chance to process each embeded file in turn. It's up to 
> them if they process a file or skip it.
> It will work similar to the current Parser code, with each container having 
> its own extractor in the parsers package, and the interface defined in the 
> core package. There will be an Auto extractor in the core package, configured 
> with a list of parser extractors just like AutoDetectParser does.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to