[ 
https://issues.apache.org/jira/browse/TIKA-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14342749#comment-14342749
 ] 

Nick Burch commented on TIKA-675:
---------------------------------

I think this is already handled by the RecursiveParserWrapper, via the 
EMBEDDED_RESOURCE_PATH metadata key?

> PackageExtractor should track names of recursively nested resources
> -------------------------------------------------------------------
>
>                 Key: TIKA-675
>                 URL: https://issues.apache.org/jira/browse/TIKA-675
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 0.10
>            Reporter: Andrzej Bialecki 
>
> When parsing archive formats the hierarchy of names is not tracked, only the 
> current embedded component's name is preserved under 
> Metadata.RESOURCE_NAME_KEY. In a way similar to the VFS model it would be 
> nice to build pseudo-urls for nested resources. In case of Tika API that uses 
> streams this could look like 
> {code}tar:gz:stream://example.tar.gz!/example.tar!/example.html{code} ...or 
> otherwise track the parent-child relationship - e.g. some applications need 
> this information to indicate what composite documents to delete from the 
> index after a container archive has been deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to