[ https://issues.apache.org/jira/browse/TIKA-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049159#comment-13049159 ]
Andrzej Bialecki commented on TIKA-675: ---------------------------------------- Good point. For example Aperture assigns sequential id-s for resources that don't have names (e.g. parts in a mime message). > PackageExtractor should track names of recursively nested resources > ------------------------------------------------------------------- > > Key: TIKA-675 > URL: https://issues.apache.org/jira/browse/TIKA-675 > Project: Tika > Issue Type: Improvement > Components: parser > Affects Versions: 1.0 > Reporter: Andrzej Bialecki > > When parsing archive formats the hierarchy of names is not tracked, only the > current embedded component's name is preserved under > Metadata.RESOURCE_NAME_KEY. In a way similar to the VFS model it would be > nice to build pseudo-urls for nested resources. In case of Tika API that uses > streams this could look like > {code}tar:gz:stream://example.tar.gz!/example.tar!/example.html{code} ...or > otherwise track the parent-child relationship - e.g. some applications need > this information to indicate what composite documents to delete from the > index after a container archive has been deleted. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira