PackageExtractor should track names of recursively nested resources -------------------------------------------------------------------
Key: TIKA-675 URL: https://issues.apache.org/jira/browse/TIKA-675 Project: Tika Issue Type: Improvement Components: parser Affects Versions: 1.0 Reporter: Andrzej Bialecki When parsing archive formats the hierarchy of names is not tracked, only the current embedded component's name is preserved under Metadata.RESOURCE_NAME_KEY. In a way similar to the VFS model it would be nice to build pseudo-urls for nested resources. In case of Tika API that uses streams this could look like {code}tar:gz:stream://example.tar.gz!/example.tar!/example.html{code} ...or otherwise track the parent-child relationship - e.g. some applications need this information to indicate what composite documents to delete from the index after a container archive has been deleted. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira