[ https://issues.apache.org/jira/browse/TIKA-637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13018170#comment-13018170 ]
Nick Burch commented on TIKA-637: --------------------------------- Doesn't org.apache.tika.extractor.ParserContainerExtractor do what you need? > Need API to get list of embedded documents > ------------------------------------------ > > Key: TIKA-637 > URL: https://issues.apache.org/jira/browse/TIKA-637 > Project: Tika > Issue Type: New Feature > Components: parser > Affects Versions: 1.0 > Reporter: Manish > > Apache tika works great to extract the content and the meta data of > documents. > but if it can have APIs where it can get you individual documents' input > stream along with its content and meta data, it would be great. > For example, if it is extracting zip files, then if we can have the output in > the form of list of <text, metadata, inputstream> for each document, or > provide an callback for each <text, metadata, inputstream>, then it can be > used for both text extraction and also to extract individual documents from > container files. > I have already done it for zip and also PST. But if we can have some standard > API, then it would be great. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira