Small improvements to how embedded docs are parsed in AbstractPOIFSExtractor.handleEmbeddedOfficeDoc ----------------------------------------------------------------------------------------------------
Key: TIKA-751 URL: https://issues.apache.org/jira/browse/TIKA-751 Project: Tika Issue Type: Improvement Components: parser Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 1.0 I noticed some minor things in this method: * It does too much work (writes the tmpFile out) if the EmbeddedDocumentExtractor didn't want to actually parse file file. * It writes the tmpFile when it won't use it in the OLE10_NATIVE case (because we use a TikeInputStream from the in-RAM byte[] instead). Also I fixed a typo in the method name (embeded -> embedded) -- is that OK? It's a protected method, and a few of the office parsers invoke it. Finally I cutover to TemporaryResources to track the possible tmpFile and open TikaInputStream against it. Separately, it's inefficient now that we must serialize a sub-dir (DirectoryEntry) in the NPOIFileSystem to a tmp file only to re-parse it back to an NPOIFileSystem in OfficeParser; I'd like to look into instead (somehow) directly passing the NPOIFileSystem's DirectoryEntry to OfficeParser... but that looks like a bigger change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira