Prevent creating of ZipInputStreamZipEntrySource when reading files from disk
-----------------------------------------------------------------------------
Key: TIKA-662
URL: https://issues.apache.org/jira/browse/TIKA-662
Project: Tika
Issue Type: Improvement
Reporter: Maxim Valyanskiy
POI provides two ways to open OPCPackage - via InputStream and via File.
Creating OPCPackage from InputStream casuses creation of
ZipInputStreamZipEntrySource, that buffers all uncompressed data in memory.
This takes a lot of memory and it is not needed when we are reading files from
disk or when we already copied stream into temporary file.
This patch removes usage of ZipInputStreamZipEntrySource in this case.
Unfortunately, it breaks ZIP-bomb prevention for OOXML parser (and other
parsers that uses TikaInputStream.getFile()). I think that ZIP-bomb prevention
should be additionally implemented for that formats before committing this to
SVN.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira