Tim Allison created TIKA-4514:
---------------------------------
Summary: RUnpackExtractor should use stream translator
Key: TIKA-4514
URL: https://issues.apache.org/jira/browse/TIKA-4514
Project: Tika
Issue Type: Task
Reporter: Tim Allison
When recursively extracting literal bytes from files, the RUnpackExtractor
copies the TikaInputStream (via TikaInputStream#getPath), and then processes
that.
The problem is that some file formats place an object in the TikaInputStream,
not raw bytes. In TikaCLI, we have an example of using the
DefaultStreamEmbeddedStreamTranslator to convert an OLE object to raw bytes.
We should update the RUnpackExtractor to use the same pattern.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)