Tim Allison created TIKA-4514:
---------------------------------

             Summary: RUnpackExtractor should use stream translator
                 Key: TIKA-4514
                 URL: https://issues.apache.org/jira/browse/TIKA-4514
             Project: Tika
          Issue Type: Task
            Reporter: Tim Allison


When recursively extracting literal bytes from files, the RUnpackExtractor 
copies the TikaInputStream (via TikaInputStream#getPath), and then processes 
that.

 

The problem is that some file formats place an object in the TikaInputStream, 
not raw bytes. In TikaCLI, we have an example of using the 
DefaultStreamEmbeddedStreamTranslator to convert an OLE object to raw bytes.

We should update the RUnpackExtractor to use the same pattern.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to