[ 
https://issues.apache.org/jira/browse/TIKA-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18029108#comment-18029108
 ] 

ASF GitHub Bot commented on TIKA-4514:
--------------------------------------

tballison commented on PR #2364:
URL: https://github.com/apache/tika/pull/2364#issuecomment-3392386369

   Needs unit tests, cleanups and some further thought.




> RUnpackExtractor should use stream translator
> ---------------------------------------------
>
>                 Key: TIKA-4514
>                 URL: https://issues.apache.org/jira/browse/TIKA-4514
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Major
>
> When recursively extracting literal bytes from files, the RUnpackExtractor 
> copies the TikaInputStream (via TikaInputStream#getPath), and then processes 
> that.
>  
> The problem is that some file formats place an object in the TikaInputStream, 
> not raw bytes. In TikaCLI, we have an example of using the 
> DefaultStreamEmbeddedStreamTranslator to convert an OLE object to raw bytes.
> We should update the RUnpackExtractor to use the same pattern.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to