[ 
https://issues.apache.org/jira/browse/TIKA-3800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17768948#comment-17768948
 ] 

Alexey Pismenskiy commented on TIKA-3800:
-----------------------------------------

[~tallison] just to confirm - "autodetect unrar" feature and switching to 
UnrarParser is not implemented, right? 

> Consider wrapping 'unrar' commandline executable as a parser to handle rar v5
> -----------------------------------------------------------------------------
>
>                 Key: TIKA-3800
>                 URL: https://issues.apache.org/jira/browse/TIKA-3800
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Minor
>             Fix For: 2.5.0
>
>
> Junrar is great and doesn't require any external dependencies.  However, it 
> doesn't handle rar v5.  I've tried {{UNRAR 5.61 beta 1 freeware}} on some of 
> the v5 files that we have in our regression corpus, and I can confirm that 
> Tika is not able to handle them, but unrar is.
> The parser would need to create a temporary directory, copy the inputstream 
> there to a file, run unrar, process the extracted files and then clean up the 
> directory.
> We can get full path information from the {{l}} command: {{unrar l blah.rar}}
> We can tell unrar not to overwrite files with the same name: {{unrar e or 
> bug_trackers/LIBRE_OFFICE/131138-137877/LIBRE_OFFICE-135119-0.rar}}.
> If we trust unrar to protect against path traversal (e.g. an embedded file 
> with the name "../../../something_bad.pdf"), we can use the {{x}} command.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to