[ https://issues.apache.org/jira/browse/TIKA-3800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769279#comment-17769279 ]
Tim Allison commented on TIKA-3800: ----------------------------------- Great! Let us know how it works for you! > Consider wrapping 'unrar' commandline executable as a parser to handle rar v5 > ----------------------------------------------------------------------------- > > Key: TIKA-3800 > URL: https://issues.apache.org/jira/browse/TIKA-3800 > Project: Tika > Issue Type: Task > Reporter: Tim Allison > Priority: Minor > Fix For: 2.5.0 > > > Junrar is great and doesn't require any external dependencies. However, it > doesn't handle rar v5. I've tried {{UNRAR 5.61 beta 1 freeware}} on some of > the v5 files that we have in our regression corpus, and I can confirm that > Tika is not able to handle them, but unrar is. > The parser would need to create a temporary directory, copy the inputstream > there to a file, run unrar, process the extracted files and then clean up the > directory. > We can get full path information from the {{l}} command: {{unrar l blah.rar}} > We can tell unrar not to overwrite files with the same name: {{unrar e or > bug_trackers/LIBRE_OFFICE/131138-137877/LIBRE_OFFICE-135119-0.rar}}. > If we trust unrar to protect against path traversal (e.g. an embedded file > with the name "../../../something_bad.pdf"), we can use the {{x}} command. -- This message was sent by Atlassian Jira (v8.20.10#820010)