[ 
https://issues.apache.org/jira/browse/TIKA-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17557341#comment-17557341
 ] 

Tim Allison commented on TIKA-3798:
-----------------------------------

There's not much we can do without an example file.  We could fuzz the junrar 
files we have in our test corpus and see if we can trigger an infinite loop.  
The issue is likely in the dependency and not fixable at the Tika level.

We've put together some thoughts on robustness of Tika: 
https://cwiki.apache.org/confluence/display/TIKA/The+Robustness+of+Apache+Tika

Basically, we need to fix the parsers and the underlying dependencies when we 
can.  However, bad things happen when processing files at scale, and you need 
to isolate parsing in a separate process.  We offer several options: 
tika-server, tika-pipes, ForkParser, PipesParser and tika-batch.


> Tika hangs up with some RAR archives
> ------------------------------------
>
>                 Key: TIKA-3798
>                 URL: https://issues.apache.org/jira/browse/TIKA-3798
>             Project: Tika
>          Issue Type: Bug
>         Environment: Windows, Tika 2.4.0
>            Reporter: Mikhail Gushinets
>            Priority: Major
>         Attachments: MicrosoftTeams-image.png
>
>
> Passing to Tika rar archive might lead to hanging up.
> When trying to unrar this file manually I get this message: "Checksum is not 
> calculated right of file as there might be a change of the metadata"
> I understand that the probably reason is some kind of file corruption here 
> but it would be nice if Tika would just throw an exception in such case 
> rather than hanging up forever.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to