[ 
https://issues.apache.org/jira/browse/TIKA-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17557323#comment-17557323
 ] 

Mikhail Gushinets commented on TIKA-3798:
-----------------------------------------

Hi Nick! Unfortunatelly I can not provide the original file to you because of 
security reasons. 

When trying to unrar this file it shows such an error message which means 
"Checksum is not calculated right of file as there might be a change of the 
metadata"

!MicrosoftTeams-image.png!

 

Probably this file has been corrupted in some way when opening it on Linux and 
then copying it to windows and trying to process it in TIka there.

 

Unfortunately none of the logs you`re talking about can be provided cause it 
happens on remote client machine.

The symptoms are that Tika just doesn`t call our callbacks that would return 
list of parsed files for a very long time (~16 hours). 

The other archive files including rar-s work just fine or return TikaException 
if file can not be processed

> Tika hangs up with some RAR archives
> ------------------------------------
>
>                 Key: TIKA-3798
>                 URL: https://issues.apache.org/jira/browse/TIKA-3798
>             Project: Tika
>          Issue Type: Bug
>         Environment: Windows, Tika 2.4.0
>            Reporter: Mikhail Gushinets
>            Priority: Major
>         Attachments: MicrosoftTeams-image.png
>
>
> Passing to Tika rar archive might lead to hanging up.
> When trying to unrar this file manually I get this message: "Checksum is not 
> calculated right of file as there might be a change of the metadata"
> I understand that the probably reason is some kind of file corruption here 
> but it would be nice if Tika would just throw an exception in such case 
> rather than hanging up forever.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to