[ 
https://issues.apache.org/jira/browse/TIKA-2714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16593971#comment-16593971
 ] 

Tim Allison commented on TIKA-2714:
-----------------------------------

This looks like an underlying problem with the junrar library.  You might try 
asking Beothorn over on: https://github.com/junrar/junrar .  I regret that we 
can't fix this at the Tika level.

> Tika Parse Errors for certain attachments
> -----------------------------------------
>
>                 Key: TIKA-2714
>                 URL: https://issues.apache.org/jira/browse/TIKA-2714
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 1.9
>            Reporter: Suman Moorthy
>            Priority: Major
>
> Tika fails to parse certain attachments that our customers send to our 
> application.
> We got a sample rar file from our customer that fails parsing, it only has 
> simple pdf files in them  and we were able to re-produce the issue.
> However. If WE create a new rar file out of the same contents (using winrar) 
> and try to parse it, that succeeds. 
> The rar file that our customer used is not encrypted or corrupted. Not sure 
> why their rar file fails parsing, but a new rar file with same contents 
> succeeds.
> Can you please provide a solution or feedback to this problem?
>  
> Below is the exception thrown when we try to parse the rar file attachment 
> from our customer:
>  
> Aug 02, 2018 5:04:09 AM com.github.junrar.Archive setFile
> WARNING: exception in archive constructor maybe file is encrypted or currupt
> com.github.junrar.exception.RarException: badRarArchive
>      at com.github.junrar.Archive.readHeaders(Archive.java:250)
>      at com.github.junrar.Archive.setFile(Archive.java:136)
>      at com.github.junrar.Archive.setVolume(Archive.java:581)
>      at com.github.junrar.Archive.<init>(Archive.java:108)
>      at com.github.junrar.Archive.<init>(Archive.java:113)
>      at org.apache.tika.parser.pkg.RarParser.parse(RarParser.java:72)
>      at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281)
>      at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281)
>      at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>      at 
> com.actiance.platform.sfab.cis.etl.documentProcessor.internal.DocumentProcessorImpl.getExtractedContent(DocumentProcessorImpl.java:160)
>      at test.TikaParserAPIExample.main(TikaParserAPIExample.java:31)
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
> [org.apache.tika.parser.pkg.RarParser@1372ed45|mailto:org.apache.tika.parser.pkg.RarParser@1372ed45]
>      at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:283)
> 05:04:09.488 [main] DEBUG com.actiance.platform.commons.spi.FileReaderUtils - 
> Deleted Temp File - 0a44423c-6fad-47e6-943b-7b56178b0b7f.tmp
>      at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281)
>      at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>      at 
> com.actiance.platform.sfab.cis.etl.documentProcessor.internal.DocumentProcessorImpl.getExtractedContent(DocumentProcessorImpl.java:160)
>      at test.TikaParserAPIExample.main(TikaParserAPIExample.java:31)
> Caused by: java.lang.NullPointerException: mainheader is null
>      at com.github.junrar.Archive.isEncrypted(Archive.java:206)
>      at org.apache.tika.parser.pkg.RarParser.parse(RarParser.java:74)
>      at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281)
>      ... 4 more
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to