[
https://issues.apache.org/jira/browse/TIKA-2714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16593971#comment-16593971
]
Tim Allison commented on TIKA-2714:
-----------------------------------
This looks like an underlying problem with the junrar library. You might try
asking Beothorn over on: https://github.com/junrar/junrar . I regret that we
can't fix this at the Tika level.
> Tika Parse Errors for certain attachments
> -----------------------------------------
>
> Key: TIKA-2714
> URL: https://issues.apache.org/jira/browse/TIKA-2714
> Project: Tika
> Issue Type: Bug
> Affects Versions: 1.9
> Reporter: Suman Moorthy
> Priority: Major
>
> Tika fails to parse certain attachments that our customers send to our
> application.
> We got a sample rar file from our customer that fails parsing, it only has
> simple pdf files in them and we were able to re-produce the issue.
> However. If WE create a new rar file out of the same contents (using winrar)
> and try to parse it, that succeeds.
> The rar file that our customer used is not encrypted or corrupted. Not sure
> why their rar file fails parsing, but a new rar file with same contents
> succeeds.
> Can you please provide a solution or feedback to this problem?
>
> Below is the exception thrown when we try to parse the rar file attachment
> from our customer:
>
> Aug 02, 2018 5:04:09 AM com.github.junrar.Archive setFile
> WARNING: exception in archive constructor maybe file is encrypted or currupt
> com.github.junrar.exception.RarException: badRarArchive
> at com.github.junrar.Archive.readHeaders(Archive.java:250)
> at com.github.junrar.Archive.setFile(Archive.java:136)
> at com.github.junrar.Archive.setVolume(Archive.java:581)
> at com.github.junrar.Archive.<init>(Archive.java:108)
> at com.github.junrar.Archive.<init>(Archive.java:113)
> at org.apache.tika.parser.pkg.RarParser.parse(RarParser.java:72)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281)
> at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> at
> com.actiance.platform.sfab.cis.etl.documentProcessor.internal.DocumentProcessorImpl.getExtractedContent(DocumentProcessorImpl.java:160)
> at test.TikaParserAPIExample.main(TikaParserAPIExample.java:31)
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from
> [org.apache.tika.parser.pkg.RarParser@1372ed45|mailto:org.apache.tika.parser.pkg.RarParser@1372ed45]
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:283)
> 05:04:09.488 [main] DEBUG com.actiance.platform.commons.spi.FileReaderUtils -
> Deleted Temp File - 0a44423c-6fad-47e6-943b-7b56178b0b7f.tmp
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281)
> at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> at
> com.actiance.platform.sfab.cis.etl.documentProcessor.internal.DocumentProcessorImpl.getExtractedContent(DocumentProcessorImpl.java:160)
> at test.TikaParserAPIExample.main(TikaParserAPIExample.java:31)
> Caused by: java.lang.NullPointerException: mainheader is null
> at com.github.junrar.Archive.isEncrypted(Archive.java:206)
> at org.apache.tika.parser.pkg.RarParser.parse(RarParser.java:74)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281)
> ... 4 more
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)