[ https://issues.apache.org/jira/browse/TIKA-2818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16746482#comment-16746482 ]
Tim Allison commented on TIKA-2818: ----------------------------------- Something like this from the RecursiveParserWrapper? {noformat} 0: X-Parsed-By : org.apache.tika.parser.DefaultParser 0: X-Parsed-By : org.apache.tika.parser.pkg.RarParser 0: X-TIKA:content_handler : ToXMLContentHandler 0: X-TIKA:parse_time_millis : 195 0: X-TIKA:content : <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta name="X-Parsed-By" content="org.apache.tika.parser.DefaultParser" /> <meta name="X-Parsed-By" content="org.apache.tika.parser.pkg.RarParser" /> <meta name="Content-Type" content="application/x-rar-compressed" /> <title></title> </head> <body><div> </div> <div class="embedded" id="encrypted.txt" /> <div class="package-entry"><h1>encrypted.txt</h1></div></body></html> 0: Content-Type : application/x-rar-compressed 1: embeddedRelationshipId : encrypted.txt 1: X-TIKA:EXCEPTION:embedded_exception : org.apache.tika.exception.EncryptedDocumentException: Unable to process: document is encrypted at org.apache.tika.parser.pkg.RarParser$EncryptedDocumentExceptionInputStream.read(RarParser.java:119) at java.io.InputStream.read(InputStream.java:170) at org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:99) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) at java.io.BufferedInputStream.read(BufferedInputStream.java:345) at java.io.FilterInputStream.read(FilterInputStream.java:107) at org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:78) at org.apache.tika.io.TikaInputStream.peek(TikaInputStream.java:572) at org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:149) at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:84) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:116) at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:147) at org.apache.tika.parser.RecursiveParserWrapper$EmbeddedParserDecorator.parse(RecursiveParserWrapper.java:370) at org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:72) at org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbedded(ParsingEmbeddedDocumentExtractor.java:105) at org.apache.tika.parser.pkg.RarParser.parse(RarParser.java:90) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) at org.apache.tika.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:224) at org.apache.tika.TikaTest.getRecursiveMetadata(TikaTest.java:263) at org.apache.tika.TikaTest.getRecursiveMetadata(TikaTest.java:219) at org.apache.tika.parser.pkg.RarParserTest.testSingleEncryptedRar(RarParserTest.java:163) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at org.junit.runner.JUnitCore.run(JUnitCore.java:137) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) 1: meta:save-date : 2019-01-18T10:17:30Z 1: X-TIKA:EXCEPTION:embedded_parser : org.apache.tika.parser.AutoDetectParser 1: X-TIKA:parse_time_millis : 5 1: resourceName : encrypted.txt 1: dcterms:modified : 2019-01-18T10:17:30Z 1: Last-Modified : 2019-01-18T10:17:30Z 1: Content-Length : 23 1: X-TIKA:embedded_resource_path : /encrypted.txt {noformat} > RarParser throws EncryptedDocumentException only when whole archive is > encrypted > -------------------------------------------------------------------------------- > > Key: TIKA-2818 > URL: https://issues.apache.org/jira/browse/TIKA-2818 > Project: Tika > Issue Type: Bug > Affects Versions: 1.20 > Reporter: Pavel Arnošt > Priority: Minor > Attachments: rar4_encrypted_content_only.rar > > > RarParser throws EncryptedDocumentException only if whole archive is > encrypted. If encryption is on individial files, parser ends with > org.apache.tika.exception.TikaException: RarParser Exception: > Caused by: org.apache.tika.exception.TikaException: RarParser Exception > at org.apache.tika.parser.pkg.RarParser.parse(RarParser.java:99) > at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) > at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:159) > at ... 43 more > Caused by: com.github.junrar.exception.RarException: ioError > at com.github.junrar.Archive.getInputStream(Archive.java:525) > at org.apache.tika.parser.pkg.RarParser.parse(RarParser.java:81) > ... 48 more > Caused by: com.github.junrar.exception.RarException: crcError > at com.github.junrar.Archive.doExtractFile(Archive.java:557) > at com.github.junrar.Archive.extractFile(Archive.java:498) > at com.github.junrar.Archive.getInputStream(Archive.java:523) > ... 49 more > File encryption should be checked before trying to extract content on line 79 > like this: > FileHeader header = rar.nextFileHeader(); > if (header.isEncrypted()) { > throw new EncryptedDocumentException(); > } > while (header != null && !Thread.currentThread().isInterrupted()) { > Or maybe insert it into metadata with > TikaCoreProperties.TIKA_META_EXCEPTION_EMBEDDED_STREAM key? I don't know, but > current behaviour is not correct (parsing fails). > Sample document is attached. -- This message was sent by Atlassian JIRA (v7.6.3#76005)