[ 
https://issues.apache.org/jira/browse/TIKA-2146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15614327#comment-15614327
 ] 

Nick Burch commented on TIKA-2146:
----------------------------------

As per https://poi.apache.org/encryption.html, there's no support in Apache POI 
for reading password protected .doc files, only .docx ones. Sadly that means, 
unless someone volunteers to add the support to POI, that haven't the password 
won't actually help...

> Unable to extract contents from protected MS 
> word-doc-java.lang.ArrayIndexOutOfBoundsException
> ----------------------------------------------------------------------------------------------
>
>                 Key: TIKA-2146
>                 URL: https://issues.apache.org/jira/browse/TIKA-2146
>             Project: Tika
>          Issue Type: Bug
>          Components: core, parser
>    Affects Versions: 1.11
>         Environment: Windows 7
>            Reporter: Sharath Kumar
>         Attachments: Test bug.doc, This is password protected.doc
>
>
> When I try to parse a MS word document which is protected, I am unable to 
> extract the content rather, i get the below exception
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
> org.apache.tika.parser.microsoft.OfficeParser@29402a40
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
>       at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>       at org.apache.tika.Tika.parseToString(Tika.java:537)
>       at 
> org.elasticsearch.mapper.attachments.TikaImpl$1.run(TikaImpl.java:102)
>       at org.elasticsearch.mapper.attachments.TikaImpl$1.run(TikaImpl.java:1)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at org.elasticsearch.mapper.attachments.TikaImpl.parse(TikaImpl.java:99)
>       at 
> org.elasticsearch.mapper.attachments.AttachmentMapper.parse(AttachmentMapper.java:482)
>       at 
> org.elasticsearch.index.mapper.DocumentParser.parseObjectOrField(DocumentParser.java:309)
>       at 
> org.elasticsearch.index.mapper.DocumentParser.parseValue(DocumentParser.java:436)
>       at 
> org.elasticsearch.index.mapper.DocumentParser.parseObject(DocumentParser.java:262)
>       at 
> org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:122)
>       at 
> org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:309)
>       at 
> org.elasticsearch.index.shard.IndexShard.prepareCreate(IndexShard.java:529)
>       at 
> org.elasticsearch.index.shard.IndexShard.prepareCreateOnPrimary(IndexShard.java:506)
>       at 
> org.elasticsearch.action.index.TransportIndexAction.prepareIndexOperationOnPrimary(TransportIndexAction.java:215)
>       at 
> org.elasticsearch.action.index.TransportIndexAction.executeIndexRequestOnPrimary(TransportIndexAction.java:224)
>       at 
> org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:326)
>       at 
> org.elasticsearch.action.bulk.TransportShardBulkAction.shardUpdateOperation(TransportShardBulkAction.java:389)
>       at 
> org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:191)
>       at 
> org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:68)
>       at 
> org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase.doRun(TransportReplicationAction.java:639)
>       at 
> org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
>       at 
> org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:279)
>       at 
> org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:271)
>       at 
> org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:75)
>       at 
> org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:376)
>       at 
> org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ArrayIndexOutOfBoundsException
>       at org.apache.poi.hwpf.model.SectionTable.<init>(SectionTable.java:84)
>       at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:345)
>       at 
> org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:144)
>       at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:146)
>       at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:117)
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to