[jira] [Issue Comment Deleted] (TIKA-2146) Unable to extract contents from protected MS word-doc-java.lang.ArrayIndexOutOfBoundsException

2017-03-17 Thread Sharath Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-2146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sharath Kumar updated TIKA-2146:

Comment: was deleted

(was: Hi Tim,

Can you please remove the document Test.doc. Seems it contains sensitive data. 
Thanks)

> Unable to extract contents from protected MS 
> word-doc-java.lang.ArrayIndexOutOfBoundsException
> --
>
> Key: TIKA-2146
> URL: https://issues.apache.org/jira/browse/TIKA-2146
> Project: Tika
>  Issue Type: Bug
>  Components: core, parser
>Affects Versions: 1.11
> Environment: Windows 7
>Reporter: Sharath Kumar
> Attachments: This is password protected.doc
>
>
> When I try to parse a MS word document which is protected, I am unable to 
> extract the content rather, i get the below exception
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
> org.apache.tika.parser.microsoft.OfficeParser@29402a40
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
>   at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>   at org.apache.tika.Tika.parseToString(Tika.java:537)
>   at 
> org.elasticsearch.mapper.attachments.TikaImpl$1.run(TikaImpl.java:102)
>   at org.elasticsearch.mapper.attachments.TikaImpl$1.run(TikaImpl.java:1)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at org.elasticsearch.mapper.attachments.TikaImpl.parse(TikaImpl.java:99)
>   at 
> org.elasticsearch.mapper.attachments.AttachmentMapper.parse(AttachmentMapper.java:482)
>   at 
> org.elasticsearch.index.mapper.DocumentParser.parseObjectOrField(DocumentParser.java:309)
>   at 
> org.elasticsearch.index.mapper.DocumentParser.parseValue(DocumentParser.java:436)
>   at 
> org.elasticsearch.index.mapper.DocumentParser.parseObject(DocumentParser.java:262)
>   at 
> org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:122)
>   at 
> org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:309)
>   at 
> org.elasticsearch.index.shard.IndexShard.prepareCreate(IndexShard.java:529)
>   at 
> org.elasticsearch.index.shard.IndexShard.prepareCreateOnPrimary(IndexShard.java:506)
>   at 
> org.elasticsearch.action.index.TransportIndexAction.prepareIndexOperationOnPrimary(TransportIndexAction.java:215)
>   at 
> org.elasticsearch.action.index.TransportIndexAction.executeIndexRequestOnPrimary(TransportIndexAction.java:224)
>   at 
> org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:326)
>   at 
> org.elasticsearch.action.bulk.TransportShardBulkAction.shardUpdateOperation(TransportShardBulkAction.java:389)
>   at 
> org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:191)
>   at 
> org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:68)
>   at 
> org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase.doRun(TransportReplicationAction.java:639)
>   at 
> org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
>   at 
> org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:279)
>   at 
> org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:271)
>   at 
> org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:75)
>   at 
> org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:376)
>   at 
> org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ArrayIndexOutOfBoundsException
>   at org.apache.poi.hwpf.model.SectionTable.(SectionTable.java:84)
>   at org.apache.poi.hwpf.HWPFDocument.(HWPFDocument.java:345)
>   at 
> org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:144)
>   at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:146)
>   at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:117)
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (TIKA-2146) Unable to extract contents from protected MS word-doc-java.lang.ArrayIndexOutOfBoundsException

2017-03-17 Thread Sharath Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-2146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sharath Kumar updated TIKA-2146:

Attachment: (was: Test bug.doc)

> Unable to extract contents from protected MS 
> word-doc-java.lang.ArrayIndexOutOfBoundsException
> --
>
> Key: TIKA-2146
> URL: https://issues.apache.org/jira/browse/TIKA-2146
> Project: Tika
>  Issue Type: Bug
>  Components: core, parser
>Affects Versions: 1.11
> Environment: Windows 7
>Reporter: Sharath Kumar
> Attachments: This is password protected.doc
>
>
> When I try to parse a MS word document which is protected, I am unable to 
> extract the content rather, i get the below exception
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
> org.apache.tika.parser.microsoft.OfficeParser@29402a40
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
>   at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>   at org.apache.tika.Tika.parseToString(Tika.java:537)
>   at 
> org.elasticsearch.mapper.attachments.TikaImpl$1.run(TikaImpl.java:102)
>   at org.elasticsearch.mapper.attachments.TikaImpl$1.run(TikaImpl.java:1)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at org.elasticsearch.mapper.attachments.TikaImpl.parse(TikaImpl.java:99)
>   at 
> org.elasticsearch.mapper.attachments.AttachmentMapper.parse(AttachmentMapper.java:482)
>   at 
> org.elasticsearch.index.mapper.DocumentParser.parseObjectOrField(DocumentParser.java:309)
>   at 
> org.elasticsearch.index.mapper.DocumentParser.parseValue(DocumentParser.java:436)
>   at 
> org.elasticsearch.index.mapper.DocumentParser.parseObject(DocumentParser.java:262)
>   at 
> org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:122)
>   at 
> org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:309)
>   at 
> org.elasticsearch.index.shard.IndexShard.prepareCreate(IndexShard.java:529)
>   at 
> org.elasticsearch.index.shard.IndexShard.prepareCreateOnPrimary(IndexShard.java:506)
>   at 
> org.elasticsearch.action.index.TransportIndexAction.prepareIndexOperationOnPrimary(TransportIndexAction.java:215)
>   at 
> org.elasticsearch.action.index.TransportIndexAction.executeIndexRequestOnPrimary(TransportIndexAction.java:224)
>   at 
> org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:326)
>   at 
> org.elasticsearch.action.bulk.TransportShardBulkAction.shardUpdateOperation(TransportShardBulkAction.java:389)
>   at 
> org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:191)
>   at 
> org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:68)
>   at 
> org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase.doRun(TransportReplicationAction.java:639)
>   at 
> org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
>   at 
> org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:279)
>   at 
> org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:271)
>   at 
> org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:75)
>   at 
> org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:376)
>   at 
> org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ArrayIndexOutOfBoundsException
>   at org.apache.poi.hwpf.model.SectionTable.(SectionTable.java:84)
>   at org.apache.poi.hwpf.HWPFDocument.(HWPFDocument.java:345)
>   at 
> org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:144)
>   at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:146)
>   at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:117)
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TIKA-2146) Unable to extract contents from protected MS word-doc-java.lang.ArrayIndexOutOfBoundsException

2017-03-17 Thread Sharath Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15931048#comment-15931048
 ] 

Sharath Kumar commented on TIKA-2146:
-

Hi Tim,

Can you please remove the document Test.doc. Seems it contains sensitive data. 
Thanks

> Unable to extract contents from protected MS 
> word-doc-java.lang.ArrayIndexOutOfBoundsException
> --
>
> Key: TIKA-2146
> URL: https://issues.apache.org/jira/browse/TIKA-2146
> Project: Tika
>  Issue Type: Bug
>  Components: core, parser
>Affects Versions: 1.11
> Environment: Windows 7
>Reporter: Sharath Kumar
> Attachments: Test bug.doc, This is password protected.doc
>
>
> When I try to parse a MS word document which is protected, I am unable to 
> extract the content rather, i get the below exception
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
> org.apache.tika.parser.microsoft.OfficeParser@29402a40
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
>   at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>   at org.apache.tika.Tika.parseToString(Tika.java:537)
>   at 
> org.elasticsearch.mapper.attachments.TikaImpl$1.run(TikaImpl.java:102)
>   at org.elasticsearch.mapper.attachments.TikaImpl$1.run(TikaImpl.java:1)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at org.elasticsearch.mapper.attachments.TikaImpl.parse(TikaImpl.java:99)
>   at 
> org.elasticsearch.mapper.attachments.AttachmentMapper.parse(AttachmentMapper.java:482)
>   at 
> org.elasticsearch.index.mapper.DocumentParser.parseObjectOrField(DocumentParser.java:309)
>   at 
> org.elasticsearch.index.mapper.DocumentParser.parseValue(DocumentParser.java:436)
>   at 
> org.elasticsearch.index.mapper.DocumentParser.parseObject(DocumentParser.java:262)
>   at 
> org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:122)
>   at 
> org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:309)
>   at 
> org.elasticsearch.index.shard.IndexShard.prepareCreate(IndexShard.java:529)
>   at 
> org.elasticsearch.index.shard.IndexShard.prepareCreateOnPrimary(IndexShard.java:506)
>   at 
> org.elasticsearch.action.index.TransportIndexAction.prepareIndexOperationOnPrimary(TransportIndexAction.java:215)
>   at 
> org.elasticsearch.action.index.TransportIndexAction.executeIndexRequestOnPrimary(TransportIndexAction.java:224)
>   at 
> org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:326)
>   at 
> org.elasticsearch.action.bulk.TransportShardBulkAction.shardUpdateOperation(TransportShardBulkAction.java:389)
>   at 
> org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:191)
>   at 
> org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:68)
>   at 
> org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase.doRun(TransportReplicationAction.java:639)
>   at 
> org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
>   at 
> org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:279)
>   at 
> org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:271)
>   at 
> org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:75)
>   at 
> org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:376)
>   at 
> org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ArrayIndexOutOfBoundsException
>   at org.apache.poi.hwpf.model.SectionTable.(SectionTable.java:84)
>   at org.apache.poi.hwpf.HWPFDocument.(HWPFDocument.java:345)
>   at 
> org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:144)
>   at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:146)
>   at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:117)
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)



--
This message was sent by Atlassian JIRA

[jira] [Updated] (TIKA-2285) Caused by: java.lang.StringIndexOutOfBoundsException - org.apache.tika.parser.microsoft.WordExtractor.buildParagraphTagAndStyle

2017-03-01 Thread Sharath Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sharath Kumar updated TIKA-2285:

Attachment: XAPPLICANT__2016.docx

> Caused by: java.lang.StringIndexOutOfBoundsException - 
> org.apache.tika.parser.microsoft.WordExtractor.buildParagraphTagAndStyle
> ---
>
> Key: TIKA-2285
> URL: https://issues.apache.org/jira/browse/TIKA-2285
> Project: Tika
>  Issue Type: Bug
>  Components: core, parser
>Affects Versions: 1.13
>Reporter: Sharath Kumar
> Attachments: XAPPLICANT__2016.docx
>
>
> Getting the below error when parsing word DOC
> Caused by: java.lang.StringIndexOutOfBoundsException: String index out of 
> range: 1
>  at java.lang.String.substring(String.java:1963)
>  at 
> org.apache.tika.parser.microsoft.WordExtractor.buildParagraphTagAndStyle(WordExtractor.java:126)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (TIKA-2285) Caused by: java.lang.StringIndexOutOfBoundsException - org.apache.tika.parser.microsoft.WordExtractor.buildParagraphTagAndStyle

2017-03-01 Thread Sharath Kumar (JIRA)
Sharath Kumar created TIKA-2285:
---

 Summary: Caused by: java.lang.StringIndexOutOfBoundsException - 
org.apache.tika.parser.microsoft.WordExtractor.buildParagraphTagAndStyle
 Key: TIKA-2285
 URL: https://issues.apache.org/jira/browse/TIKA-2285
 Project: Tika
  Issue Type: Bug
  Components: core, parser
Affects Versions: 1.13
Reporter: Sharath Kumar


Getting the below error when parsing word DOC

Caused by: java.lang.StringIndexOutOfBoundsException: String index out of 
range: 1
 at java.lang.String.substring(String.java:1963)
 at 
org.apache.tika.parser.microsoft.WordExtractor.buildParagraphTagAndStyle(WordExtractor.java:126)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TIKA-2284) Caused by: org.apache.xmlbeans.XmlException: error: The document is not a ftr@http://schemas.openxmlformats.org/wordprocessingml/2006/main: document element local name m

2017-03-01 Thread Sharath Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15889879#comment-15889879
 ] 

Sharath Kumar commented on TIKA-2284:
-

I am not able to add the attachment here cause, after removing the confidential 
info in the doc, if I save and try to parse, i wont get the exception. Even if 
i modify a bit and save the file, I cannot reproduce the issue. But as the 
original doc contains user details, i cant upload here

> Caused by: org.apache.xmlbeans.XmlException: error: The document is not a 
> ftr@http://schemas.openxmlformats.org/wordprocessingml/2006/main: document 
> element local name mismatch expected ftr got hdr
> -
>
> Key: TIKA-2284
> URL: https://issues.apache.org/jira/browse/TIKA-2284
> Project: Tika
>  Issue Type: Bug
>  Components: core, parser
>Affects Versions: 1.13
>Reporter: Sharath Kumar
>
> I get the below parsing error for the attached doc
> Caused by: org.apache.xmlbeans.XmlException: error: The document is not a 
> ftr@http://schemas.openxmlformats.org/wordprocessingml/2006/main: document 
> element local name mismatch expected ftr got hdr
>  at org.apache.xmlbeans.impl.store.Locale.verifyDocumentType(Locale.java:459)
>  at org.apache.xmlbeans.impl.store.Locale.autoTypeDocument(Locale.java:364)
>  at 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (TIKA-2284) Caused by: org.apache.xmlbeans.XmlException: error: The document is not a ftr@http://schemas.openxmlformats.org/wordprocessingml/2006/main: document element local name mis

2017-03-01 Thread Sharath Kumar (JIRA)
Sharath Kumar created TIKA-2284:
---

 Summary: Caused by: org.apache.xmlbeans.XmlException: error: The 
document is not a 
ftr@http://schemas.openxmlformats.org/wordprocessingml/2006/main: document 
element local name mismatch expected ftr got hdr
 Key: TIKA-2284
 URL: https://issues.apache.org/jira/browse/TIKA-2284
 Project: Tika
  Issue Type: Bug
  Components: core, parser
Affects Versions: 1.13
Reporter: Sharath Kumar


I get the below parsing error for the attached doc

Caused by: org.apache.xmlbeans.XmlException: error: The document is not a 
ftr@http://schemas.openxmlformats.org/wordprocessingml/2006/main: document 
element local name mismatch expected ftr got hdr
 at org.apache.xmlbeans.impl.store.Locale.verifyDocumentType(Locale.java:459)
 at org.apache.xmlbeans.impl.store.Locale.autoTypeDocument(Locale.java:364)
 at 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (TIKA-2283) Pap style 16 claimed to have itself as its parent, which isn't allowed

2017-03-01 Thread Sharath Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sharath Kumar updated TIKA-2283:

Attachment: Test_doc.doc

> Pap style 16 claimed to have itself as its parent, which isn't allowed
> --
>
> Key: TIKA-2283
> URL: https://issues.apache.org/jira/browse/TIKA-2283
> Project: Tika
>  Issue Type: Bug
>  Components: core, parser
>Affects Versions: 1.13
>Reporter: Sharath Kumar
> Attachments: Test_doc.doc
>
>
> For the attached document, i get the below error when parsing 
> Caused by: java.lang.IllegalStateException: Pap style 16 claimed to have 
> itself as its parent, which isn't allowed
>  at org.apache.poi.hwpf.model.StyleSheet.createPap(StyleSheet.java:232)
>  at org.apache.poi.hwpf.model.StyleSheet.(StyleSheet.java:120)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (TIKA-2283) Pap style 16 claimed to have itself as its parent, which isn't allowed

2017-03-01 Thread Sharath Kumar (JIRA)
Sharath Kumar created TIKA-2283:
---

 Summary: Pap style 16 claimed to have itself as its parent, which 
isn't allowed
 Key: TIKA-2283
 URL: https://issues.apache.org/jira/browse/TIKA-2283
 Project: Tika
  Issue Type: Bug
  Components: core, parser
Affects Versions: 1.13
Reporter: Sharath Kumar


For the attached document, i get the below error when parsing 

Caused by: java.lang.IllegalStateException: Pap style 16 claimed to have itself 
as its parent, which isn't allowed
 at org.apache.poi.hwpf.model.StyleSheet.createPap(StyleSheet.java:232)
 at org.apache.poi.hwpf.model.StyleSheet.(StyleSheet.java:120)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TIKA-2258) Unable to parse .pub files -java.lang.ArrayIndexOutOfBoundsException: 88

2017-02-02 Thread Sharath Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15851037#comment-15851037
 ] 

Sharath Kumar commented on TIKA-2258:
-

Thanks Tim. 

https://bz.apache.org/bugzilla/show_bug.cgi?id=60685

> Unable to parse .pub files -java.lang.ArrayIndexOutOfBoundsException: 88
> 
>
> Key: TIKA-2258
> URL: https://issues.apache.org/jira/browse/TIKA-2258
> Project: Tika
>  Issue Type: Bug
>  Components: core, parser
>Affects Versions: 1.13
> Environment: Windows 7
>Reporter: Sharath Kumar
> Attachments: Roc.pub
>
>
> When i try to parse the attached .pub file, it fails with the below exception 
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 88
>   at org.apache.poi.util.LittleEndian.getUShort(LittleEndian.java:343)
>   at 
> org.apache.poi.hpbf.model.qcbits.QCPLCBit$Type12.(QCPLCBit.java:215)
>   at 
> org.apache.poi.hpbf.model.qcbits.QCPLCBit$Type12.(QCPLCBit.java:176)
>   at 
> org.apache.poi.hpbf.model.qcbits.QCPLCBit.createQCPLCBit(QCPLCBit.java:90)
>   at org.apache.poi.hpbf.model.QuillContents.(QuillContents.java:71)
>   at org.apache.poi.hpbf.HPBFDocument.(HPBFDocument.java:67)
>   at 
> org.apache.poi.hpbf.extractor.PublisherTextExtractor.(PublisherTextExtractor.java:45)
>   at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:141)
>   at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:117)
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>   ... 28 more



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (TIKA-2258) Unable to parse .pub files -java.lang.ArrayIndexOutOfBoundsException: 88

2017-02-01 Thread Sharath Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-2258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sharath Kumar updated TIKA-2258:

Attachment: Roc.pub

Test document which can be used to replicate the error

> Unable to parse .pub files -java.lang.ArrayIndexOutOfBoundsException: 88
> 
>
> Key: TIKA-2258
> URL: https://issues.apache.org/jira/browse/TIKA-2258
> Project: Tika
>  Issue Type: Bug
>  Components: core, parser
>Affects Versions: 1.13
> Environment: Windows 7
>Reporter: Sharath Kumar
> Attachments: Roc.pub
>
>
> When i try to parse the attached .pub file, it fails with the below exception 
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 88
>   at org.apache.poi.util.LittleEndian.getUShort(LittleEndian.java:343)
>   at 
> org.apache.poi.hpbf.model.qcbits.QCPLCBit$Type12.(QCPLCBit.java:215)
>   at 
> org.apache.poi.hpbf.model.qcbits.QCPLCBit$Type12.(QCPLCBit.java:176)
>   at 
> org.apache.poi.hpbf.model.qcbits.QCPLCBit.createQCPLCBit(QCPLCBit.java:90)
>   at org.apache.poi.hpbf.model.QuillContents.(QuillContents.java:71)
>   at org.apache.poi.hpbf.HPBFDocument.(HPBFDocument.java:67)
>   at 
> org.apache.poi.hpbf.extractor.PublisherTextExtractor.(PublisherTextExtractor.java:45)
>   at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:141)
>   at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:117)
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>   ... 28 more



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (TIKA-2258) Unable to parse .pub files -java.lang.ArrayIndexOutOfBoundsException: 88

2017-02-01 Thread Sharath Kumar (JIRA)
Sharath Kumar created TIKA-2258:
---

 Summary: Unable to parse .pub files 
-java.lang.ArrayIndexOutOfBoundsException: 88
 Key: TIKA-2258
 URL: https://issues.apache.org/jira/browse/TIKA-2258
 Project: Tika
  Issue Type: Bug
  Components: core, parser
Affects Versions: 1.13
 Environment: Windows 7
Reporter: Sharath Kumar


When i try to parse the attached .pub file, it fails with the below exception 

Caused by: java.lang.ArrayIndexOutOfBoundsException: 88
at org.apache.poi.util.LittleEndian.getUShort(LittleEndian.java:343)
at 
org.apache.poi.hpbf.model.qcbits.QCPLCBit$Type12.(QCPLCBit.java:215)
at 
org.apache.poi.hpbf.model.qcbits.QCPLCBit$Type12.(QCPLCBit.java:176)
at 
org.apache.poi.hpbf.model.qcbits.QCPLCBit.createQCPLCBit(QCPLCBit.java:90)
at org.apache.poi.hpbf.model.QuillContents.(QuillContents.java:71)
at org.apache.poi.hpbf.HPBFDocument.(HPBFDocument.java:67)
at 
org.apache.poi.hpbf.extractor.PublisherTextExtractor.(PublisherTextExtractor.java:45)
at 
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:141)
at 
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:117)
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
... 28 more



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TIKA-2146) Unable to extract contents from protected MS word-doc-java.lang.ArrayIndexOutOfBoundsException

2016-11-04 Thread Sharath Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15635502#comment-15635502
 ] 

Sharath Kumar commented on TIKA-2146:
-

What would be action plan for this. is this gonna be supported in Tika or not

> Unable to extract contents from protected MS 
> word-doc-java.lang.ArrayIndexOutOfBoundsException
> --
>
> Key: TIKA-2146
> URL: https://issues.apache.org/jira/browse/TIKA-2146
> Project: Tika
>  Issue Type: Bug
>  Components: core, parser
>Affects Versions: 1.11
> Environment: Windows 7
>Reporter: Sharath Kumar
> Attachments: Test bug.doc, This is password protected.doc
>
>
> When I try to parse a MS word document which is protected, I am unable to 
> extract the content rather, i get the below exception
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
> org.apache.tika.parser.microsoft.OfficeParser@29402a40
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
>   at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>   at org.apache.tika.Tika.parseToString(Tika.java:537)
>   at 
> org.elasticsearch.mapper.attachments.TikaImpl$1.run(TikaImpl.java:102)
>   at org.elasticsearch.mapper.attachments.TikaImpl$1.run(TikaImpl.java:1)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at org.elasticsearch.mapper.attachments.TikaImpl.parse(TikaImpl.java:99)
>   at 
> org.elasticsearch.mapper.attachments.AttachmentMapper.parse(AttachmentMapper.java:482)
>   at 
> org.elasticsearch.index.mapper.DocumentParser.parseObjectOrField(DocumentParser.java:309)
>   at 
> org.elasticsearch.index.mapper.DocumentParser.parseValue(DocumentParser.java:436)
>   at 
> org.elasticsearch.index.mapper.DocumentParser.parseObject(DocumentParser.java:262)
>   at 
> org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:122)
>   at 
> org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:309)
>   at 
> org.elasticsearch.index.shard.IndexShard.prepareCreate(IndexShard.java:529)
>   at 
> org.elasticsearch.index.shard.IndexShard.prepareCreateOnPrimary(IndexShard.java:506)
>   at 
> org.elasticsearch.action.index.TransportIndexAction.prepareIndexOperationOnPrimary(TransportIndexAction.java:215)
>   at 
> org.elasticsearch.action.index.TransportIndexAction.executeIndexRequestOnPrimary(TransportIndexAction.java:224)
>   at 
> org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:326)
>   at 
> org.elasticsearch.action.bulk.TransportShardBulkAction.shardUpdateOperation(TransportShardBulkAction.java:389)
>   at 
> org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:191)
>   at 
> org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:68)
>   at 
> org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase.doRun(TransportReplicationAction.java:639)
>   at 
> org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
>   at 
> org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:279)
>   at 
> org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:271)
>   at 
> org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:75)
>   at 
> org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:376)
>   at 
> org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ArrayIndexOutOfBoundsException
>   at org.apache.poi.hwpf.model.SectionTable.(SectionTable.java:84)
>   at org.apache.poi.hwpf.HWPFDocument.(HWPFDocument.java:345)
>   at 
> org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:144)
>   at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:146)
>   at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:117)
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-2147) ClassCastException on a valid Word template

2016-11-02 Thread Sharath Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15629203#comment-15629203
 ] 

Sharath Kumar commented on TIKA-2147:
-

Thanks [~talli...@mitre.org]

> ClassCastException on a valid Word template
> ---
>
> Key: TIKA-2147
> URL: https://issues.apache.org/jira/browse/TIKA-2147
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.13
> Environment: Windows 7 x64, JVM 1.8.0_101
>Reporter: Seva Alekseyev
> Attachments: Forefront Fax.dotx, basicresume.docx
>
>
> On the attached document template, which opens fine in Word, the Tika parser 
> throws the following error:
> java.lang.ClassCastException: org.apache.poi.POIXMLDocumentPart cannot be 
> cast to org.apache.poi.xwpf.usermodel.XWPFDocument
>   at 
> org.apache.poi.xwpf.usermodel.XWPFFootnotes.getXWPFDocument(XWPFFootnotes.java:162)
>   at 
> org.apache.poi.xwpf.usermodel.XWPFFootnote.(XWPFFootnote.java:47)
>   at 
> org.apache.poi.xwpf.usermodel.XWPFFootnotes.onDocumentRead(XWPFFootnotes.java:95)
>   at 
> org.apache.poi.POIXMLDocumentPart._invokeOnDocumentRead(POIXMLDocumentPart.java:658)
>   at 
> org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:235)
>   at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:160)
>   at 
> org.apache.poi.xwpf.usermodel.XWPFDocument.(XWPFDocument.java:124)
>   at 
> org.apache.poi.xwpf.extractor.XWPFWordExtractor.(XWPFWordExtractor.java:58)
>   at 
> org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:237)
>   at 
> org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:86)
>   at 
> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:87)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TIKA-2147) ClassCastException on a valid Word template

2016-10-28 Thread Sharath Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sharath Kumar updated TIKA-2147:

Attachment: basicresume.docx

> ClassCastException on a valid Word template
> ---
>
> Key: TIKA-2147
> URL: https://issues.apache.org/jira/browse/TIKA-2147
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.13
> Environment: Windows 7 x64, JVM 1.8.0_101
>Reporter: Seva Alekseyev
> Attachments: Forefront Fax.dotx, basicresume.docx
>
>
> On the attached document template, which opens fine in Word, the Tika parser 
> throws the following error:
> java.lang.ClassCastException: org.apache.poi.POIXMLDocumentPart cannot be 
> cast to org.apache.poi.xwpf.usermodel.XWPFDocument
>   at 
> org.apache.poi.xwpf.usermodel.XWPFFootnotes.getXWPFDocument(XWPFFootnotes.java:162)
>   at 
> org.apache.poi.xwpf.usermodel.XWPFFootnote.(XWPFFootnote.java:47)
>   at 
> org.apache.poi.xwpf.usermodel.XWPFFootnotes.onDocumentRead(XWPFFootnotes.java:95)
>   at 
> org.apache.poi.POIXMLDocumentPart._invokeOnDocumentRead(POIXMLDocumentPart.java:658)
>   at 
> org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:235)
>   at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:160)
>   at 
> org.apache.poi.xwpf.usermodel.XWPFDocument.(XWPFDocument.java:124)
>   at 
> org.apache.poi.xwpf.extractor.XWPFWordExtractor.(XWPFWordExtractor.java:58)
>   at 
> org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:237)
>   at 
> org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:86)
>   at 
> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:87)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-2147) ClassCastException on a valid Word template

2016-10-28 Thread Sharath Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15615588#comment-15615588
 ] 

Sharath Kumar commented on TIKA-2147:
-

I get the similar issue for docx too . I have attached the document which can 
reproduce the issue

> ClassCastException on a valid Word template
> ---
>
> Key: TIKA-2147
> URL: https://issues.apache.org/jira/browse/TIKA-2147
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.13
> Environment: Windows 7 x64, JVM 1.8.0_101
>Reporter: Seva Alekseyev
> Attachments: Forefront Fax.dotx
>
>
> On the attached document template, which opens fine in Word, the Tika parser 
> throws the following error:
> java.lang.ClassCastException: org.apache.poi.POIXMLDocumentPart cannot be 
> cast to org.apache.poi.xwpf.usermodel.XWPFDocument
>   at 
> org.apache.poi.xwpf.usermodel.XWPFFootnotes.getXWPFDocument(XWPFFootnotes.java:162)
>   at 
> org.apache.poi.xwpf.usermodel.XWPFFootnote.(XWPFFootnote.java:47)
>   at 
> org.apache.poi.xwpf.usermodel.XWPFFootnotes.onDocumentRead(XWPFFootnotes.java:95)
>   at 
> org.apache.poi.POIXMLDocumentPart._invokeOnDocumentRead(POIXMLDocumentPart.java:658)
>   at 
> org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:235)
>   at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:160)
>   at 
> org.apache.poi.xwpf.usermodel.XWPFDocument.(XWPFDocument.java:124)
>   at 
> org.apache.poi.xwpf.extractor.XWPFWordExtractor.(XWPFWordExtractor.java:58)
>   at 
> org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:237)
>   at 
> org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:86)
>   at 
> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:87)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-2149) org.apache.poi.POIXMLDocumentPart cannot be cast to org.apache.poi.xwpf.usermodel.XWPFDocument - MS Word docx

2016-10-28 Thread Sharath Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15615584#comment-15615584
 ] 

Sharath Kumar commented on TIKA-2149:
-

Tika 2147, the input document is a word template. However not in my case

>  org.apache.poi.POIXMLDocumentPart cannot be cast to 
> org.apache.poi.xwpf.usermodel.XWPFDocument - MS Word docx
> --
>
> Key: TIKA-2149
> URL: https://issues.apache.org/jira/browse/TIKA-2149
> Project: Tika
>  Issue Type: Bug
>  Components: core, parser
>Affects Versions: 1.11, 1.13
> Environment: Windows 7 . Linux RHEL 7
>Reporter: Sharath Kumar
>
> When I run the attached document(.docx) against tika 1.11 or tika 1.13 to 
> extract contents, it errors out with the below exception
> Exception in thread "main" org.apache.tika.exception.TikaException: 
> Unexpected RuntimeException from 
> org.apache.tika.parser.microsoft.ooxml.OOXMLParser@1ea9f6af
> at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
> at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:191)
> at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:480)
> at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:145)
> Caused by: java.lang.ClassCastException: org.apache.poi.POIXMLDocumentPart 
> cannot be cast to org.apache.poi.xwpf.usermodel.XWPFDocument
> at 
> org.apache.poi.xwpf.usermodel.XWPFFootnotes.getXWPFDocument(XWPFFootnotes.java:162)
> at 
> org.apache.poi.xwpf.usermodel.XWPFFootnote.(XWPFFootnote.java:47)
> at 
> org.apache.poi.xwpf.usermodel.XWPFFootnotes.onDocumentRead(XWPFFootnotes.java:95)
> at 
> org.apache.poi.POIXMLDocumentPart._invokeOnDocumentRead(POIXMLDocumentPart.java:658)
> at 
> org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:235)
> at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:160)
> at 
> org.apache.poi.xwpf.usermodel.XWPFDocument.(XWPFDocument.java:124)
> at 
> org.apache.poi.xwpf.extractor.XWPFWordExtractor.(XWPFWordExtractor.java:58)
> at 
> org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:237)
> at 
> org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:86)
> at 
> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:87)
> at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> ... 5 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TIKA-2149) org.apache.poi.POIXMLDocumentPart cannot be cast to org.apache.poi.xwpf.usermodel.XWPFDocument - MS Word docx

2016-10-28 Thread Sharath Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15615584#comment-15615584
 ] 

Sharath Kumar edited comment on TIKA-2149 at 10/28/16 2:37 PM:
---

Bug Tika-2147, the input document is a word template. However not in my case


was (Author: mnsk07):
Tika 2147, the input document is a word template. However not in my case

>  org.apache.poi.POIXMLDocumentPart cannot be cast to 
> org.apache.poi.xwpf.usermodel.XWPFDocument - MS Word docx
> --
>
> Key: TIKA-2149
> URL: https://issues.apache.org/jira/browse/TIKA-2149
> Project: Tika
>  Issue Type: Bug
>  Components: core, parser
>Affects Versions: 1.11, 1.13
> Environment: Windows 7 . Linux RHEL 7
>Reporter: Sharath Kumar
>
> When I run the attached document(.docx) against tika 1.11 or tika 1.13 to 
> extract contents, it errors out with the below exception
> Exception in thread "main" org.apache.tika.exception.TikaException: 
> Unexpected RuntimeException from 
> org.apache.tika.parser.microsoft.ooxml.OOXMLParser@1ea9f6af
> at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
> at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:191)
> at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:480)
> at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:145)
> Caused by: java.lang.ClassCastException: org.apache.poi.POIXMLDocumentPart 
> cannot be cast to org.apache.poi.xwpf.usermodel.XWPFDocument
> at 
> org.apache.poi.xwpf.usermodel.XWPFFootnotes.getXWPFDocument(XWPFFootnotes.java:162)
> at 
> org.apache.poi.xwpf.usermodel.XWPFFootnote.(XWPFFootnote.java:47)
> at 
> org.apache.poi.xwpf.usermodel.XWPFFootnotes.onDocumentRead(XWPFFootnotes.java:95)
> at 
> org.apache.poi.POIXMLDocumentPart._invokeOnDocumentRead(POIXMLDocumentPart.java:658)
> at 
> org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:235)
> at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:160)
> at 
> org.apache.poi.xwpf.usermodel.XWPFDocument.(XWPFDocument.java:124)
> at 
> org.apache.poi.xwpf.extractor.XWPFWordExtractor.(XWPFWordExtractor.java:58)
> at 
> org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:237)
> at 
> org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:86)
> at 
> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:87)
> at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> ... 5 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (TIKA-2146) Unable to extract contents from protected MS word-doc-java.lang.ArrayIndexOutOfBoundsException

2016-10-28 Thread Sharath Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-2146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sharath Kumar updated TIKA-2146:

Comment: was deleted

(was:  Does tika support extracting the contents of a protected MS-word 
document. The document is however not a password protected though.)

> Unable to extract contents from protected MS 
> word-doc-java.lang.ArrayIndexOutOfBoundsException
> --
>
> Key: TIKA-2146
> URL: https://issues.apache.org/jira/browse/TIKA-2146
> Project: Tika
>  Issue Type: Bug
>  Components: core, parser
>Affects Versions: 1.11
> Environment: Windows 7
>Reporter: Sharath Kumar
> Attachments: Test bug.doc, This is password protected.doc
>
>
> When I try to parse a MS word document which is protected, I am unable to 
> extract the content rather, i get the below exception
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
> org.apache.tika.parser.microsoft.OfficeParser@29402a40
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
>   at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>   at org.apache.tika.Tika.parseToString(Tika.java:537)
>   at 
> org.elasticsearch.mapper.attachments.TikaImpl$1.run(TikaImpl.java:102)
>   at org.elasticsearch.mapper.attachments.TikaImpl$1.run(TikaImpl.java:1)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at org.elasticsearch.mapper.attachments.TikaImpl.parse(TikaImpl.java:99)
>   at 
> org.elasticsearch.mapper.attachments.AttachmentMapper.parse(AttachmentMapper.java:482)
>   at 
> org.elasticsearch.index.mapper.DocumentParser.parseObjectOrField(DocumentParser.java:309)
>   at 
> org.elasticsearch.index.mapper.DocumentParser.parseValue(DocumentParser.java:436)
>   at 
> org.elasticsearch.index.mapper.DocumentParser.parseObject(DocumentParser.java:262)
>   at 
> org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:122)
>   at 
> org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:309)
>   at 
> org.elasticsearch.index.shard.IndexShard.prepareCreate(IndexShard.java:529)
>   at 
> org.elasticsearch.index.shard.IndexShard.prepareCreateOnPrimary(IndexShard.java:506)
>   at 
> org.elasticsearch.action.index.TransportIndexAction.prepareIndexOperationOnPrimary(TransportIndexAction.java:215)
>   at 
> org.elasticsearch.action.index.TransportIndexAction.executeIndexRequestOnPrimary(TransportIndexAction.java:224)
>   at 
> org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:326)
>   at 
> org.elasticsearch.action.bulk.TransportShardBulkAction.shardUpdateOperation(TransportShardBulkAction.java:389)
>   at 
> org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:191)
>   at 
> org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:68)
>   at 
> org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase.doRun(TransportReplicationAction.java:639)
>   at 
> org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
>   at 
> org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:279)
>   at 
> org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:271)
>   at 
> org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:75)
>   at 
> org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:376)
>   at 
> org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ArrayIndexOutOfBoundsException
>   at org.apache.poi.hwpf.model.SectionTable.(SectionTable.java:84)
>   at org.apache.poi.hwpf.HWPFDocument.(HWPFDocument.java:345)
>   at 
> org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:144)
>   at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:146)
>   at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:117)
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)



--
This message was sent by At

[jira] [Commented] (TIKA-2146) Unable to extract contents from protected MS word-doc-java.lang.ArrayIndexOutOfBoundsException

2016-10-28 Thread Sharath Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15614659#comment-15614659
 ] 

Sharath Kumar commented on TIKA-2146:
-

 Does tika support extracting the contents of a protected MS-word document. The 
document is however not a password protected though.

> Unable to extract contents from protected MS 
> word-doc-java.lang.ArrayIndexOutOfBoundsException
> --
>
> Key: TIKA-2146
> URL: https://issues.apache.org/jira/browse/TIKA-2146
> Project: Tika
>  Issue Type: Bug
>  Components: core, parser
>Affects Versions: 1.11
> Environment: Windows 7
>Reporter: Sharath Kumar
> Attachments: Test bug.doc, This is password protected.doc
>
>
> When I try to parse a MS word document which is protected, I am unable to 
> extract the content rather, i get the below exception
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
> org.apache.tika.parser.microsoft.OfficeParser@29402a40
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
>   at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>   at org.apache.tika.Tika.parseToString(Tika.java:537)
>   at 
> org.elasticsearch.mapper.attachments.TikaImpl$1.run(TikaImpl.java:102)
>   at org.elasticsearch.mapper.attachments.TikaImpl$1.run(TikaImpl.java:1)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at org.elasticsearch.mapper.attachments.TikaImpl.parse(TikaImpl.java:99)
>   at 
> org.elasticsearch.mapper.attachments.AttachmentMapper.parse(AttachmentMapper.java:482)
>   at 
> org.elasticsearch.index.mapper.DocumentParser.parseObjectOrField(DocumentParser.java:309)
>   at 
> org.elasticsearch.index.mapper.DocumentParser.parseValue(DocumentParser.java:436)
>   at 
> org.elasticsearch.index.mapper.DocumentParser.parseObject(DocumentParser.java:262)
>   at 
> org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:122)
>   at 
> org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:309)
>   at 
> org.elasticsearch.index.shard.IndexShard.prepareCreate(IndexShard.java:529)
>   at 
> org.elasticsearch.index.shard.IndexShard.prepareCreateOnPrimary(IndexShard.java:506)
>   at 
> org.elasticsearch.action.index.TransportIndexAction.prepareIndexOperationOnPrimary(TransportIndexAction.java:215)
>   at 
> org.elasticsearch.action.index.TransportIndexAction.executeIndexRequestOnPrimary(TransportIndexAction.java:224)
>   at 
> org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:326)
>   at 
> org.elasticsearch.action.bulk.TransportShardBulkAction.shardUpdateOperation(TransportShardBulkAction.java:389)
>   at 
> org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:191)
>   at 
> org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:68)
>   at 
> org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase.doRun(TransportReplicationAction.java:639)
>   at 
> org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
>   at 
> org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:279)
>   at 
> org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:271)
>   at 
> org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:75)
>   at 
> org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:376)
>   at 
> org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ArrayIndexOutOfBoundsException
>   at org.apache.poi.hwpf.model.SectionTable.(SectionTable.java:84)
>   at org.apache.poi.hwpf.HWPFDocument.(HWPFDocument.java:345)
>   at 
> org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:144)
>   at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:146)
>   at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:117)
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)



--
Th

[jira] [Commented] (TIKA-2146) Unable to extract contents from protected MS word-doc-java.lang.ArrayIndexOutOfBoundsException

2016-10-28 Thread Sharath Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15614660#comment-15614660
 ] 

Sharath Kumar commented on TIKA-2146:
-

 Does tika support extracting the contents of a protected MS-word document. The 
document is however not a password protected though.

> Unable to extract contents from protected MS 
> word-doc-java.lang.ArrayIndexOutOfBoundsException
> --
>
> Key: TIKA-2146
> URL: https://issues.apache.org/jira/browse/TIKA-2146
> Project: Tika
>  Issue Type: Bug
>  Components: core, parser
>Affects Versions: 1.11
> Environment: Windows 7
>Reporter: Sharath Kumar
> Attachments: Test bug.doc, This is password protected.doc
>
>
> When I try to parse a MS word document which is protected, I am unable to 
> extract the content rather, i get the below exception
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
> org.apache.tika.parser.microsoft.OfficeParser@29402a40
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
>   at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>   at org.apache.tika.Tika.parseToString(Tika.java:537)
>   at 
> org.elasticsearch.mapper.attachments.TikaImpl$1.run(TikaImpl.java:102)
>   at org.elasticsearch.mapper.attachments.TikaImpl$1.run(TikaImpl.java:1)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at org.elasticsearch.mapper.attachments.TikaImpl.parse(TikaImpl.java:99)
>   at 
> org.elasticsearch.mapper.attachments.AttachmentMapper.parse(AttachmentMapper.java:482)
>   at 
> org.elasticsearch.index.mapper.DocumentParser.parseObjectOrField(DocumentParser.java:309)
>   at 
> org.elasticsearch.index.mapper.DocumentParser.parseValue(DocumentParser.java:436)
>   at 
> org.elasticsearch.index.mapper.DocumentParser.parseObject(DocumentParser.java:262)
>   at 
> org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:122)
>   at 
> org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:309)
>   at 
> org.elasticsearch.index.shard.IndexShard.prepareCreate(IndexShard.java:529)
>   at 
> org.elasticsearch.index.shard.IndexShard.prepareCreateOnPrimary(IndexShard.java:506)
>   at 
> org.elasticsearch.action.index.TransportIndexAction.prepareIndexOperationOnPrimary(TransportIndexAction.java:215)
>   at 
> org.elasticsearch.action.index.TransportIndexAction.executeIndexRequestOnPrimary(TransportIndexAction.java:224)
>   at 
> org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:326)
>   at 
> org.elasticsearch.action.bulk.TransportShardBulkAction.shardUpdateOperation(TransportShardBulkAction.java:389)
>   at 
> org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:191)
>   at 
> org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:68)
>   at 
> org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase.doRun(TransportReplicationAction.java:639)
>   at 
> org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
>   at 
> org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:279)
>   at 
> org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:271)
>   at 
> org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:75)
>   at 
> org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:376)
>   at 
> org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ArrayIndexOutOfBoundsException
>   at org.apache.poi.hwpf.model.SectionTable.(SectionTable.java:84)
>   at org.apache.poi.hwpf.HWPFDocument.(HWPFDocument.java:345)
>   at 
> org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:144)
>   at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:146)
>   at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:117)
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)



--
Th

[jira] [Created] (TIKA-2149) org.apache.poi.POIXMLDocumentPart cannot be cast to org.apache.poi.xwpf.usermodel.XWPFDocument - MS Word docx

2016-10-27 Thread Sharath Kumar (JIRA)
Sharath Kumar created TIKA-2149:
---

 Summary:  org.apache.poi.POIXMLDocumentPart cannot be cast to 
org.apache.poi.xwpf.usermodel.XWPFDocument - MS Word docx
 Key: TIKA-2149
 URL: https://issues.apache.org/jira/browse/TIKA-2149
 Project: Tika
  Issue Type: Bug
  Components: core, parser
Affects Versions: 1.13, 1.11
 Environment: Windows 7 . Linux RHEL 7
Reporter: Sharath Kumar


When I run the attached document(.docx) against tika 1.11 or tika 1.13 to 
extract contents, it errors out with the below exception

Exception in thread "main" org.apache.tika.exception.TikaException: Unexpected 
RuntimeException from 
org.apache.tika.parser.microsoft.ooxml.OOXMLParser@1ea9f6af
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:191)
at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:480)
at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:145)
Caused by: java.lang.ClassCastException: org.apache.poi.POIXMLDocumentPart 
cannot be cast to org.apache.poi.xwpf.usermodel.XWPFDocument
at 
org.apache.poi.xwpf.usermodel.XWPFFootnotes.getXWPFDocument(XWPFFootnotes.java:162)
at 
org.apache.poi.xwpf.usermodel.XWPFFootnote.(XWPFFootnote.java:47)
at 
org.apache.poi.xwpf.usermodel.XWPFFootnotes.onDocumentRead(XWPFFootnotes.java:95)
at 
org.apache.poi.POIXMLDocumentPart._invokeOnDocumentRead(POIXMLDocumentPart.java:658)
at 
org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:235)
at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:160)
at 
org.apache.poi.xwpf.usermodel.XWPFDocument.(XWPFDocument.java:124)
at 
org.apache.poi.xwpf.extractor.XWPFWordExtractor.(XWPFWordExtractor.java:58)
at 
org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:237)
at 
org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:86)
at 
org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:87)
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
... 5 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-2146) Unable to extract contents from protected MS word-doc-java.lang.ArrayIndexOutOfBoundsException

2016-10-27 Thread Sharath Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15614367#comment-15614367
 ] 

Sharath Kumar commented on TIKA-2146:
-

[~talli...@mitre.org]

I ran the same document that i have attached using tika 1.13 I get the below 
issue even in 1.13 . I have one more protected document MS Word 97( which I 
cant share due to the sensitive data in that, that also returns in error. Below 
are the error logs. I have question. Does tika support extrating the contents 
of a protected MS-word doument. The doument in question is not password 
prtotected though.

Output 1:
C:\Users\sk\Downloads>java -jar tika-app-1.13.jar Testbug.doc
Exception in thread "main" org.apache.tika.exception.TikaException: Unexpected 
RuntimeException from org.apache.tika.parser.microsoft.Offic
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:191)
at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:480)
at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:145)
Caused by: java.lang.IllegalStateException: Told we're for characters 8236 -> 
10293, but actually covers 2055 characters!
at org.apache.poi.hwpf.model.TextPiece.(TextPiece.java:73)
at 
org.apache.poi.hwpf.model.TextPieceTable.(TextPieceTable.java:112)
at 
org.apache.poi.hwpf.model.ComplexFileTable.(ComplexFileTable.java:70)
at org.apache.poi.hwpf.HWPFOldDocument.(HWPFOldDocument.java:72)
at 
org.apache.tika.parser.microsoft.WordExtractor.parseWord6(WordExtractor.java:602)
at 
org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:146)
at 
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:146)
at 
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:117)
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
... 5 more


Output 2:

Exception in thread "main" org.apache.tika.exception.TikaException: Unexpected 
RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@6f27a732
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:191)
at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:480)
at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:145)
Caused by: java.lang.ArrayIndexOutOfBoundsException
at java.lang.System.arraycopy(Native Method)
at org.apache.poi.hwpf.model.SectionTable.(SectionTable.java:84)
at org.apache.poi.hwpf.HWPFDocument.(HWPFDocument.java:342)
at 
org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:144)
at 
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:146)
at 
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:117)
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
... 5 more



> Unable to extract contents from protected MS 
> word-doc-java.lang.ArrayIndexOutOfBoundsException
> --
>
> Key: TIKA-2146
> URL: https://issues.apache.org/jira/browse/TIKA-2146
> Project: Tika
>  Issue Type: Bug
>  Components: core, parser
>Affects Versions: 1.11
> Environment: Windows 7
>Reporter: Sharath Kumar
> Attachments: Test bug.doc, This is password protected.doc
>
>
> When I try to parse a MS word document which is protected, I am unable to 
> extract the content rather, i get the below exception
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
> org.apache.tika.parser.microsoft.OfficeParser@29402a40
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
>   at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>   at org.apache.tika.Tika.parseToString(Tika.java:537)
>   at 
> org.elasticsearch.mapper.attachments.TikaImpl$1.run(TikaImpl.java:102)
>   at org.elasticsearch.mapper.attachments.TikaImpl$1.run(TikaImpl.java:1)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at org.elasticsearch.mapper.attachments.TikaImpl.parse(TikaImpl.java:99)
>   at 
> org.elasticsearch.mapper.attachments.AttachmentMapper.parse(AttachmentMapper.java:482)
>   at 
> org.elasticsearch.index.mapper.Doc

[jira] [Comment Edited] (TIKA-2146) Unable to extract contents from protected MS word-doc-java.lang.ArrayIndexOutOfBoundsException

2016-10-27 Thread Sharath Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15611671#comment-15611671
 ] 

Sharath Kumar edited comment on TIKA-2146 at 10/27/16 12:36 PM:


Sure. I have uploaded the doc. The file is not password protected. 
I also see errors like the below for these type of docs(protected word docs)

java.security.PrivilegedActionException: 
org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
org.apache.tika.parser.microsoft.OfficeParser@29402a40
at java.security.AccessController.doPrivileged(Native Method)


was (Author: mnsk07):
Sure. I have uploaded the doc. The file is not password protected. 
I also see errors like the below for these type of docs

java.security.PrivilegedActionException: 
org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
org.apache.tika.parser.microsoft.OfficeParser@29402a40
at java.security.AccessController.doPrivileged(Native Method)

> Unable to extract contents from protected MS 
> word-doc-java.lang.ArrayIndexOutOfBoundsException
> --
>
> Key: TIKA-2146
> URL: https://issues.apache.org/jira/browse/TIKA-2146
> Project: Tika
>  Issue Type: Bug
>  Components: core, parser
>Affects Versions: 1.11
> Environment: Windows 7
>Reporter: Sharath Kumar
> Attachments: Test bug.doc
>
>
> When I try to parse a MS word document which is protected, I am unable to 
> extract the content rather, i get the below exception
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
> org.apache.tika.parser.microsoft.OfficeParser@29402a40
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
>   at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>   at org.apache.tika.Tika.parseToString(Tika.java:537)
>   at 
> org.elasticsearch.mapper.attachments.TikaImpl$1.run(TikaImpl.java:102)
>   at org.elasticsearch.mapper.attachments.TikaImpl$1.run(TikaImpl.java:1)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at org.elasticsearch.mapper.attachments.TikaImpl.parse(TikaImpl.java:99)
>   at 
> org.elasticsearch.mapper.attachments.AttachmentMapper.parse(AttachmentMapper.java:482)
>   at 
> org.elasticsearch.index.mapper.DocumentParser.parseObjectOrField(DocumentParser.java:309)
>   at 
> org.elasticsearch.index.mapper.DocumentParser.parseValue(DocumentParser.java:436)
>   at 
> org.elasticsearch.index.mapper.DocumentParser.parseObject(DocumentParser.java:262)
>   at 
> org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:122)
>   at 
> org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:309)
>   at 
> org.elasticsearch.index.shard.IndexShard.prepareCreate(IndexShard.java:529)
>   at 
> org.elasticsearch.index.shard.IndexShard.prepareCreateOnPrimary(IndexShard.java:506)
>   at 
> org.elasticsearch.action.index.TransportIndexAction.prepareIndexOperationOnPrimary(TransportIndexAction.java:215)
>   at 
> org.elasticsearch.action.index.TransportIndexAction.executeIndexRequestOnPrimary(TransportIndexAction.java:224)
>   at 
> org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:326)
>   at 
> org.elasticsearch.action.bulk.TransportShardBulkAction.shardUpdateOperation(TransportShardBulkAction.java:389)
>   at 
> org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:191)
>   at 
> org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:68)
>   at 
> org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase.doRun(TransportReplicationAction.java:639)
>   at 
> org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
>   at 
> org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:279)
>   at 
> org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:271)
>   at 
> org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:75)
>   at 
> org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:376)
>   at 
> org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent

[jira] [Updated] (TIKA-2146) Unable to extract contents from protected MS word-doc-java.lang.ArrayIndexOutOfBoundsException

2016-10-27 Thread Sharath Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-2146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sharath Kumar updated TIKA-2146:

Attachment: Test bug.doc

> Unable to extract contents from protected MS 
> word-doc-java.lang.ArrayIndexOutOfBoundsException
> --
>
> Key: TIKA-2146
> URL: https://issues.apache.org/jira/browse/TIKA-2146
> Project: Tika
>  Issue Type: Bug
>  Components: core, parser
>Affects Versions: 1.11
> Environment: Windows 7
>Reporter: Sharath Kumar
> Attachments: Test bug.doc
>
>
> When I try to parse a MS word document which is protected, I am unable to 
> extract the content rather, i get the below exception
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
> org.apache.tika.parser.microsoft.OfficeParser@29402a40
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
>   at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>   at org.apache.tika.Tika.parseToString(Tika.java:537)
>   at 
> org.elasticsearch.mapper.attachments.TikaImpl$1.run(TikaImpl.java:102)
>   at org.elasticsearch.mapper.attachments.TikaImpl$1.run(TikaImpl.java:1)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at org.elasticsearch.mapper.attachments.TikaImpl.parse(TikaImpl.java:99)
>   at 
> org.elasticsearch.mapper.attachments.AttachmentMapper.parse(AttachmentMapper.java:482)
>   at 
> org.elasticsearch.index.mapper.DocumentParser.parseObjectOrField(DocumentParser.java:309)
>   at 
> org.elasticsearch.index.mapper.DocumentParser.parseValue(DocumentParser.java:436)
>   at 
> org.elasticsearch.index.mapper.DocumentParser.parseObject(DocumentParser.java:262)
>   at 
> org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:122)
>   at 
> org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:309)
>   at 
> org.elasticsearch.index.shard.IndexShard.prepareCreate(IndexShard.java:529)
>   at 
> org.elasticsearch.index.shard.IndexShard.prepareCreateOnPrimary(IndexShard.java:506)
>   at 
> org.elasticsearch.action.index.TransportIndexAction.prepareIndexOperationOnPrimary(TransportIndexAction.java:215)
>   at 
> org.elasticsearch.action.index.TransportIndexAction.executeIndexRequestOnPrimary(TransportIndexAction.java:224)
>   at 
> org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:326)
>   at 
> org.elasticsearch.action.bulk.TransportShardBulkAction.shardUpdateOperation(TransportShardBulkAction.java:389)
>   at 
> org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:191)
>   at 
> org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:68)
>   at 
> org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase.doRun(TransportReplicationAction.java:639)
>   at 
> org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
>   at 
> org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:279)
>   at 
> org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:271)
>   at 
> org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:75)
>   at 
> org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:376)
>   at 
> org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ArrayIndexOutOfBoundsException
>   at org.apache.poi.hwpf.model.SectionTable.(SectionTable.java:84)
>   at org.apache.poi.hwpf.HWPFDocument.(HWPFDocument.java:345)
>   at 
> org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:144)
>   at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:146)
>   at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:117)
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-2146) Unable to extract contents from protected MS word-doc-java.lang.ArrayIndexOutOfBoundsException

2016-10-27 Thread Sharath Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15611671#comment-15611671
 ] 

Sharath Kumar commented on TIKA-2146:
-

Sure. I have uploaded the doc. The file is not password protected. 
I also see errors like the below for these type of docs

java.security.PrivilegedActionException: 
org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
org.apache.tika.parser.microsoft.OfficeParser@29402a40
at java.security.AccessController.doPrivileged(Native Method)

> Unable to extract contents from protected MS 
> word-doc-java.lang.ArrayIndexOutOfBoundsException
> --
>
> Key: TIKA-2146
> URL: https://issues.apache.org/jira/browse/TIKA-2146
> Project: Tika
>  Issue Type: Bug
>  Components: core, parser
>Affects Versions: 1.11
> Environment: Windows 7
>Reporter: Sharath Kumar
> Attachments: Test bug.doc
>
>
> When I try to parse a MS word document which is protected, I am unable to 
> extract the content rather, i get the below exception
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
> org.apache.tika.parser.microsoft.OfficeParser@29402a40
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
>   at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>   at org.apache.tika.Tika.parseToString(Tika.java:537)
>   at 
> org.elasticsearch.mapper.attachments.TikaImpl$1.run(TikaImpl.java:102)
>   at org.elasticsearch.mapper.attachments.TikaImpl$1.run(TikaImpl.java:1)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at org.elasticsearch.mapper.attachments.TikaImpl.parse(TikaImpl.java:99)
>   at 
> org.elasticsearch.mapper.attachments.AttachmentMapper.parse(AttachmentMapper.java:482)
>   at 
> org.elasticsearch.index.mapper.DocumentParser.parseObjectOrField(DocumentParser.java:309)
>   at 
> org.elasticsearch.index.mapper.DocumentParser.parseValue(DocumentParser.java:436)
>   at 
> org.elasticsearch.index.mapper.DocumentParser.parseObject(DocumentParser.java:262)
>   at 
> org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:122)
>   at 
> org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:309)
>   at 
> org.elasticsearch.index.shard.IndexShard.prepareCreate(IndexShard.java:529)
>   at 
> org.elasticsearch.index.shard.IndexShard.prepareCreateOnPrimary(IndexShard.java:506)
>   at 
> org.elasticsearch.action.index.TransportIndexAction.prepareIndexOperationOnPrimary(TransportIndexAction.java:215)
>   at 
> org.elasticsearch.action.index.TransportIndexAction.executeIndexRequestOnPrimary(TransportIndexAction.java:224)
>   at 
> org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:326)
>   at 
> org.elasticsearch.action.bulk.TransportShardBulkAction.shardUpdateOperation(TransportShardBulkAction.java:389)
>   at 
> org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:191)
>   at 
> org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:68)
>   at 
> org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase.doRun(TransportReplicationAction.java:639)
>   at 
> org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
>   at 
> org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:279)
>   at 
> org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:271)
>   at 
> org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:75)
>   at 
> org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:376)
>   at 
> org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ArrayIndexOutOfBoundsException
>   at org.apache.poi.hwpf.model.SectionTable.(SectionTable.java:84)
>   at org.apache.poi.hwpf.HWPFDocument.(HWPFDocument.java:345)
>   at 
> org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:144)
>   at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(Offi

[jira] [Updated] (TIKA-2146) Unable to extract contents from protected MS word-doc-java.lang.ArrayIndexOutOfBoundsException

2016-10-27 Thread Sharath Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-2146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sharath Kumar updated TIKA-2146:

Component/s: parser

> Unable to extract contents from protected MS 
> word-doc-java.lang.ArrayIndexOutOfBoundsException
> --
>
> Key: TIKA-2146
> URL: https://issues.apache.org/jira/browse/TIKA-2146
> Project: Tika
>  Issue Type: Bug
>  Components: core, parser
>Affects Versions: 1.11
> Environment: Windows 7
>Reporter: Sharath Kumar
>
> When I try to parse a MS word document which is protected, I am unable to 
> extract the content rather, i get the below exception
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
> org.apache.tika.parser.microsoft.OfficeParser@29402a40
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
>   at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>   at org.apache.tika.Tika.parseToString(Tika.java:537)
>   at 
> org.elasticsearch.mapper.attachments.TikaImpl$1.run(TikaImpl.java:102)
>   at org.elasticsearch.mapper.attachments.TikaImpl$1.run(TikaImpl.java:1)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at org.elasticsearch.mapper.attachments.TikaImpl.parse(TikaImpl.java:99)
>   at 
> org.elasticsearch.mapper.attachments.AttachmentMapper.parse(AttachmentMapper.java:482)
>   at 
> org.elasticsearch.index.mapper.DocumentParser.parseObjectOrField(DocumentParser.java:309)
>   at 
> org.elasticsearch.index.mapper.DocumentParser.parseValue(DocumentParser.java:436)
>   at 
> org.elasticsearch.index.mapper.DocumentParser.parseObject(DocumentParser.java:262)
>   at 
> org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:122)
>   at 
> org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:309)
>   at 
> org.elasticsearch.index.shard.IndexShard.prepareCreate(IndexShard.java:529)
>   at 
> org.elasticsearch.index.shard.IndexShard.prepareCreateOnPrimary(IndexShard.java:506)
>   at 
> org.elasticsearch.action.index.TransportIndexAction.prepareIndexOperationOnPrimary(TransportIndexAction.java:215)
>   at 
> org.elasticsearch.action.index.TransportIndexAction.executeIndexRequestOnPrimary(TransportIndexAction.java:224)
>   at 
> org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:326)
>   at 
> org.elasticsearch.action.bulk.TransportShardBulkAction.shardUpdateOperation(TransportShardBulkAction.java:389)
>   at 
> org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:191)
>   at 
> org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:68)
>   at 
> org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase.doRun(TransportReplicationAction.java:639)
>   at 
> org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
>   at 
> org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:279)
>   at 
> org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:271)
>   at 
> org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:75)
>   at 
> org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:376)
>   at 
> org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ArrayIndexOutOfBoundsException
>   at org.apache.poi.hwpf.model.SectionTable.(SectionTable.java:84)
>   at org.apache.poi.hwpf.HWPFDocument.(HWPFDocument.java:345)
>   at 
> org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:144)
>   at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:146)
>   at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:117)
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TIKA-2146) Unable to extract contents from protected MS word-doc-java.lang.ArrayIndexOutOfBoundsException

2016-10-27 Thread Sharath Kumar (JIRA)
Sharath Kumar created TIKA-2146:
---

 Summary: Unable to extract contents from protected MS 
word-doc-java.lang.ArrayIndexOutOfBoundsException
 Key: TIKA-2146
 URL: https://issues.apache.org/jira/browse/TIKA-2146
 Project: Tika
  Issue Type: Bug
  Components: core
Affects Versions: 1.11
 Environment: Windows 7
Reporter: Sharath Kumar


When I try to parse a MS word document which is protected, I am unable to 
extract the content rather, i get the below exception

org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
org.apache.tika.parser.microsoft.OfficeParser@29402a40
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at org.apache.tika.Tika.parseToString(Tika.java:537)
at 
org.elasticsearch.mapper.attachments.TikaImpl$1.run(TikaImpl.java:102)
at org.elasticsearch.mapper.attachments.TikaImpl$1.run(TikaImpl.java:1)
at java.security.AccessController.doPrivileged(Native Method)
at org.elasticsearch.mapper.attachments.TikaImpl.parse(TikaImpl.java:99)
at 
org.elasticsearch.mapper.attachments.AttachmentMapper.parse(AttachmentMapper.java:482)
at 
org.elasticsearch.index.mapper.DocumentParser.parseObjectOrField(DocumentParser.java:309)
at 
org.elasticsearch.index.mapper.DocumentParser.parseValue(DocumentParser.java:436)
at 
org.elasticsearch.index.mapper.DocumentParser.parseObject(DocumentParser.java:262)
at 
org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:122)
at 
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:309)
at 
org.elasticsearch.index.shard.IndexShard.prepareCreate(IndexShard.java:529)
at 
org.elasticsearch.index.shard.IndexShard.prepareCreateOnPrimary(IndexShard.java:506)
at 
org.elasticsearch.action.index.TransportIndexAction.prepareIndexOperationOnPrimary(TransportIndexAction.java:215)
at 
org.elasticsearch.action.index.TransportIndexAction.executeIndexRequestOnPrimary(TransportIndexAction.java:224)
at 
org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:326)
at 
org.elasticsearch.action.bulk.TransportShardBulkAction.shardUpdateOperation(TransportShardBulkAction.java:389)
at 
org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:191)
at 
org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:68)
at 
org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase.doRun(TransportReplicationAction.java:639)
at 
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at 
org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:279)
at 
org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:271)
at 
org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:75)
at 
org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:376)
at 
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ArrayIndexOutOfBoundsException
at org.apache.poi.hwpf.model.SectionTable.(SectionTable.java:84)
at org.apache.poi.hwpf.HWPFDocument.(HWPFDocument.java:345)
at 
org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:144)
at 
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:146)
at 
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:117)
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)