[ https://issues.apache.org/jira/browse/TIKA-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584556#comment-17584556 ]
Tilman Hausherr commented on TIKA-3841: --------------------------------------- This happens in POI: {noformat} Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 351 out of bounds for length 351 at org.apache.poi.hwpf.sprm.SprmOperation.initSize(SprmOperation.java:177) at org.apache.poi.hwpf.sprm.SprmOperation.<init>(SprmOperation.java:75) at org.apache.poi.hwpf.sprm.SprmIterator.next(SprmIterator.java:47) at org.apache.poi.hwpf.sprm.ParagraphSprmUncompressor.uncompressPAP(ParagraphSprmUncompressor.java:62) at org.apache.poi.hwpf.usermodel.Paragraph.newParagraph(Paragraph.java:111) at org.apache.poi.hwpf.usermodel.Range.getParagraph(Range.java:777) at org.apache.tika.parser.microsoft.WordExtractor.handleHeaderFooter(WordExtractor.java:253) at org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:210) at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:218) at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:175) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298) {noformat} > 使用tika解析部分word文档出现异常,tika_exception > ----------------------------------- > > Key: TIKA-3841 > URL: https://issues.apache.org/jira/browse/TIKA-3841 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 1.24, 2.4.1, 1.28.4 > Environment: h3. Java Version > java version "1.8.0_291" > h3. OS Version > Linux localhost.localdomain 3.10.0-957.el7.x86_64 > [#1|https://github.com/elastic/elasticsearch/issues/1] SMP Thu Nov 8 23:39:32 > UTC 2018 x86_64 x86_64 x86_64 GNU/Linux > Reporter: lxz > Priority: Blocker > Attachments: 22030714121143428592.doc > > > { > "error": { > "root_cause": [ > { "type": "parse_exception", "reason": "Error parsing > document in field [content]" } > ], > "type": "parse_exception", > "reason": "Error parsing document in field [content]", > "caused_by": { > "type": "tika_exception", > "reason": "Unexpected RuntimeException from > org.apache.tika.parser.microsoft.OfficeParser@3b5e180a", > "caused_by": > { "type": "array_index_out_of_bounds_exception", > "reason": "351" } > } > }, > "status": 400 > } -- This message was sent by Atlassian Jira (v8.20.10#820010)