[ https://issues.apache.org/jira/browse/TIKA-733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13119462#comment-13119462 ]
Michael McCandless commented on TIKA-733: ----------------------------------------- Actually, I think we should just commit your patch: it's harmless for non-corrupt RTF docs, and for corrupt ones (with this particular corruption) it will make a best effort to extract what text it can. I only wanted to confirm that you were hitting this because of document corruption and not a bug in how the new RTF parser tokenizes open/close groups. Thanks! > [PATCH] RTF TextExtractor processGroupEnd() NoSuchElementException > ------------------------------------------------------------------ > > Key: TIKA-733 > URL: https://issues.apache.org/jira/browse/TIKA-733 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 1.0 > Reporter: Jeremy Anderson > Assignee: Michael McCandless > Labels: patch > Fix For: 1.0 > > Attachments: > TIKA-733-rtf_TextExtractor_processGroupEnd-NoSuchElementException.patch > > > Parsing some RTF documents attempt to perform a removeLast() on the > groupStates() list when the list is empty. Added a check to not perform the > logic when the list is empty, thus causing the restore group state to not be > performed. Text extraction now completes without further down-stream errors. > Unable to include sample file due to sensitive nature of file contents. > StackTrace (TIKA-0.9) > Caused by: java.util.NoSuchElementException > at java.util.LinkedList.remove(LinkedList.java:788) > at java.util.LinkedList.removeLast(LinkedList.java:144) > at > org.apache.tika.parser.rtf.TextExtractor.processGroupEnd(TextExtractor.java:1010) > at > org.apache.tika.parser.rtf.TextExtractor.extract(TextExtractor.java:352) > at org.apache.tika.parser.rtf.RTFParser.parse(RTFParser.java:53) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) > ... 45 more -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira