[ 
https://issues.apache.org/jira/browse/TIKA-733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13119815#comment-13119815
 ] 

Jeremy Anderson commented on TIKA-733:
--------------------------------------

Cool beans!! 

Thanks for your attention to it.  Yeah, I confirmed with 18 of the other files 
experiencing this error, all corruption issues similar to the first one.  
Although the amount of info contained in the final block varies widely from a 
few chars to none.  


But using the patch I already submitted does appear to actually work with 
getting the text out for each these corrupted documents.

Thanks again for adding it to the trunk.

                
> [PATCH] RTF TextExtractor processGroupEnd() NoSuchElementException
> ------------------------------------------------------------------
>
>                 Key: TIKA-733
>                 URL: https://issues.apache.org/jira/browse/TIKA-733
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.0
>            Reporter: Jeremy Anderson
>            Assignee: Michael McCandless
>              Labels: patch
>             Fix For: 1.0
>
>         Attachments: 
> TIKA-733-rtf_TextExtractor_processGroupEnd-NoSuchElementException.patch
>
>
> Parsing some RTF documents attempt to perform a removeLast() on the 
> groupStates() list when the list is empty.  Added a check to not perform the 
> logic when the list is empty, thus causing the restore group state to not be 
> performed. Text extraction now completes without further down-stream errors.
> Unable to include sample file due to sensitive nature of file contents.
> StackTrace (TIKA-0.9)
> Caused by: java.util.NoSuchElementException
>       at java.util.LinkedList.remove(LinkedList.java:788)
>       at java.util.LinkedList.removeLast(LinkedList.java:144)
>       at 
> org.apache.tika.parser.rtf.TextExtractor.processGroupEnd(TextExtractor.java:1010)
>       at 
> org.apache.tika.parser.rtf.TextExtractor.extract(TextExtractor.java:352)
>       at org.apache.tika.parser.rtf.RTFParser.parse(RTFParser.java:53)
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
>       ... 45 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to