[ 
https://issues.apache.org/jira/browse/TIKA-642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036337#comment-13036337
 ] 

Cristian Vat commented on TIKA-642:
-----------------------------------

For the example file it seems like there's only extra closing tag after main 
rtf block.

Since TIKA-422 RTF files are already read completely and filtered through to a 
separate file, so that could be modified to just drop any content after closing 
of main block.

The only question is if it should... I'll also check the RTF spec to see if 
there is anything which can be after the main block.

> Few of RTF files not extracting properly
> ----------------------------------------
>
>                 Key: TIKA-642
>                 URL: https://issues.apache.org/jira/browse/TIKA-642
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.9, 1.0
>         Environment: All
>            Reporter: Manish
>         Attachments: FIRM GAS GTC B RED.DOC
>
>
> Few of the RTF files dont get extracted properly. 
> This is the stack trace: 
> org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from 
> org.apache.tika.parser.rtf.RTFParser@616d071a
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:203)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
> at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
> Caused by: java.io.IOException: Too many close-groups in RTF text
> at javax.swing.text.rtf.RTFParser.write(RTFParser.java:156)
> at javax.swing.text.rtf.RTFParser.writeSpecial(RTFParser.java:101)
> at javax.swing.text.rtf.AbstractFilter.write(AbstractFilter.java:158)
> at javax.swing.text.rtf.AbstractFilter.readFromStream(AbstractFilter.java:88)
> at javax.swing.text.rtf.RTFEditorKit.read(RTFEditorKit.java:65)
> at org.apache.tika.parser.rtf.RTFParser.parse(RTFParser.java:112)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to