[
https://issues.apache.org/jira/browse/TIKA-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16850807#comment-16850807
]
Tim Allison commented on TIKA-2883:
-----------------------------------
The file isn't triggering:
{noformat}
inHeader = false;
{noformat}
If we add:
{noformat}
if (...|| equals("htmlrtf"))
{noformat}
text is mostly correctly extracted. There's another open issue that I'd like
to track down, where our RTFParser doesn't realize that it is not out of the
head section of the RTF...
> Text not extracted from RTF files
> ---------------------------------
>
> Key: TIKA-2883
> URL: https://issues.apache.org/jira/browse/TIKA-2883
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.20, 1.19.1, 1.21
> Reporter: Luis Filipe Nassif
> Priority: Major
> Attachments: Message (5).rtf
>
>
> I have a number of RTF files (extracted fromĀ PST email bodies) which text is
> not extracted currently. Sample file attached. [[email protected]], do you
> have any ideia what is going on?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)