RTF parser fails to extract the body
------------------------------------

                 Key: TIKA-748
                 URL: https://issues.apache.org/jira/browse/TIKA-748
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 0.10
            Reporter: Andrzej Bialecki 


Using tika-app I'm getting the following result of parsing the attached 
document:

{noformat}
<?xml version="1.0" encoding="UTF-8"?><html 
xmlns="http://www.w3.org/1999/xhtml";>
<head>
<meta name="subject" content="tests"/>
<meta name="Content-Length" content="2235"/>
<meta name="comment" content="StarWriter"/>
<meta name="X-Parsed-By" content="org.apache.tika.parser.DefaultParser"/>
<meta name="X-Parsed-By" content="org.apache.tika.parser.rtf.RTFParser"/>
<meta name="Content-Type" content="application/rtf"/>
<meta name="resourceName" content="test.rtf"/>
<title>test rft document</title>
</head>
<body/></html>
{noformat}

The expected result would be a non-empty body containing the text "The quick 
brown fox jumps over the lazy dog
".

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to