[ 
https://issues.apache.org/jira/browse/NUTCH-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124451#comment-13124451
 ] 

Andrzej Bialecki  commented on NUTCH-1154:
------------------------------------------

The case for inclusion is here http://s.apache.org/vR :) that is, Tika 0.10 has 
several important improvements over 0.9.

With the attached patch all tests pass except TestRTFParser, due to an issue 
that just has been fixed in Tika trunk. The underlying problem is that our test 
document is malformed and Tika's new RTF parser wasn't robust enough to handle 
this.

This means that for now we would have to disable this test, and re-enable it 
once we upgrade to Tika 1.0.
                
> Upgrade to Tika 0.10
> --------------------
>
>                 Key: NUTCH-1154
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1154
>             Project: Nutch
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 1.4
>            Reporter: Andrzej Bialecki 
>         Attachments: NUTCH-1154.diff
>
>
> There have been significant improvements in Tika 0.10 and it would be nice to 
> use the latest Tika in 1.4.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to