[ https://issues.apache.org/jira/browse/NUTCH-840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lewis John McGibbney updated NUTCH-840: --------------------------------------- Attachment: NUTCH-840.patch Hi Julien. I have absolutely no idea how or when I ended up working on this, but I think the attachment nearly addresses this issue. It is from a while back and to be honest I can't really remeber working on it... Anyway, I think the parse-tika tests fail as it is not quite working properly yet. The patch also changes the directory structure to o.a.n.p.tika rather than existing o.a.n.tika which is inconsistent with other parser plugin implementation we ship with Nutch. Sorry for hijacking this one slightly. > Port tests from parse-html to parse-tika > ---------------------------------------- > > Key: NUTCH-840 > URL: https://issues.apache.org/jira/browse/NUTCH-840 > Project: Nutch > Issue Type: Task > Components: parser > Affects Versions: 1.1 > Reporter: Julien Nioche > Assignee: Julien Nioche > Fix For: nutchgora > > Attachments: NUTCH-840.patch, NUTCH-840.patch > > > We don't have test for HTML in parse-tika so I'll copy them from the old > parse-html plugin -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira