[ https://issues.apache.org/jira/browse/NUTCH-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Markus Jelsma updated NUTCH-1233: --------------------------------- Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > Rely on Tika for outlink extraction > ----------------------------------- > > Key: NUTCH-1233 > URL: https://issues.apache.org/jira/browse/NUTCH-1233 > Project: Nutch > Issue Type: Improvement > Reporter: Markus Jelsma > Assignee: Markus Jelsma > Fix For: 1.6 > > Attachments: NUTCH-1233-1.5-wip.patch > > > Tika provides outlink extraction features that are not used in Nutch. To be > able to use it in Nutch we need Tika to return the rel attr value of each > link, which it currently doesn't. There's a patch for Tika 1.1. If that patch > is included in Tika and we upgraded to that new version this issue can be > worked on. Here's preliminary code that does both Tika and current outlink > extraction. This also includes parts of the Boilerpipe code. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira