[ http://issues.apache.org/jira/browse/NUTCH-240?page=all ]
Andrzej Bialecki updated NUTCH-240: ------------------------------------ Attachment: Generator.patch.txt This patch is an intermediate step towards the simplification of the scoring API. It changes Generator to use an arbitrary FloatWritable for selecting topN records. If there are not objections, I'd like to commit this patch first, and then refactor the scoring API to use this new Generator. > Scoring API: extension point, scoring filters and an OPIC plugin > ---------------------------------------------------------------- > > Key: NUTCH-240 > URL: http://issues.apache.org/jira/browse/NUTCH-240 > Project: Nutch > Type: Improvement > Versions: 0.8-dev > Reporter: Andrzej Bialecki > Attachments: Generator.patch.txt, patch.txt > > This patch refactors all places where Nutch manipulates page scores, into a > plugin-based API. Using this API it's possible to implement different scoring > algorithms. It is also much easier to understand how scoring works. > Multiple scoring plugins can be run in sequence, in a manner similar to > URLFilters. > Included is also an OPICScoringFilter plugin, which contains the current > implementation of the scoring algorithm. Together with the scoring API it > provides a fully backward-compatible scoring. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira