[ http://issues.apache.org/jira/browse/NUTCH-240?page=comments#action_12372379 ]
Andrzej Bialecki commented on NUTCH-240: ----------------------------------------- Yes, one of the reasons I wanted to discuss these patches is that they uncovered some of the underlying ugliness... ;) The reson for generator store/restore is that scoring plugins could take into account many more variables than just the score recorded in CrawlDatum.score. They could also have different strategies for prioritizing pages to be included in topN. So, it's true this is not currently used by OPIC but I think without this it's not possible for plugins to affect the choice of topN. Initially, I did as you suggest, i.e. I created a method to calculate one float value for the purpose of selecting topN. However, I wanted to avoid changing CrawlDatum.compareTo - if we put ScoringFilters there, it would be a big performance hit. OTOH, if we overwrite the primitive float in CrawlDatum.score it seemed to me we should store its earlier value, and then possibl restore - as the value for selecting topN may have nothing to do with the "real" score. passScoreBeforeParsing/passScoreAfterParsing: again, I agree it looks strange, but that's what we do at the moment, I just extracted it into an interface. I'd love to skip this altogether, if there is a way. > Scoring API: extension point, scoring filters and an OPIC plugin > ---------------------------------------------------------------- > > Key: NUTCH-240 > URL: http://issues.apache.org/jira/browse/NUTCH-240 > Project: Nutch > Type: Improvement > Versions: 0.8-dev > Reporter: Andrzej Bialecki > Attachments: patch.txt > > This patch refactors all places where Nutch manipulates page scores, into a > plugin-based API. Using this API it's possible to implement different scoring > algorithms. It is also much easier to understand how scoring works. > Multiple scoring plugins can be run in sequence, in a manner similar to > URLFilters. > Included is also an OPICScoringFilter plugin, which contains the current > implementation of the scoring algorithm. Together with the scoring API it > provides a fully backward-compatible scoring. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
