(Please don't cross-post to multiple lists) Emmanuel wrote: > I've been through the code of the CrawlDbReader class. I discovered the > method "processTopNJob" which use the class CrawlDbTopNMapper and > CrawlDbTopNReducer. > I'm wondering why do we have this function. Is it an old implementation > that > was used before the Generator to get the TopN links to Fetch or is it > something else ? > I would appreciate if you give me your thoughts.
It's not an old method, it's in use. See the synopsis in CrawlDbReader.main(). The purpose of this option is to dump the top scoring URLs, together with their scores. This is a useful functionality to monitor CrawlDb for potential scoring problems. > > I found also some class which are not used, "CrawlDbDumpReducer" its > defined > but its never used or instanciate. > Don't you think we can remove it from the source code ? > Yes, we can remove this class - it's equivalent to IdentityReducer, which is used implicitly by this job. This class is a leftover from the time, when it contained also some filtering code. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
