Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "NutchScoring" page has been changed by LewisJohnMcgibbney:
https://wiki.apache.org/nutch/NutchScoring?action=diff&rev1=7&rev2=8

  == Where Scoring takes place within the Nutch Crawl cycle ==
  Scoring occurs in numerous places throughout the Nutch codebase and 
consequently within the crawl cycle. This section describes the point of 
occurence and functional purpose at each step.
   
-  * 
[[https://svn.apache.org/repos/asf/nutch/trunk/src/java/org/apache/nutch/crawl/Injector.java|./src/java/org/apache/nutch/crawl/Injector.java]]
 - The Injector 
+  * 
[[https://svn.apache.org/repos/asf/nutch/trunk/src/java/org/apache/nutch/crawl/Injector.java|./src/java/org/apache/nutch/crawl/Injector.java]]
 - Scoring filters are defined within the various MapReduce job configurations. 
This means that the desired configuration will be used appropriately at runtime 
when the job is run by the JobClient. The Injector actually contains two 
MapReduce jobs, namely
+     * sortJob - where we set the InjectMapper as the Mapreduce Mapper 
override. The InjectMapper uses ScoringFilters to calculate a new initial score 
for a particular URL based on passing in the Hadoop Text key (representing the 
URL of the page) and associated CrawlDatum value (representing a new datum. 
Filters will modify it in-place) to the ScoringFilters.injectedScore method. 
Essentially this sets an initial score for newly injected pages. It should be 
noted that newly injected pages may have no inlinks, so filter implementations 
may wish to set this score to a non-zero value, to give newly injected pages 
some initial credit.
+     * mergeJob - 
   * ./src/java/org/apache/nutch/crawl/CrawlDbReducer.java
   * ./src/java/org/apache/nutch/crawl/Generator.java
   * ./src/java/org/apache/nutch/fetcher/Fetcher.java

Reply via email to