[ https://issues.apache.org/jira/browse/NUTCH-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Julien Nioche closed NUTCH-1071. -------------------------------- > Crawldb update to total counts per status > ----------------------------------------- > > Key: NUTCH-1071 > URL: https://issues.apache.org/jira/browse/NUTCH-1071 > Project: Nutch > Issue Type: Improvement > Affects Versions: 1.4 > Reporter: Julien Nioche > Assignee: Julien Nioche > Priority: Trivial > Fix For: 1.4 > > Attachments: NUTCH-1071.patch > > > The reduce phase of the crawldb update outputs all the entries that will be > found in the updated crawldb. We can use the counters to summarise the number > of URLs per status, which is a bit like the readdb -stats functionality > except that it does not require an additional step. > This is a useful way of monitoring the progress of a crawl using the Hadoop > JobTracker UI. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira