Crawldb update to total counts per status
-----------------------------------------

                 Key: NUTCH-1071
                 URL: https://issues.apache.org/jira/browse/NUTCH-1071
             Project: Nutch
          Issue Type: Improvement
    Affects Versions: 1.4
            Reporter: Julien Nioche
            Assignee: Julien Nioche
            Priority: Trivial
             Fix For: 1.4


The reduce phase of the crawldb update outputs all the entries that will be 
found in the updated crawldb. We can use the counters to summarise the number 
of URLs per status, which is a bit like the readdb -stats functionality except 
that it does not require an additional step. 
This is a useful way of monitoring the progress of a crawl using the Hadoop 
JobTracker UI.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to