On 7/25/07, Luca Rondanini <[EMAIL PROTECTED]> wrote: > Hi all, > The gerate step of my crwal process is taking more then 2 hours....is it > normal?
Are you partitioning urls by ip or by host? > > this is my stat report: > > CrawlDb statistics start: crawl/crawldb > Statistics for CrawlDb: crawl/crawldb > TOTAL urls: 586860 > retry 0: 578159 > retry 1: 1983 > retry 2: 2017 > retry 3: 4701 > min score: 0.0 > avg score: 0.0 > max score: 1.0 > status 1 (db_unfetched): 164849 > status 2 (db_fetched): 417306 > status 3 (db_gone): 4701 > status 5 (db_redir_perm): 4 > CrawlDb statistics: done > > > > > > Luca Rondanini > > -- Doğacan Güney ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
