...by host. I guess you want suggest to change PartitionUrlByHost. In this case, can you please me point out how to change it to have an "unpolite" fetcher?
Luca Rondanini Doğacan Güney wrote: > On 7/25/07, Luca Rondanini <[EMAIL PROTECTED]> wrote: > >> Hi all, >> The gerate step of my crwal process is taking more then 2 hours....is it >> normal? > > > Are you partitioning urls by ip or by host? > >> >> this is my stat report: >> >> CrawlDb statistics start: crawl/crawldb >> Statistics for CrawlDb: crawl/crawldb >> TOTAL urls: 586860 >> retry 0: 578159 >> retry 1: 1983 >> retry 2: 2017 >> retry 3: 4701 >> min score: 0.0 >> avg score: 0.0 >> max score: 1.0 >> status 1 (db_unfetched): 164849 >> status 2 (db_fetched): 417306 >> status 3 (db_gone): 4701 >> status 5 (db_redir_perm): 4 >> CrawlDb statistics: done >> >> >> >> >> >> Luca Rondanini >> >> > > ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
