...by host.
I guess you want suggest to change PartitionUrlByHost. In this case, can 
you please me point out how to change it to have an  "unpolite" fetcher?



Luca Rondanini


Doğacan Güney wrote:
> On 7/25/07, Luca Rondanini <[EMAIL PROTECTED]> wrote:
> 
>> Hi all,
>> The gerate step of my crwal process is taking more then 2 hours....is it
>> normal?
> 
> 
> Are you partitioning urls by ip or by host?
> 
>>
>> this is my stat report:
>>
>> CrawlDb statistics start: crawl/crawldb
>> Statistics for CrawlDb: crawl/crawldb
>> TOTAL urls:     586860
>> retry 0:        578159
>> retry 1:        1983
>> retry 2:        2017
>> retry 3:        4701
>> min score:      0.0
>> avg score:      0.0
>> max score:      1.0
>> status 1 (db_unfetched):        164849
>> status 2 (db_fetched):  417306
>> status 3 (db_gone):     4701
>> status 5 (db_redir_perm):       4
>> CrawlDb statistics: done
>>
>>
>>
>>
>>
>> Luca Rondanini
>>
>>
> 
> 

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to