On 7/25/07, Luca Rondanini <[EMAIL PROTECTED]> wrote:
> Hi all,
> The gerate step of my crwal process is taking more then 2 hours....is it
> normal?

Are you partitioning urls by ip or by host?

>
> this is my stat report:
>
> CrawlDb statistics start: crawl/crawldb
> Statistics for CrawlDb: crawl/crawldb
> TOTAL urls:     586860
> retry 0:        578159
> retry 1:        1983
> retry 2:        2017
> retry 3:        4701
> min score:      0.0
> avg score:      0.0
> max score:      1.0
> status 1 (db_unfetched):        164849
> status 2 (db_fetched):  417306
> status 3 (db_gone):     4701
> status 5 (db_redir_perm):       4
> CrawlDb statistics: done
>
>
>
>
>
> Luca Rondanini
>
>


-- 
Doğacan Güney
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to