Have you tried the following:

http://wiki.apache.org/nutch/HardwareRequirements

and

http://wiki.apache.org/nutch/

There are no quick answer if one is planning to crawl million
pages..Read..Try.. Read..


On 12/28/05, Pushpesh Kr. Rajwanshi <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I want to know if anyone is able to successfully run distributed crawl on
> multiple machines involving crawling millions of pages? and how hard is to
> do that? Do i just have to do some configuration and set up or do some
> implementations also?
>
> Also can anyone tell me if i want to crawl around 20,000 websites (say for
> depth 5) in a day, is it possible and if yes then how many machines would i
> roughly require? and what all configurations i will need? I would appreciate
> even some very approximate numbers also as i can understand it might not be
> trivial to find out or may be :-)
>
> TIA
> Pushpesh
>
>


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_idv37&alloc_id865&op=click
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to