[Nutch-general] Re: Is any one able to successfully run Distributed Crawl?

Gal Nitzan Tue, 03 Jan 2006 00:12:10 -0800

+1

On Mon, 2006-01-02 at 13:39 -0800, Earl Cahill wrote:
> Any chance you could walk through your implementation?
>  Like how the twenty boxes were assigned?  Maybe
> upload your confs somewhere, and outline what commands
> you actually ran?
> 
> Thanks,
> Earl
> 
> --- Doug Cutting <[EMAIL PROTECTED]> wrote:
> 
> > Pushpesh Kr. Rajwanshi wrote:
> > > I want to know if anyone is able to successfully
> > run distributed crawl on
> > > multiple machines involving crawling millions of
> > pages? and how hard is to
> > > do that? Do i just have to do some configuration
> > and set up or do some
> > > implementations also?
> > 
> > I recently performed a four-level deep crawl,
> > starting from urls in 
> > DMOZ, limiting each level to 16M urls.  This ran on
> > 20 machines taking 
> > around 24 hours using about 100Mbit and retrieved
> > around 50M pages.  I 
> > used Nutch unmodified, specifying only a few
> > configuration options.  So, 
> > yes, it is possible.
> > 
> > Doug
> > 
> 
> 
> 
>       
>               
> __________________________________ 
> Yahoo! for Good - Make a difference this year. 
> http://brand.yahoo.com/cybergivingweek2005/
>





-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

[Nutch-general] Re: Is any one able to successfully run Distributed Crawl?

Reply via email to