Hi I have had no problem doing distributed crawl.
On 12/28/05, Pushpesh Kr. Rajwanshi <[EMAIL PROTECTED]> wrote: > Hi NN, > > Thanks for replying me. Actually I wanted to know if distributed crawling in > nutch is working fine and to what success? Like i am successful in setting > up distributed crawl for 2 machines (1 master and 1 slave) but when i try > with more than two machines there seems problem specially while injecting > urls in crawlDB. Could you please post your log files please. For example jobtracker and tasktracker log file...ยจ So was wondering if anyone is successful in doing a massive > crawl using nutch involving crawling of millions of pages successfully? > > My requirement is to crawl like 20,000 websites (for say depth 5) in a day > and i was wondering how many machines would it require to do that. > > Would truely appreciate any response on this. > > Thanks In Advance > Pushpesh > > > On 12/28/05, Nutch Newbie <[EMAIL PROTECTED]> wrote: > > > > Have you tried the following: > > > > http://wiki.apache.org/nutch/HardwareRequirements > > > > and > > > > http://wiki.apache.org/nutch/ > > > > There are no quick answer if one is planning to crawl million > > pages..Read..Try.. Read.. > > > > > > On 12/28/05, Pushpesh Kr. Rajwanshi <[EMAIL PROTECTED]> wrote: > > > Hi, > > > > > > I want to know if anyone is able to successfully run distributed crawl > > on > > > multiple machines involving crawling millions of pages? and how hard is > > to > > > do that? Do i just have to do some configuration and set up or do some > > > implementations also? > > > > > > Also can anyone tell me if i want to crawl around 20,000 websites (say > > for > > > depth 5) in a day, is it possible and if yes then how many machines > > would i > > > roughly require? and what all configurations i will need? I would > > appreciate > > > even some very approximate numbers also as i can understand it might not > > be > > > trivial to find out or may be :-) > > > > > > TIA > > > Pushpesh > > > > > > > > > > ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_idv37&alloc_id865&op=click _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
