Hi I have had no problem doing distributed crawl.
On 12/28/05, Pushpesh Kr. Rajwanshi <[EMAIL PROTECTED]> wrote: > Hi NN, > > Thanks for replying me. Actually I wanted to know if distributed crawling in > nutch is working fine and to what success? Like i am successful in setting > up distributed crawl for 2 machines (1 master and 1 slave) but when i try > with more than two machines there seems problem specially while injecting > urls in crawlDB. Could you please post your log files please. For example jobtracker and tasktracker log file...ยจ So was wondering if anyone is successful in doing a massive > crawl using nutch involving crawling of millions of pages successfully? > > My requirement is to crawl like 20,000 websites (for say depth 5) in a day > and i was wondering how many machines would it require to do that. > > Would truely appreciate any response on this. > > Thanks In Advance > Pushpesh > > > On 12/28/05, Nutch Newbie <[EMAIL PROTECTED]> wrote: > > > > Have you tried the following: > > > > http://wiki.apache.org/nutch/HardwareRequirements > > > > and > > > > http://wiki.apache.org/nutch/ > > > > There are no quick answer if one is planning to crawl million > > pages..Read..Try.. Read.. > > > > > > On 12/28/05, Pushpesh Kr. Rajwanshi <[EMAIL PROTECTED]> wrote: > > > Hi, > > > > > > I want to know if anyone is able to successfully run distributed crawl > > on > > > multiple machines involving crawling millions of pages? and how hard is > > to > > > do that? Do i just have to do some configuration and set up or do some > > > implementations also? > > > > > > Also can anyone tell me if i want to crawl around 20,000 websites (say > > for > > > depth 5) in a day, is it possible and if yes then how many machines > > would i > > > roughly require? and what all configurations i will need? I would > > appreciate > > > even some very approximate numbers also as i can understand it might not > > be > > > trivial to find out or may be :-) > > > > > > TIA > > > Pushpesh > > > > > > > > > >