Hi

I have had no problem doing distributed crawl.

On 12/28/05, Pushpesh Kr. Rajwanshi <[EMAIL PROTECTED]> wrote:
> Hi NN,
>
> Thanks for replying me. Actually I wanted to know if distributed crawling in
> nutch is working fine and to what success? Like i am successful in setting
> up distributed crawl for 2 machines (1 master and 1 slave) but when i try
> with more than two machines there seems problem specially while injecting
> urls in crawlDB.

Could you please post your log files please. For example jobtracker
and tasktracker log file...ยจ

So was wondering if anyone is successful in doing a massive
> crawl using nutch involving crawling of millions of pages successfully?
>
> My requirement is to crawl like 20,000 websites (for say depth 5) in a day
> and i was wondering how many machines would it require to do that.
>
> Would truely appreciate any response on this.
>
> Thanks In Advance
> Pushpesh
>
>
> On 12/28/05, Nutch Newbie <[EMAIL PROTECTED]> wrote:
> >
> > Have you tried the following:
> >
> > http://wiki.apache.org/nutch/HardwareRequirements
> >
> > and
> >
> > http://wiki.apache.org/nutch/
> >
> > There are no quick answer if one is planning to crawl million
> > pages..Read..Try.. Read..
> >
> >
> > On 12/28/05, Pushpesh Kr. Rajwanshi <[EMAIL PROTECTED]> wrote:
> > > Hi,
> > >
> > > I want to know if anyone is able to successfully run distributed crawl
> > on
> > > multiple machines involving crawling millions of pages? and how hard is
> > to
> > > do that? Do i just have to do some configuration and set up or do some
> > > implementations also?
> > >
> > > Also can anyone tell me if i want to crawl around 20,000 websites (say
> > for
> > > depth 5) in a day, is it possible and if yes then how many machines
> > would i
> > > roughly require? and what all configurations i will need? I would
> > appreciate
> > > even some very approximate numbers also as i can understand it might not
> > be
> > > trivial to find out or may be :-)
> > >
> > > TIA
> > > Pushpesh
> > >
> > >
> >
>
>

Reply via email to