I would suggest to check url filters. If you use the crawl command then it is teh crawl url filter otherwise it is regex-urlfilter
And check nutch settings if you allow adding outward links. 2009/3/5 Yves Yu <[email protected]> > yes, I'm using Luke now, and I see there is no www. bbb.com and no > www.ccc.com in crawling procedure. it only can crawling www.aaa.com, > www.aaa.com\xxx\xxx, like these > do you know what the problem is? > > 2009/3/4 Jasper Kamperman <[email protected]> > > > Oh and the documentation also specifies a depth parameter that says how > far > > afield the crawler may go. I think default is 10 but not sure. > > > > Sent from my iPhone > > > > > > On Mar 3, 2009, at 12:53 PM, Yves Yu <[email protected]> wrote: > > > > you mean, we can do this without additional configuration? how about 10 > >> depth like this? how can I set it?thanks. > >> > >> 2009/3/4 Jasper Kamperman <[email protected]> > >> > >> Could be a lot of reasons. I'd start by investigating the index with > Luke > >>> to see if ccc made it into the index and if I can search out the page > >>> with > >>> the word "big". From what I find out with Luke I'd work my way back to > >>> the > >>> root cause > >>> > >>> Sent from my iPhone > >>> > >>> > >>> On Mar 3, 2009, at 7:40 AM, Yves Yu <[email protected]> wrote: > >>> > >>> Hi, all, > >>> > >>>> for example, > >>>> > >>>> The page www.aaa.com has a link www.bbb.com > >>>> www.bbb.com has a link www.ccc.com > >>>> www.ccc.com has a word: big > >>>> > >>>> It seems I cannot find "big" in www.ccc.com, is it possible? How can > I > >>>> set > >>>> the configurations? > >>>> > >>>> Thanks in advance! > >>>> > >>>> Yves > >>>> > >>>> > >>> > -- Best Regards Alexander Aristov
