I would suggest to check url filters. If you use the crawl command then it
is teh crawl url filter otherwise it is regex-urlfilter


And check nutch settings if you allow adding  outward links.

2009/3/5 Yves Yu <[email protected]>

> yes, I'm using Luke now, and I see there is no www. bbb.com and no
> www.ccc.com in crawling procedure. it only can crawling www.aaa.com,
> www.aaa.com\xxx\xxx, like these
> do you know what the problem is?
>
> 2009/3/4 Jasper Kamperman <[email protected]>
>
> > Oh and the documentation also specifies a depth parameter that says how
> far
> > afield the crawler may go. I think default is 10 but not sure.
> >
> > Sent from my iPhone
> >
> >
> > On Mar 3, 2009, at 12:53 PM, Yves Yu <[email protected]> wrote:
> >
> >  you mean, we can do this without additional configuration? how about 10
> >> depth like this? how can I set it?thanks.
> >>
> >> 2009/3/4 Jasper Kamperman <[email protected]>
> >>
> >>  Could be a lot of reasons. I'd start by investigating the index with
> Luke
> >>> to see if ccc made it into the index and if I can search out the page
> >>> with
> >>> the word "big". From what I find out with Luke I'd work my way back to
> >>> the
> >>> root cause
> >>>
> >>> Sent from my iPhone
> >>>
> >>>
> >>> On Mar 3, 2009, at 7:40 AM, Yves Yu <[email protected]> wrote:
> >>>
> >>> Hi, all,
> >>>
> >>>> for example,
> >>>>
> >>>> The page www.aaa.com has a link www.bbb.com
> >>>> www.bbb.com has a link www.ccc.com
> >>>> www.ccc.com has a word: big
> >>>>
> >>>> It seems I cannot find "big" in www.ccc.com, is it possible? How can
> I
> >>>> set
> >>>> the configurations?
> >>>>
> >>>> Thanks in advance!
> >>>>
> >>>> Yves
> >>>>
> >>>>
> >>>
>



-- 
Best Regards
Alexander Aristov

Reply via email to