thank you
my urls.txt is www.aaa.com
must I add www.bbb.com and www.ccc.com here?

my urlfilter is +^http://([a-z0-9]*\.)*com/

by the way, how to check nutch settings if I allow adding outward links?

2009/3/5 Alexander Aristov <[email protected]>

> I would suggest to check url filters. If you use the crawl command then it
> is teh crawl url filter otherwise it is regex-urlfilter
>
>
> And check nutch settings if you allow adding  outward links.
>
> 2009/3/5 Yves Yu <[email protected]>
>
> > yes, I'm using Luke now, and I see there is no www. bbb.com and no
> > www.ccc.com in crawling procedure. it only can crawling www.aaa.com,
> > www.aaa.com\xxx\xxx, like these
> > do you know what the problem is?
> >
> > 2009/3/4 Jasper Kamperman <[email protected]>
> >
> > > Oh and the documentation also specifies a depth parameter that says how
> > far
> > > afield the crawler may go. I think default is 10 but not sure.
> > >
> > > Sent from my iPhone
> > >
> > >
> > > On Mar 3, 2009, at 12:53 PM, Yves Yu <[email protected]> wrote:
> > >
> > >  you mean, we can do this without additional configuration? how about
> 10
> > >> depth like this? how can I set it?thanks.
> > >>
> > >> 2009/3/4 Jasper Kamperman <[email protected]>
> > >>
> > >>  Could be a lot of reasons. I'd start by investigating the index with
> > Luke
> > >>> to see if ccc made it into the index and if I can search out the page
> > >>> with
> > >>> the word "big". From what I find out with Luke I'd work my way back
> to
> > >>> the
> > >>> root cause
> > >>>
> > >>> Sent from my iPhone
> > >>>
> > >>>
> > >>> On Mar 3, 2009, at 7:40 AM, Yves Yu <[email protected]> wrote:
> > >>>
> > >>> Hi, all,
> > >>>
> > >>>> for example,
> > >>>>
> > >>>> The page www.aaa.com has a link www.bbb.com
> > >>>> www.bbb.com has a link www.ccc.com
> > >>>> www.ccc.com has a word: big
> > >>>>
> > >>>> It seems I cannot find "big" in www.ccc.com, is it possible? How
> can
> > I
> > >>>> set
> > >>>> the configurations?
> > >>>>
> > >>>> Thanks in advance!
> > >>>>
> > >>>> Yves
> > >>>>
> > >>>>
> > >>>
> >
>
>
>
> --
> Best Regards
> Alexander Aristov
>

Reply via email to