Maybe I misunderstood you.
All I was saying is if you're wondering why you're not getting any
search results on
a word that is in a page that you (think you) have crawled, then it
may be a good
idea to use Luke (http://www.getopt.org/luke/) to look at the index
and then work
backwards.
If you're just asking how you can specify which pages you want to
index and which
pages not, please read the Nutch tutorial:
http://wiki.apache.org/nutch/NutchTutorial
and look at the section about the Crawl Command, specifically how you
edit the
file conf/crawl-urlfilter.txt .
Thanks,
Jasper
On Mar 3, 2009, at 12:53 PM, Yves Yu wrote:
you mean, we can do this without additional configuration? how about
10
depth like this? how can I set it?thanks.
2009/3/4 Jasper Kamperman <[email protected]>
Could be a lot of reasons. I'd start by investigating the index
with Luke
to see if ccc made it into the index and if I can search out the
page with
the word "big". From what I find out with Luke I'd work my way back
to the
root cause
Sent from my iPhone
On Mar 3, 2009, at 7:40 AM, Yves Yu <[email protected]> wrote:
Hi, all,
for example,
The page www.aaa.com has a link www.bbb.com
www.bbb.com has a link www.ccc.com
www.ccc.com has a word: big
It seems I cannot find "big" in www.ccc.com, is it possible? How
can I
set
the configurations?
Thanks in advance!
Yves