Maybe I misunderstood you.

All I was saying is if you're wondering why you're not getting any search results on a word that is in a page that you (think you) have crawled, then it may be a good idea to use Luke (http://www.getopt.org/luke/) to look at the index and then work
backwards.

If you're just asking how you can specify which pages you want to index and which
pages not, please read the Nutch tutorial:

        http://wiki.apache.org/nutch/NutchTutorial

and look at the section about the Crawl Command, specifically how you edit the
file conf/crawl-urlfilter.txt  .

Thanks,

Jasper

On Mar 3, 2009, at 12:53 PM, Yves Yu wrote:

you mean, we can do this without additional configuration? how about 10
depth like this? how can I set it?thanks.

2009/3/4 Jasper Kamperman <[email protected]>

Could be a lot of reasons. I'd start by investigating the index with Luke to see if ccc made it into the index and if I can search out the page with the word "big". From what I find out with Luke I'd work my way back to the
root cause

Sent from my iPhone


On Mar 3, 2009, at 7:40 AM, Yves Yu <[email protected]> wrote:

Hi, all,
for example,

The page www.aaa.com has a link www.bbb.com
www.bbb.com has a link www.ccc.com
www.ccc.com has a word: big

It seems I cannot find "big" in www.ccc.com, is it possible? How can I
set
the configurations?

Thanks in advance!

Yves



Reply via email to