Indexing Process

Jeff Maki Thu, 20 Sep 2007 08:41:50 -0700

Hello everyone,

I'm not going to post my config files as not to spam you all, but I
have a general question: I'm trying to index the pages of a website
(obviously), and I've created a special page with a link to all the
pages I want to index. I then pointed nutch to this special link page.
I set max_outlinks appropriately, and I do see all the page URLs I
expect go by in the log for the fetching stage.


When nutch gets to indexing, however, not all the documents appear in
the log--it looks as if not all of the fetched pages are being
indexed. Searching for terms I know are on the missing pages also
turns up nothing--they're not in the index!?

Can anybody tell me what factors affect the indexing stage? I want to
have nutch index *all* documents it fetches. How can I do this?

Any tips/ideas/things to configure?

Thanks in advance,

-Jeff

Indexing Process

Reply via email to