Much clearer.  I think to make the whole web crawling section should be
made even more clear:

Bootstrapping should be 1 option under 

A section label something like

"Defining the URLs that you want to include in your fetch"

Option 1, bootstrap DMOZ

Option 2.  Make a text file with the urls you want to crawl.
It this it should be mentioned if you want to limit crawling, you need
to set up a filter.

Common Filter Questions:

Tips should be provided that the out of the box filter in a way that
limits to pages without qs paramters, and how to remove that part of the
filter.  I mistakenly put a '+' in front of that line instaed of
commenting it out.  + has the effect of overriding your other lines.

Common configuration issues:

Maximum Content size:
Maximum Retries.

What else?


I will gladly post these chnages to the Wiki, given some votes of
confidence and other suggestions.



-----Original Message-----
From: Vanderdray, Jacob [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, March 07, 2006 1:52 PM
To: nutch-user@lucene.apache.org
Subject: Tutorial on the Wiki


        I've changed the language a bit.  If you're interested, take a
look:

http://wiki.apache.org/nutch/NutchTutorial

Thanks,
Jake.

Reply via email to