Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The following page has been changed by RandallLucas: http://wiki.apache.org/nutch/GettingNutchRunningWithUbuntu ------------------------------------------------------------------------------ Follow the nutch tutorial (http://lucene.apache.org/nutch/tutorial.html) to build a index, or for a simple index: + ''If you are using the latest "trunk" stuff, the url seeding has been changed from a single file to a directory. Using trunk (after 0.7.2), put the urls in a file (here, called "nutch") in a DIRECTORY called "urls":'' + {{{ [EMAIL PROTECTED]:~/nutch/trunk $ mkdir urls [EMAIL PROTECTED]:~/nutch/trunk $ echo 'http://lucene.apache.org/nutch/' > urls/nutch + }}} + + ''Using 0.7.2 or before, just put urls in a FILE called "urls":'' + + {{{ + [EMAIL PROTECTED]:~/nutch/trunk $ echo 'http://lucene.apache.org/nutch/' > urls + }}} + + Then, in any case, you specify in the same fashion ("urls" below referring either to a dir or a file, depending on the version you're using): + + {{{ [EMAIL PROTECTED]:~/nutch/trunk $ perl -pi -e 's|MY.DOMAIN.NAME|lucene.apache.org/nutch|' \ conf/crawl-urlfilter.txt [EMAIL PROTECTED]:~/nutch/trunk $ bin/nutch crawl urls -dir crawl.test -depth 3