Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The following page has been changed by RandallLucas:
http://wiki.apache.org/nutch/GettingNutchRunningWithUbuntu

------------------------------------------------------------------------------
  
  Follow the nutch tutorial (http://lucene.apache.org/nutch/tutorial.html) to 
build a index, or for a simple index:
  
+ ''If you are using the latest "trunk" stuff, the url seeding has been changed 
from a single file to a directory.  Using trunk (after 0.7.2), put the urls in 
a file (here, called "nutch") in a DIRECTORY called "urls":''
+ 
  {{{
  [EMAIL PROTECTED]:~/nutch/trunk $ mkdir urls
  [EMAIL PROTECTED]:~/nutch/trunk $ echo 'http://lucene.apache.org/nutch/' > 
urls/nutch
+ }}}
+ 
+ ''Using 0.7.2 or before, just put urls in a FILE called "urls":''
+ 
+ {{{
+ [EMAIL PROTECTED]:~/nutch/trunk $ echo 'http://lucene.apache.org/nutch/' > 
urls
+ }}}
+ 
+ Then, in any case, you specify in the same fashion ("urls" below referring 
either to a dir or a file, depending on the version you're using):
+ 
+ {{{
  [EMAIL PROTECTED]:~/nutch/trunk $ perl -pi -e 
's|MY.DOMAIN.NAME|lucene.apache.org/nutch|' \
    conf/crawl-urlfilter.txt
  [EMAIL PROTECTED]:~/nutch/trunk $ bin/nutch crawl urls -dir crawl.test -depth 
3

Reply via email to