[Nutch Wiki] Update of "FAQ" by ThorstenScherler

Apache Wiki Mon, 27 Nov 2006 01:11:08 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.


The following page has been changed by ThorstenScherler:
http://wiki.apache.org/nutch/FAQ

------------------------------------------------------------------------------
  
  The crawl tool expects as its first parameter the folder name where the 
seeding urls file is located so for example if your urls.txt is located in 
/nutch/seeds the crawl command would look like: crawl seed -dir 
/user/nutchuser...
  
+ ==== Nutch crawling parent directories for file protocol ->  misconfigured 
URLFilters ====
+ [http://issues.apache.org/jira/browse/NUTCH-407] E.g. for urlfilter-regex you 
should put the following in regex-urlfilter.txt :
+ {{{
+ 
+ +^file:///c:/top/directory/
+ -.
+ }}}
+ 
  === Discussion ===
  
  [http://grub.org/ Grub] has some interesting ideas about building a search 
engine using distributed computing. ''And how is that relevant to nutch?''

[Nutch Wiki] Update of "FAQ" by ThorstenScherler

Reply via email to