Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The following page has been changed by ThorstenScherler: http://wiki.apache.org/nutch/FAQ ------------------------------------------------------------------------------ The crawl tool expects as its first parameter the folder name where the seeding urls file is located so for example if your urls.txt is located in /nutch/seeds the crawl command would look like: crawl seed -dir /user/nutchuser... + ==== Nutch crawling parent directories for file protocol -> misconfigured URLFilters ==== + [http://issues.apache.org/jira/browse/NUTCH-407] E.g. for urlfilter-regex you should put the following in regex-urlfilter.txt : + {{{ + + +^file:///c:/top/directory/ + -. + }}} + === Discussion === [http://grub.org/ Grub] has some interesting ideas about building a search engine using distributed computing. ''And how is that relevant to nutch?''