Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The following page has been changed by ThorstenScherler: http://wiki.apache.org/nutch/Nutch_-_The_Java_Search_Engine The comment on the change is: Adding more information about working with trunk ------------------------------------------------------------------------------ === 3.2.2 Edit the file conf/crawl-urlfilter.txt === + If you are using TRUNK then there is no file called conf/crawl-urlfilter.txt but conf/crawl-urlfilter.txt. Just do + {{{ + cat conf/crawl-urlfilter.txt.template|sed 's/MY.DOMAIN.NAME/criaturitas.org/'g> conf/crawl-urlfilter.txt + }}} - and replace the existing domain name with the name of the domain you wish to crawl. For example, if you wished to limit the crawl to the virtusa.com domain, the line should read: + If you already have this file then replace the existing domain name with the name of the domain you wish to crawl. For example, if you wished to limit the crawl to the virtusa.com domain, the line should read: {{{ +^http://([a-z0-9]*\.)*virtusa.com/ }}} ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-cvs mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-cvs
