Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The following page has been changed by tyrellperera: http://wiki.apache.org/nutch/Nutch_-_The_Java_Search_Engine ------------------------------------------------------------------------------ === 3.2.2 Edit the file conf/crawl-urlfilter.txt === - and replace the existing domain name with the name of the domain you wish to crawl. For example, if you wished to limit the crawl to the openreach.co.uk domain, the line should read: + and replace the existing domain name with the name of the domain you wish to crawl. For example, if you wished to limit the crawl to the virtusa.com domain, the line should read: {{{ +^http://([a-z0-9]*\.)*virtusa.com/ }}} @@ -154, +154 @@ == 3.3 Configuring the Nutch Web Application == - The search web application is already integrated and deployed along with the ORPG application. In order for the nutch search web application to function properly, it needs to know where to find the indexes. We need to map our indexes by editing the ânutch-site.xmlâ file. + The search web application is included in your downloaded Nutch archive. In order for the nutch search web application to function properly, it needs to know where to find the indexes. We need to map our indexes by editing the ânutch-site.xmlâ file. NOTE: the steps below assume that the