Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The following page has been changed by RenaudRichardet: http://wiki.apache.org/nutch/GettingNutchRunningWithWindows ------------------------------------------------------------------------------ - Since Nutch is written in Java, it should be possible to get Nutch working in a Windows environment, provided that the correct software is installed. + Since Nutch is written in Java, it is possible to get Nutch working in a Windows environment, provided that the correct software is installed. - The following documents how I got it working on Windows XP Pro running Tomcat 5.28. + The following documents describe how I got it working on Windows XP Pro running Tomcat 5.28. == Java == - You will need to have Java 1.4.2 or Java 1.5 installed. + You will need to have Java 1.4.2 (or Java 1.5 for Nutch 0.8.x or higher) installed. == Cygwin == You'll need cygwin to run the shell commands since there are no separate scripts for NT cmd (the NT cmd shell does not nest environments recursively). Mks ksh does not work correctly with the scripts. + Make sure you have installed the utility 'uname' in cygwin. == Tomcat == @@ -30, +31 @@ == Intranet Crawling == - Follow the tutorial instructions to begin the crawl by entering commands in cygwin. Depending on the commands you enter Nutch should create a crawl directory and a log file. + Follow the tutorial instructions to begin the crawl by entering commands in cygwin. Nutch will create a crawl directory and a log file. - For example, if you enter the following command: + For example, if you enter the following command from the root of your Nutch install: {{{ - bin/nutch crawl urls -dir crawled -depth 3 >& crawl.log + bin/nutch crawl urls -dir crawl -depth 3 >& crawl.log }}} - then a folder called crawled is created in your nutch directory, along with the crawl.log file. Use this log file to debug any errors you might have. From my experience you'll need to delete the crawled directory before starting the crawl off again. + then a folder called crawl/ is created in your nutch directory, along with the crawl.log file. Use this log file to debug any errors you might have. You'll need to delete or move the crawl directory before starting the crawl off again unless you specify another path on the command above. - == Serving == + == Web Interface for Search == In your Environment Variables settings, add NUTCH_JAVA_HOME and the location of your JVM (e.g. C:\j2sdk1.4.2_09) as a new Environment Variable - Open up a web browser and navigate to the Tomcat webapps manager (e.g. http://localhost:8080/manager/html) and upload the WAR file to the context. + Open up a web browser and navigate to the Tomcat webapps manager (e.g. http://localhost:8080/manager/html) and upload the nutch WAR file to the context. If a root context already exists, undeploy it. @@ -60, +61 @@ <nutch-conf> <property> <name>searcher.dir</name> - <value>your_crawled_folder_here</value> + <value>your_crawl_folder_here</value> </property> </nutch-conf> }}} - For example, if your nutch directory resides at C:\nutch-0.7.1 and you specified crawled as the directory after the -dir command, then enter C:\nutch-0.7.1\crawled\ instead of your_crawled_folder_here. + For example, if your nutch directory resides at C:\nutch-0.7.1 and you specified crawled as the directory after the -dir command, then enter C:\nutch-0.7.1\crawl\ instead of your_crawl_folder_here. Restart Tomcat using the windows services tool, open up a browser and enter the url http://localhost:8080. The nutch search page should appear. As long as you've defined the correct location of your nutch index directory as shown above then clicking search should yield results.