Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The following page has been changed by RenaudRichardet:
http://wiki.apache.org/nutch/GettingNutchRunningWithWindows

------------------------------------------------------------------------------
- Since Nutch is written in Java, it should be possible to get Nutch working in 
a Windows environment, provided that the correct software is installed.
+ Since Nutch is written in Java, it is possible to get Nutch working in a 
Windows environment, provided that the correct software is installed.
  
- The following documents how I got it working on Windows XP Pro running Tomcat 
5.28.  
+ The following documents describe how I got it working on Windows XP Pro 
running Tomcat 5.28.  
  
  == Java ==
  
- You will need to have Java 1.4.2 or Java 1.5 installed.
+ You will need to have Java 1.4.2 (or Java 1.5 for Nutch 0.8.x or higher) 
installed.
  
  == Cygwin ==
  
  You'll need cygwin to run the shell commands since there are no separate 
scripts for NT cmd (the NT cmd shell does not nest environments recursively).  
Mks ksh does not work correctly with the scripts.
+ Make sure you have installed the utility 'uname' in cygwin.
  
  == Tomcat ==
  
@@ -30, +31 @@

  
  == Intranet Crawling ==
  
- Follow the tutorial instructions to begin the crawl by entering commands in 
cygwin. Depending on the commands you enter Nutch should create a crawl 
directory and a log file.
+ Follow the tutorial instructions to begin the crawl by entering commands in 
cygwin. Nutch will create a crawl directory and a log file.
  
- For example, if you enter the following command:
+ For example, if you enter the following command from the root of your Nutch 
install:
  {{{
- bin/nutch crawl urls -dir crawled -depth 3 >& crawl.log
+ bin/nutch crawl urls -dir crawl -depth 3 >& crawl.log
  }}}
- then a folder called crawled is created in your nutch directory, along with 
the crawl.log file.  Use this log file to debug any errors you might have.  
From my experience you'll need to delete the crawled directory before starting 
the crawl off again.
+ then a folder called crawl/ is created in your nutch directory, along with 
the crawl.log file.  Use this log file to debug any errors you might have. 
You'll need to delete or move the crawl directory before starting the crawl off 
again unless you specify another path on the command above.
  
- == Serving ==
+ == Web Interface for Search ==
  
  In your Environment Variables settings, add NUTCH_JAVA_HOME and the location 
of your JVM (e.g. C:\j2sdk1.4.2_09) as a new Environment Variable
  
- Open up a web browser and navigate to the Tomcat webapps manager (e.g. 
http://localhost:8080/manager/html) and upload the WAR file to the context.
+ Open up a web browser and navigate to the Tomcat webapps manager (e.g. 
http://localhost:8080/manager/html) and upload the nutch WAR file to the 
context.
  
  If a root context already exists, undeploy it.
  
@@ -60, +61 @@

  <nutch-conf>
  <property>
      <name>searcher.dir</name>
-     <value>your_crawled_folder_here</value>
+     <value>your_crawl_folder_here</value>
    </property>
  </nutch-conf>
  }}}
  
- For example, if your nutch directory resides at C:\nutch-0.7.1 and you 
specified crawled as the directory after the -dir command, then enter 
C:\nutch-0.7.1\crawled\ instead of your_crawled_folder_here.
+ For example, if your nutch directory resides at C:\nutch-0.7.1 and you 
specified crawled as the directory after the -dir command, then enter 
C:\nutch-0.7.1\crawl\ instead of your_crawl_folder_here.
  
  Restart Tomcat using the windows services tool, open up a browser and enter 
the url http://localhost:8080.  The nutch search page should appear.  As long 
as you've defined the correct location of your nutch index directory as shown 
above then clicking search should yield results.
  

Reply via email to