Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The following page has been changed by JamesVictor:
http://wiki.apache.org/nutch/GettingNutchRunningWithWindows

The comment on the change is:
added example for plugin.includes

------------------------------------------------------------------------------
  
  You'll need to delete or move the crawl directory before starting the crawl 
off again unless you specify another path on the command above.
  
+ === Analyzing Additional Resource Types ===
+ 
+ From the ["Features"]:
+ 
+ Edit `conf/nutch-site.xml` and change the value of `plugin.includes` to 
include the plugins for the document types that you want Nutch to handle.
+ 
+ For example, to add parsing for PDF, MS Office, and OpenOffice documents, and 
use the `index-more` instead of `index-basic`, you'll have something like:
+ 
+ {{{
+ <property>
+   <name>plugin.includes</name>
+   
<value>protocol-http|urlfilter-regex|parse-(text|html|js|msexcel|mspowerpoint|msword|oo|pdf|swf|zip)|
+ index-more|query-(basic|site|url)|summary-basic|scoring-opic|
+ urlnormalizer-(pass|regex|basic)</value>
+ </property>
+ }}}
+ 
  == Web Interface for Search ==
  
  In your Environment Variables settings, add `NUTCH_JAVA_HOME` and the 
location of your JVM (e.g. `C:\j2sdk1.4.2_09`) as a new Environment Variable.

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-cvs mailing list
Nutch-cvs@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-cvs

Reply via email to