Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The following page has been changed by JamesVictor: http://wiki.apache.org/nutch/GettingNutchRunningWithWindows The comment on the change is: removed index-more from example; it threw an exception on my indexing ------------------------------------------------------------------------------ Edit `conf/nutch-site.xml` and change the value of `plugin.includes` to include the plugins for the document types that you want Nutch to handle. - For example, to add parsing for PDF, MS Office, and OpenOffice documents, and use the `index-more` instead of `index-basic`, you'll have something like: + Example: to add parsing for PDF, MS Office, and OpenOffice documents, you'll have something like: {{{ <property> <name>plugin.includes</name> <value>protocol-http|urlfilter-regex|parse-(text|html|js|msexcel|mspowerpoint|msword|oo|pdf|swf|zip)| - index-more|query-(basic|site|url)|summary-basic|scoring-opic| + index-basic|query-(basic|site|url)|summary-basic|scoring-opic| urlnormalizer-(pass|regex|basic)</value> </property> }}} ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-cvs mailing list Nutch-cvs@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-cvs