is cygwin your NUTCH_HOME ???, sounds strange. You better have a ".../nutch" home directory and run scripts from there since many default defaults assume it is that way, for instance ..../nutch/conf is where nutch is looking for all default settings.
-Ray- 2009/5/25 Richardt Hase <[email protected]> > > Thanks for the tip! > > Now I'm a bit closer to solution :) > actual batch-file looks like this: > > --- > @echo off > > > cd C:\cygwin\ > > set DEFAULT_NUTCH_HOME=C:\cygwin > set NUTCH_HOME=%DEFAULT_NUTCH_HOME% > set > NUTCH_CLASSPATH=%NUTCH_HOME%;%NUTCH_HOME%\nutch-1.0.jar;%NUTCH_HOME%\conf > > rem bash --login -i > C:\cygwin\bin\bash --login -i -c "cd C:/cygwin/ && ls && > C:/cygwin/bin/nutch crawl C:/cygwin/urls -dir crawl-test9 -depth 5" > cd C:\cygwin > --- > > Plugins are now found and getting registered, but it stops at > > ~~ > INFO conf.Configuration: regex-urlfilter.txt not found > FATAL api.RegexURLFilterBase: Can't find resource: regex-urlfilter.txt > WARN regex.RegexURLNormalizer: Can't load the default config file! > regex-normalize.xml > ~~ > > These two files are located at C:\cygwin\conf , which confuses me, because > it finds the plugins at C:\cygwin\plugins . > > Any advice? :) > > Greetings > Richardt > > > > > -------- Original-Nachricht -------- > > Datum: Mon, 18 May 2009 23:00:04 +0200 > > Von: "Raymond Balmčs" <[email protected]> > > An: [email protected] > > Betreff: Re: nutch-Batch for Task Scheduler / Windows > > > I think you need to set JAVA_HOME, besides you need to give path that > unix > > compatible (I think) > > > > 2009/5/18 Richardt Hase <[email protected]> > > > > > Hello, > > > > > > I'm using nutch 1.0 on Windows Server 2003 (actual Windows XP for > > testing) > > > for our intranet. I'm having a hard time creating a batch-file for > > automized > > > crawl (via task scheduler). > > > > > > Nutch is located in C:\cygwin\bin and this is the content of the batch: > > > > > > -- > > > @echo off > > > > > > C: > > > chdir C:\cygwin\bin > > > > > > > > > bash --login -i -c "C:/cygwin/bin/nutch crawl C:/cygwin/urls -dir > > > crawl-test9 -depth 5" > > > -- > > > > > > The problem is the following error message I'm getting in command > > line(with > > > commented echo off): > > > > > > ~~ > > > WARN plugin.PluginRepository: Plugins: directory not found: plugins > > > [...] > > > WARN mapred.LocalJobRunner: job_local_0001 > > > java.lang.RuntimeException: x point org.apache.nutch.net.URLNormalizer > > not > > > found. > > > [...] > > > Exception in thread "main" java.io.IOException: Job failed! > > > at > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232) > > > at org.apache.nutch.crawl.Injector.inject(Injector.java:160) > > > at org.apache.nutch.crawl.Crawl.main(Crawl.java:113) > > > ~~ > > > > > > Do I miss some parameter for setting a path, where nutch should look > for > > > /plugins? It's located at C:\cygwin\plugins. > > > > > > Any help is highly appreciated :) > > > -- > > > Neu: GMX FreeDSL Komplettanschluss mit DSL 6.000 Flatrate + > > > Telefonanschluss für nur 17,95 Euro/mtl.!* > > > http://dslspecial.gmx.de/freedsl-surfflat/?ac=OM.AD.PD003K11308T4569a > > > > > -- > Neu: GMX FreeDSL Komplettanschluss mit DSL 6.000 Flatrate + > Telefonanschluss für nur 17,95 Euro/mtl.!* > http://portal.gmx.net/de/go/dsl02 >
