Thanks for the tip!
Now I'm a bit closer to solution :)
actual batch-file looks like this:
---
@echo off
cd C:\cygwin\
set DEFAULT_NUTCH_HOME=C:\cygwin
set NUTCH_HOME=%DEFAULT_NUTCH_HOME%
set NUTCH_CLASSPATH=%NUTCH_HOME%;%NUTCH_HOME%\nutch-1.0.jar;%NUTCH_HOME%\conf
rem bash --login -i
C:\cygwin\bin\bash --login -i -c "cd C:/cygwin/ && ls && C:/cygwin/bin/nutch
crawl C:/cygwin/urls -dir crawl-test9 -depth 5"
cd C:\cygwin
---
Plugins are now found and getting registered, but it stops at
~~
INFO conf.Configuration: regex-urlfilter.txt not found
FATAL api.RegexURLFilterBase: Can't find resource: regex-urlfilter.txt
WARN regex.RegexURLNormalizer: Can't load the default config file!
regex-normalize.xml
~~
These two files are located at C:\cygwin\conf , which confuses me, because it
finds the plugins at C:\cygwin\plugins .
Any advice? :)
Greetings
Richardt
-------- Original-Nachricht --------
> Datum: Mon, 18 May 2009 23:00:04 +0200
> Von: "Raymond Balmès" <[email protected]>
> An: [email protected]
> Betreff: Re: nutch-Batch for Task Scheduler / Windows
> I think you need to set JAVA_HOME, besides you need to give path that unix
> compatible (I think)
>
> 2009/5/18 Richardt Hase <[email protected]>
>
> > Hello,
> >
> > I'm using nutch 1.0 on Windows Server 2003 (actual Windows XP for
> testing)
> > for our intranet. I'm having a hard time creating a batch-file for
> automized
> > crawl (via task scheduler).
> >
> > Nutch is located in C:\cygwin\bin and this is the content of the batch:
> >
> > --
> > @echo off
> >
> > C:
> > chdir C:\cygwin\bin
> >
> >
> > bash --login -i -c "C:/cygwin/bin/nutch crawl C:/cygwin/urls -dir
> > crawl-test9 -depth 5"
> > --
> >
> > The problem is the following error message I'm getting in command
> line(with
> > commented echo off):
> >
> > ~~
> > WARN plugin.PluginRepository: Plugins: directory not found: plugins
> > [...]
> > WARN mapred.LocalJobRunner: job_local_0001
> > java.lang.RuntimeException: x point org.apache.nutch.net.URLNormalizer
> not
> > found.
> > [...]
> > Exception in thread "main" java.io.IOException: Job failed!
> > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
> > at org.apache.nutch.crawl.Injector.inject(Injector.java:160)
> > at org.apache.nutch.crawl.Crawl.main(Crawl.java:113)
> > ~~
> >
> > Do I miss some parameter for setting a path, where nutch should look for
> > /plugins? It's located at C:\cygwin\plugins.
> >
> > Any help is highly appreciated :)
> > --
> > Neu: GMX FreeDSL Komplettanschluss mit DSL 6.000 Flatrate +
> > Telefonanschluss für nur 17,95 Euro/mtl.!*
> > http://dslspecial.gmx.de/freedsl-surfflat/?ac=OM.AD.PD003K11308T4569a
> >
--
Neu: GMX FreeDSL Komplettanschluss mit DSL 6.000 Flatrate + Telefonanschluss
für nur 17,95 Euro/mtl.!* http://portal.gmx.net/de/go/dsl02