is cygwin your NUTCH_HOME ???, sounds strange.
You better have a ".../nutch" home directory and run scripts from there
since many default defaults assume it is that way, for instance
..../nutch/conf is where nutch is looking for all default settings.

-Ray-

2009/5/25 Richardt Hase <[email protected]>

>
> Thanks for the tip!
>
> Now I'm a bit closer to solution  :)
> actual batch-file looks like this:
>
> ---
> @echo off
>
>
> cd C:\cygwin\
>
> set DEFAULT_NUTCH_HOME=C:\cygwin
> set NUTCH_HOME=%DEFAULT_NUTCH_HOME%
> set
> NUTCH_CLASSPATH=%NUTCH_HOME%;%NUTCH_HOME%\nutch-1.0.jar;%NUTCH_HOME%\conf
>
> rem bash --login -i
> C:\cygwin\bin\bash --login -i -c "cd C:/cygwin/ && ls &&
> C:/cygwin/bin/nutch crawl C:/cygwin/urls -dir crawl-test9 -depth 5"
> cd C:\cygwin
> ---
>
> Plugins are now found and getting registered, but it stops at
>
> ~~
> INFO conf.Configuration: regex-urlfilter.txt not found
> FATAL api.RegexURLFilterBase: Can't find resource: regex-urlfilter.txt
> WARN regex.RegexURLNormalizer: Can't load the default config file!
> regex-normalize.xml
> ~~
>
> These two files are located at C:\cygwin\conf , which confuses me, because
> it finds the plugins at C:\cygwin\plugins .
>
> Any advice? :)
>
> Greetings
> Richardt
>
>
>
>
> -------- Original-Nachricht --------
> > Datum: Mon, 18 May 2009 23:00:04 +0200
> > Von: "Raymond Balmčs" <[email protected]>
> > An: [email protected]
> > Betreff: Re: nutch-Batch for Task Scheduler / Windows
>
> > I think you need to set JAVA_HOME, besides you need to give path that
> unix
> > compatible (I think)
> >
> > 2009/5/18 Richardt Hase <[email protected]>
> >
> > > Hello,
> > >
> > > I'm using nutch 1.0 on Windows Server 2003 (actual Windows XP for
> > testing)
> > > for our intranet. I'm having a hard time creating a batch-file for
> > automized
> > > crawl (via task scheduler).
> > >
> > > Nutch is located in C:\cygwin\bin and this is the content of the batch:
> > >
> > > --
> > > @echo off
> > >
> > > C:
> > > chdir C:\cygwin\bin
> > >
> > >
> > > bash --login -i -c "C:/cygwin/bin/nutch crawl C:/cygwin/urls -dir
> > > crawl-test9 -depth 5"
> > > --
> > >
> > > The problem is the following error message I'm getting in command
> > line(with
> > > commented echo off):
> > >
> > > ~~
> > > WARN plugin.PluginRepository: Plugins: directory not found: plugins
> > > [...]
> > > WARN mapred.LocalJobRunner: job_local_0001
> > > java.lang.RuntimeException: x point org.apache.nutch.net.URLNormalizer
> > not
> > > found.
> > > [...]
> > > Exception in thread "main" java.io.IOException: Job failed!
> > >        at
> org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
> > >        at org.apache.nutch.crawl.Injector.inject(Injector.java:160)
> > >        at org.apache.nutch.crawl.Crawl.main(Crawl.java:113)
> > > ~~
> > >
> > > Do I miss some parameter for setting a path, where nutch should look
> for
> > > /plugins? It's located at C:\cygwin\plugins.
> > >
> > > Any help is highly appreciated :)
> > > --
> > > Neu: GMX FreeDSL Komplettanschluss mit DSL 6.000 Flatrate +
> > > Telefonanschluss für nur 17,95 Euro/mtl.!*
> > > http://dslspecial.gmx.de/freedsl-surfflat/?ac=OM.AD.PD003K11308T4569a
> > >
>
> --
> Neu: GMX FreeDSL Komplettanschluss mit DSL 6.000 Flatrate +
> Telefonanschluss für nur 17,95 Euro/mtl.!*
> http://portal.gmx.net/de/go/dsl02
>

Reply via email to