[Nutch-general] Re: java.io.IOException: No input directories specified in

Peter Swoboda Wed, 26 Apr 2006 23:27:01 -0700

Mhh. But seems to be nearly the same problem like mine.
And i'm running unix.



Chris Fellows schrieb:

Thanks for the response.

I did get it going by specifing the segment (ie.
crawl/segments/20060425173804)

Per your last email, that's probably a bug as it looks
like it is supposed to invertlinks on all the segments
(LinkDb.java: 147). I'll wait for the 0.2 release, for
now this is okay for me.

As quick feedback on the tutorials, a few short lines
on these commands might really help out. The commands
that tooks me a few minutes to figure out were:

bin/nutch inject db urls (where db is that database
directory and urls is the url directory, not the
actual url.txt file)

and the line in indexing -

wiki shows:bin/nutch index indexes crawl/linkdb crawl/segments/*

should be:

bin/nutch index crawl/index crawl/crawldb crawl/linkdb
crawl/segments/*

Again, as you said, maybe this is just the windows
path names bug. In which case I'll try again on hadoop
0.2.

Otherwise, everything else is fairly self-explanatory.
I'm definitely enjoying the product. When I tried
0.7.2, I was up and running in under an hour!

--- Doug Cutting <[EMAIL PROTECTED]> wrote:

Chris Fellows wrote:
I'm having what appears to be the same issue on
0.8
trunk. I can get through inject, generate, fetch
and
updatedb, but am getting the IOException: No input
directories on invertlinks and cannot figure out
why.
I'm only using nutch on a single local windows
machine. Any idea's? Configuration has not changed
since checking out from svn.
The handling of Windows pathnames is still buggy in
Hadoop 0.1.1. Youmight try replacing your lib/hadoop-0.1.1.jar filewith the latestHadoop nightly jar, from:
http://cvs.apache.org/dist/lucene/hadoop/nightly/
The file name code has been extensively re-written.The next Hadooprelease (0.2), containing these fixes, will be made
in around a week.

Doug

[Nutch-general] Re: java.io.IOException: No input directories specified in

Reply via email to