Thanks for the response.

I did get it going by specifing the segment (ie.
crawl/segments/20060425173804)

Per your last email, that's probably a bug as it looks
like it is supposed to invertlinks on all the segments
(LinkDb.java: 147). I'll wait for the 0.2 release, for
now this is okay for me.

As quick feedback on the tutorials, a few short lines
on these commands might really help out. The commands
that tooks me a few minutes to figure out were:

bin/nutch inject db urls (where db is that database
directory and urls is the url directory, not the
actual url.txt file)

and the line in indexing -

wiki shows: 

bin/nutch index indexes crawl/linkdb crawl/segments/* 

should be:

bin/nutch index crawl/index crawl/crawldb crawl/linkdb
crawl/segments/*

Again, as you said, maybe this is just the windows
path names bug. In which case I'll try again on hadoop
0.2.

Otherwise, everything else is fairly self-explanatory.
I'm definitely enjoying the product. When I tried
0.7.2, I was up and running in under an hour!

--- Doug Cutting <[EMAIL PROTECTED]> wrote:

> Chris Fellows wrote:
> > I'm having what appears to be the same issue on
> 0.8
> > trunk. I can get through inject, generate, fetch
> and
> > updatedb, but am getting the IOException: No input
> > directories on invertlinks and cannot figure out
> why.
> > I'm only using nutch on a single local windows
> > machine. Any idea's? Configuration has not changed
> > since checking out from svn.
> 
> The handling of Windows pathnames is still buggy in
> Hadoop 0.1.1.  You 
> might try replacing your lib/hadoop-0.1.1.jar file
> with the latest 
> Hadoop nightly jar, from:
> 
> http://cvs.apache.org/dist/lucene/hadoop/nightly/
> 
> The file name code has been extensively re-written. 
> The next Hadoop 
> release (0.2), containing these fixes, will be made
> in around a week.
> 
> Doug
> 



-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to