maybe it is about windows path names and file names.

in windows, path names and file names can have whitespace. but nutch
don't get it right in this case. at least nutch 0.9 has problem about
this issue. you can try to set debug mode with protocol-file plugin in
log4j.properties file as follows to see what happened:

log4j.logger.org.apache.nutch.protocol.file=DEBUG,cmdstdout

if that's the case, here is the workaround:

in FileResponse, find these lines:

// url.toURI() is only in j2se 1.5.0
//java.io.File f = new java.io.File(url.toURI());
java.io.File f = new java.io.File(path);

change to these:

java.io.File f = new java.io.File(url.toURI());
//java.io.File f = new java.io.File(path);

and run ant compile

good luck



2009/4/3, Hannu Väisänen <[email protected]>:
> I am using Nutch to index my hard disk.
>
> Nutch is skipping some files. They do not show in Nutch logs (like
> fetching file:...) and it is as if Nutch do not notice that they
> exist.
>
> But when I moved one file that Nutch did not notice to a test
> directory that had only a few files and indexed only that directory,
> Nutch did index the file.
>
> Any ideas on how I can debug the problem?
>

Reply via email to