You can ignore mapred.input.subdir; I find it is an unneeded option.
Now that the mapred branch is merged to be the trunk, there is a need to clarify the documentation since the a change was made to have the input be specified as a directory and then all files in that directory are considered input files (no wildcard needed). I will put that on my ToDo list. mapred.input.dir is an abstract path that is either the OS filesystem or NDFS, depending on which is in use (if fs.default.name is "local" then the local OS fs is being used, otherwise fs.default.name is something like domainOfMyMasterNode:port). To use NDFS, you need to copy your input file(s) from your local fs to NDFS: bin/nutch ndfs -put /home/peb/urls_localfs/oneFILENAME /urls The destination path "/urls" is arbitrary and is created as a side effect of the file -put. Repeat this for each file you have. Paul Lukas Vlcek wrote: > java.io.IOException: No input directories specified in: NutchConf: > nutch-default.xml , mapred-default.xml , > /home/lukas/nutch/mapred/local/localRunner/job_4zwds6.xml , > nutch-site.xml > at org.apache.nutch.mapred.InputFormatBase.listFiles(InputFormatBase.java:85) > at org.apache.nutch.mapred.InputFormatBase.getSplits(InputFormatBase.java:95) > at org.apache.nutch.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:63) > 051220 204249 Running job: job_4zwds6 > Exception in thread "main" java.io.IOException: Job failed! > at org.apache.nutch.mapred.JobClient.runJob(JobClient.java:308) > at org.apache.nutch.crawl.Injector.inject(Injector.java:102) > at org.apache.nutch.crawl.Crawl.main(Crawl.java:101)