Hi, I am trying to use nutch-0.8-dev and I have a problem with crawl run. I did checkout from SVN and prepared fresh package (ant package - all went fine). Then I installed nutch on linux and made only minor changes to nutch-site.xml file (turned on some plugins and increased several constansts), prepared file with urls and started bin/nutch crawl.
This worked for nutch-0.7x but for nutch-0.8-dev I am receiving the following exception in log file: 051220 204248 parsing file:/home/lukas/nutch/nutch-0.8-dev/conf/nutch-default.xml 051220 204249 parsing file:/home/lukas/nutch/nutch-0.8-dev/conf/crawl-tool.xml 051220 204249 parsing file:/home/lukas/nutch/nutch-0.8-dev/conf/mapred-default.xml 051220 204249 parsing file:/home/lukas/nutch/nutch-0.8-dev/conf/nutch-site.xml 051220 204249 crawl started in: ./crawl.test 051220 204249 rootUrlDir = urls 051220 204249 threads = 10 051220 204249 depth = 6 051220 204249 parsing file:/home/lukas/nutch/nutch-0.8-dev/conf/nutch-default.xml 051220 204249 parsing file:/home/lukas/nutch/nutch-0.8-dev/conf/crawl-tool.xml 051220 204249 parsing file:/home/lukas/nutch/nutch-0.8-dev/conf/nutch-site.xml 051220 204249 Injector: starting 051220 204249 Injector: crawlDb: ./crawl.test/crawldb 051220 204249 Injector: urlDir: urls 051220 204249 Injector: Converting injected urls to crawl db entries. 051220 204249 parsing file:/home/lukas/nutch/nutch-0.8-dev/conf/nutch-default.xml 051220 204249 parsing file:/home/lukas/nutch/nutch-0.8-dev/conf/crawl-tool.xml 051220 204249 parsing file:/home/lukas/nutch/nutch-0.8-dev/conf/mapred-default.xml 051220 204249 parsing file:/home/lukas/nutch/nutch-0.8-dev/conf/mapred-default.xml 051220 204249 parsing file:/home/lukas/nutch/nutch-0.8-dev/conf/nutch-site.xml 051220 204249 parsing file:/home/lukas/nutch/nutch-0.8-dev/conf/nutch-default.xml 051220 204249 parsing file:/home/lukas/nutch/nutch-0.8-dev/conf/mapred-default.xml 051220 204249 parsing /home/lukas/nutch/mapred/local/localRunner/job_4zwds6.xml 051220 204249 parsing file:/home/lukas/nutch/nutch-0.8-dev/conf/nutch-site.xml java.io.IOException: No input directories specified in: NutchConf: nutch-default.xml , mapred-default.xml , /home/lukas/nutch/mapred/local/localRunner/job_4zwds6.xml , nutch-site.xml at org.apache.nutch.mapred.InputFormatBase.listFiles(InputFormatBase.java:85) at org.apache.nutch.mapred.InputFormatBase.getSplits(InputFormatBase.java:95) at org.apache.nutch.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:63) 051220 204249 Running job: job_4zwds6 Exception in thread "main" java.io.IOException: Job failed! at org.apache.nutch.mapred.JobClient.runJob(JobClient.java:308) at org.apache.nutch.crawl.Injector.inject(Injector.java:102) at org.apache.nutch.crawl.Crawl.main(Crawl.java:101) It seems that the problem is that Nutch is not able to find mapred.input.subdir setting in neither of config files. I found that there is mapred.input.dir property defined in config for particular job (job_4zwds6.xml) with value equal to the name of my urls file but I don't understand where should I define mapred.input.subdir property and what value to assign to it (if it needs to be defined manually - note that mapred.input.dir seems to be configured automatically). Does anybody know the answer? p.s: Note that number of lines it the exception trace above for InputFormatBase.java file (85,95) can differ a bit as I tried to insert some more LOG.debug() commands there in search of the root cause and then I removed them again but it is possible that I left some extra empty lines there. Thanks, Lukas