Stefan, Nutch created folders in /tmp so I think if it should able to create files there as well. I also tried to change all /tmp* in conf file to my home folder with the same result (i.e.: folders were created and several files were dumped there but it yielded the same exception).
Are you able to run nutch from up-to-date trunk package build? May be I didn't explain it clearly - I am using untch-0.8-dev which I get from nutch-trunk. Regards, Lukas On 12/21/05, Stefan Groschupf <[EMAIL PROTECTED]> wrote: > Lukas, > the input folder are normally setted by the tools to you can not > change that. > However in case you use a unix box, check that the user that runs > nutch has read and write acess to all the folder defined in the nutch- > site/default.xml. > (I guess that can be the problem, nutch use e.g. /tmp to write in > some data) > If this not solve the problem, just run the commands manually step by > step, there is a tutorial in the wiki how to run the map rd commands > step by step. > > Stefan > > Am 21.12.2005 um 06:56 schrieb Lukas Vlcek: > > > Hi, > > > > I am trying to use nutch-0.8-dev and I have a problem with crawl run. > > I did checkout from SVN and prepared fresh package (ant package - all > > went fine). Then I installed nutch on linux and made only minor > > changes to nutch-site.xml file (turned on some plugins and increased > > several constansts), prepared file with urls and started bin/nutch > > crawl. > > > > This worked for nutch-0.7x but for nutch-0.8-dev I am receiving the > > following exception in log file: > > > > 051220 204248 parsing > > file:/home/lukas/nutch/nutch-0.8-dev/conf/nutch-default.xml > > 051220 204249 parsing file:/home/lukas/nutch/nutch-0.8-dev/conf/ > > crawl-tool.xml > > 051220 204249 parsing > > file:/home/lukas/nutch/nutch-0.8-dev/conf/mapred-default.xml > > 051220 204249 parsing file:/home/lukas/nutch/nutch-0.8-dev/conf/ > > nutch-site.xml > > 051220 204249 crawl started in: ./crawl.test > > 051220 204249 rootUrlDir = urls > > 051220 204249 threads = 10 > > 051220 204249 depth = 6 > > 051220 204249 parsing > > file:/home/lukas/nutch/nutch-0.8-dev/conf/nutch-default.xml > > 051220 204249 parsing file:/home/lukas/nutch/nutch-0.8-dev/conf/ > > crawl-tool.xml > > 051220 204249 parsing file:/home/lukas/nutch/nutch-0.8-dev/conf/ > > nutch-site.xml > > 051220 204249 Injector: starting > > 051220 204249 Injector: crawlDb: ./crawl.test/crawldb > > 051220 204249 Injector: urlDir: urls > > 051220 204249 Injector: Converting injected urls to crawl db entries. > > 051220 204249 parsing > > file:/home/lukas/nutch/nutch-0.8-dev/conf/nutch-default.xml > > 051220 204249 parsing file:/home/lukas/nutch/nutch-0.8-dev/conf/ > > crawl-tool.xml > > 051220 204249 parsing > > file:/home/lukas/nutch/nutch-0.8-dev/conf/mapred-default.xml > > 051220 204249 parsing > > file:/home/lukas/nutch/nutch-0.8-dev/conf/mapred-default.xml > > 051220 204249 parsing file:/home/lukas/nutch/nutch-0.8-dev/conf/ > > nutch-site.xml > > 051220 204249 parsing > > file:/home/lukas/nutch/nutch-0.8-dev/conf/nutch-default.xml > > 051220 204249 parsing > > file:/home/lukas/nutch/nutch-0.8-dev/conf/mapred-default.xml > > 051220 204249 parsing /home/lukas/nutch/mapred/local/localRunner/ > > job_4zwds6.xml > > 051220 204249 parsing file:/home/lukas/nutch/nutch-0.8-dev/conf/ > > nutch-site.xml > > java.io.IOException: No input directories specified in: NutchConf: > > nutch-default.xml , mapred-default.xml , > > /home/lukas/nutch/mapred/local/localRunner/job_4zwds6.xml , > > nutch-site.xml > > at org.apache.nutch.mapred.InputFormatBase.listFiles > > (InputFormatBase.java:85) > > at org.apache.nutch.mapred.InputFormatBase.getSplits > > (InputFormatBase.java:95) > > at org.apache.nutch.mapred.LocalJobRunner$Job.run > > (LocalJobRunner.java:63) > > 051220 204249 Running job: job_4zwds6 > > Exception in thread "main" java.io.IOException: Job failed! > > at org.apache.nutch.mapred.JobClient.runJob(JobClient.java: > > 308) > > at org.apache.nutch.crawl.Injector.inject(Injector.java:102) > > at org.apache.nutch.crawl.Crawl.main(Crawl.java:101) > > > > It seems that the problem is that Nutch is not able to find > > mapred.input.subdir setting in neither of config files. I found that > > there is mapred.input.dir property defined in config for particular > > job (job_4zwds6.xml) with value equal to the name of my urls file but > > I don't understand where should I define mapred.input.subdir property > > and what value to assign to it (if it needs to be defined manually - > > note that mapred.input.dir seems to be configured automatically). > > > > Does anybody know the answer? > > > > p.s: Note that number of lines it the exception trace above for > > InputFormatBase.java file (85,95) can differ a bit as I tried to > > insert some more LOG.debug() commands there in search of the root > > cause and then I removed them again but it is possible that I left > > some extra empty lines there. > > > > Thanks, > > Lukas > > > > ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_idv37&alloc_id865&op=click _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
