Specify temp/working directory for crawl ----------------------------------------
Key: NUTCH-159 URL: http://issues.apache.org/jira/browse/NUTCH-159 Project: Nutch Type: Bug Components: fetcher, indexer Versions: 0.8-dev Environment: Linux/Debian Reporter: byron miller I ran a crawl of 100k web pages and got: org.apache.nutch.fs.FSError: java.io.IOException: No space left on device at org.apache.nutch.fs.LocalFileSystem$LocalNFSFileOutputStream.write(LocalFileSystem.java:149) at org.apache.nutch.fs.FileUtil.copyContents(FileUtil.java:65) at org.apache.nutch.fs.LocalFileSystem.renameRaw(LocalFileSystem.java:178) at org.apache.nutch.fs.NutchFileSystem.rename(NutchFileSystem.java:224) at org.apache.nutch.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:80) Caused by: java.io.IOException: No space left on device at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:260) at org.apache.nutch.fs.LocalFileSystem$LocalNFSFileOutputStream.write(LocalFileSystem.java:147) ... 4 more Exception in thread "main" java.io.IOException: Job failed! at org.apache.nutch.mapred.JobClient.runJob(JobClient.java:308) at org.apache.nutch.crawl.Fetcher.fetch(Fetcher.java:335) at org.apache.nutch.crawl.Crawl.main(Crawl.java:107) [EMAIL PROTECTED]:/data/nutch$ df -k It appears crawl created a /tmp/nutch directory that filled up even though i specified a db directory. Need to add a parameter to the command line or make a globaly configurable /tmp (work area) for the nutch instance so that crawls won't fail. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira