Thanks but isn't there an option to tell nutch where to write these files to?
Jake Jacobson http://www.linkedin.com/in/jakejacobson http://www.facebook.com/jakecjacobson http://twitter.com/jakejacobson Our greatest fear should not be of failure, but of succeeding at something that doesn't really matter. -- ANONYMOUS 2009/7/16 Doğacan Güney <[email protected]>: > On Wed, Jul 15, 2009 at 15:41, Jake Jacobson<[email protected]> wrote: >> Did this with the same results. >> >> In my home directory I had a directory name "linkdb-1292468754" >> created with caused the process to run out of disk space. >> > > linkdb-<number> is not a temporary linkdb. There are two jobs that run when > you > run invertlinks. First is the inverting of new segments (which creates > the output dir > linkdb-<number>). Then new linkdb and old one is merged. > > I suggest playing with the hadoop compress options. It is discussed in > another mail > in this list (chronologically just a few email down). > >> In the hadoop-site.xml I have this set up >> >> <configuration> >> <property> >> <name>hadoop.tmp.dir</name> >> <value>/webroot/oscrawlers/nutch/tmp/</value> >> <description>A base for other temporary >> directories.</description> >> </property> >> >> </configuration> >> >> I am using the following command line options to run Nutch 1.0 >> >> /webroot/oscrawlers/nutch/bin/nutch crawl >> /webroot/oscrawlers/nutch/urls/seed.txt -dir >> /webroot/oscrawlers/nutch/crawl -depth 10 >& >> /webroot/oscrawlers/nutch/logs/crawl_log.txt >> >> In my log file I see this error message: >> >> LinkDb: adding segment: >> file:/webroot/oscrawlers/nutch/crawl/segments/20090714095100 >> Exception in thread "main" java.io.IOException: Job failed! >> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232) >> at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:170) >> at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:147) >> at org.apache.nutch.crawl.Crawl.main(Crawl.java:129) >> >> Jake Jacobson >> >> http://www.linkedin.com/in/jakejacobson >> http://www.facebook.com/jakecjacobson >> http://twitter.com/jakejacobson >> >> Our greatest fear should not be of failure, >> but of succeeding at something that doesn't really matter. >> -- ANONYMOUS >> >> >> >> On Mon, Jul 13, 2009 at 9:00 AM, SunGod<[email protected]> wrote: >>> if you use hadoop run nutch >>> >>> please add >>> >>> <property> >>> <name>hadoop.tmp.dir</name> >>> <value>/youtempfs/hadoop-${user.name}</value> >>> <description>A base for other temporary directories.</description> >>> </property> >>> >>> to you hadoop-site.xml >>> >>> 2009/7/13 Jake Jacobson <[email protected]> >>> >>>> Hi, >>>> >>>> I have tried to run nutch 1.0 several times and it fails due to lack >>>> of disk space. I have defined the crawl to place all files on a disk >>>> that has plenty of space but when it starts building the linkdb it >>>> wants to put temp files in the home dir which doesn't have enough >>>> space. How can I force Nutch not to do this? >>>> >>>> Jake Jacobson >>>> >>>> http://www.linkedin.com/in/jakejacobson >>>> http://www.facebook.com/jakecjacobson >>>> http://twitter.com/jakejacobson >>>> >>>> Our greatest fear should not be of failure, >>>> but of succeeding at something that doesn't really matter. >>>> -- ANONYMOUS >>>> >>> >> > > > > -- > Doğacan Güney >
