On Wed, Jul 15, 2009 at 15:41, Jake Jacobson<[email protected]> wrote: > Did this with the same results. > > In my home directory I had a directory name "linkdb-1292468754" > created with caused the process to run out of disk space. >
linkdb-<number> is not a temporary linkdb. There are two jobs that run when you run invertlinks. First is the inverting of new segments (which creates the output dir linkdb-<number>). Then new linkdb and old one is merged. I suggest playing with the hadoop compress options. It is discussed in another mail in this list (chronologically just a few email down). > In the hadoop-site.xml I have this set up > > <configuration> > <property> > <name>hadoop.tmp.dir</name> > <value>/webroot/oscrawlers/nutch/tmp/</value> > <description>A base for other temporary > directories.</description> > </property> > > </configuration> > > I am using the following command line options to run Nutch 1.0 > > /webroot/oscrawlers/nutch/bin/nutch crawl > /webroot/oscrawlers/nutch/urls/seed.txt -dir > /webroot/oscrawlers/nutch/crawl -depth 10 >& > /webroot/oscrawlers/nutch/logs/crawl_log.txt > > In my log file I see this error message: > > LinkDb: adding segment: > file:/webroot/oscrawlers/nutch/crawl/segments/20090714095100 > Exception in thread "main" java.io.IOException: Job failed! > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232) > at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:170) > at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:147) > at org.apache.nutch.crawl.Crawl.main(Crawl.java:129) > > Jake Jacobson > > http://www.linkedin.com/in/jakejacobson > http://www.facebook.com/jakecjacobson > http://twitter.com/jakejacobson > > Our greatest fear should not be of failure, > but of succeeding at something that doesn't really matter. > -- ANONYMOUS > > > > On Mon, Jul 13, 2009 at 9:00 AM, SunGod<[email protected]> wrote: >> if you use hadoop run nutch >> >> please add >> >> <property> >> <name>hadoop.tmp.dir</name> >> <value>/youtempfs/hadoop-${user.name}</value> >> <description>A base for other temporary directories.</description> >> </property> >> >> to you hadoop-site.xml >> >> 2009/7/13 Jake Jacobson <[email protected]> >> >>> Hi, >>> >>> I have tried to run nutch 1.0 several times and it fails due to lack >>> of disk space. I have defined the crawl to place all files on a disk >>> that has plenty of space but when it starts building the linkdb it >>> wants to put temp files in the home dir which doesn't have enough >>> space. How can I force Nutch not to do this? >>> >>> Jake Jacobson >>> >>> http://www.linkedin.com/in/jakejacobson >>> http://www.facebook.com/jakecjacobson >>> http://twitter.com/jakejacobson >>> >>> Our greatest fear should not be of failure, >>> but of succeeding at something that doesn't really matter. >>> -- ANONYMOUS >>> >> > -- Doğacan Güney
