On Wed, Jul 15, 2009 at 15:41, Jake Jacobson<[email protected]> wrote:
> Did this with the same results.
>
> In my home directory I had a directory name "linkdb-1292468754"
> created with caused the process to run out of disk space.
>

linkdb-<number> is not a temporary linkdb. There are two jobs that run when you
run invertlinks. First is the inverting of new segments (which creates
the output dir
linkdb-<number>). Then new linkdb and old one is merged.

I suggest playing with the hadoop compress options. It is discussed in
another mail
in this list (chronologically just a few email down).

> In the hadoop-site.xml I have this set up
>
> <configuration>
>        <property>
>                <name>hadoop.tmp.dir</name>
>                <value>/webroot/oscrawlers/nutch/tmp/</value>
>                <description>A base for other temporary
> directories.</description>
>        </property>
>
> </configuration>
>
> I am using the following command line options to run Nutch 1.0
>
> /webroot/oscrawlers/nutch/bin/nutch crawl
> /webroot/oscrawlers/nutch/urls/seed.txt -dir
> /webroot/oscrawlers/nutch/crawl -depth 10 >&
> /webroot/oscrawlers/nutch/logs/crawl_log.txt
>
> In my log file I see this error message:
>
> LinkDb: adding segment:
> file:/webroot/oscrawlers/nutch/crawl/segments/20090714095100
> Exception in thread "main" java.io.IOException: Job failed!
>        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
>        at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:170)
>        at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:147)
>        at org.apache.nutch.crawl.Crawl.main(Crawl.java:129)
>
> Jake Jacobson
>
> http://www.linkedin.com/in/jakejacobson
> http://www.facebook.com/jakecjacobson
> http://twitter.com/jakejacobson
>
> Our greatest fear should not be of failure,
> but of succeeding at something that doesn't really matter.
>   -- ANONYMOUS
>
>
>
> On Mon, Jul 13, 2009 at 9:00 AM, SunGod<[email protected]> wrote:
>> if you use hadoop run nutch
>>
>> please add
>>
>> <property>
>>  <name>hadoop.tmp.dir</name>
>>  <value>/youtempfs/hadoop-${user.name}</value>
>>  <description>A base for other temporary directories.</description>
>> </property>
>>
>> to you hadoop-site.xml
>>
>> 2009/7/13 Jake Jacobson <[email protected]>
>>
>>> Hi,
>>>
>>> I have tried to run nutch 1.0 several times and it fails due to lack
>>> of disk space.  I have defined the crawl to place all files on a disk
>>> that has plenty of space but when it starts building the linkdb it
>>> wants to put temp files in the home dir which doesn't have enough
>>> space.  How can I force Nutch not to do this?
>>>
>>> Jake Jacobson
>>>
>>> http://www.linkedin.com/in/jakejacobson
>>> http://www.facebook.com/jakecjacobson
>>> http://twitter.com/jakejacobson
>>>
>>> Our greatest fear should not be of failure,
>>> but of succeeding at something that doesn't really matter.
>>>   -- ANONYMOUS
>>>
>>
>



-- 
Doğacan Güney

Reply via email to