Thanks but isn't there an option to tell nutch where to write these files to?

Jake Jacobson

http://www.linkedin.com/in/jakejacobson
http://www.facebook.com/jakecjacobson
http://twitter.com/jakejacobson

Our greatest fear should not be of failure,
but of succeeding at something that doesn't really matter.
   -- ANONYMOUS



2009/7/16 Doğacan Güney <[email protected]>:
> On Wed, Jul 15, 2009 at 15:41, Jake Jacobson<[email protected]> wrote:
>> Did this with the same results.
>>
>> In my home directory I had a directory name "linkdb-1292468754"
>> created with caused the process to run out of disk space.
>>
>
> linkdb-<number> is not a temporary linkdb. There are two jobs that run when 
> you
> run invertlinks. First is the inverting of new segments (which creates
> the output dir
> linkdb-<number>). Then new linkdb and old one is merged.
>
> I suggest playing with the hadoop compress options. It is discussed in
> another mail
> in this list (chronologically just a few email down).
>
>> In the hadoop-site.xml I have this set up
>>
>> <configuration>
>>        <property>
>>                <name>hadoop.tmp.dir</name>
>>                <value>/webroot/oscrawlers/nutch/tmp/</value>
>>                <description>A base for other temporary
>> directories.</description>
>>        </property>
>>
>> </configuration>
>>
>> I am using the following command line options to run Nutch 1.0
>>
>> /webroot/oscrawlers/nutch/bin/nutch crawl
>> /webroot/oscrawlers/nutch/urls/seed.txt -dir
>> /webroot/oscrawlers/nutch/crawl -depth 10 >&
>> /webroot/oscrawlers/nutch/logs/crawl_log.txt
>>
>> In my log file I see this error message:
>>
>> LinkDb: adding segment:
>> file:/webroot/oscrawlers/nutch/crawl/segments/20090714095100
>> Exception in thread "main" java.io.IOException: Job failed!
>>        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
>>        at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:170)
>>        at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:147)
>>        at org.apache.nutch.crawl.Crawl.main(Crawl.java:129)
>>
>> Jake Jacobson
>>
>> http://www.linkedin.com/in/jakejacobson
>> http://www.facebook.com/jakecjacobson
>> http://twitter.com/jakejacobson
>>
>> Our greatest fear should not be of failure,
>> but of succeeding at something that doesn't really matter.
>>   -- ANONYMOUS
>>
>>
>>
>> On Mon, Jul 13, 2009 at 9:00 AM, SunGod<[email protected]> wrote:
>>> if you use hadoop run nutch
>>>
>>> please add
>>>
>>> <property>
>>>  <name>hadoop.tmp.dir</name>
>>>  <value>/youtempfs/hadoop-${user.name}</value>
>>>  <description>A base for other temporary directories.</description>
>>> </property>
>>>
>>> to you hadoop-site.xml
>>>
>>> 2009/7/13 Jake Jacobson <[email protected]>
>>>
>>>> Hi,
>>>>
>>>> I have tried to run nutch 1.0 several times and it fails due to lack
>>>> of disk space.  I have defined the crawl to place all files on a disk
>>>> that has plenty of space but when it starts building the linkdb it
>>>> wants to put temp files in the home dir which doesn't have enough
>>>> space.  How can I force Nutch not to do this?
>>>>
>>>> Jake Jacobson
>>>>
>>>> http://www.linkedin.com/in/jakejacobson
>>>> http://www.facebook.com/jakecjacobson
>>>> http://twitter.com/jakejacobson
>>>>
>>>> Our greatest fear should not be of failure,
>>>> but of succeeding at something that doesn't really matter.
>>>>   -- ANONYMOUS
>>>>
>>>
>>
>
>
>
> --
> Doğacan Güney
>

Reply via email to