Well do you have enough space in the filesystem where nutch is installed ?
Because I noticed that Nutch does create some temp files in the nutch
default directory, they are not all in the hadoop.tmp.dir location, not sure
if it is bug or else.
I did move nutch installation to a bigger filesystem to avoid this potential
problem.

I assume you are using a local filesystem and not the hadoop distributed
mode.


2009/7/16 Doğacan Güney <[email protected]>

> On Thu, Jul 16, 2009 at 17:25, Jake Jacobson<[email protected]>
> wrote:
> > Thanks but isn't there an option to tell nutch where to write these files
> to?
> >
>
> There is an option to write temporary mapred files but not (for the
> most part) where to
> write output files for jobs. However, you can change nutch code to
> write linkdb-<number>
> to another directory (Take a look at LinkDb#createJob method).
>
> > Jake Jacobson
> >
> > http://www.linkedin.com/in/jakejacobson
> > http://www.facebook.com/jakecjacobson
> > http://twitter.com/jakejacobson
> >
> > Our greatest fear should not be of failure,
> > but of succeeding at something that doesn't really matter.
> >   -- ANONYMOUS
> >
> >
> >
> > 2009/7/16 Doğacan Güney <[email protected]>:
> >> On Wed, Jul 15, 2009 at 15:41, Jake Jacobson<[email protected]>
> wrote:
> >>> Did this with the same results.
> >>>
> >>> In my home directory I had a directory name "linkdb-1292468754"
> >>> created with caused the process to run out of disk space.
> >>>
> >>
> >> linkdb-<number> is not a temporary linkdb. There are two jobs that run
> when you
> >> run invertlinks. First is the inverting of new segments (which creates
> >> the output dir
> >> linkdb-<number>). Then new linkdb and old one is merged.
> >>
> >> I suggest playing with the hadoop compress options. It is discussed in
> >> another mail
> >> in this list (chronologically just a few email down).
> >>
> >>> In the hadoop-site.xml I have this set up
> >>>
> >>> <configuration>
> >>>        <property>
> >>>                <name>hadoop.tmp.dir</name>
> >>>                <value>/webroot/oscrawlers/nutch/tmp/</value>
> >>>                <description>A base for other temporary
> >>> directories.</description>
> >>>        </property>
> >>>
> >>> </configuration>
> >>>
> >>> I am using the following command line options to run Nutch 1.0
> >>>
> >>> /webroot/oscrawlers/nutch/bin/nutch crawl
> >>> /webroot/oscrawlers/nutch/urls/seed.txt -dir
> >>> /webroot/oscrawlers/nutch/crawl -depth 10 >&
> >>> /webroot/oscrawlers/nutch/logs/crawl_log.txt
> >>>
> >>> In my log file I see this error message:
> >>>
> >>> LinkDb: adding segment:
> >>> file:/webroot/oscrawlers/nutch/crawl/segments/20090714095100
> >>> Exception in thread "main" java.io.IOException: Job failed!
> >>>        at
> org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
> >>>        at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:170)
> >>>        at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:147)
> >>>        at org.apache.nutch.crawl.Crawl.main(Crawl.java:129)
> >>>
> >>> Jake Jacobson
> >>>
> >>> http://www.linkedin.com/in/jakejacobson
> >>> http://www.facebook.com/jakecjacobson
> >>> http://twitter.com/jakejacobson
> >>>
> >>> Our greatest fear should not be of failure,
> >>> but of succeeding at something that doesn't really matter.
> >>>   -- ANONYMOUS
> >>>
> >>>
> >>>
> >>> On Mon, Jul 13, 2009 at 9:00 AM, SunGod<[email protected]> wrote:
> >>>> if you use hadoop run nutch
> >>>>
> >>>> please add
> >>>>
> >>>> <property>
> >>>>  <name>hadoop.tmp.dir</name>
> >>>>  <value>/youtempfs/hadoop-${user.name}</value>
> >>>>  <description>A base for other temporary directories.</description>
> >>>> </property>
> >>>>
> >>>> to you hadoop-site.xml
> >>>>
> >>>> 2009/7/13 Jake Jacobson <[email protected]>
> >>>>
> >>>>> Hi,
> >>>>>
> >>>>> I have tried to run nutch 1.0 several times and it fails due to lack
> >>>>> of disk space.  I have defined the crawl to place all files on a disk
> >>>>> that has plenty of space but when it starts building the linkdb it
> >>>>> wants to put temp files in the home dir which doesn't have enough
> >>>>> space.  How can I force Nutch not to do this?
> >>>>>
> >>>>> Jake Jacobson
> >>>>>
> >>>>> http://www.linkedin.com/in/jakejacobson
> >>>>> http://www.facebook.com/jakecjacobson
> >>>>> http://twitter.com/jakejacobson
> >>>>>
> >>>>> Our greatest fear should not be of failure,
> >>>>> but of succeeding at something that doesn't really matter.
> >>>>>   -- ANONYMOUS
> >>>>>
> >>>>
> >>>
> >>
> >>
> >>
> >> --
> >> Doğacan Güney
> >>
> >
>
>
>
> --
> Doğacan Güney
>



-- 
-MilleBii-

Reply via email to