Re: mergesegs disk space

MilleBii Wed, 15 Jul 2009 10:46:16 -0700

Are you on a single node conf ?
If yes I have the same problem, and some people have suggested earlier to
use the hadoop pseudo-distributed config on a single server.
Others have also suggested to use compress mode of hadoop.
But I have not been able to make it work on my PC because I get bogged down
by some windows/hadoop compatibility issues.
If you are on Linux you may be more lucky, interested by your results by the
way, so I know if when moving to Linux I get those problems solved.



2009/7/15 Doğacan Güney <[email protected]>

> On Wed, Jul 15, 2009 at 19:31, Tomislav Poljak<[email protected]> wrote:
> > Hi,
> > I'm trying to merge (using nutch-1.0 mergesegs) about 1.2MM pages on one
> > machine contained in 10 segments, using:
> >
> > bin/nutch mergesegs crawl/merge_seg -dir crawl/segments
> >
> > ,but there is not enough space on 500G disk to complete this merge task
> > (getting java.io.IOException: No space left on device in hadoop.log)
> >
> > Shouldn't 500G be enough disk space for this merge? Is this a bug? If
> > this is not a bug, how much disk space is required for this merge?
> >
>
> A lot :)
>
> Try deleting your hadoop temporary folders. If that doesn't help you
> may try merging
> segment parts one by one. For example, move your content/ directories
> and try merging
> again. If successful you can then merge contents later and move the
> resulting content/ into
> your merge_seg dir.
>
> > Tomislav
> >
> >
>
>
>
> --
> Doğacan Güney
>



-- 
-MilleBii-

Re: mergesegs disk space

Reply via email to