Are you on a single node conf ? If yes I have the same problem, and some people have suggested earlier to use the hadoop pseudo-distributed config on a single server. Others have also suggested to use compress mode of hadoop. But I have not been able to make it work on my PC because I get bogged down by some windows/hadoop compatibility issues. If you are on Linux you may be more lucky, interested by your results by the way, so I know if when moving to Linux I get those problems solved.
2009/7/15 Doğacan Güney <[email protected]> > On Wed, Jul 15, 2009 at 19:31, Tomislav Poljak<[email protected]> wrote: > > Hi, > > I'm trying to merge (using nutch-1.0 mergesegs) about 1.2MM pages on one > > machine contained in 10 segments, using: > > > > bin/nutch mergesegs crawl/merge_seg -dir crawl/segments > > > > ,but there is not enough space on 500G disk to complete this merge task > > (getting java.io.IOException: No space left on device in hadoop.log) > > > > Shouldn't 500G be enough disk space for this merge? Is this a bug? If > > this is not a bug, how much disk space is required for this merge? > > > > A lot :) > > Try deleting your hadoop temporary folders. If that doesn't help you > may try merging > segment parts one by one. For example, move your content/ directories > and try merging > again. If successful you can then merge contents later and move the > resulting content/ into > your merge_seg dir. > > > Tomislav > > > > > > > > -- > Doğacan Güney > -- -MilleBii-
