On Wed, Jul 15, 2009 at 20:45, MilleBii<[email protected]> wrote: > Are you on a single node conf ? > If yes I have the same problem, and some people have suggested earlier to > use the hadoop pseudo-distributed config on a single server. > Others have also suggested to use compress mode of hadoop.
Yes, that's a good point. Playing around with these options may help: mapred.output.compress mapred.output.compression.type (BLOCK may help a lot here) mapred.compress.map.output > But I have not been able to make it work on my PC because I get bogged down > by some windows/hadoop compatibility issues. > If you are on Linux you may be more lucky, interested by your results by the > way, so I know if when moving to Linux I get those problems solved. > > > 2009/7/15 Doğacan Güney <[email protected]> > >> On Wed, Jul 15, 2009 at 19:31, Tomislav Poljak<[email protected]> wrote: >> > Hi, >> > I'm trying to merge (using nutch-1.0 mergesegs) about 1.2MM pages on one >> > machine contained in 10 segments, using: >> > >> > bin/nutch mergesegs crawl/merge_seg -dir crawl/segments >> > >> > ,but there is not enough space on 500G disk to complete this merge task >> > (getting java.io.IOException: No space left on device in hadoop.log) >> > >> > Shouldn't 500G be enough disk space for this merge? Is this a bug? If >> > this is not a bug, how much disk space is required for this merge? >> > >> >> A lot :) >> >> Try deleting your hadoop temporary folders. If that doesn't help you >> may try merging >> segment parts one by one. For example, move your content/ directories >> and try merging >> again. If successful you can then merge contents later and move the >> resulting content/ into >> your merge_seg dir. >> >> > Tomislav >> > >> > >> >> >> >> -- >> Doğacan Güney >> > > > > -- > -MilleBii- > -- Doğacan Güney
