Re: mergesegs disk space

Doğacan Güney Wed, 15 Jul 2009 11:05:02 -0700

On Wed, Jul 15, 2009 at 20:45, MilleBii<[email protected]> wrote:
> Are you on a single node conf ?
> If yes I have the same problem, and some people have suggested earlier to
> use the hadoop pseudo-distributed config on a single server.
> Others have also suggested to use compress mode of hadoop.


Yes, that's a good point. Playing around with these options may help:

mapred.output.compress

mapred.output.compression.type (BLOCK may help a lot here)

mapred.compress.map.output


> But I have not been able to make it work on my PC because I get bogged down
> by some windows/hadoop compatibility issues.
> If you are on Linux you may be more lucky, interested by your results by the
> way, so I know if when moving to Linux I get those problems solved.
>
>
> 2009/7/15 Doğacan Güney <[email protected]>
>
>> On Wed, Jul 15, 2009 at 19:31, Tomislav Poljak<[email protected]> wrote:
>> > Hi,
>> > I'm trying to merge (using nutch-1.0 mergesegs) about 1.2MM pages on one
>> > machine contained in 10 segments, using:
>> >
>> > bin/nutch mergesegs crawl/merge_seg -dir crawl/segments
>> >
>> > ,but there is not enough space on 500G disk to complete this merge task
>> > (getting java.io.IOException: No space left on device in hadoop.log)
>> >
>> > Shouldn't 500G be enough disk space for this merge? Is this a bug? If
>> > this is not a bug, how much disk space is required for this merge?
>> >
>>
>> A lot :)
>>
>> Try deleting your hadoop temporary folders. If that doesn't help you
>> may try merging
>> segment parts one by one. For example, move your content/ directories
>> and try merging
>> again. If successful you can then merge contents later and move the
>> resulting content/ into
>> your merge_seg dir.
>>
>> > Tomislav
>> >
>> >
>>
>>
>>
>> --
>> Doğacan Güney
>>
>
>
>
> --
> -MilleBii-
>



-- 
Doğacan Güney

Re: mergesegs disk space

Reply via email to