Not strange, look at the mailing list, their has been lot's of discussions
on this issue.
You may want to use the compress option.
And/or start using hadoop in pseudo-distributed, so that that reduce starts
consumming the map data, because in 'local' mode you get the map first & the
reduce after so their can be a lot of data in the tmp directory.

segment merge uses a LOT of space, so much that I don't use it anymore. I
only merge my indexes which are much smaller in my case.



2009/8/27 Fuad Efendi <[email protected]>

> Unfortunately, you can't manage disk space usage via configuration
> parameters... it is not easy... just try to keep your eyes on
> services/processes/ram/swap (disk swapping happens if RAM is not enough)
> during merge, even browse file/folders and click 'refresh' button to get an
> idea... it is strange that 50G was not enough to merge 2G, may be problem
> is
> somewhere else (OS X specifics for instance)... try to play with Nutch with
> smaller segment sizes and study it's behaviour on your OS...
> -Fuad
>
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]]
> Sent: August-26-09 6:41 PM
> To: [email protected]
> Subject: Re: content of hadoop-site.xml
>
>
>
>
>
>  Thanks for the response.
>
> How can I check disk swap?
> 50GB was before running merge command. When it crashed available space was
> 1
> kb. RAM in my MacPro is 2GB. I deleted tmp folders created by hadoop during
> merge and after that OS X does not start. I plan to run merge again and
> need
> to reduce disk space usage by merge. I have read on the net that for
> reducing space we must use hadoop-site.xml. But, there is no
> hadoop-default.xml file and hadoop-site.xml file is empty.
>
>
> Thanks.
> Alex.
>
>
>
>
> -----Original Message-----
> From: Fuad Efendi <[email protected]>
> To: [email protected]
> Sent: Wed, Aug 26, 2009 3:28 pm
> Subject: RE: content of hadoop-site.xml
>
>
>
>
>
>
>
>
>
>
> You can override default settings (nutch-default.xml) in nutch-site.xml;
> but
> it won't help with spacing; empty file is Ok.
>
> "merge" may generate temporary files, but 50Gb against 2Gb looks extremely
> strange; try to empty recycle bin for instance... check disk swap... OS may
> report 50G available but you may be out of space... for instance heavy disk
> swap during merge due to low RAM...
>
>
>
> -Fuad
> http://www.linkedin.com/in/liferay
> http://www.tokenizer.org
>
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]]
> Sent: August-26-09 5:33 PM
> To: [email protected]
> Subject: content of hadoop-site.xml
>
> Hello,
>
> ?I have run merge script? to merge two crawl dirs, one 1.6G another 120MB.
> But my MacPro with 50G free space did not start, after merge crashed with
> no
> space error. I have been told that OSX got corrupted.
> I looked inside my nutch-1.0/conf/hadoop-site.xml file and it is empty. Can
> anyone let me know what must be put inside this file in order for merge not
> to take too much space.
>
> Thanks in advance.
> Alex.
>
>
>
>
>
>
>
>
>
>


-- 
-MilleBii-

Reply via email to