Not strange, look at the mailing list, their has been lot's of discussions on this issue. You may want to use the compress option. And/or start using hadoop in pseudo-distributed, so that that reduce starts consumming the map data, because in 'local' mode you get the map first & the reduce after so their can be a lot of data in the tmp directory.
segment merge uses a LOT of space, so much that I don't use it anymore. I only merge my indexes which are much smaller in my case. 2009/8/27 Fuad Efendi <[email protected]> > Unfortunately, you can't manage disk space usage via configuration > parameters... it is not easy... just try to keep your eyes on > services/processes/ram/swap (disk swapping happens if RAM is not enough) > during merge, even browse file/folders and click 'refresh' button to get an > idea... it is strange that 50G was not enough to merge 2G, may be problem > is > somewhere else (OS X specifics for instance)... try to play with Nutch with > smaller segment sizes and study it's behaviour on your OS... > -Fuad > > > -----Original Message----- > From: [email protected] [mailto:[email protected]] > Sent: August-26-09 6:41 PM > To: [email protected] > Subject: Re: content of hadoop-site.xml > > > > > > Thanks for the response. > > How can I check disk swap? > 50GB was before running merge command. When it crashed available space was > 1 > kb. RAM in my MacPro is 2GB. I deleted tmp folders created by hadoop during > merge and after that OS X does not start. I plan to run merge again and > need > to reduce disk space usage by merge. I have read on the net that for > reducing space we must use hadoop-site.xml. But, there is no > hadoop-default.xml file and hadoop-site.xml file is empty. > > > Thanks. > Alex. > > > > > -----Original Message----- > From: Fuad Efendi <[email protected]> > To: [email protected] > Sent: Wed, Aug 26, 2009 3:28 pm > Subject: RE: content of hadoop-site.xml > > > > > > > > > > > You can override default settings (nutch-default.xml) in nutch-site.xml; > but > it won't help with spacing; empty file is Ok. > > "merge" may generate temporary files, but 50Gb against 2Gb looks extremely > strange; try to empty recycle bin for instance... check disk swap... OS may > report 50G available but you may be out of space... for instance heavy disk > swap during merge due to low RAM... > > > > -Fuad > http://www.linkedin.com/in/liferay > http://www.tokenizer.org > > > -----Original Message----- > From: [email protected] [mailto:[email protected]] > Sent: August-26-09 5:33 PM > To: [email protected] > Subject: content of hadoop-site.xml > > Hello, > > ?I have run merge script? to merge two crawl dirs, one 1.6G another 120MB. > But my MacPro with 50G free space did not start, after merge crashed with > no > space error. I have been told that OSX got corrupted. > I looked inside my nutch-1.0/conf/hadoop-site.xml file and it is empty. Can > anyone let me know what must be put inside this file in order for merge not > to take too much space. > > Thanks in advance. > Alex. > > > > > > > > > > -- -MilleBii-
