I don't understand why multiple disks would be particularly beneficial for a Map/Reduce job..... would I/O for a map/reduce job be i/o *as well as CPU bound* ? I would think that simply reading and parsing large files would still require dedicated CPU blocks. ?
On Sun, Apr 22, 2012 at 3:14 AM, Harsh J <ha...@cloudera.com> wrote: > You can use mapred.local.dir for this purpose. It accepts a list of > directories tasks may use, just like dfs.data.dir uses multiple disks > for block writes/reads. > > On Sun, Apr 22, 2012 at 12:50 PM, mete <efk...@gmail.com> wrote: > > Hello folks, > > > > I have a job that processes text files from hdfs on local fs (temp > > directory) and then copies those back to hdfs. > > I added another drive to each server to have better io performance, but > as > > far as i could see hadoop.tmp.dir will not benefit from multiple > disks,even > > if i setup two different folders on different disks. (dfs.data.dir works > > fine). As a result the disk with temp folder set is highy utilized, where > > the other one is a little bit idle. > > Does anyone have an idea on what to do? (i am using cdh3u3) > > > > Thanks in advance > > Mete > > > > -- > Harsh J > -- Jay Vyas MMSB/UCHC