Re: hadoop.tmp.dir with multiple disks

Jay Vyas Sun, 22 Apr 2012 07:03:20 -0700

I don't understand why multiple disks would be particularly beneficial for
a Map/Reduce job..... would I/O for a map/reduce job be i/o *as well as CPU
bound* ?   I would think that simply reading and parsing large files would
still require dedicated CPU blocks. ?


On Sun, Apr 22, 2012 at 3:14 AM, Harsh J <ha...@cloudera.com> wrote:

> You can use mapred.local.dir for this purpose. It accepts a list of
> directories tasks may use, just like dfs.data.dir uses multiple disks
> for block writes/reads.
>
> On Sun, Apr 22, 2012 at 12:50 PM, mete <efk...@gmail.com> wrote:
> > Hello folks,
> >
> > I have a job that processes text files from hdfs on local fs (temp
> > directory) and then copies those back to hdfs.
> > I added another drive to each server to have better io performance, but
> as
> > far as i could see hadoop.tmp.dir will not benefit from multiple
> disks,even
> > if i setup two different folders on different disks. (dfs.data.dir works
> > fine). As a result the disk with temp folder set is highy utilized, where
> > the other one is a little bit idle.
> > Does anyone have an idea on what to do? (i am using cdh3u3)
> >
> > Thanks in advance
> > Mete
>
>
>
> --
> Harsh J
>



-- 
Jay Vyas
MMSB/UCHC

Re: hadoop.tmp.dir with multiple disks

Reply via email to