Re: Mapred job parallelism

2009-01-26 Thread Aaron Kimball
Indeed, you will need to enable the Fair Scheduler or Capacity Scheduler
(which are both in 0.19) to do this. mapred.map.tasks is more a hint than
anything else -- if you have more files to map than you set this value to,
it will use more tasks than you configured the job to. The newer schedulers
will ensure that each job's many map tasks are only using a portion of the
available slots.
- Aaron

On Mon, Jan 26, 2009 at 1:43 PM, jason hadoop wrote:

> I believe that the schedule code in 0.19.0 has a framework for this, but I
> haven't dug into it in detail yet.
>
> http://hadoop.apache.org/core/docs/r0.19.0/capacity_scheduler.html
>
> From what I gather you would set up 2 queues, each with guaranteed access
> to
> 1/2 of the cluster
> Then you submit your jobs to alternate queues.
>
> This is not ideal as you have to balance what queue you submit jobs to, to
> ensure that there is some depth.
>
>
> On Mon, Jan 26, 2009 at 1:30 PM, Sagar Naik  wrote:
>
> > Hi Guys,
> >
> > I was trying to setup a cluster so that two jobs can run simultaneously.
> >
> > The conf :
> > number of nodes : 4(say)
> > mapred.tasktracker.map.tasks.maximum=2
> >
> >
> > and in the joblClient
> > mapred.map.tasks=4 (# of nodes)
> >
> >
> > I also have a condition, that each job should have only one map-task per
> > node
> >
> > In short, created 8 map slots and set the number of mappers to 4.
> > So now, we have two jobs running simultaneously
> >
> > However, I realized that, if a tasktracker happens to die, potentially, I
> > will have 2 map-tasks running on a node
> >
> >
> > Setting mapred.tasktracker.map.tasks.maximum=1 in Jobclient has no
> effect.
> > It is tasktracker property and cant be changed per job
> >
> > Any ideas on how to have 2 jobs running simultaneously ?
> >
> >
> > -Sagar
> >
> >
> >
> >
> >
> >
> >
>


Re: Mapred job parallelism

2009-01-26 Thread jason hadoop
I believe that the schedule code in 0.19.0 has a framework for this, but I
haven't dug into it in detail yet.

http://hadoop.apache.org/core/docs/r0.19.0/capacity_scheduler.html

>From what I gather you would set up 2 queues, each with guaranteed access to
1/2 of the cluster
Then you submit your jobs to alternate queues.

This is not ideal as you have to balance what queue you submit jobs to, to
ensure that there is some depth.


On Mon, Jan 26, 2009 at 1:30 PM, Sagar Naik  wrote:

> Hi Guys,
>
> I was trying to setup a cluster so that two jobs can run simultaneously.
>
> The conf :
> number of nodes : 4(say)
> mapred.tasktracker.map.tasks.maximum=2
>
>
> and in the joblClient
> mapred.map.tasks=4 (# of nodes)
>
>
> I also have a condition, that each job should have only one map-task per
> node
>
> In short, created 8 map slots and set the number of mappers to 4.
> So now, we have two jobs running simultaneously
>
> However, I realized that, if a tasktracker happens to die, potentially, I
> will have 2 map-tasks running on a node
>
>
> Setting mapred.tasktracker.map.tasks.maximum=1 in Jobclient has no effect.
> It is tasktracker property and cant be changed per job
>
> Any ideas on how to have 2 jobs running simultaneously ?
>
>
> -Sagar
>
>
>
>
>
>
>


Mapred job parallelism

2009-01-26 Thread Sagar Naik

Hi Guys,

I was trying to setup a cluster so that two jobs can run simultaneously.

The conf :
number of nodes : 4(say)
mapred.tasktracker.map.tasks.maximum=2


and in the joblClient
mapred.map.tasks=4 (# of nodes)


I also have a condition, that each job should have only one map-task per 
node


In short, created 8 map slots and set the number of mappers to 4.
So now, we have two jobs running simultaneously

However, I realized that, if a tasktracker happens to die, potentially, 
I will have 2 map-tasks running on a node



Setting mapred.tasktracker.map.tasks.maximum=1 in Jobclient has no 
effect. It is tasktracker property and cant be changed per job


Any ideas on how to have 2 jobs running simultaneously ?


-Sagar