Re: Intermediary Data on Fair Scheduler

Mithila Nagendra Thu, 13 Aug 2009 11:50:46 -0700

This helps a lot! Thank you Todd.

Best Regards
Mithila


On Thu, Aug 13, 2009 at 11:40 AM, Todd Lipcon <t...@cloudera.com> wrote:

> On Thu, Aug 13, 2009 at 11:32 AM, Mithila Nagendra <mnage...@asu.edu>
> wrote:
>
> > Hi Todd
> >
> > So does this mean that when two jobs are assigned to a pool, where one
> job
> > has 1 map task and 1 reduce task, whereas the other has 5 map and 5
> reduce
> > tasks, how will the switch between these jobs take place?
>
>
> The switching happens on the task level - after one of the map tasks from
> the big job has finished, the small job will get its map task executed
> before the rest of the other job's.
>
>
> >
> >
> > Lets say the scheduler starts with the bigger job, runs 1 map task, when
> it
> > switches to the shorter job what does it do with the intermediate data?
> for
> > instance in Hadoop on demand if we run a search query where would the
> > search
> > keywords be stored? I assume if the bigger job is in middle of a map task
> > the smaller job will wait for the task to end before the the map task for
> > the shorter job is launched.
> >
>
> Intermediate data from the big job will be on the local disk like it always
> is - this isn't anything special about the fair scheduler. Map outputs
> remain in mapred.local.dir until the job is complete.
>
> -Todd
>
>
> On Thu, Aug 13, 2009 at 10:52 AM, Todd Lipcon <t...@cloudera.com> wrote:
>
> > Hi Mithila,
> >
> > I assume you're referring to fair scheduler preemption. In the preemption
> > scenario, tasks are completely killed, not paused. It's not like a
> > preemptive scheduler in your OS where things are "context switched". This
> > is
> > why the preemption is not enabled by default and has tuning parameters
> that
> > only trigger preemption in certain situations.
> >
> > Hope that helps,
> > -Todd
> >
> > On Thu, Aug 13, 2009 at 10:44 AM, Mithila Nagendra <mnage...@asu.edu>
> > wrote:
> >
> > > Hello All
> > >
> > > When the fair scheduler switches between two jobs, what does it do with
> > the
> > > intermediary data? Does it dump the data/job states onto the disk
> (DFS)?
> > Or
> > > does it do a context switch (i.e. everything is in memory)? I was
> looking
> > > at
> > > the scheduler for an application I'm working on, any pointers will be
> > > appreciated!
> > >
> > > Thanks!
> > > Mithila Nagendra
> > > Arizona State University
> > >
> >
>

Re: Intermediary Data on Fair Scheduler

Reply via email to