Re: Draining/Decommisioning a tasktracker

rishi pathak Mon, 31 Jan 2011 09:32:50 -0800

Still need to figure out whether a queue can be associated with a TT. i.e.
TT acl for a queue
in which tasks submitted to that queue will only be relayed to TT in the acl
list for the queue.


On Mon, Jan 31, 2011 at 10:51 PM, rishi pathak <mailmaverick...@gmail.com>wrote:

> Hi Koji,
>            Thanks for opening feature request. Right now for the purpose
> stated earlier
> I have upgraded to hadoop to 0.21. , and trying to see if creating
> individual leaf level queues for every tasktracker and changing the state of
> it to 'stopped' before the expiry of the walltime. Seems like it will work
> for now.
>
> P.S. - What credentials are required for commentiong on an issue in Jira
>
> On Mon, Jan 31, 2011 at 10:22 PM, Koji Noguchi <knogu...@yahoo-inc.com>wrote:
>
>>  Rishi,
>>
>> > Using exclude list for TT will not help as Koji has already mentioned
>> >
>> It’ll help a bit in a sense that no more tasks are assigned to that
>> TaskTracker once excluded.
>>
>> As for TT decommissioning and map outputs handling, opened a Jira for
>> further discussion.
>> https://issues.apache.org/jira/browse/MAPREDUCE-2291
>>
>> Koji
>>
>>
>>
>> On 1/29/11 5:37 AM, "rishi pathak" <mailmaverick...@gmail.com> wrote:
>>
>> HI,
>>     Here is a description of what we are trying to achieve(whether it is
>> possible or not is still not cear):
>> We have large computing clusters used majorly  for MPI jobs. We use
>> PBS/Torque and Maui for resource allocation and scheduling.
>> At most times utilization is very high except for very small resource
>> pockets of say 16 cores for 2-5 Hrs. We are trying establish feasibility of
>> using these small(but fixed sized) resource pockets for nutch crawls. Our
>> configuration is:
>>
>> # Hadoop 0.20.2 (packaged with nutch)
>> #Lustre parallel filesystem for data storage
>> # No HDFS
>>
>> We have JT running on one of the login nodes at all times.
>> Request for resource (nodes=16, walltime=05 Hrs.) is made using batch
>> system and as a part of job TTs are provisioned. The problem is, when a job
>> expires, user processes are cleaned up and thus TT gets killed. With that,
>> completed and running map/reduce tasks for nutch job are killed and are
>> rescheduled. Solution could be as we see it:
>>
>> 1. As the filesystem is shared(& persistent),  restart tasks on another TT
>> and make intermediate task data available. i.e. sort of checkpointing.
>> 2. TT draining - based on a speculative time for task completion, TT whose
>> walltime is nearing expiry will go into draining mode.i.e. no new tasks will
>> be scheduled on that TT.
>>
>> For '1', it is very far fetched(we are no Hadoop expert)
>> '2' seems to be a more sensible approach.
>>
>> Using exclude list for TT will not help as Koji has already mentioned
>> We looked into capacity scheduler but did'nt find any pointers. Phil, what
>> version of hadoop
>> have these hooks in scheduler.
>>
>> On Sat, Jan 29, 2011 at 3:34 AM, phil young <phil.wills.yo...@gmail.com>
>> wrote:
>>
>> There are some hooks available in the schedulers that could be useful
>> also.
>> I think they were expected to be used to allow you to schedule tasks based
>> on load average on the host, but I'd expect you can customize them for
>> your
>> purpose.
>>
>>
>> On Fri, Jan 28, 2011 at 6:46 AM, Harsh J <qwertyman...@gmail.com> wrote:
>>
>> > Moving discussion to the MapReduce-User list:
>> > mapreduce-user@hadoop.apache.org
>> >
>> > Reply inline:
>> >
>> > On Fri, Jan 28, 2011 at 2:39 PM, rishi pathak <
>> mailmaverick...@gmail.com>
>> > wrote:
>> > > Hi,
>> > >        Is there a way to drain a tasktracker. What we require is not
>> to
>> > > schedule any more map/red tasks onto a tasktracker(mark it offline)
>> but
>> > > still the running tasks should not be affected.
>> >
>> > You could simply shut the TT down. MapReduce was designed with faults
>> > in mind and thus tasks that are running on a particular TaskTracker
>> > can be re-run elsewhere if they failed. Is this not usable in your
>> > case?
>> >
>> > --
>> > Harsh J
>> > www.harshj.com <http://www.harshj.com>
>> >
>>
>>
>>
>>
>
>
> --
> ---
> Rishi Pathak
> National PARAM Supercomputing Facility
> C-DAC, Pune, India
>
>
>


-- 
---
Rishi Pathak
National PARAM Supercomputing Facility
C-DAC, Pune, India

Re: Draining/Decommisioning a tasktracker

Reply via email to