Still need to figure out whether a queue can be associated with a TT. i.e. TT acl for a queue in which tasks submitted to that queue will only be relayed to TT in the acl list for the queue.
On Mon, Jan 31, 2011 at 10:51 PM, rishi pathak <mailmaverick...@gmail.com>wrote: > Hi Koji, > Thanks for opening feature request. Right now for the purpose > stated earlier > I have upgraded to hadoop to 0.21. , and trying to see if creating > individual leaf level queues for every tasktracker and changing the state of > it to 'stopped' before the expiry of the walltime. Seems like it will work > for now. > > P.S. - What credentials are required for commentiong on an issue in Jira > > On Mon, Jan 31, 2011 at 10:22 PM, Koji Noguchi <knogu...@yahoo-inc.com>wrote: > >> Rishi, >> >> > Using exclude list for TT will not help as Koji has already mentioned >> > >> It’ll help a bit in a sense that no more tasks are assigned to that >> TaskTracker once excluded. >> >> As for TT decommissioning and map outputs handling, opened a Jira for >> further discussion. >> https://issues.apache.org/jira/browse/MAPREDUCE-2291 >> >> Koji >> >> >> >> On 1/29/11 5:37 AM, "rishi pathak" <mailmaverick...@gmail.com> wrote: >> >> HI, >> Here is a description of what we are trying to achieve(whether it is >> possible or not is still not cear): >> We have large computing clusters used majorly for MPI jobs. We use >> PBS/Torque and Maui for resource allocation and scheduling. >> At most times utilization is very high except for very small resource >> pockets of say 16 cores for 2-5 Hrs. We are trying establish feasibility of >> using these small(but fixed sized) resource pockets for nutch crawls. Our >> configuration is: >> >> # Hadoop 0.20.2 (packaged with nutch) >> #Lustre parallel filesystem for data storage >> # No HDFS >> >> We have JT running on one of the login nodes at all times. >> Request for resource (nodes=16, walltime=05 Hrs.) is made using batch >> system and as a part of job TTs are provisioned. The problem is, when a job >> expires, user processes are cleaned up and thus TT gets killed. With that, >> completed and running map/reduce tasks for nutch job are killed and are >> rescheduled. Solution could be as we see it: >> >> 1. As the filesystem is shared(& persistent), restart tasks on another TT >> and make intermediate task data available. i.e. sort of checkpointing. >> 2. TT draining - based on a speculative time for task completion, TT whose >> walltime is nearing expiry will go into draining mode.i.e. no new tasks will >> be scheduled on that TT. >> >> For '1', it is very far fetched(we are no Hadoop expert) >> '2' seems to be a more sensible approach. >> >> Using exclude list for TT will not help as Koji has already mentioned >> We looked into capacity scheduler but did'nt find any pointers. Phil, what >> version of hadoop >> have these hooks in scheduler. >> >> On Sat, Jan 29, 2011 at 3:34 AM, phil young <phil.wills.yo...@gmail.com> >> wrote: >> >> There are some hooks available in the schedulers that could be useful >> also. >> I think they were expected to be used to allow you to schedule tasks based >> on load average on the host, but I'd expect you can customize them for >> your >> purpose. >> >> >> On Fri, Jan 28, 2011 at 6:46 AM, Harsh J <qwertyman...@gmail.com> wrote: >> >> > Moving discussion to the MapReduce-User list: >> > mapreduce-user@hadoop.apache.org >> > >> > Reply inline: >> > >> > On Fri, Jan 28, 2011 at 2:39 PM, rishi pathak < >> mailmaverick...@gmail.com> >> > wrote: >> > > Hi, >> > > Is there a way to drain a tasktracker. What we require is not >> to >> > > schedule any more map/red tasks onto a tasktracker(mark it offline) >> but >> > > still the running tasks should not be affected. >> > >> > You could simply shut the TT down. MapReduce was designed with faults >> > in mind and thus tasks that are running on a particular TaskTracker >> > can be re-run elsewhere if they failed. Is this not usable in your >> > case? >> > >> > -- >> > Harsh J >> > www.harshj.com <http://www.harshj.com> >> > >> >> >> >> > > > -- > --- > Rishi Pathak > National PARAM Supercomputing Facility > C-DAC, Pune, India > > > -- --- Rishi Pathak National PARAM Supercomputing Facility C-DAC, Pune, India