RE: Dynamic changing of slaves
sounds like what you r looking for is a custom scheduler. along the line of !--property namemapred.jobtracker.taskScheduler/name valueorg.apache.hadoop.mapred.FairScheduler/value /property-- obviously not the FairScheduler, but it could give u some idea -Original Message- From: theta glynisdso...@email.arizona.edu Sent: Wednesday, February 22, 2012 10:32am To: core-u...@hadoop.apache.org Subject: Dynamic changing of slaves Hi, I am working on a project which requires a setup as follows: One master with four slaves.However, when a map only program is run, the master dynamically selects the slave to run the map. For example, when the program is run for the first time, slave 2 is selected to run the map and reduce programs, and the output is stored on dfs. When the program is run the second time, slave 3 is selected and son on. I am currently using Hadoop 0.20.2 with Ubuntu 11.10. Any ideas on creating the setup as described above? Regards -- View this message in context: http://old.nabble.com/Dynamic-changing-of-slaves-tp33365922p33365922.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Dynamic changing of slaves
Hi, I am working on a project which requires a setup as follows: One master with four slaves.However, when a map only program is run, the master dynamically selects the slave to run the map. For example, when the program is run for the first time, slave 2 is selected to run the map and reduce programs, and the output is stored on dfs. When the program is run the second time, slave 3 is selected and son on. I am currently using Hadoop 0.20.2 with Ubuntu 11.10. Any ideas on creating the setup as described above? Regards -- View this message in context: http://old.nabble.com/Dynamic-changing-of-slaves-tp33365922p33365922.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: Dynamic changing of slaves
I think that job configuration does not allow you such setup, however maybe I missed something.. Probably I would tackle this problem from the scheduler source. The default one is JobQueueTaskScheduler which preserves a fifo based queue. When a tasktracker (your slave) tells the jobtracker that it has some free slots to run, JT in the heartbeat method calls the scheduler assignTasks method where tasks are assigned on local basis. In other words, scheduler tries to find tasks on the tasktracker which data resides on it. If the scheduler will not find a local map/reduce task to run it will try to find a non local one. Probably here is the point where you should do something with your jobs and wait for the tasktrackers heartbeat.. Instead of waiting for the TT heartbeat, maybe there is another option to force an heartbeatResponse, despite the TT has not send a heartbeat but I am not aware of it.. On 21 February 2012 19:27, theta glynisdso...@email.arizona.edu wrote: Hi, I am working on a project which requires a setup as follows: One master with four slaves.However, when a map only program is run, the master dynamically selects the slave to run the map. For example, when the program is run for the first time, slave 2 is selected to run the map and reduce programs, and the output is stored on dfs. When the program is run the second time, slave 3 is selected and son on. I am currently using Hadoop 0.20.2 with Ubuntu 11.10. Any ideas on creating the setup as described above? Regards -- View this message in context: http://old.nabble.com/Dynamic-changing-of-slaves-tp33365922p33365922.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: Dynamic changing of slaves
Yeah, I'm not sure how you can actually do it, as I haven't done it before, but from a logical perspective, you'd probably have to do a lot of configuration changes and maybe even write up some complicated M/R code, coordination/rules engine logic, change how the heartbeat scheduler operate to do what you want. There might be an easier way, I'm not sure though. Peter J On 2/21/12 3:16 PM, Merto Mertek masmer...@gmail.com wrote: I think that job configuration does not allow you such setup, however maybe I missed something.. Probably I would tackle this problem from the scheduler source. The default one is JobQueueTaskScheduler which preserves a fifo based queue. When a tasktracker (your slave) tells the jobtracker that it has some free slots to run, JT in the heartbeat method calls the scheduler assignTasks method where tasks are assigned on local basis. In other words, scheduler tries to find tasks on the tasktracker which data resides on it. If the scheduler will not find a local map/reduce task to run it will try to find a non local one. Probably here is the point where you should do something with your jobs and wait for the tasktrackers heartbeat.. Instead of waiting for the TT heartbeat, maybe there is another option to force an heartbeatResponse, despite the TT has not send a heartbeat but I am not aware of it.. On 21 February 2012 19:27, theta glynisdso...@email.arizona.edu wrote: Hi, I am working on a project which requires a setup as follows: One master with four slaves.However, when a map only program is run, the master dynamically selects the slave to run the map. For example, when the program is run for the first time, slave 2 is selected to run the map and reduce programs, and the output is stored on dfs. When the program is run the second time, slave 3 is selected and son on. I am currently using Hadoop 0.20.2 with Ubuntu 11.10. Any ideas on creating the setup as described above? Regards -- View this message in context: http://old.nabble.com/Dynamic-changing-of-slaves-tp33365922p33365922.html Sent from the Hadoop core-user mailing list archive at Nabble.com.