RE: Dynamic changing of slaves

2012-02-22 Thread kaveh
sounds like what you r looking for is a custom scheduler. along the line of 

!--property 
  namemapred.jobtracker.taskScheduler/name 
  valueorg.apache.hadoop.mapred.FairScheduler/value 
/property--

obviously not the FairScheduler, but it could give u some idea

-Original Message-
From: theta glynisdso...@email.arizona.edu
Sent: Wednesday, February 22, 2012 10:32am
To: core-u...@hadoop.apache.org
Subject: Dynamic changing of slaves


Hi,

I am working on a project which requires a setup as follows:

One master with four slaves.However, when a map only program is run, the
master dynamically selects the slave to run the map. For example, when the
program is run for the first time, slave 2 is selected to run the map and
reduce programs, and the output is stored on dfs. When the program is run
the second time, slave 3 is selected and son on.

I am currently using Hadoop 0.20.2 with Ubuntu 11.10.

Any ideas on creating the setup as described above?

Regards

-- 
View this message in context: 
http://old.nabble.com/Dynamic-changing-of-slaves-tp33365922p33365922.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.





Dynamic changing of slaves

2012-02-21 Thread theta

Hi,

I am working on a project which requires a setup as follows:

One master with four slaves.However, when a map only program is run, the
master dynamically selects the slave to run the map. For example, when the
program is run for the first time, slave 2 is selected to run the map and
reduce programs, and the output is stored on dfs. When the program is run
the second time, slave 3 is selected and son on.

I am currently using Hadoop 0.20.2 with Ubuntu 11.10.

Any ideas on creating the setup as described above?

Regards

-- 
View this message in context: 
http://old.nabble.com/Dynamic-changing-of-slaves-tp33365922p33365922.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Re: Dynamic changing of slaves

2012-02-21 Thread Merto Mertek
I think that job configuration does not allow you such setup, however maybe
I missed something..

 Probably I would tackle this problem from the scheduler source. The
default one is JobQueueTaskScheduler which preserves a fifo based queue.
When a tasktracker (your slave) tells the jobtracker that it has some free
slots to run, JT in the heartbeat method calls the scheduler assignTasks
method where tasks are assigned on local basis. In other words, scheduler
tries to find tasks on the tasktracker which data resides on it. If the
scheduler will not find a local map/reduce task to run it will try to find
a non local one. Probably here is the point where you should do something
with your jobs and wait for the tasktrackers heartbeat.. Instead of waiting
for the TT heartbeat, maybe there is another option to force an
heartbeatResponse, despite the TT has not send a heartbeat but I am not
aware of it..


On 21 February 2012 19:27, theta glynisdso...@email.arizona.edu wrote:


 Hi,

 I am working on a project which requires a setup as follows:

 One master with four slaves.However, when a map only program is run, the
 master dynamically selects the slave to run the map. For example, when the
 program is run for the first time, slave 2 is selected to run the map and
 reduce programs, and the output is stored on dfs. When the program is run
 the second time, slave 3 is selected and son on.

 I am currently using Hadoop 0.20.2 with Ubuntu 11.10.

 Any ideas on creating the setup as described above?

 Regards

 --
 View this message in context:
 http://old.nabble.com/Dynamic-changing-of-slaves-tp33365922p33365922.html
 Sent from the Hadoop core-user mailing list archive at Nabble.com.




Re: Dynamic changing of slaves

2012-02-21 Thread Jamack, Peter
Yeah, I'm not sure how you can actually do it, as I haven't done it
before, but from a logical perspective,  you'd probably have to do a lot
of configuration changes and maybe even write up some complicated M/R
code, coordination/rules engine logic, change how the heartbeat 
scheduler operate to do what you want.
 There might be an easier way, I'm not sure though.

Peter J

On 2/21/12 3:16 PM, Merto Mertek masmer...@gmail.com wrote:

I think that job configuration does not allow you such setup, however
maybe
I missed something..

 Probably I would tackle this problem from the scheduler source. The
default one is JobQueueTaskScheduler which preserves a fifo based queue.
When a tasktracker (your slave) tells the jobtracker that it has some free
slots to run, JT in the heartbeat method calls the scheduler assignTasks
method where tasks are assigned on local basis. In other words, scheduler
tries to find tasks on the tasktracker which data resides on it. If the
scheduler will not find a local map/reduce task to run it will try to find
a non local one. Probably here is the point where you should do something
with your jobs and wait for the tasktrackers heartbeat.. Instead of
waiting
for the TT heartbeat, maybe there is another option to force an
heartbeatResponse, despite the TT has not send a heartbeat but I am not
aware of it..


On 21 February 2012 19:27, theta glynisdso...@email.arizona.edu wrote:


 Hi,

 I am working on a project which requires a setup as follows:

 One master with four slaves.However, when a map only program is run, the
 master dynamically selects the slave to run the map. For example, when
the
 program is run for the first time, slave 2 is selected to run the map
and
 reduce programs, and the output is stored on dfs. When the program is
run
 the second time, slave 3 is selected and son on.

 I am currently using Hadoop 0.20.2 with Ubuntu 11.10.

 Any ideas on creating the setup as described above?

 Regards

 --
 View this message in context:
 
http://old.nabble.com/Dynamic-changing-of-slaves-tp33365922p33365922.html
 Sent from the Hadoop core-user mailing list archive at Nabble.com.