Either plugins or configuration options would be possible to do.  The real 
question is what is the use case for this, and is that use case big enough to 
warrant it being part of core Hadoop?  I have seen a few situations where this 
perhaps makes since, but most of those are because the resource scheduling is 
currently very basic (meaning it only knows abut RAM and machine/rack locality 
which is really a surrogate for requesting a HDFS block as a resource).  For 
example GPGPU computation, where sharing the GPU between processes can be 
complicated, or more commonly where the network is the bottleneck and multiple 
processes running on the same box is not an optimal solution because it will 
saturate the network.

In both of those cases I would prefer to see resource scheduling updated rather 
then trying to work around it by having the AM throw away containers.  But that 
is just me and is a major undertaking.  Throwing away the containers feels like 
a hack to me but it is something that can work right now.  It just depends on 
what the use case is and if it is compelling enough to make that hack a fully 
supported part of the Hadoop map/reduce API.

--Bobby Evans


On 5/3/12 7:59 AM, "Radim Kolar" <h...@filez.com> wrote:

if plugin system for AM is overkill, something simpler can be made like:

maximum number of mappers per node
maximum number of reducers per node

maximum percentage of non data local tasks
maximum percentage of rack local tasks

and set this in job properties.

Reply via email to