Either plugins or configuration options would be possible to do. The real question is what is the use case for this, and is that use case big enough to warrant it being part of core Hadoop? I have seen a few situations where this perhaps makes since, but most of those are because the resource scheduling is currently very basic (meaning it only knows abut RAM and machine/rack locality which is really a surrogate for requesting a HDFS block as a resource). For example GPGPU computation, where sharing the GPU between processes can be complicated, or more commonly where the network is the bottleneck and multiple processes running on the same box is not an optimal solution because it will saturate the network.
In both of those cases I would prefer to see resource scheduling updated rather then trying to work around it by having the AM throw away containers. But that is just me and is a major undertaking. Throwing away the containers feels like a hack to me but it is something that can work right now. It just depends on what the use case is and if it is compelling enough to make that hack a fully supported part of the Hadoop map/reduce API. --Bobby Evans On 5/3/12 7:59 AM, "Radim Kolar" <h...@filez.com> wrote: if plugin system for AM is overkill, something simpler can be made like: maximum number of mappers per node maximum number of reducers per node maximum percentage of non data local tasks maximum percentage of rack local tasks and set this in job properties.