Not to speak for Radim, but what I'm trying to achieve is performance at least 
as good as 0.20 for all cases.  That is, no regressions.  For something as 
simple as terasort, I don't think that is possible without being able to 
specify the max number of mappers/reducers per node.  As it is, I see slowdowns 
as much as 2X.  Hopefully I'm wrong and somebody will straighten me out.  But 
if I'm not, adding such a feature won't lead to bad behavior of any kind since 
the default could be set to unlimited and thus have no effect whatsoever.

I should emphasize that I support the goal of greater automation since Hadoop 
has way too many parameters and is so hard to tune.  Just not at the expense of 
performance regressions.

Jeff


We've been against these 'features' since it leads to very bad behaviour across 
the cluster with multiple apps/users etc.

What is your use-case i.e. what are you trying to achieve with this?

thanks,
Arun

On May 3, 2012, at 5:59 AM, Radim Kolar wrote:


if plugin system for AM is overkill, something simpler can be made like:

maximum number of mappers per node
maximum number of reducers per node

maximum percentage of non data local tasks
maximum percentage of rack local tasks

and set this in job properties.


Reply via email to