I'm interested in the ability to track metrics (such as CPU time, storage used per machine, across the cluster) in Hadoop by User. I've taken a look at the Fair and Capacity Schedulers and they seem oriented towards ensuring fair use between users' jobs rather than providing a feature which also reports what resources the users actually used on the cluster. Likewise, with other tools like Ganglia, which appear to be concerned with reporting metrics by machine (and not by job). I've also taken a look through the common/metrics tickets in JIRA and there does not seem to be any open work that addresses this requirement.
Have I missed something ? Has anyone been able to do this ? Is there a way to capture metrics by Job (which could be correlated back to a user?) If not, is there any current or forecasted work in the project that addresses this requirement ? Kind regards Steve Watt