Hi Rick,

There is one extra piece of information that my patch takes into
account, the selected time range.  Lets say that you are looking at the
monthly data with the web frontend.  The Load Avg numbers will show you
the latest values for the 1, 5 & 15 minute loads, averaged over the
number of nodes in your grid/cluster.  When you compare this to the rrd
graphs to the right of the numbers, you might notice that your cluster
was idle for some part of the month and 100% busy at other times.

For example, lets say that for the first 3 weeks of the month your
cluster was idle and over the last week it was 100% busy.  In this
example, the numbers on the web frontend would probably say 100%, but in
actuality, your cluster utilization was only around 25%.  The patch I
made uses the data stored in the rrds to display the utilization
averaged over the selected time range, and in this case would show 25%.

Maybe we could take Ian's suggestion and display only one extra
percentage, based on load1, and call it utilization???

~Jason


On Thu, 2005-12-15 at 17:50 -0500, Rick Mohr wrote:
> If I understand the current behavior correctly, the "Load Avg" for say the 
> 1-min 
> load is simply the sum of the 1-min loads on all nodes divided by the total 
> number of processors on all nodes.
> 
> Personally, I find this value useful because it is a relatively good 
> indication 
> of how much of our cluster is being utilitzed by user submitted jobs.  The 
> 1-min 
> load is (for the most part) roughly equal to the number of running user jobs 
> on 
> the nodes.  The averages for the 5-min and 15-min loads however are not so 
> useful.
> 
> I would vote to make it a config option, but any solution that allows me to 
> keep 
> the current behaviour would be fine.
> 
> -- Rick
> 
> --------------------------
> Rick Mohr
> Systems Developer
> Ohio Supercomputer Center
> 
> On Thu, 15 Dec 2005, Jason A. Smith wrote:
> 
> > I was thinking that if some people would prefer the current behavior, it
> > could either be made into an admin config option or somehow user
> > selectable on the web frontend.  A third option, like you said, would be
> > to display both.  Anyone have any thoughts on which is the best choice
> > and how to do it.
> >
> > ~Jason
> >
> >
> > On Wed, 2005-12-14 at 18:22 -0700, Ian Cunningham wrote:
> >> Jason,
> >>
> >> The page should explain that the load averages given are historic. I am
> >> not sure if the average of the 15 min average is meaningful for long
> >> periods of time. Also its possible that there should be values given for
> >> both current and historic.
> >>
> >> Thanks for your work,
> >> Ian
> >>
> >> Jason A. Smith wrote:
> >>
> >>> The Avg Load percentages on the ganglia web frontend currently show the
> >>> latest measured values for the grid/cluster.  When looking at historical
> >>> data, these numbers can be misleading when compared to the graphs right
> >>> next to them.  I created a patch which changes this behavior by using
> >>> rrdtool to calculate the average loads over the displayed time range
> >>> instead of the latest value, see attachment.  Any comments, suggestions?
> >>>
> >>> ~Jason
> >>>
> >>
> >
> 
-- 
/------------------------------------------------------------------\
|  Jason A. Smith                          Email:  [EMAIL PROTECTED] |
|  Atlas Computing Facility, Bldg. 510M    Phone:  (631)344-4226   |
|  Brookhaven National Lab, P.O. Box 5000  Fax:    (631)344-7616   |
|  Upton, NY 11973-5000                                            |
\------------------------------------------------------------------/



Reply via email to