Hi Florian,
The current implementation uses pool_busy_time (expressed in ns) but
experience shows this metric isn't accurate: It shows lower cpu usage
for the entire pool than the sum of the participating lpars.
Using pool_idle_time (expressed in clock ticks) in contrast is almost
a perfect match.
thanks for the update! :) So what you're saying is that busy + idle
may not be equal to max?
Not quite, just that the calculations that the plugin did with the
pool_busy_time parameter did not give the expected result. This may be
because the calculations are somehow wrong or because the parameter
itself accounts for something we are not aware of (maybe power saving,
as you suggested).
I suspect the calculations, as pool_busy_time and pool_idle_time are
expressed in different units, and though the calculations are the
same...
If so, What happens to the missing CPU
cycles? Would it make sense to keep track of this separately? Something
like missing = max - (idle + busy) could be used, for example.
I think I remember something about ticks varying in the time they
consume, due to power-saving facilities built into the CPUs. This would
explain why the (physical) CPU time available to the cluster is measured
in nanoseconds rather than ticks. Also, if there are more and shorter
ticks in the same wallclock time due to power-saving measures, this
would explain the perceived lower CPU usage when converting the ns back
to ticks using a larger ns per tick constant. So maybe the missing
metric above could be named power_save. What do you think?
Regarding the patch, I'd like to propose one tweak:
-#define NS_TO_TICKS(ns) ((ns) / XINTFRAC)
[...]
+ pool_idle_cpus = (double) (lparstats.pool_idle_time -
lparstats_old.pool_idle_time) / XINTFRAC / (double) ticks;
I'd really like to keep this macro: diff / XINTFRAC / ticks doesn't do
a good job at describing to the reader what's going on. With the macro
this becomes NS_TO_TICKS (diff) / ticks: you can see without looking
at the macro's implementation that diff is converted from nanoseconds
to ticks and then divided by ticks, which results in a ratio.
I totally agree with you. I removed I macro not because I didn't like it
but because I didn't know how to properly name it: according to
libperfstat.h pool_idle_time is in 'clock ticks' (which is not the same
as physical processor ticks by a factor of XINTFRAC). I didn't want to
name the macro TICKS_TO_TICKS()...
By the way, this seems to indicate that the calculation I used to
convert ns to processor ticks was wrong, which in turn could explain why
the graphs didn't match.
Regards,
Aurélien Reynaud
___
collectd mailing list
collectd@verplant.org
http://mailman.verplant.org/listinfo/collectd