Re: [collectd] [PATCH] lpar plugin: use pool_idle_time to account for cpu pool usage

2010-10-07 Thread Aurélien Reynaud

Le mercredi 06 octobre 2010 à 14:37 +0200, Florian Forster a écrit :
 It'd be awesome if you could test my changes and tell me whether the
 plugin now works as expected or if further changes are required. I'll
 merge the branch as soon as you give me the thumbs-up ;)
 

Hi Florian,


I'd be grateful if you could postpone the merge of this plugin for a few
days. I recently spotted something which might be worth investigating.

On IBM pSeries machines there is a feature called Capacity on Demand
(CoD), which in short consists in IBM logically disabling some of the
installed physical processors, and then charging you for unlocking the
extra processing capacity if you happen to need it.

Using the phys_cpus_pool number could be misleading in this case, as you
might think by looking at the graphs that you still have plenty of power
left, when in reality some of the processors you're considering are
disabled and unusable.

I am looking for a way to take this into account.

I am wondering whether pool_max_time doesn't do just this, ie indicate
the activated/online/available/paid processing power instead of the
physically present cpus...

By the way this could also be the cause of the difference between the
results when calculating from pool_busy_time vs from pool_idle_time!


I will come back to you shortly, probably next week, as soon as I
complete a few tests...
 

Regards,
Aurélien Reynaud



___
collectd mailing list
collectd@verplant.org
http://mailman.verplant.org/listinfo/collectd


Re: [collectd] [PATCH] lpar plugin: use pool_idle_time to account for cpu pool usage

2010-10-06 Thread Florian Forster
Hi Aurélien,

I just added the changes you sent me to the LPAR plugin, i.e. pool
busy time is now calculated from pool idle time and not the other way
around.

It'd be awesome if you could test my changes and tell me whether the
plugin now works as expected or if further changes are required. I'll
merge the branch as soon as you give me the thumbs-up ;)

Regards,
—octo
-- 
Florian octo Forster
Hacker in training
GnuPG: 0x0C705A15
http://octo.it/


signature.asc
Description: Digital signature
___
collectd mailing list
collectd@verplant.org
http://mailman.verplant.org/listinfo/collectd


Re: [collectd] [PATCH] lpar plugin: use pool_idle_time to account for cpu pool usage

2010-09-26 Thread Aurélien Reynaud
Hi Florian,


  The current implementation uses pool_busy_time (expressed in ns) but
  experience shows this metric isn't accurate: It shows lower cpu usage
  for the entire pool than the sum of the participating lpars.
  Using pool_idle_time (expressed in clock ticks) in contrast is almost
  a perfect match.
 
 thanks for the update! :) So what you're saying is that busy + idle
 may not be equal to max?

Not quite, just that the calculations that the plugin did with the
pool_busy_time parameter did not give the expected result. This may be
because the calculations are somehow wrong or because the parameter
itself accounts for something we are not aware of (maybe power saving,
as you suggested).

I suspect the calculations, as pool_busy_time and pool_idle_time are
expressed in different units, and though the calculations are the
same...

  If so, What happens to the missing CPU
 cycles? Would it make sense to keep track of this separately? Something
 like missing = max - (idle + busy) could be used, for example.
 
 I think I remember something about ticks varying in the time they
 consume, due to power-saving facilities built into the CPUs. This would
 explain why the (physical) CPU time available to the cluster is measured
 in nanoseconds rather than ticks. Also, if there are more and shorter
 ticks in the same wallclock time due to power-saving measures, this
 would explain the perceived lower CPU usage when converting the ns back
 to ticks using a larger ns per tick constant. So maybe the missing
 metric above could be named power_save. What do you think?
 
 Regarding the patch, I'd like to propose one tweak:
 
  -#define NS_TO_TICKS(ns) ((ns) / XINTFRAC)
  [...]
  +   pool_idle_cpus = (double) (lparstats.pool_idle_time - 
  lparstats_old.pool_idle_time) / XINTFRAC / (double) ticks;
 
 I'd really like to keep this macro: diff / XINTFRAC / ticks doesn't do
 a good job at describing to the reader what's going on. With the macro
 this becomes NS_TO_TICKS (diff) / ticks: you can see without looking
 at the macro's implementation that diff is converted from nanoseconds
 to ticks and then divided by ticks, which results in a ratio.

I totally agree with you. I removed I macro not because I didn't like it
but because I didn't know how to properly name it: according to
libperfstat.h pool_idle_time is in 'clock ticks' (which is not the same
as physical processor ticks by a factor of XINTFRAC). I didn't want to
name the macro TICKS_TO_TICKS()...
By the way, this seems to indicate that the calculation I used to
convert ns to processor ticks was wrong, which in turn could explain why
the graphs didn't match.


Regards,

Aurélien Reynaud



___
collectd mailing list
collectd@verplant.org
http://mailman.verplant.org/listinfo/collectd