[collectd] collectd compilation on solaris 8

2010-04-26 Thread Aurélien Reynaud
Hello,


I just joined the list as I am in the process of installing and testing
collectd on one of my clients’ machines. Some of those machines are
running Solaris 8 and I’ve had a hard time getting collectd to compile
on this old OS.

The problems I encountered:

- Solaris 8 doesn’t have setenv(), which is required for the exec
plugin. You must use putenv(), which is roughly equivalent and available
on systems where setenv() isn't yet.

- Solaris 8 has strtok_r(), but does not declare it by default. You must
at compile time define __EXTENSIONS__, or _REENTRANT or _POSIX_SOURCE >=
199506L. Running ./configure --enable-standards may solves this but
raises a ton of other problems...

- Solaris 8 doesn't have , which holds the declaration of
uint64_t-like types. Those are declared in 


Following is an email with a patch for those problems against today's
git repo.


Best regards,

Reynaud Aurélien 




___
collectd mailing list
collectd@verplant.org
http://mailman.verplant.org/listinfo/collectd


[collectd] [PATCH] Solaris 8 fixes

2010-04-26 Thread Aurélien Reynaud
---
 configure.in   |6 +-
 src/exec.c |   12 
 src/libcollectdclient/client.h |4 +++-
 3 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/configure.in b/configure.in
index 3224526..35809ac 100644
--- a/configure.in
+++ b/configure.in
@@ -87,6 +87,7 @@ fi
 if test "x$ac_system" = "xSolaris"
 then
AC_DEFINE(_POSIX_PTHREAD_SEMANTICS, 1, [Define to enforce POSIX thread 
semantics under Solaris.])
+   AC_DEFINE(__EXTENSIONS__, 1, [Define to allow access to functions 
widely used but not in the standard Solaris environment.])
 fi
 
 # Where to install .pc files.
@@ -477,7 +478,7 @@ AC_HEADER_TIME
 # Checks for library functions.
 #
 AC_PROG_GCC_TRADITIONAL
-AC_CHECK_FUNCS(gettimeofday select strdup strtol getaddrinfo getnameinfo 
strchr memcpy strstr strcmp strncmp strncpy strlen strncasecmp strcasecmp 
openlog closelog sysconf if_indextoname)
+AC_CHECK_FUNCS(gettimeofday select strdup strtol getaddrinfo getnameinfo 
strchr memcpy strstr strcmp strncmp strncpy strlen strncasecmp strcasecmp 
openlog closelog sysconf if_indextoname setenv)
 
 AC_FUNC_STRERROR_R
 
@@ -715,6 +716,7 @@ if test "x$fp_layout_type" = "xunknown"; then
 #if HAVE_STDINT_H
 # include 
 #endif
+#include 
 #if HAVE_STDBOOL_H
 # include 
 #endif
@@ -759,6 +761,7 @@ if test "x$fp_layout_type" = "xunknown"; then
 #if HAVE_STDINT_H
 # include 
 #endif
+#include 
 #if HAVE_STDBOOL_H
 # include 
 #endif
@@ -811,6 +814,7 @@ if test "x$fp_layout_type" = "xunknown"; then
 #if HAVE_STDINT_H
 # include 
 #endif
+#include 
 #if HAVE_STDBOOL_H
 # include 
 #endif
diff --git a/src/exec.c b/src/exec.c
index 681b94d..c64f949 100644
--- a/src/exec.c
+++ b/src/exec.c
@@ -269,11 +269,23 @@ static void set_environment (void) /* {{{ */
 {
   char buffer[1024];
 
+#ifdef HAVE_SETENV
   ssnprintf (buffer, sizeof (buffer), "%i", interval_g);
   setenv ("COLLECTD_INTERVAL", buffer, /* overwrite = */ 1);
 
   ssnprintf (buffer, sizeof (buffer), "%s", hostname_g);
   setenv ("COLLECTD_HOSTNAME", buffer, /* overwrite = */ 1);
+#else
+  ssnprintf (buffer, sizeof (buffer), "COLLECTD_INTERVAL=%i", interval_g);
+  putenv (buffer);
+
+  ssnprintf (buffer, sizeof (buffer), "COLLECTD_HOSTNAME=%s", hostname_g);
+  putenv (buffer);
+#endif
+
+#ifdef HAVE_SETENV
+#else
+#endif
 } /* }}} void set_environment */
 
 __attribute__((noreturn))
diff --git a/src/libcollectdclient/client.h b/src/libcollectdclient/client.h
index b0d092d..11e7b13 100644
--- a/src/libcollectdclient/client.h
+++ b/src/libcollectdclient/client.h
@@ -27,7 +27,9 @@
 /*
  * Includes (for data types)
  */
-#include 
+#if HAVE_STDINT_H
+# include 
+#endif
 #include 
 #include 
 
-- 
1.7.0.4




___
collectd mailing list
collectd@verplant.org
http://mailman.verplant.org/listinfo/collectd


[collectd] Hi, and solaris 8 compilation problems

2010-04-28 Thread Aurélien Reynaud
Hello,


it seems my first email did not make it to the list, so I repost.

I recently discovered collectd and decided to use it to gather system
data statistics for the machines I admin as a job.

There are some old Solaris 8 boxes, in the process of being replaced by
partitioned IBM power systems running AIX 5.3 and 6.1. There are some
linux boxes too for good measure... So I need something both lighweight
(in order not to disturb production) and very portable. collectd seems
to be the right tool for the job.

Alas, I’ve had some trouble getting it to compile on good old Solaris 8.

The problems I encountered:

- Solaris 8 doesn’t have setenv(), which is required for the exec
plugin. You must use putenv(), which is roughly equivalent and available
on systems where setenv() isn't yet.

- Solaris 8 has strtok_r(), but does not declare it by default. You must
at compile time define __EXTENSIONS__, or _REENTRANT or _POSIX_SOURCE >=
199506L. Running ./configure --enable-standards may solves this but
raises a ton of other problems...

- Solaris 8 doesn't have , which holds the declaration of
uint64_t-like types. Those are declared in 

Following is an email with a patch for those problems.


Best regards,

Aurélien Reynaud


___
collectd mailing list
collectd@verplant.org
http://mailman.verplant.org/listinfo/collectd


Re: [collectd] Hi, and solaris 8 compilation problems

2010-04-29 Thread Aurélien Reynaud
Le jeudi 29 avril 2010 à 10:06 +0200, Florian Forster a écrit :
> > - Solaris 8 has strtok_r(), but does not declare it by default. You
> > must at compile time define __EXTENSIONS__, or _REENTRANT or
> > _POSIX_SOURCE >= 199506L. Running ./configure --enable-standards may
> > solves this but raises a ton of other problems...
> 
> Actually, there is a configure check specifically for Solaris / this
> problem. It checks whether or not "_REENTRANT" is required for
> strtok_r(3). If you look for "strtok_r" in "configure.in" you should
> easily find it.
> 
> I'd rather fix that check then setting "__EXTENSIONS__" when on Solaris.
> Could you take a look at the file "config.log" generated by the
> configure script and try to find out why that check failed?

Florian,


you are right, this check exists but doesn't work as expected, and
that's why I missed it.

The test program produces a warning about strtok_r being undeclared, but
the check succeeds nevertheless because warnings don't cause the
compilation to fail (gcc returns 0).

The problem is that -Werror is not in the CFLAGS at the time of the
test, but it is when collectd is compiled. So the check succeeds but the
actual compilation aborts.

I am working on improving the check and will provide a new patch
shortly...


Regards,

Aurélien Reynaud




___
collectd mailing list
collectd@verplant.org
http://mailman.verplant.org/listinfo/collectd


Re: [collectd] [PATCH] Fix strtok_r check in configure.in

2010-05-01 Thread Aurélien Reynaud
Le samedi 01 mai 2010 à 10:01 +0200, Florian Forster a écrit :
> Hi,
> 
> sorry for re-doing your work, but I felt it'd be best to mimic the
> behavior of src/Makefile.am as closely as possible. I've therefore added
> a check for GCC before the block and set "-Wall -Werror" if the compiler
> is GCC. Hope this works as intended..

Hello,

I can understand this, no problem. But did you have a look at my latest
patch, which removes the need for feedling with gcc options altogether ?

After Peter's remarks I thought a little about it, and it seems (to me
at least) to be a much saner approach...



Regards,

Aurélien Reynaud


___
collectd mailing list
collectd@verplant.org
http://mailman.verplant.org/listinfo/collectd


Re: [collectd] swap plugin improvements

2010-05-25 Thread Aurélien REYNAUD
Hello,

> -Message d'origine-
> 
> I would say you want to get numbers that match swap -l.  (the last two
> numbers are blocks allocated and blocks free of just swap.)
> 

To be clear on the subject, the solaris swap command can be run with one of
the two arguments -l or -s.

Run with -s it behaves like the current swap plugin implementation
(libkstat) and gives results about the solaris swapfs, which consists of the
swap space provided by the swap devices (disks) plus a variable amount of
RAM: 

r...@uv8801xr:/root> swap -s
total: 7468592k bytes allocated + 6127480k reserved = 13596072k used,
7109144k available

This can be useful for advanced sysadmins who have a good understanding of
the solaris internals, as it implies being familiar with virtual memory, or
the behavior of tmpfs (RAM/swap based filesystem). I personally find the
resulting graphs confusing, as they show the usage of something which varies
in time and is neither swap size nor any obvious combination of swap and RAM
sizes.


Running swap -l is more straightforward. It just gives a list of disk-based
swap devices and their usage, which I think is what most admins expect:

r...@uv8801xr:/root> swap -l
swapfile dev  swaplo blocks   free
/dev/vx/dsk/swapvol 242,5  16 24576704 23138752


I suppose the solaris swap utility internally uses either the libkstat or
the swapctl method according to the option given in argument.



Regards,

Aurelien Reynaud


___
collectd mailing list
collectd@verplant.org
http://mailman.verplant.org/listinfo/collectd


Re: [collectd] Solaris vmstat info available in any current plugins

2010-06-01 Thread Aurélien REYNAUD

Hello,


> On Thu, May 27, 2010 at 10:59:43AM -0400, Cleveland Mark-RGKW63 wrote:
> > Do any of the current plugins available on Solaris provide similar
> > information to what you can get from vmstat? In particular I am
> > interested in process information w/ respect to number that are
> > running/blocked/waiting (r b w) and paging info, page in/page out info
> > (pi po)
> 
> Global page-in/page-out counters could be added to the "vmem" plugin,
> which is Linux-only at the moment.

I have been working on adding global paging I/O stats lately. I have a patch 
ready for Solaris and AIX, but I intended to test it on my machines a little 
more before going public...

Anyway I am adding these features to the swap plugin, as there is already some 
code there for the linux case, using the swap_io type from types.db :
...
swap_submit ("in", swap_in, DS_TYPE_DERIVE);
swap_submit ("out", swap_out, DS_TYPE_DERIVE);
...
Where the value are taken from /proc/vmstat.

I don't mind moving my code to the vmem plugin, but I feel we should clearly 
define what metric goes to which plugin, as the question may be raised again 
shortly.

The heart of the problem is that on unix, swap is a subset of the virtual 
memory subsystem. For example, paging is an action decided by the vm under 
memory pressure, but it does not necessarily involve swap space: pages 
containing code are simply freed and reread from the ondisk executable if 
needed. On Solaris, under some circumstances we can even page out to memory... 
All this depends upon the vm subsystem implementation which can vastly differ 
between unices.

So I propose this:
- use the vmem plugin to monitor the specifics of each vm subsystem. We cannot 
assume here that two unices will have any metrics in common. Experts will know 
what metrics are meaningful on which OS
-  use the swap plugin to monitor swap devices: their usage and their i/o. They 
are common to all unices and any admin needs to know if they are appropriately 
sized and intensively accessed or not.

Implementation wise this means for example that on solaris, paging 
(executables, anonymous, global, etc...) will be in vmem, but a subset 
(anonymous) will be used in swap to report swap devices i/o.

Regards,
Aurelien Reynaud



___
collectd mailing list
collectd@verplant.org
http://mailman.verplant.org/listinfo/collectd


[collectd] vmem plugin: AIX/Solaris support

2010-06-08 Thread Aurélien Reynaud
Hello,


this patch against git branch 4.10 adds support for AIX and Solaris to
the vmem plugin.

Many metrics are collected but, I tried to follow the current linux
implementation behavior: only vm sizes, swap in/out, memory in/out and
memory faults are active by default. Everything else is reported only if
the verbose option is set in the plugin configuration.

Worth of note:
- I created a new vmpage_counter type, as I needed something like
vmpage_number, but the collected data was of type counter instead of
gauge. I am not very fluent in rrd-speak, so it may not have been
necessary...

- the Solaris code includes the metrics currently collected by the
Solaris (kstat) part of the swap plugin (only expressed in pages rather
than bytes). It should pave the way for removal of the kstat code from
the swap plugin for 5.0, the solaris swap case being taken care of by
the new swapctl code.


Regards,

Aurélien Reynaud


___
collectd mailing list
collectd@verplant.org
http://mailman.verplant.org/listinfo/collectd


Re: [collectd] Trouble compiling collectd 4.10 on AIX 6.1 TL4

2010-06-21 Thread Aurélien REYNAUD
Hello,


I got it to compile under AIX 6.1 TL2 without trouble using:

./configure --disable-dns LDFLAGS="-Wl,-brtl"

You will possibly need some patches that may not have been merged yet.
Search the list for:

[PATCH] configure.in: htonll check depends on linker
[PATCH 1/2] processes.c: fix uninitialized variableswarnings
[PATCH 2/2] snmp.c: compilation fixes for AIX
[PATCH] Fix errno thread-safety under AIX


Regards,
Aurelien Reynaud


> -Message d'origine-
> De : collectd-boun...@verplant.org [mailto:collectd-boun...@verplant.org]
> De la part de Héctor Rivas Gándara
> Envoyé : lundi 21 juin 2010 11:33
> À : collectd@verplant.org
> Objet : [collectd] Trouble compiling collectd 4.10 on AIX 6.1 TL4
> 
> Hello,
> 
> I am trying to compile collectd on AIX, and I am having some trouble
> with link process. Versions:
>  * collectd 4.10
>  * AIX 6100-04-01-0944
>  * gcc-4.2.0-3 (from Linux ToolBox CD)
> 
> I used this configure line:
> OBJECT_MODE=64 ./configure LDFLAGS=-Wl,-brtl CFLAGS=-maix64
> --disable-dns --disable-nagios
> 
> And make fails with an undefined symbol error (see below). Any idea or
> clue?
> 
> Making all in libltdl
> make  all-am
> Target "all-am" is up to date.
> Making all in src
> make  all-recursive
> Making all in libcollectdclient
> make  all-am
> Target "all-am" is up to date.
> Making all in liboconfig
> make  all-am
> Target "all-am" is up to date.
> /bin/sh ../libtool --tag=CC--mode=link
> /cgx1/SoftwareRepository/source/collectd-4.10.0/compile gcc -Wall
> -Werror -maix64   -Wl,-brtl -o collectd-nagios collectd-nagios.o  -lm
> libcollectdclient/libcollectdclient.la
> libtool: link: /cgx1/SoftwareRepository/source/collectd-4.10.0/compile
> gcc -Wall -Werror -maix64 -Wl,-brtl -o .libs/collectd-nagios
> collectd-nagios.o  -lm -Llibcollectdclient/.libs -lcollectdclient
> -Wl,-blibpath:/usr/local/stow/collectd-
> 4.10.0/lib:/opt/freeware/lib/gcc/powerpc-ibm-
> aix6.1.0.0/4.2.0/ppc64:/opt/freeware/lib/gcc/powerpc-ibm-
> aix6.1.0.0/4.2.0:/opt/freeware/lib/gcc/powerpc-ibm-
> aix6.1.0.0/4.2.0/../../..:/usr/lib:/lib
> ld: 0711-317 ERROR: Undefined symbol: .lcc_connect
> ld: 0711-317 ERROR: Undefined symbol: .lcc_string_to_identifier
> ld: 0711-317 ERROR: Undefined symbol: .lcc_strerror
> ld: 0711-317 ERROR: Undefined symbol: .lcc_disconnect
> ld: 0711-317 ERROR: Undefined symbol: .lcc_getval
> ld: 0711-345 Use the -bloadmap or -bnoquiet option to obtain more
> information.
> collect2: ld returned 8 exit status
> make: 1254-004 The error code from the last command is 1.
> 
> 
> Stop.
> make: 1254-004 The error code from the last command is 1.
> 
> 
> Stop.
> make: 1254-004 The error code from the last command is 2.
> 
> 
> Stop.
> make: 1254-004 The error code from the last command is 1.
> 
> 
> Stop.
> 
> 
> --
> Atentamente
> Héctor Rivas
> 
> ___
> collectd mailing list
> collectd@verplant.org
> http://mailman.verplant.org/listinfo/collectd


___
collectd mailing list
collectd@verplant.org
http://mailman.verplant.org/listinfo/collectd


[collectd] [PATCH] New plugin - lpar

2010-08-10 Thread Aurélien Reynaud
Hello,


here is a patch against 4.10 adding a new plugin "lpar".

LPAR stands for Logical PARtitions, which is the virtualization solution
for IBM high-end power systems running AIX.

The standard cpu plugin shows cpu usage as a percentage of each cpu
available to the system, but in an LPAR the number of cpus and their
apparent power can vary according to the load of every LPAR sharing the
same hardware and to the policy set by the admin.

This new plugin allows to monitor real (physical) CPU usage of the
virtualized system, as well as some other metrics specific to IBM's
partitioning solution.


Regards,

Aurélien Reynaud

-- 
Love is like PI - natural, irrational, endless, and very important.
diff --git a/configure.in b/configure.in
index 75ac706..a928302 100644
--- a/configure.in
+++ b/configure.in
@@ -4292,6 +4292,7 @@ AC_PLUGIN([java],[$with_java], [Embed the Java Virtual Machine])
 AC_PLUGIN([libvirt], [$plugin_libvirt],[Virtual machine statistics])
 AC_PLUGIN([load],[$plugin_load],   [System load])
 AC_PLUGIN([logfile], [yes],[File logging plugin])
+AC_PLUGIN([lpar],[$with_perfstat], [AIX logical partitions statistics])
 AC_PLUGIN([madwifi], [$have_linux_wireless_h], [Madwifi wireless statistics])
 AC_PLUGIN([match_empty_counter], [yes],[The empty counter match])
 AC_PLUGIN([match_hashed], [yes],   [The hashed match])
@@ -4609,6 +4610,7 @@ Configuration:
 libvirt . . . . . . . $enable_libvirt
 load  . . . . . . . . $enable_load
 logfile . . . . . . . $enable_logfile
+lpar... . . . . . . . $enable_lpar
 madwifi . . . . . . . $enable_madwifi
 match_empty_counter . $enable_match_empty_counter
 match_hashed  . . . . $enable_match_hashed
diff --git a/src/Makefile.am b/src/Makefile.am
index c6b0538..3bf4184 100644
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -499,6 +499,15 @@ collectd_LDADD += "-dlopen" logfile.la
 collectd_DEPENDENCIES += logfile.la
 endif
 
+if BUILD_PLUGIN_LPAR
+pkglib_LTLIBRARIES += lpar.la
+lpar_la_SOURCES = lpar.c
+lpar_la_LDFLAGS = -module -avoid-version
+collectd_LDADD += "-dlopen" lpar.la
+collectd_DEPENDENCIES += lpar.la
+lpar_la_LIBADD = -lperfstat
+endif
+
 if BUILD_PLUGIN_MADWIFI
 pkglib_LTLIBRARIES += madwifi.la
 madwifi_la_SOURCES = madwifi.c madwifi.h
diff --git a/src/lpar.c b/src/lpar.c
new file mode 100644
index 000..cf9f94b
--- /dev/null
+++ b/src/lpar.c
@@ -0,0 +1,241 @@
+/**
+ * collectd - src/lpar.c
+ * Copyright (C) 2010  Aurélien Reynaud
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; only version 2 of the License is applicable.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St, Fifth Floor, Boston, MA  02110-1301 USA
+ *
+ * Authors:
+ *   Aurelien Reynaud 
+ **/
+
+#include "collectd.h"
+#include "common.h"
+#include "plugin.h"
+#include 
+#include 
+#include 
+#include 
+
+#ifndef XINTFRAC
+# define XINTFRAC ((double)(_system_configuration.Xint) / \
+   (double)(_system_configuration.Xfrac))
+#endif
+
+/* Max length of the type instance string */
+#define TYPE_INST_LEN (sizeof("lpar--total") + 2*sizeof(int) + 1)
+
+static const char *config_keys[] =
+{
+  "CpuPoolStats"
+};
+static int config_keys_num = STATIC_ARRAY_SIZE (config_keys);
+static int pool_stats = 0;
+
+/* As an LPAR can be moved transparently across physical systems
+ * through Live Partition Mobility (LPM), and the resources we are
+ * monitoring are tied to the underlying hardware, we need to keep
+ * track on which physical server we are currently on. This is done
+ * through the plugin instance which holds the chassis' serial.
+ */
+static char plugin_inst[SYS_NMLN];
+
+static u_longlong_t last_time_base;
+static u_longlong_t last_pcpu_user,
+last_pcpu_sys,
+last_pcpu_idle,
+last_pcpu_wait;
+static u_longlong_t last_pool_idle_time = 0;
+static u_longlong_t last_idle_donated_purr = 0,
+last_busy_donated_purr = 0,
+last_busy_stolen_purr = 0,
+last_idle_stolen_purr = 0;
+static int donate_flag = 0;
+
+
+/* Save the current values for the next iteration */
+static void save_last_values (perfstat_partition_total_t *lparstats)
+{
+	last_time_base = lparstats->timebase_last;
+
+	last_pcpu_user = lparstats->puser;
+	last_pcpu_sys  = l

Re: [collectd] [PATCH] New plugin - lpar

2010-08-19 Thread Aurélien Reynaud
plugin
> instance. If the partition is moved to another system, the physical
> ID you're using changes, and this seems to be on purpose. I'd
> however expect that you'd look for something that *doesn't* change
> to identify the partition. Something like:
>   hostname = "lpar_pool-%x", pool_id
>   plugin  = "lpar"
>   plugin_instance = "partition-%x", lpar_id / "global"
> What am I missing?

You're right, there is a secret purpose to this. ;-)
I would like to graph the total cpu usage of the chassis itself, by
adding up the individual metrics of each participating LPAR. But there
is no static list of LPARs since they can be moved across chassis'. So I
need to know which rrd's are to be considered when graphing a given
chassis.

Another possibility would be a config option like "ReportByChassis =
true" which would tell the plugin to use the chassis's serial instead of
the hostname. So the rrd layout would not be:

lpar_hostname/lpar-chassis_serial/cpu.rrd

but:

chassis_serial/lpar-lpar_hostname/cpu.rrd

> 
>   * Why not use the name included in the struct,
> (perfstat_partition_total_t *)->name?

This is the name of the LPAR as it is known by the hypervisor, ie
basically a label for a group of hardware resources. It does not
necessarily match the hostname of the AIX instance it is running,
although most sysadmins prefer when that is the case, for consistency's
sake.

 
> 
>   * What's this code doing?:
> > +   dlt_pit = lparstats.pool_idle_time - 
> > last_pool_idle_time;
> > +   total = (double)lparstats.phys_cpus_pool;
> > +   idle  = (double)dlt_pit / XINTFRAC / 
> > (double)delta_time_base;
> Why don't you use the "pool_busy_time" member?

The code sample at
http://publib.boulder.ibm.com/infocenter/aix/v6r1/index.jsp?topic=/com.ibm.aix.prftools/doc/prftools/prftools07.htm
uses lparstats.pool_idle_time. I chose to stick with it and calculate
busy as total - idle, rather than trying on my own to convert
nanoseconds to physical cpus given the (crystal clear) API definitions: 

pool_idle_time: Number of clock tics a processor in the shared pool was
idle.
pool_busy_time: Summation of busy (non-idle) time accumulated across all
partitions in the pool (nano seconds).

After all, who am I to argue with IBM coders ?



Regards,

Aurélien Reynaud



___
collectd mailing list
collectd@verplant.org
http://mailman.verplant.org/listinfo/collectd


Re: [collectd] [PATCH] New plugin - lpar

2010-08-19 Thread Aurélien Reynaud
n
> instance. If the partition is moved to another system, the physical
> ID you're using changes, and this seems to be on purpose. I'd
> however expect that you'd look for something that *doesn't* change
> to identify the partition. Something like:
>   hostname = "lpar_pool-%x", pool_id
>   plugin  = "lpar"
>   plugin_instance = "partition-%x", lpar_id / "global"
> What am I missing?

You're right, there is a secret purpose to this. ;-)
I would like to graph the total cpu usage of the chassis itself, by
adding up the individual metrics of each participating LPAR. But there
is no static list of LPARs since they can be moved across chassis'. So I
need to know which rrd's are to be considered when graphing a given
chassis.

Another possibility would be a config option like "ReportByChassis =
true" which would tell the plugin to use the chassis's serial instead of
the hostname. So the rrd layout would not be:

lpar_hostname/lpar-chassis_serial/cpu.rrd

but:

chassis_serial/lpar-lpar_hostname/cpu.rrd

> 
>   * Why not use the name included in the struct,
> (perfstat_partition_total_t *)->name?

This is the name of the LPAR as it is known by the hypervisor, ie
basically a label for a group of hardware resources. It does not
necessarily match the hostname of the AIX instance it is running,
although most sysadmins prefer when that is the case, for consistency's
sake.

> 
>   * What's this code doing?:
> > +   dlt_pit = lparstats.pool_idle_time - 
> > last_pool_idle_time;
> > +   total = (double)lparstats.phys_cpus_pool;
> > +   idle  = (double)dlt_pit / XINTFRAC / 
> > (double)delta_time_base;
> Why don't you use the "pool_busy_time" member?

The code sample at
http://publib.boulder.ibm.com/infocenter/aix/v6r1/index.jsp?topic=/com.ibm.aix.prftools/doc/prftools/prftools07.htm
uses lparstats.pool_idle_time. I chose to stick with it and calculate
busy as total - idle, rather than trying on my own to convert
nanoseconds to physical cpus given the (crystal clear) API definitions: 

pool_idle_time: Number of clock tics a processor in the shared pool was
idle.
pool_busy_time: Summation of busy (non-idle) time accumulated across all
partitions in the pool (nano seconds).

After all, who am I to argue with IBM coders ?



Regards,

Aurélien Reynaud




___
collectd mailing list
collectd@verplant.org
http://mailman.verplant.org/listinfo/collectd


[collectd] lpar plugin - sample graph

2010-08-22 Thread Aurélien Reynaud

Hello,

here is a sample graph, which may help understand what I try to achieve
with this plugin.


Regards,

Aurélien Reynaud


<>___
collectd mailing list
collectd@verplant.org
http://mailman.verplant.org/listinfo/collectd


[collectd] [PATCH] lpar plugin: new attempt

2010-09-01 Thread Aurélien Reynaud

Hello Florian,


here is a new version of my lpar plugin. I tried to address the
shortcomings of the previous attempt:

- Minimum and maximum proc capacity are gone, being static values
- The plugin now uses the cpu type for every value, so there is no need
anymore for the lpar_cpu type
- This also means there is no need anymore to compute rates in the
plugin, so the code is IMHO much more elegant
- There is a config option "ReportBySerial", as described in my previous
email
- We now use pool_busy_time directly instead of computing it from total
and idle

The patch is against the current 4.10 branch, rather than against
ar/lpar, because it is more of a complete rewrite than just fixes. I
could provide a patch against ar/lpar however if you prefer so.



Regards,

Aurélien Reynaud

diff --git a/configure.in b/configure.in
index 75ac706..a928302 100644
--- a/configure.in
+++ b/configure.in
@@ -4292,6 +4292,7 @@ AC_PLUGIN([java],[$with_java], [Embed the Java Virtual Machine])
 AC_PLUGIN([libvirt], [$plugin_libvirt],[Virtual machine statistics])
 AC_PLUGIN([load],[$plugin_load],   [System load])
 AC_PLUGIN([logfile], [yes],[File logging plugin])
+AC_PLUGIN([lpar],[$with_perfstat], [AIX logical partitions statistics])
 AC_PLUGIN([madwifi], [$have_linux_wireless_h], [Madwifi wireless statistics])
 AC_PLUGIN([match_empty_counter], [yes],[The empty counter match])
 AC_PLUGIN([match_hashed], [yes],   [The hashed match])
@@ -4609,6 +4610,7 @@ Configuration:
 libvirt . . . . . . . $enable_libvirt
 load  . . . . . . . . $enable_load
 logfile . . . . . . . $enable_logfile
+lpar... . . . . . . . $enable_lpar
 madwifi . . . . . . . $enable_madwifi
 match_empty_counter . $enable_match_empty_counter
 match_hashed  . . . . $enable_match_hashed
diff --git a/src/Makefile.am b/src/Makefile.am
index c6b0538..3bf4184 100644
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -499,6 +499,15 @@ collectd_LDADD += "-dlopen" logfile.la
 collectd_DEPENDENCIES += logfile.la
 endif
 
+if BUILD_PLUGIN_LPAR
+pkglib_LTLIBRARIES += lpar.la
+lpar_la_SOURCES = lpar.c
+lpar_la_LDFLAGS = -module -avoid-version
+collectd_LDADD += "-dlopen" lpar.la
+collectd_DEPENDENCIES += lpar.la
+lpar_la_LIBADD = -lperfstat
+endif
+
 if BUILD_PLUGIN_MADWIFI
 pkglib_LTLIBRARIES += madwifi.la
 madwifi_la_SOURCES = madwifi.c madwifi.h
diff --git a/src/lpar.c b/src/lpar.c
new file mode 100755
index 000..2267e03
--- /dev/null
+++ b/src/lpar.c
@@ -0,0 +1,202 @@
+/**
+ * collectd - src/lpar.c
+ * Copyright (C) 2010  Aurélien Reynaud
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; only version 2 of the License is applicable.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St, Fifth Floor, Boston, MA  02110-1301 USA
+ *
+ * Authors:
+ *   Aurelien Reynaud 
+ **/
+
+#include "collectd.h"
+#include "common.h"
+#include "plugin.h"
+#include 
+#include 
+#include 
+
+#ifndef XINTFRAC
+# include 
+# define XINTFRAC ((double)(_system_configuration.Xint) / \
+   (double)(_system_configuration.Xfrac))
+#endif
+#define HTIC2SEC(x)	((double)x * XINTFRAC / 10.0)
+
+/* Max length of the type instance string */
+#define TYPE_INST_LEN (sizeof("pool--total") + 2*sizeof(int) + 1)
+
+static const char *config_keys[] =
+{
+  "CpuPoolStats",
+  "ReportBySerial"
+};
+static int config_keys_num = STATIC_ARRAY_SIZE (config_keys);
+static int pool_stats = 0,
+   report_by_serial = 0;
+
+static u_longlong_t last_time_base;
+static u_longlong_t ent_counter;
+static int donate_flag = 0;
+
+
+static int lpar_config (const char *key, const char *value)
+{
+	if (strcasecmp ("CpuPoolStats", key) == 0)
+	{
+		if (IS_TRUE (value))
+			pool_stats = 1;
+	}
+	else if (strcasecmp ("ReportBySerial", key) == 0)
+	{
+		if (IS_TRUE (value))
+			report_by_serial = 1;
+	}
+	else
+	{
+		return (-1);
+	}
+
+	return (0);
+} /* int lpar_config */
+
+static int lpar_init (void)
+{
+	perfstat_partition_total_t lparstats;
+
+	/* Retrieve the initial metrics */
+	if (!perfstat_partition_total (NULL, &lparstats,
+	   sizeof (perfstat_partition_total_t), 1))
+	{
+		ERROR ("lpar plugin: perfstat_partition_total failed.");
+		return (-1);
+	}
+
+	if (!lparstats.type.b.shared_enabled && lparstats.type.b.donate_

Re: [collectd] [PATCH] New plugin - lpar

2010-09-07 Thread Aurélien Reynaud
Hi Florian,



I saw how you modified my code and I find it indeed much more elegant to
directly use raw counters as they are calculated and reported by the
system.

> Many people expect the CPU usage to be in percent. You can easily
> calculate that in the front-end as
> 
>   percent busy = 100.0 * busy / (busy + idle + )

However I beg to differ here. I think what most people want is to know
what fraction of the available processing power is being used at a given
moment. In the standard case a percentage fits perfectly, with 100%
usage meaning "the processor I am considering is fully used". It is
however implicit that we are considering the processing power of ONE
processor. This is what the standard cpu plugin already does, and this
is the whole raison d'être of the lpar plugin.

The processing power of an LPAR is not 1 nor even an integer number of
processors, so a percentage won't do: 30% usage of 2.1 CPU is not the
same as 30% usage of 0.3 CPU...

What I am saying here is that we cannot just compute a ratio with the
raw counters, we need to have the final result expressed in CPUs. And
for this we must have one metric of which we know the value both in CPUs
and in counters, so that we can compute a ratio in the frontend and
scale the graphs accordingly.

The only metric I can think of is entitled processor capacity. It can be
reported directly as a gauge, and we can compute the corresponding
processor ticks in the plugin.

> 
> It makes sense with your explanation above. I'd track it in the way
> described above, though, i.e. in terms of "processor ticks not available
> to the partition" rather than in it's absolute form. Does that make
> sense to you?

As I tried to explain above, assuming we report entitlement as a gauge,
we'll need to have it reported somehow as counters also.

Instead of reporting it directly, we could assume :

entitled = syst + user + wait + idle + unav

... but this is not always true. As "uncapped" LPARs can consume more
than they are entitled to, this sum cannot be considered a constant.

We could directly report cpu-entitled as a counter (calculated as
ent_proc_cap * time_diff in the plugin as you already did) but IMHO
there would be little value-added: its only justification would be to
help calculate a ratio together with the "entitled" gauge. Why not
report the ratio directly then?

What I propose instead is to report percent-entitled and entitled as
gauges. Then we could show in the graphs:

- entitled(CPUs)
- consumed(CPUs) = entitled(CPUs) * percent-entitled / 100

(which in itself is already very useful) and then assuming 
  consumed(ticks) = syst(ticks) + user(ticks) + wait(ticks)
+ idle(ticks)
go into further detail with

- syst(CPUs) = syst(ticks) * consumed(CPUs) / consumed(ticks)
- user(CPUs) = user(ticks) * consumed(CPUs) / consumed(ticks)
- wait(CPUs) = wait(ticks) * consumed(CPUs) / consumed(ticks)
- idle(CPUs) = idle(ticks) * consumed(CPUs) / consumed(ticks)
- unav(CPUs) = max ( 0, entitled(CPUs) - consumed(CPUs))

I will send you patches against ar/lpar implementing this shortly (I
need some time to code, compile and test on production machines).


Please tell me if this suits you or any suggestions you may have.


Regards,

Aurélien Reynaud


___
collectd mailing list
collectd@verplant.org
http://mailman.verplant.org/listinfo/collectd


Re: [collectd] [PATCH] New plugin - lpar

2010-09-07 Thread Aurélien Reynaud
Hello Florian,


> Yeah, since the LPAR documentation itself always talks in terms of
> "virtual CPUs" we might as well adopt that terminology in the plugin.
> 
> Currently I tend towards the solution I think you originally
> implemented: Introduce a "vcpu_gauge" (or simply "vcpu") type and track
> usage in terms of virtual CPUs. I.e. if the partition spends 123% of a
> virtual CPU executing user code, store 1.23 in the "user" value.

That's fine with me. I will try to provide a patch against ar/lpar which
reverts to the original implementation while retaining the various
changes you made wrt coding style.


Regards,

Aurélien Reynaud




___
collectd mailing list
collectd@verplant.org
http://mailman.verplant.org/listinfo/collectd


[collectd] [PATCH] lpar plugin update

2010-09-09 Thread Aurélien Reynaud
Hello,


here is a new patch against ar/lpar with the following features :

- get back to the original implementation with gauges only. A new type
"vcpu" is created (it was "lpar_pcpu" in the original)

- I tried to keep as much as possible of your changes, but some have
been reverted (the init function is back) because it was simpler for me
to port my previous code rather than adapt it to the current form. Feel
free to change them again

- the "consumed" metric might seem superfluous at first sight as it
could be calculated in the frontend in the general case. But I thought
it might come in handy when dealing with dedicated partitions, where
donated and stolen values are no easy concepts. Not everyone wants to
dig into the code and the APIs to find out what they mean and whether
they should be added to or substracted from other values...



As a side note, one  of the changes you introduced was better checking
of the return status from perfstat_partition_total() using errno. This
reminded me that under AIX errno is by default unsafe to use in a
multithreaded environment (which collectd is). I posted a fix ("Fix
errno thread-safety under AIX") on Sat, 19 Jun 2010, which if I am not
mistaken has not been merged yet.


Best regards,

Aurélien Reynaud
diff --git a/src/lpar.c b/src/lpar.c
index be4738b..4f7f444 100644
--- a/src/lpar.c
+++ b/src/lpar.c
@@ -16,7 +16,7 @@
  * 51 Franklin St, Fifth Floor, Boston, MA  02110-1301 USA
  *
  * Authors:
- *   Aurelien Reynaud 
+ *   Aurélien Reynaud 
  **/
 
 #include "collectd.h"
@@ -27,6 +27,13 @@
 #include 
 #include 
 
+/* XINTFRAC was defined in libperfstat.h somewhere between AIX 5.3 and 6.1 */
+#ifndef XINTFRAC
+# include 
+# define XINTFRAC ((double)(_system_configuration.Xint) / \
+   (double)(_system_configuration.Xfrac))
+#endif
+
 static const char *config_keys[] =
 {
   "CpuPoolStats",
@@ -36,6 +43,45 @@ static int config_keys_num = STATIC_ARRAY_SIZE (config_keys);
 
 static _Bool pool_stats = 0;
 static _Bool report_by_serial = 0;
+static _Bool donate_flag = 0;
+static char serial[SYS_NMLN];
+
+static u_longlong_t time_old;
+static u_longlong_t user_old,
+syst_old,
+idle_old,
+wait_old;
+static u_longlong_t pool_busy_time_old,
+pool_max_time_old;
+static u_longlong_t idle_donated_old,
+busy_donated_old,
+busy_stolen_old,
+idle_stolen_old;
+
+
+static void save_last_values (perfstat_partition_total_t *lparstats)
+{
+	time_old = lparstats->timebase_last;
+
+	user_old = lparstats->puser;
+	syst_old = lparstats->psys;
+	idle_old = lparstats->pidle;
+	wait_old = lparstats->pwait;
+
+	if (donate_flag)
+	{
+		idle_donated_old = lparstats->idle_donated_purr;
+		busy_donated_old = lparstats->busy_donated_purr;
+		busy_stolen_old  = lparstats->busy_stolen_purr;
+		idle_stolen_old  = lparstats->idle_stolen_purr;
+	}
+
+	if (pool_stats)
+	{
+		pool_busy_time_old = lparstats->pool_busy_time;
+		pool_max_time_old  = lparstats->pool_max_time;
+	}
+} /* void save_last_values */
 
 static int lpar_config (const char *key, const char *value)
 {
@@ -61,33 +107,54 @@ static int lpar_config (const char *key, const char *value)
 	return (0);
 } /* int lpar_config */
 
-static void lpar_submit (const char *type_instance, counter_t value)
+static int lpar_init (void)
+{
+	perfstat_partition_total_t lparstats;
+	int status;
+
+	/* Retrieve the initial metrics. Returns the number of structures filled. */
+	status = perfstat_partition_total (/* name = */ NULL, /* (must be NULL) */
+			&lparstats, sizeof (perfstat_partition_total_t),
+			/* number = */ 1 /* (must be 1) */);
+	if (status != 1)
+	{
+		char errbuf[1024];
+		ERROR ("lpar plugin: perfstat_partition_total failed: %s (%i)",
+sstrerror (errno, errbuf, sizeof (errbuf)),
+status);
+		return (-1);
+	}
+
+	if (!lparstats.type.b.shared_enabled && lparstats.type.b.donate_enabled)
+	{
+		donate_flag = 1;
+	}
+
+	if (pool_stats && !lparstats.type.b.pool_util_authority)
+	{
+		WARNING ("lpar plugin: This partition does not have pool authority. "
+"Disabling CPU pool statistics collection.");
+		pool_stats = 0;
+	}
+
+	/* Save the initial data */
+	save_last_values (&lparstats);
+
+	return (0);
+} /* int lpar_init */
+
+static void lpar_submit (const char *type_instance, double value)
 {
 	value_t values[1];
 	value_list_t vl = VALUE_LIST_INIT;
 
-	/* Although it appears as a double, value is really a (scaled) counter,
-	   expressed in CPU x seconds. At high collection rates (< 1 min), its
-	   integer part is very small and the resulting graphs get blocky. We regain
-	   some precision by applying a x100 factor before casting it to a counter,
-	   turning the final value into CPU units instead of CPUs. */
-	values

Re: [collectd] [PATCH] lpar plugin update

2010-09-11 Thread Aurélien Reynaud
Hi Florian,


> thanks for the update :) I've pushed the changes to the "ar/lpar"
> branch. [...] makes a lot more sense now that I understand what's going on ;)

No problem, I've been throught this myself! Your will to thoroughly
understand every part of the code is IMHO the main reason behind
collectd's excellent quality.

> > - the "consumed" metric might seem superfluous at first sight […]
> >   But I thought it might come in handy when dealing with dedicated
> >   partitions, where donated and stolen values are no easy concepts.
> 
> Would it make sense to activate this metric only if the partition is a
> dedicated partition and donations have been enabled? Likewise, would it make 
> sense to submit "entitled" capacity only if the
> partition is a shared partition?

I'm not sure. I have been thinking about this since my last post, and
here is what I finally came out with:

'consumed' is a generic concept, materializing the hardware processing
power really allocated to the partition by the hypervisor. 

Be it a shared or dedicated partition, comparing this value to the
hard/soft limits you configured (entitlement, capping, donation) is
possibly what admins care the most about. Many will be satisfied with
just having the 'entitled' and 'consumed' metrics.

When you see things this way, knowing which part of these two main
metrics are in fact spent in io, system, user, stolen from other
partitions, donated to them, or just idling could appear as mere
details, useful but not essential.

So maybe we could collect only entitled and consumed by default, and
provide some 'Details = true/false' option to enable the collection of
the different processor states.


Does it make sense to you?

> > I posted a fix ("Fix errno thread-safety under AIX") [...] I applied
> the fix to the collectd-4.9 branch and will merge it to master
> eventually.

Thanks.

> > +   ssnprintf (typinst, sizeof (typinst), "pool-%X-total", 
> > lparstats.pool_id);
> > +   lpar_submit (typinst, (double) pool_max_ns / XINTFRAC / 
> > (double) ticks);
> 
> I'd prefer to account "busy" and "used" (non-busy) rather than "busy"
> and "total". Do you see any problem with changing that?

None. I can provide a patch, unless you want to code it yourself. I just
find a little confusing to have busy/used as metrics. Maybe busy/idle or
used/unused would be more straightforward?

> 
> > +   save_last_values (&lparstats);
> 
> I think it might be easier to keep a (module) global
> "perfstat_partition_total_t" around and simply do
> 
>   memcpy (&lparstats_old, lparstats_new, sizeof (lparstats_old));
> 
> in the "save values" function. What do you think?

Fine with me. Just need to make sure no struct members are pointers to
other structs. We would have to recursively duplicate them or document
that some members of the copy are not accessible. IMHO the
implementation requiring the fewest lines of code will be the best
here...


Regards,

Aurélien Reynaud




___
collectd mailing list
collectd@verplant.org
http://mailman.verplant.org/listinfo/collectd


Re: [collectd] [PATCH] lpar plugin update

2010-09-11 Thread Aurélien Reynaud

Hello Florian,

> yeah, and now I'm craving for a POWER7-based system and am beginning to
> think that 10,000 EUR for a server is "cheap" and it's all your fault! ;)

Maybe you could get in touch with IBM ? They might be willing to sponsor
your work... ;-)
/sarcastic mode off

> No, thank you for figuring this out :) I didn't find anything useful on
> "_THREAD_SAFE", by the way, other than something along the lines of
> "your thread provider should set this". So I'm assuming that an equally
> well solution would be to include  before . Just a
> guess, though.

I can't say much more. As you said the documentation is scarce. I had to
dig into AIX's header files to find about it, and to be honest I don't
really want to dig further into this as my itch is now scratched.

> Oh, "used" was a typo. I meant "idle", of course. The reason for this
> (and to some degree, the "consumed" / "entitled" business above) is that
> I prefer each tick to be accounted for *once*. So the busy ticks would
> be included in both, the "busy" metric and the "total" metric. When both
> are equally available (or trivially computable) I prefer used/unused
> over used/total.

It make sense to me, can't but agree with your approach..

> I'll probably implement this tonight, unless you beat me to it. It
> should basically be one subtraction and a changed string .. ;)

> So maybe we could collect only entitled and consumed by default, and
> > provide some 'Details = true/false' option to enable the collection
of
> > the different processor states.
> 
> Would be okay for me. Power to the user ;)

I'll be out of office until sept 20 (being at IBM's, learning about AIX
clusters) so I won't have access to my test machines for a moment. If
you are willing to make the changes I will test them and provide
feedback wrt compilation etc as soon as I return. Otherwise I will code
them myself and post them after some testing.


Regards,

Aurélien Reynaud


___
collectd mailing list
collectd@verplant.org
http://mailman.verplant.org/listinfo/collectd


[collectd] lpar plugin: ready for inclusion ?

2010-09-23 Thread Aurélien Reynaud
Hello Florian,


with this latest patch, I think the lpar plugin is now ready for
inclusion as far as I am concerned, functionnally speaking.

It compiles and runs flawlessly on my production machines, and does
everything I intended it to.

I am of course willing to tweak it some more if you have comments wrt
coding style or configuration handling.


Regards,
Aurélien Reynaud


___
collectd mailing list
collectd@verplant.org
http://mailman.verplant.org/listinfo/collectd


Re: [collectd] [PATCH] lpar plugin: use pool_idle_time to account for cpu pool usage

2010-09-26 Thread Aurélien Reynaud
Hi Florian,


> > The current implementation uses pool_busy_time (expressed in ns) but
> > experience shows this metric isn't accurate: It shows lower cpu usage
> > for the entire pool than the sum of the participating lpars.
> > Using pool_idle_time (expressed in clock ticks) in contrast is almost
> > a perfect match.
> 
> thanks for the update! :) So what you're saying is that "busy + idle"
> may not be equal to "max"?

Not quite, just that the calculations that the plugin did with the
pool_busy_time parameter did not give the expected result. This may be
because the calculations are somehow wrong or because the parameter
itself accounts for something we are not aware of (maybe power saving,
as you suggested).

I suspect the calculations, as pool_busy_time and pool_idle_time are
expressed in different units, and though the calculations are the
same...

>  If so, What happens to the missing CPU
> cycles? Would it make sense to keep track of this separately? Something
> like "missing = max - (idle + busy)" could be used, for example.
> 
> I think I remember something about ticks varying in the time they
> consume, due to power-saving facilities built into the CPUs. This would
> explain why the (physical) CPU time available to the cluster is measured
> in nanoseconds rather than ticks. Also, if there are more and shorter
> ticks in the same wallclock time due to power-saving measures, this
> would explain the perceived lower CPU usage when converting the ns back
> to ticks using a larger "ns per tick" constant. So maybe the "missing"
> metric above could be named "power_save". What do you think?
> 
> Regarding the patch, I'd like to propose one tweak:
> 
> > -#define NS_TO_TICKS(ns) ((ns) / XINTFRAC)
> > [...]
> > +   pool_idle_cpus = (double) (lparstats.pool_idle_time - 
> > lparstats_old.pool_idle_time) / XINTFRAC / (double) ticks;
> 
> I'd really like to keep this macro: "diff / XINTFRAC / ticks" doesn't do
> a good job at describing to the reader what's going on. With the macro
> this becomes "NS_TO_TICKS (diff) / ticks": you can see without looking
> at the macro's implementation that "diff" is converted from nanoseconds
> to ticks and then divided by ticks, which results in a ratio.

I totally agree with you. I removed I macro not because I didn't like it
but because I didn't know how to properly name it: according to
libperfstat.h pool_idle_time is in 'clock ticks' (which is not the same
as physical processor ticks by a factor of XINTFRAC). I didn't want to
name the macro TICKS_TO_TICKS()...
By the way, this seems to indicate that the calculation I used to
convert ns to processor ticks was wrong, which in turn could explain why
the graphs didn't match.


Regards,

Aurélien Reynaud



___
collectd mailing list
collectd@verplant.org
http://mailman.verplant.org/listinfo/collectd


Re: [collectd] [PATCH] lpar plugin: use pool_idle_time to account for cpu pool usage

2010-10-07 Thread Aurélien Reynaud

Le mercredi 06 octobre 2010 à 14:37 +0200, Florian Forster a écrit :
> It'd be awesome if you could test my changes and tell me whether the
> plugin now works as expected or if further changes are required. I'll
> merge the branch as soon as you give me the thumbs-up ;)
> 

Hi Florian,


I'd be grateful if you could postpone the merge of this plugin for a few
days. I recently spotted something which might be worth investigating.

On IBM pSeries machines there is a "feature" called Capacity on Demand
(CoD), which in short consists in IBM logically disabling some of the
installed physical processors, and then charging you for unlocking the
extra processing capacity if you happen to need it.

Using the phys_cpus_pool number could be misleading in this case, as you
might think by looking at the graphs that you still have plenty of power
left, when in reality some of the processors you're considering are
disabled and unusable.

I am looking for a way to take this into account.

I am wondering whether pool_max_time doesn't do just this, ie indicate
the activated/online/available/paid processing power instead of the
physically present cpus...

By the way this could also be the cause of the difference between the
results when calculating from pool_busy_time vs from pool_idle_time!


I will come back to you shortly, probably next week, as soon as I
complete a few tests...
 

Regards,
Aurélien Reynaud



___
collectd mailing list
collectd@verplant.org
http://mailman.verplant.org/listinfo/collectd


Re: [collectd] swap plugin improvements

2010-10-07 Thread Aurélien Reynaud
Hello Florian,


Le mercredi 06 octobre 2010 à 11:43 +0200, Florian Forster a écrit :
> 
> sorry for the very delayed reply. Fortunately I've kept track of this
> patch in the version 5.0 wiki page, otherwise I would have forgotten
> about it.
> Okay, if I understand you correctly, then "-l" reports the swap usage
> from the device's perspective while "-s" uses the point of view from
> virtual memory.

Exactly.

> I think either view has its advantages, so I think the best way to go is
> to let the user chose what to see. I can't really think if a clever name
> for the config option though. What do you think about "ReportPhysical" /
> "ReportVirtual"? (Setting both to true will result in both views being
> collected.)

In my original posting, I suggested moving the "-s" (virtual memory) to
the vmem plugin which is currently linux-only. Don't you think this
would make sense?

> 
> With regard to the physical view: Would it make sense to report this on
> a per-disk (per-partition) basis? For example:
> 
> > r...@uv8801xr:/root> swap -l
> > swapfile dev  swaplo blocks   free
> > /dev/vx/dsk/swapvol 242,5  16 24576704 23138752
> 
> This could be reported as
> 
>   uv8801xr/swap-vx-dsk-swapvol/swap-used = 24576704 - 23138752
>   uv8801xr/swap-vx-dsk-swapvol/swap-free = 23138752
> 
> We could possibly configurate this behavior with something like
>   ReportPhysical no/yes/separate/combined
> where "yes" and "combined" are synonymous.

Sounds reasonable to be, as long as "combined" is the default. The total
values are what IT execs and developers are interested in. The exact
physical layout of swap is for admins fine-tuning their systems...

By the way, I also use collectd for producing monthly reports and
capacity planning (headed to decision makers rather than technicians)!


Regards,

Aurélien Reynaud


___
collectd mailing list
collectd@verplant.org
http://mailman.verplant.org/listinfo/collectd


[collectd] [PATCH 1/3] lpar plugin: check for donation support in libperfstat

2010-10-14 Thread Aurélien Reynaud
From: Aurelien Reynaud 


Signed-off-by: Aurelien Reynaud 
---
 configure.in |6 ++
 src/lpar.c   |6 ++
 2 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/configure.in b/configure.in
index fcbd57c..5abca73 100644
--- a/configure.in
+++ b/configure.in
@@ -1193,6 +1193,12 @@ fi
 if test "x$with_perfstat" = "xyes"
 then
 AC_DEFINE(HAVE_PERFSTAT, 1, [Define to 1 if you have the 'perfstat' 
library (-lperfstat)])
+# struct members pertaining to donation have been added to libperfstat 
somewhere between AIX5.3ML5 and AIX5.3ML9
+AC_CHECK_MEMBER([perfstat_partition_type_t.b.donate_enabled], [], [], 
[[#include  ticks given to another partition
@@ -224,6 +229,7 @@ static int lpar_read (void)
/* Donated ticks will be accounted for as stolen ticks in other 
LPARs */
consumed_ticks += idle_stolen_ticks + busy_stolen_ticks;
}
+#endif
 
lpar_submit ("consumed", (double) consumed_ticks / (double) ticks);
 
-- 
1.7.2.2


___
collectd mailing list
collectd@verplant.org
http://mailman.verplant.org/listinfo/collectd


[collectd] lpar plugin final patches

2010-10-14 Thread Aurélien Reynaud
Hello Florian,
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

here is a final series of patches regarding the lpar plugin.

The first one adds a check in configure.in, testing whether libperfstat
has the struct members relative to donation accounting. Libperfstat is
continuously updated to support the latest AIX features, and donation
wasn't supported on early versions of AIX 5.3. The patch also adds #if's
to lpar.c to conditionnally disable the donation-related code.

The second patch updates the comment regarding the "busy" vs "idle" question
about pool statistics. As I said in a previous mail, "busy" could probably be
used, provided we knew how to properly use it. However the problem is obsoleted
by the fact that the "busy" member was added some time after the first releases
of AIX 5.3. That's probably why IBM used "idle" in its sample code...

The third and last patch only renames the NS_TO_TICKS macro for consistency's
sake, as the only parameter using it is pool_idle_time which is expressed in
clock ticks, not in nanoseconds.


With these three patches, tested and working on production machines ranging
from AIX 5.3ML5 to AIX 6.1TL3, I am fully satisfied with the state of the
plugin. As far as I am concerned it is ready for inclusion.

Regards,
Aurélien Reynaud



___
collectd mailing list
collectd@verplant.org
http://mailman.verplant.org/listinfo/collectd


[collectd] [PATCH 2/3] lpar plugin: update commen t regarding poll_idle vs pool_busy

2010-10-14 Thread Aurélien Reynaud
From: Aurelien Reynaud 


Signed-off-by: Aurelien Reynaud 
---
 src/lpar.c |5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/src/lpar.c b/src/lpar.c
index b9471fb..d25715e 100644
--- a/src/lpar.c
+++ b/src/lpar.c
@@ -241,9 +241,8 @@ static int lpar_read (void)
double pool_busy_cpus;
 
/* We're calculating "busy" from "idle" and the total number of
-* CPUs, because according to Aurélien Reynaud using the "busy"
-* member yields values that differ from the values produced by
-* the LPAR command line tools. --octo */
+* CPUs, because the "busy" member didn't exist in early 
versions
+* of libperfstat. It was added somewhere between AIX 5.3 ML5 
and ML9. */
pool_idle_ns = lparstats.pool_idle_time - 
lparstats_old.pool_idle_time;
pool_idle_cpus = NS_TO_TICKS ((double) pool_idle_ns) / (double) 
ticks;
pool_busy_cpus = ((double) lparstats.phys_cpus_pool) - 
pool_idle_cpus;
-- 
1.7.2.2


___
collectd mailing list
collectd@verplant.org
http://mailman.verplant.org/listinfo/collectd


[collectd] [PATCH 3/3] lpar plugin: rename NS_TO_TICKS() macro to CLOCKTICKS_TO_TICKS()

2010-10-14 Thread Aurélien Reynaud
From: Aurelien Reynaud 


Signed-off-by: Aurelien Reynaud 
---
 src/lpar.c |8 
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/lpar.c b/src/lpar.c
index d25715e..4d53447 100644
--- a/src/lpar.c
+++ b/src/lpar.c
@@ -34,7 +34,7 @@
(double)(_system_configuration.Xfrac))
 #endif
 
-#define NS_TO_TICKS(ns) ((ns) / XINTFRAC)
+#define CLOCKTICKS_TO_TICKS(cticks) ((cticks) / XINTFRAC)
 
 static const char *config_keys[] =
 {
@@ -236,15 +236,15 @@ static int lpar_read (void)
if (pool_stats)
{
char typinst[DATA_MAX_NAME_LEN];
-   u_longlong_t pool_idle_ns;
+   u_longlong_t pool_idle_cticks;
double pool_idle_cpus;
double pool_busy_cpus;
 
/* We're calculating "busy" from "idle" and the total number of
 * CPUs, because the "busy" member didn't exist in early 
versions
 * of libperfstat. It was added somewhere between AIX 5.3 ML5 
and ML9. */
-   pool_idle_ns = lparstats.pool_idle_time - 
lparstats_old.pool_idle_time;
-   pool_idle_cpus = NS_TO_TICKS ((double) pool_idle_ns) / (double) 
ticks;
+   pool_idle_cticks = lparstats.pool_idle_time - 
lparstats_old.pool_idle_time;
+   pool_idle_cpus = CLOCKTICKS_TO_TICKS ((double) 
pool_idle_cticks) / (double) ticks;
pool_busy_cpus = ((double) lparstats.phys_cpus_pool) - 
pool_idle_cpus;
if (pool_busy_cpus < 0.0)
pool_busy_cpus = 0.0;
-- 
1.7.2.2


___
collectd mailing list
collectd@verplant.org
http://mailman.verplant.org/listinfo/collectd


Re: [collectd] lpar plugin final patches

2010-11-17 Thread Aurélien Reynaud
Hello Florian,

Le samedi 06 novembre 2010 à 10:01 +0100, Florian Forster a écrit :
> thank you very much for you continued work on the plugin and your
> patience with me ;)

No problem. collectd is an excellent piece of software, very cleanly
written. I must say I'm rather proud my code finally met these
standards... Thank you for your advices!

> I applied your patches and merged the "ar/lpar" branch to the "master"
> branch. So the plugin will definitely make it into version 5.0. :)

Great! No more syncing and patching...
Please let me know if you need some more help for the docs (sample
graphs, manpages, etc.)


Regards,

Aurélien Reynaud


___
collectd mailing list
collectd@verplant.org
http://mailman.verplant.org/listinfo/collectd


Re: [collectd] swap plugin improvements

2010-12-17 Thread Aurélien Reynaud
Le lundi 06 décembre 2010 à 11:27 +0100, Florian Forster a écrit :
> I've just tinkered with the Swap plugin so swapctl(2) and kstat(3KSTAT)
> can both be used on Solaris. The result is a mess of preprocessor #if
> blocks and weird defaults, but I think all in all the default behavior
> is intuitive:
> 
>   * If swapctl(2) is available, it is the default facility. You can set
> the "ReportPhysical" option to "separate" to get per-device
> statistics.
>   * If kstat(3KSTAT) is available, too, you can set "ReportVirtual" to
> "true" to enable virtual swap statistics. You can also set
> "ReportPhysical" to "false" to disable physical swap information and
> get the collectd 4 behavior.
>   * If kstat(3KSTAT) is available and swapctl(2) isn't, the plugin will
> behave like it did under collectd 4.
> 
> > Sounds reasonable to be, as long as "combined" is the default. The
> > total values are what IT execs and developers are interested in. The
> > exact physical layout of swap is for admins fine-tuning their
> > systems...
> 
> I totally agree ;)
> 
> On Thu, Oct 07, 2010 at 11:24:24PM +0200, Aurélien Reynaud wrote:
> > In my original posting, I suggested moving the "-s" (virtual memory)
> > to the vmem plugin which is currently linux-only. Don't you think this
> > would make sense?
> 
> I guess it'd make sense. I don't have the experience with Solaris to
> implement useful vmem statistics, though.
> 
> It might make sense to remove the kstat code from the swap plugin
> altogether, even if this means that the functionality won't be available
> at first. It can then be added to the vmem plugin whenever someone with
> enough knowledge of Solaris virtual memory steps up and implements it.
> What do you think?

Hello Florian,

I think you're right. The basic (and arguably most important)
functionality is provided by swapctl() which has been available for a
while and will probably remain so for the foreseeable future. Backwards
compatibility is sacred on industrial products. So I vote for removing
the kstat() code from the swap plugin altogether.

It should indeed be moved to the vmem plugin. I could have provided a
patch for this but couldn't have by any means done better than you
would. I don't have enough expertise in Solaris to make sure the metrics
would be adequate for somenone needing something useful.

As you said, we maybe we should wait for someneone competent to step up.
Or maybe we could blindly port the code just to show it can be done, and
expect someone to get so scandalized that he starts throwing stones at
us before showing us how it should have been done...

But compatibility and consistency are important for collectd too, so the
first solution is probably the best.


Regards,
Aurélien Reynaud


___
collectd mailing list
collectd@verplant.org
http://mailman.verplant.org/listinfo/collectd