Hi Bernard,

I took a closer look and though I think it could be done it might be very ugly for the following reasons:

- AIX is still using RPM version 3.0.5 and I am not aware of any intentions to upgrade anytime soon - Like I said I think it could be consolidated, however, that would probably require tons of "%ifarch ppc" and "%ifnarch ppc" defines which would make the SPEC file rather hard to read - AIX RPM is installing all the software under the /opt/freeware directory hierarchy (to better distinguish from the AIX base filesets), therefore lots of different file locations in the SPEC file would have to "ifdef'ed" as mentioned above. - All the Linux specific stuff like "chkconfig" would have to be "%ifdef'ed" appropriately.

A quick solution would probably to just rename the committed ganglia.aix.spec to maybe ganglia.spec.aix so your rpmbuild command doesn't get mixed up.

I'll give it a try and see how far I get along but the end result might be ugly :-)

Regards,
Michael

Bernard Li wrote:
Hi Michael:

Any chance you can also work on merging the ganglia.aix.spec file back
to the mainstream .spec file?  I'm about to change configure.in to
only include the specific spec file depending on the OS, but I think
the better solution is just to merge the two.  Right now I cannot just
generate the distribution tarball and run 'rpmbuild -ta' since there
are 2 spec files.

Thanks in advance,

Bernard

On 4/2/07, Michael Perzl <[EMAIL PROTECTED]> wrote:

 Hi Martin,

if possible I would like to somehow take my version (after some reviewing)
:-)    , as it contains all the new POWER5 stuff already.

My understanding is - as it would require some changes to protocol.x - that
my changes won't have a chance to get into the core Ganglia source code
until version 3.1 comes along.

 This code and everything else (RPMs) can be found on my website
http://www.perzl.org/ganglia/.

This stuff is actually in use at quite many customer sites already (runs on AIX 4.3.3, 5.1, 5.2 and 5.3) so I would like to keep that POWER5-stuff in if
possible. Actually, an AIX gmond implementation without the POWER5-stuff
based on my implementation could be done very easy (just stripping off the
POWER5-addons).

 Regards,
 Michael

 Martin Knoblauch wrote:
 Michael, Andreas,

 any chance that you could consolidate the two versions of the AIX
metrics that seem to be around? Seem you are the ones who have worked
most recently on the AIX implementation.

Cheers
Martin

--- Michael Perzl <[EMAIL PROTECTED]> wrote:



 Andreas,

thank you for taking the blame but you are off the hook here. ;-)

If I understood David correctly, he is using my AIX Ganglia RPM
packages
with POWER5 extensions. Here most if not all implementation of how
the
metrics are collected under AIX have been changed. Everything is
documented on my homepage (http://www.perzl.org/ganglia/) though.
So everything what goes wrong here is entiremy my fault :-[

After some investigating and some discussions with Nigel I have come
to
terms with the following facts regarding the bytes_in/bytes_out
problem:
- libperfstat (the library on AIX which obtains all the system
performance data) uses u_longlong_t data types (these are definitely
64-bit large).
- The AIX kernel internally, though, may probably not be using 64-bit

data types - more realistic is probably unsigned 32-bit - in order
not
to break compatibility (my personal opinion)
- The consequence now is that integer overrun may occur much easier
with
32-bit data types than with 64-bit data types (we all probably don't
live long enough to see that happen).

Please take a look at my implementation of the bytes_in metric (the
bytes_out implementation is accordingly):

01 g_val_t
02 bytes_in_func( void )
03 {
04 g_val_t val;
05 perfstat_netinterface_total_t n;
06 static u_longlong_t last_bytes_in = 0, bytes_in;
07 static double last_time = 0.0;
08 double now, delta_t;
09 struct timeval timeValue;
10 struct timezone timeZone;
11
12 gettimeofday( &timeValue, &timeZone );
13
14 now = (double) (timeValue.tv_sec - boottime) +
(timeValue.tv_usec
/ 1000000.0);
15
16 if (perfstat_netinterface_total( NULL, &n, sizeof(
perfstat_netinterface_total_t ), 1 ) == -1)
17 val.f = 0.0;
18 else
19 {
20 bytes_in = n.ibytes;
21
22 delta_t = now - last_time;
23
24 if ( delta_t )
25 val.f = (double) (bytes_in - last_bytes_in) / delta_t;
26 else
27 val.f = 0.0;
28
29 last_bytes_in = bytes_in;
30 }
31
32 last_time = now;
33
34 return( val );
35 }

In my opinion the overrun occurs in line #25 when "bytes_in <
last_bytes_in".
In my naivity I had assumed as both are of type u_longlong_t that an
integer overrun might never happen.

So to solve the overrun a check for "bytes_in < last_bytes_in" must
be
introduced, something like:

u_longlong_t d;
d = bytes_in - last_bytes_in;
if (d < 0) d += ULONG_MAX;

and line #25 would essentially become
25 val.f = (double) d / delta_t;

Comments ?

Regards,
Michael

PS: David, the reason why you don't see it happen with pkts_in and
pkts_out is that probably no overrun so far has occurred but at some
point it will also happen.

PPS: David, if this is a solution (I want some comments on that
before,
though) then I would be building new RPMs with the then hopefully
correct code.

Andreas Schoenfeld wrote:


 Hi David and Martin,

I suppose the network code is still the code I wrote, so there are

 two


 problems I know of:
1. yes there is a problem with owerflows
2. the shown network traffic is the sum of all network interfaces
including local loopback devices (lo0...).

Both Problems could lead to astonishing data transfer rate in

 ganglia.


 Sorry I had promised to fix the problems, but there was to much

 other


 work ...

Best regards
 Andreas




 Date: Thu, 29 Mar 2007 08:21:38 -0700 (PDT)
From: Martin Knoblauch <[EMAIL PROTECTED]>
Subject: Re: [Ganglia-general] Help! I have a petabyte/s network
To: David Wong <[EMAIL PROTECTED]>,

 [EMAIL PROTECTED],



 [EMAIL PROTECTED]
Message-ID: <[EMAIL PROTECTED]>
Content-Type: text/plain; charset=iso-8859-1

David,

 good catch. I will have to look at it for a bit.

Cheers
Martin
--- David Wong <[EMAIL PROTECTED]> wrote:





 I don't write much code nowadays, so I'm going to need a lot of

 help





 with this.

I dug through the ganglia code and I found this interesting

 tidbit in





 libmetrics/aix/metrics.c which may be indicative of the problem.

There's an assignment from cur_ninfo.ibytes to

 cur_net_stat.ibytes,





 but
the types of the two variables are different.

net_stat::ibytes is a double:

struct net_stat{
 double ipackets;
 double opackets;
 double ibytes;
 double obytes;
} cur_net_stat;

and we have *ninfo declared here:

perfstat_netinterface_total_t ninfo[2],*last_ninfo, *cur_ninfo ;

libperfstat.h has perfstat_netinterface_total_t::ibytes as
u_longlong_t.

Does this code try to do what I think it is doing, i.e. assign

 an





 unsigned 64 bit integer to a signed 64bit integer?

I'm willing to test the code if someone who's more adept at

 coding





 and
building will take on the challenge.

It looks to me that the type mismatch will have to fixed in a

 few





 places, such as CALC_NETSTAT, and we'll have to add an unsigned

 long





 long to g_val_t too. Those are the ones I can see so far.

David Wong
Senior Systems Engineer
Management Dynamics, Inc.
Phone: 201-804-6127
[EMAIL PROTECTED]

-----Original Message-----
From: Martin Knoblauch [mailto:[EMAIL PROTECTED]
Sent: Wednesday, March 28, 2007 12:00 PM
To: David Wong; [EMAIL PROTECTED]
Subject: Re: [Ganglia-general] Help! I have a petabyte/s network

David,

 as far as I remember, the AIX metrics code had an
overflow/wrap-around
problem prior to 3.0.4. Maybe the fixes are not thorough enough.

 The packets/sec are of course less affected.

Cheers
Martin





------------------------------------------------------
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www: http://www.knobisoft.de



-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers




Reply via email to