Hi Michael: Thanks for looking into this. Yes, I am aware the spec file may get bloated but I think ultimately this will be better for one (or more person) to manage (as opposed to managing multiple files).
What do other devs/users think? BTW, I'm cc: Marcus to see if he has any specific insights on this :-) Cheers, Bernard On 4/2/07, Michael Perzl <[EMAIL PROTECTED]> wrote:
Hi Bernard, I took a closer look and though I think it could be done it might be very ugly for the following reasons: - AIX is still using RPM version 3.0.5 and I am not aware of any intentions to upgrade anytime soon - Like I said I think it could be consolidated, however, that would probably require tons of "%ifarch ppc" and "%ifnarch ppc" defines which would make the SPEC file rather hard to read - AIX RPM is installing all the software under the /opt/freeware directory hierarchy (to better distinguish from the AIX base filesets), therefore lots of different file locations in the SPEC file would have to "ifdef'ed" as mentioned above. - All the Linux specific stuff like "chkconfig" would have to be "%ifdef'ed" appropriately. A quick solution would probably to just rename the committed ganglia.aix.spec to maybe ganglia.spec.aix so your rpmbuild command doesn't get mixed up. I'll give it a try and see how far I get along but the end result might be ugly :-) Regards, Michael Bernard Li wrote: > Hi Michael: > > Any chance you can also work on merging the ganglia.aix.spec file back > to the mainstream .spec file? I'm about to change configure.in to > only include the specific spec file depending on the OS, but I think > the better solution is just to merge the two. Right now I cannot just > generate the distribution tarball and run 'rpmbuild -ta' since there > are 2 spec files. > > Thanks in advance, > > Bernard > > On 4/2/07, Michael Perzl <[EMAIL PROTECTED]> wrote: >> >> Hi Martin, >> >> if possible I would like to somehow take my version (after some >> reviewing) >> :-) , as it contains all the new POWER5 stuff already. >> >> My understanding is - as it would require some changes to protocol.x >> - that >> my changes won't have a chance to get into the core Ganglia source code >> until version 3.1 comes along. >> >> This code and everything else (RPMs) can be found on my website >> http://www.perzl.org/ganglia/. >> >> This stuff is actually in use at quite many customer sites already >> (runs on >> AIX 4.3.3, 5.1, 5.2 and 5.3) so I would like to keep that >> POWER5-stuff in if >> possible. Actually, an AIX gmond implementation without the POWER5-stuff >> based on my implementation could be done very easy (just stripping >> off the >> POWER5-addons). >> >> Regards, >> Michael >> >> Martin Knoblauch wrote: >> Michael, Andreas, >> >> any chance that you could consolidate the two versions of the AIX >> metrics that seem to be around? Seem you are the ones who have worked >> most recently on the AIX implementation. >> >> Cheers >> Martin >> >> --- Michael Perzl <[EMAIL PROTECTED]> wrote: >> >> >> >> Andreas, >> >> thank you for taking the blame but you are off the hook here. ;-) >> >> If I understood David correctly, he is using my AIX Ganglia RPM >> packages >> with POWER5 extensions. Here most if not all implementation of how >> the >> metrics are collected under AIX have been changed. Everything is >> documented on my homepage (http://www.perzl.org/ganglia/) though. >> So everything what goes wrong here is entiremy my fault :-[ >> >> After some investigating and some discussions with Nigel I have come >> to >> terms with the following facts regarding the bytes_in/bytes_out >> problem: >> - libperfstat (the library on AIX which obtains all the system >> performance data) uses u_longlong_t data types (these are definitely >> 64-bit large). >> - The AIX kernel internally, though, may probably not be using 64-bit >> >> data types - more realistic is probably unsigned 32-bit - in order >> not >> to break compatibility (my personal opinion) >> - The consequence now is that integer overrun may occur much easier >> with >> 32-bit data types than with 64-bit data types (we all probably don't >> live long enough to see that happen). >> >> Please take a look at my implementation of the bytes_in metric (the >> bytes_out implementation is accordingly): >> >> 01 g_val_t >> 02 bytes_in_func( void ) >> 03 { >> 04 g_val_t val; >> 05 perfstat_netinterface_total_t n; >> 06 static u_longlong_t last_bytes_in = 0, bytes_in; >> 07 static double last_time = 0.0; >> 08 double now, delta_t; >> 09 struct timeval timeValue; >> 10 struct timezone timeZone; >> 11 >> 12 gettimeofday( &timeValue, &timeZone ); >> 13 >> 14 now = (double) (timeValue.tv_sec - boottime) + >> (timeValue.tv_usec >> / 1000000.0); >> 15 >> 16 if (perfstat_netinterface_total( NULL, &n, sizeof( >> perfstat_netinterface_total_t ), 1 ) == -1) >> 17 val.f = 0.0; >> 18 else >> 19 { >> 20 bytes_in = n.ibytes; >> 21 >> 22 delta_t = now - last_time; >> 23 >> 24 if ( delta_t ) >> 25 val.f = (double) (bytes_in - last_bytes_in) / delta_t; >> 26 else >> 27 val.f = 0.0; >> 28 >> 29 last_bytes_in = bytes_in; >> 30 } >> 31 >> 32 last_time = now; >> 33 >> 34 return( val ); >> 35 } >> >> In my opinion the overrun occurs in line #25 when "bytes_in < >> last_bytes_in". >> In my naivity I had assumed as both are of type u_longlong_t that an >> integer overrun might never happen. >> >> So to solve the overrun a check for "bytes_in < last_bytes_in" must >> be >> introduced, something like: >> >> u_longlong_t d; >> d = bytes_in - last_bytes_in; >> if (d < 0) d += ULONG_MAX; >> >> and line #25 would essentially become >> 25 val.f = (double) d / delta_t; >> >> Comments ? >> >> Regards, >> Michael >> >> PS: David, the reason why you don't see it happen with pkts_in and >> pkts_out is that probably no overrun so far has occurred but at some >> point it will also happen. >> >> PPS: David, if this is a solution (I want some comments on that >> before, >> though) then I would be building new RPMs with the then hopefully >> correct code. >> >> Andreas Schoenfeld wrote: >> >> >> Hi David and Martin, >> >> I suppose the network code is still the code I wrote, so there are >> >> two >> >> >> problems I know of: >> 1. yes there is a problem with owerflows >> 2. the shown network traffic is the sum of all network interfaces >> including local loopback devices (lo0...). >> >> Both Problems could lead to astonishing data transfer rate in >> >> ganglia. >> >> >> Sorry I had promised to fix the problems, but there was to much >> >> other >> >> >> work ... >> >> Best regards >> Andreas >> >> >> >> >> Date: Thu, 29 Mar 2007 08:21:38 -0700 (PDT) >> From: Martin Knoblauch <[EMAIL PROTECTED]> >> Subject: Re: [Ganglia-general] Help! I have a petabyte/s network >> To: David Wong <[EMAIL PROTECTED]>, >> >> [EMAIL PROTECTED], >> >> >> >> [EMAIL PROTECTED] >> Message-ID: <[EMAIL PROTECTED]> >> Content-Type: text/plain; charset=iso-8859-1 >> >> David, >> >> good catch. I will have to look at it for a bit. >> >> Cheers >> Martin >> --- David Wong <[EMAIL PROTECTED]> wrote: >> >> >> >> >> >> I don't write much code nowadays, so I'm going to need a lot of >> >> help >> >> >> >> >> >> with this. >> >> I dug through the ganglia code and I found this interesting >> >> tidbit in >> >> >> >> >> >> libmetrics/aix/metrics.c which may be indicative of the problem. >> >> There's an assignment from cur_ninfo.ibytes to >> >> cur_net_stat.ibytes, >> >> >> >> >> >> but >> the types of the two variables are different. >> >> net_stat::ibytes is a double: >> >> struct net_stat{ >> double ipackets; >> double opackets; >> double ibytes; >> double obytes; >> } cur_net_stat; >> >> and we have *ninfo declared here: >> >> perfstat_netinterface_total_t ninfo[2],*last_ninfo, *cur_ninfo ; >> >> libperfstat.h has perfstat_netinterface_total_t::ibytes as >> u_longlong_t. >> >> Does this code try to do what I think it is doing, i.e. assign >> >> an >> >> >> >> >> >> unsigned 64 bit integer to a signed 64bit integer? >> >> I'm willing to test the code if someone who's more adept at >> >> coding >> >> >> >> >> >> and >> building will take on the challenge. >> >> It looks to me that the type mismatch will have to fixed in a >> >> few >> >> >> >> >> >> places, such as CALC_NETSTAT, and we'll have to add an unsigned >> >> long >> >> >> >> >> >> long to g_val_t too. Those are the ones I can see so far. >> >> David Wong >> Senior Systems Engineer >> Management Dynamics, Inc. >> Phone: 201-804-6127 >> [EMAIL PROTECTED] >> >> -----Original Message----- >> From: Martin Knoblauch [mailto:[EMAIL PROTECTED] >> Sent: Wednesday, March 28, 2007 12:00 PM >> To: David Wong; [EMAIL PROTECTED] >> Subject: Re: [Ganglia-general] Help! I have a petabyte/s network >> >> David, >> >> as far as I remember, the AIX metrics code had an >> overflow/wrap-around >> problem prior to 3.0.4. Maybe the fixes are not thorough enough. >> >> The packets/sec are of course less affected. >> >> Cheers >> Martin >> >> >> >> >> >> ------------------------------------------------------ >> Martin Knoblauch >> email: k n o b i AT knobisoft DOT de >> www: http://www.knobisoft.de >> >> >> >> ------------------------------------------------------------------------- >> >> Take Surveys. Earn Cash. Influence the Future of IT >> Join SourceForge.net's Techsay panel and you'll get the chance to >> share your >> opinions on IT & business topics through brief surveys-and earn cash >> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV >> >> _______________________________________________ >> Ganglia-developers mailing list >> Ganglia-developers@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/ganglia-developers >> >> >