Hi Michael: Filing a bug and attaching the patch would be nice. Or you could just post it here.
Thanks, Bernard On 4/2/07, Michael Perzl <[EMAIL PROTECTED]> wrote:
Hi Bernard, I now have a consolidated SPEC file (I think it is ugly :-) ), so how do you want me to send it to you (I guess not posting to the mailing list :-) ) ? Regards, Michael Bernard Li wrote: > Hi Michael: > > Thanks for looking into this. Yes, I am aware the spec file may get > bloated but I think ultimately this will be better for one (or more > person) to manage (as opposed to managing multiple files). > > What do other devs/users think? > > BTW, I'm cc: Marcus to see if he has any specific insights on this :-) > > Cheers, > > Bernard > > On 4/2/07, Michael Perzl <[EMAIL PROTECTED]> wrote: >> Hi Bernard, >> >> I took a closer look and though I think it could be done it might be >> very ugly for the following reasons: >> >> - AIX is still using RPM version 3.0.5 and I am not aware of any >> intentions to upgrade anytime soon >> - Like I said I think it could be consolidated, however, that would >> probably require tons of "%ifarch ppc" and "%ifnarch ppc" defines which >> would make the SPEC file rather hard to read >> - AIX RPM is installing all the software under the /opt/freeware >> directory hierarchy (to better distinguish from the AIX base filesets), >> therefore lots of different file locations in the SPEC file would have >> to "ifdef'ed" as mentioned above. >> - All the Linux specific stuff like "chkconfig" would have to be >> "%ifdef'ed" appropriately. >> >> A quick solution would probably to just rename the committed >> ganglia.aix.spec to maybe ganglia.spec.aix so your rpmbuild command >> doesn't get mixed up. >> >> I'll give it a try and see how far I get along but the end result might >> be ugly :-) >> >> Regards, >> Michael >> >> Bernard Li wrote: >> > Hi Michael: >> > >> > Any chance you can also work on merging the ganglia.aix.spec file back >> > to the mainstream .spec file? I'm about to change configure.in to >> > only include the specific spec file depending on the OS, but I think >> > the better solution is just to merge the two. Right now I cannot just >> > generate the distribution tarball and run 'rpmbuild -ta' since there >> > are 2 spec files. >> > >> > Thanks in advance, >> > >> > Bernard >> > >> > On 4/2/07, Michael Perzl <[EMAIL PROTECTED]> wrote: >> >> >> >> Hi Martin, >> >> >> >> if possible I would like to somehow take my version (after some >> >> reviewing) >> >> :-) , as it contains all the new POWER5 stuff already. >> >> >> >> My understanding is - as it would require some changes to protocol.x >> >> - that >> >> my changes won't have a chance to get into the core Ganglia source >> code >> >> until version 3.1 comes along. >> >> >> >> This code and everything else (RPMs) can be found on my website >> >> http://www.perzl.org/ganglia/. >> >> >> >> This stuff is actually in use at quite many customer sites already >> >> (runs on >> >> AIX 4.3.3, 5.1, 5.2 and 5.3) so I would like to keep that >> >> POWER5-stuff in if >> >> possible. Actually, an AIX gmond implementation without the >> POWER5-stuff >> >> based on my implementation could be done very easy (just stripping >> >> off the >> >> POWER5-addons). >> >> >> >> Regards, >> >> Michael >> >> >> >> Martin Knoblauch wrote: >> >> Michael, Andreas, >> >> >> >> any chance that you could consolidate the two versions of the AIX >> >> metrics that seem to be around? Seem you are the ones who have worked >> >> most recently on the AIX implementation. >> >> >> >> Cheers >> >> Martin >> >> >> >> --- Michael Perzl <[EMAIL PROTECTED]> wrote: >> >> >> >> >> >> >> >> Andreas, >> >> >> >> thank you for taking the blame but you are off the hook here. ;-) >> >> >> >> If I understood David correctly, he is using my AIX Ganglia RPM >> >> packages >> >> with POWER5 extensions. Here most if not all implementation of how >> >> the >> >> metrics are collected under AIX have been changed. Everything is >> >> documented on my homepage (http://www.perzl.org/ganglia/) though. >> >> So everything what goes wrong here is entiremy my fault :-[ >> >> >> >> After some investigating and some discussions with Nigel I have come >> >> to >> >> terms with the following facts regarding the bytes_in/bytes_out >> >> problem: >> >> - libperfstat (the library on AIX which obtains all the system >> >> performance data) uses u_longlong_t data types (these are definitely >> >> 64-bit large). >> >> - The AIX kernel internally, though, may probably not be using 64-bit >> >> >> >> data types - more realistic is probably unsigned 32-bit - in order >> >> not >> >> to break compatibility (my personal opinion) >> >> - The consequence now is that integer overrun may occur much easier >> >> with >> >> 32-bit data types than with 64-bit data types (we all probably don't >> >> live long enough to see that happen). >> >> >> >> Please take a look at my implementation of the bytes_in metric (the >> >> bytes_out implementation is accordingly): >> >> >> >> 01 g_val_t >> >> 02 bytes_in_func( void ) >> >> 03 { >> >> 04 g_val_t val; >> >> 05 perfstat_netinterface_total_t n; >> >> 06 static u_longlong_t last_bytes_in = 0, bytes_in; >> >> 07 static double last_time = 0.0; >> >> 08 double now, delta_t; >> >> 09 struct timeval timeValue; >> >> 10 struct timezone timeZone; >> >> 11 >> >> 12 gettimeofday( &timeValue, &timeZone ); >> >> 13 >> >> 14 now = (double) (timeValue.tv_sec - boottime) + >> >> (timeValue.tv_usec >> >> / 1000000.0); >> >> 15 >> >> 16 if (perfstat_netinterface_total( NULL, &n, sizeof( >> >> perfstat_netinterface_total_t ), 1 ) == -1) >> >> 17 val.f = 0.0; >> >> 18 else >> >> 19 { >> >> 20 bytes_in = n.ibytes; >> >> 21 >> >> 22 delta_t = now - last_time; >> >> 23 >> >> 24 if ( delta_t ) >> >> 25 val.f = (double) (bytes_in - last_bytes_in) / delta_t; >> >> 26 else >> >> 27 val.f = 0.0; >> >> 28 >> >> 29 last_bytes_in = bytes_in; >> >> 30 } >> >> 31 >> >> 32 last_time = now; >> >> 33 >> >> 34 return( val ); >> >> 35 } >> >> >> >> In my opinion the overrun occurs in line #25 when "bytes_in < >> >> last_bytes_in". >> >> In my naivity I had assumed as both are of type u_longlong_t that an >> >> integer overrun might never happen. >> >> >> >> So to solve the overrun a check for "bytes_in < last_bytes_in" must >> >> be >> >> introduced, something like: >> >> >> >> u_longlong_t d; >> >> d = bytes_in - last_bytes_in; >> >> if (d < 0) d += ULONG_MAX; >> >> >> >> and line #25 would essentially become >> >> 25 val.f = (double) d / delta_t; >> >> >> >> Comments ? >> >> >> >> Regards, >> >> Michael >> >> >> >> PS: David, the reason why you don't see it happen with pkts_in and >> >> pkts_out is that probably no overrun so far has occurred but at some >> >> point it will also happen. >> >> >> >> PPS: David, if this is a solution (I want some comments on that >> >> before, >> >> though) then I would be building new RPMs with the then hopefully >> >> correct code. >> >> >> >> Andreas Schoenfeld wrote: >> >> >> >> >> >> Hi David and Martin, >> >> >> >> I suppose the network code is still the code I wrote, so there are >> >> >> >> two >> >> >> >> >> >> problems I know of: >> >> 1. yes there is a problem with owerflows >> >> 2. the shown network traffic is the sum of all network interfaces >> >> including local loopback devices (lo0...). >> >> >> >> Both Problems could lead to astonishing data transfer rate in >> >> >> >> ganglia. >> >> >> >> >> >> Sorry I had promised to fix the problems, but there was to much >> >> >> >> other >> >> >> >> >> >> work ... >> >> >> >> Best regards >> >> Andreas >> >> >> >> >> >> >> >> >> >> Date: Thu, 29 Mar 2007 08:21:38 -0700 (PDT) >> >> From: Martin Knoblauch <[EMAIL PROTECTED]> >> >> Subject: Re: [Ganglia-general] Help! I have a petabyte/s network >> >> To: David Wong <[EMAIL PROTECTED]>, >> >> >> >> [EMAIL PROTECTED], >> >> >> >> >> >> >> >> [EMAIL PROTECTED] >> >> Message-ID: <[EMAIL PROTECTED]> >> >> Content-Type: text/plain; charset=iso-8859-1 >> >> >> >> David, >> >> >> >> good catch. I will have to look at it for a bit. >> >> >> >> Cheers >> >> Martin >> >> --- David Wong <[EMAIL PROTECTED]> wrote: >> >> >> >> >> >> >> >> >> >> >> >> I don't write much code nowadays, so I'm going to need a lot of >> >> >> >> help >> >> >> >> >> >> >> >> >> >> >> >> with this. >> >> >> >> I dug through the ganglia code and I found this interesting >> >> >> >> tidbit in >> >> >> >> >> >> >> >> >> >> >> >> libmetrics/aix/metrics.c which may be indicative of the problem. >> >> >> >> There's an assignment from cur_ninfo.ibytes to >> >> >> >> cur_net_stat.ibytes, >> >> >> >> >> >> >> >> >> >> >> >> but >> >> the types of the two variables are different. >> >> >> >> net_stat::ibytes is a double: >> >> >> >> struct net_stat{ >> >> double ipackets; >> >> double opackets; >> >> double ibytes; >> >> double obytes; >> >> } cur_net_stat; >> >> >> >> and we have *ninfo declared here: >> >> >> >> perfstat_netinterface_total_t ninfo[2],*last_ninfo, *cur_ninfo ; >> >> >> >> libperfstat.h has perfstat_netinterface_total_t::ibytes as >> >> u_longlong_t. >> >> >> >> Does this code try to do what I think it is doing, i.e. assign >> >> >> >> an >> >> >> >> >> >> >> >> >> >> >> >> unsigned 64 bit integer to a signed 64bit integer? >> >> >> >> I'm willing to test the code if someone who's more adept at >> >> >> >> coding >> >> >> >> >> >> >> >> >> >> >> >> and >> >> building will take on the challenge. >> >> >> >> It looks to me that the type mismatch will have to fixed in a >> >> >> >> few >> >> >> >> >> >> >> >> >> >> >> >> places, such as CALC_NETSTAT, and we'll have to add an unsigned >> >> >> >> long >> >> >> >> >> >> >> >> >> >> >> >> long to g_val_t too. Those are the ones I can see so far. >> >> >> >> David Wong >> >> Senior Systems Engineer >> >> Management Dynamics, Inc. >> >> Phone: 201-804-6127 >> >> [EMAIL PROTECTED] >> >> >> >> -----Original Message----- >> >> From: Martin Knoblauch [mailto:[EMAIL PROTECTED] >> >> Sent: Wednesday, March 28, 2007 12:00 PM >> >> To: David Wong; [EMAIL PROTECTED] >> >> Subject: Re: [Ganglia-general] Help! I have a petabyte/s network >> >> >> >> David, >> >> >> >> as far as I remember, the AIX metrics code had an >> >> overflow/wrap-around >> >> problem prior to 3.0.4. Maybe the fixes are not thorough enough. >> >> >> >> The packets/sec are of course less affected. >> >> >> >> Cheers >> >> Martin >> >> >> >> >> >> >> >> >> >> >> >> ------------------------------------------------------ >> >> Martin Knoblauch >> >> email: k n o b i AT knobisoft DOT de >> >> www: http://www.knobisoft.de >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------- >> >> >> >> >> Take Surveys. Earn Cash. Influence the Future of IT >> >> Join SourceForge.net's Techsay panel and you'll get the chance to >> >> share your >> >> opinions on IT & business topics through brief surveys-and earn cash >> >> >> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV >> >> >> >> >> _______________________________________________ >> >> Ganglia-developers mailing list >> >> Ganglia-developers@lists.sourceforge.net >> >> https://lists.sourceforge.net/lists/listinfo/ganglia-developers >> >> >> >> >> > >> >