Hi Bernard,
I took a closer look and though I think it could be done it might be
very ugly for the following reasons:
- AIX is still using RPM version 3.0.5 and I am not aware of any
intentions to upgrade anytime soon
- Like I said I think it could be consolidated, however, that would
probably require tons of "%ifarch ppc" and "%ifnarch ppc" defines which
would make the SPEC file rather hard to read
- AIX RPM is installing all the software under the /opt/freeware
directory hierarchy (to better distinguish from the AIX base filesets),
therefore lots of different file locations in the SPEC file would have
to "ifdef'ed" as mentioned above.
- All the Linux specific stuff like "chkconfig" would have to be
"%ifdef'ed" appropriately.
A quick solution would probably to just rename the committed
ganglia.aix.spec to maybe ganglia.spec.aix so your rpmbuild command
doesn't get mixed up.
I'll give it a try and see how far I get along but the end result might
be ugly :-)
Regards,
Michael
Bernard Li wrote:
> Hi Michael:
>
> Any chance you can also work on merging the ganglia.aix.spec file back
> to the mainstream .spec file? I'm about to change configure.in to
> only include the specific spec file depending on the OS, but I think
> the better solution is just to merge the two. Right now I cannot just
> generate the distribution tarball and run 'rpmbuild -ta' since there
> are 2 spec files.
>
> Thanks in advance,
>
> Bernard
>
> On 4/2/07, Michael Perzl <[EMAIL PROTECTED]> wrote:
>>
>> Hi Martin,
>>
>> if possible I would like to somehow take my version (after some
>> reviewing)
>> :-) , as it contains all the new POWER5 stuff already.
>>
>> My understanding is - as it would require some changes to protocol.x
>> - that
>> my changes won't have a chance to get into the core Ganglia source
code
>> until version 3.1 comes along.
>>
>> This code and everything else (RPMs) can be found on my website
>> http://www.perzl.org/ganglia/.
>>
>> This stuff is actually in use at quite many customer sites already
>> (runs on
>> AIX 4.3.3, 5.1, 5.2 and 5.3) so I would like to keep that
>> POWER5-stuff in if
>> possible. Actually, an AIX gmond implementation without the
POWER5-stuff
>> based on my implementation could be done very easy (just stripping
>> off the
>> POWER5-addons).
>>
>> Regards,
>> Michael
>>
>> Martin Knoblauch wrote:
>> Michael, Andreas,
>>
>> any chance that you could consolidate the two versions of the AIX
>> metrics that seem to be around? Seem you are the ones who have worked
>> most recently on the AIX implementation.
>>
>> Cheers
>> Martin
>>
>> --- Michael Perzl <[EMAIL PROTECTED]> wrote:
>>
>>
>>
>> Andreas,
>>
>> thank you for taking the blame but you are off the hook here. ;-)
>>
>> If I understood David correctly, he is using my AIX Ganglia RPM
>> packages
>> with POWER5 extensions. Here most if not all implementation of how
>> the
>> metrics are collected under AIX have been changed. Everything is
>> documented on my homepage (http://www.perzl.org/ganglia/) though.
>> So everything what goes wrong here is entiremy my fault :-[
>>
>> After some investigating and some discussions with Nigel I have come
>> to
>> terms with the following facts regarding the bytes_in/bytes_out
>> problem:
>> - libperfstat (the library on AIX which obtains all the system
>> performance data) uses u_longlong_t data types (these are definitely
>> 64-bit large).
>> - The AIX kernel internally, though, may probably not be using 64-bit
>>
>> data types - more realistic is probably unsigned 32-bit - in order
>> not
>> to break compatibility (my personal opinion)
>> - The consequence now is that integer overrun may occur much easier
>> with
>> 32-bit data types than with 64-bit data types (we all probably don't
>> live long enough to see that happen).
>>
>> Please take a look at my implementation of the bytes_in metric (the
>> bytes_out implementation is accordingly):
>>
>> 01 g_val_t
>> 02 bytes_in_func( void )
>> 03 {
>> 04 g_val_t val;
>> 05 perfstat_netinterface_total_t n;
>> 06 static u_longlong_t last_bytes_in = 0, bytes_in;
>> 07 static double last_time = 0.0;
>> 08 double now, delta_t;
>> 09 struct timeval timeValue;
>> 10 struct timezone timeZone;
>> 11
>> 12 gettimeofday( &timeValue, &timeZone );
>> 13
>> 14 now = (double) (timeValue.tv_sec - boottime) +
>> (timeValue.tv_usec
>> / 1000000.0);
>> 15
>> 16 if (perfstat_netinterface_total( NULL, &n, sizeof(
>> perfstat_netinterface_total_t ), 1 ) == -1)
>> 17 val.f = 0.0;
>> 18 else
>> 19 {
>> 20 bytes_in = n.ibytes;
>> 21
>> 22 delta_t = now - last_time;
>> 23
>> 24 if ( delta_t )
>> 25 val.f = (double) (bytes_in - last_bytes_in) / delta_t;
>> 26 else
>> 27 val.f = 0.0;
>> 28
>> 29 last_bytes_in = bytes_in;
>> 30 }
>> 31
>> 32 last_time = now;
>> 33
>> 34 return( val );
>> 35 }
>>
>> In my opinion the overrun occurs in line #25 when "bytes_in <
>> last_bytes_in".
>> In my naivity I had assumed as both are of type u_longlong_t that an
>> integer overrun might never happen.
>>
>> So to solve the overrun a check for "bytes_in < last_bytes_in" must
>> be
>> introduced, something like:
>>
>> u_longlong_t d;
>> d = bytes_in - last_bytes_in;
>> if (d < 0) d += ULONG_MAX;
>>
>> and line #25 would essentially become
>> 25 val.f = (double) d / delta_t;
>>
>> Comments ?
>>
>> Regards,
>> Michael
>>
>> PS: David, the reason why you don't see it happen with pkts_in and
>> pkts_out is that probably no overrun so far has occurred but at some
>> point it will also happen.
>>
>> PPS: David, if this is a solution (I want some comments on that
>> before,
>> though) then I would be building new RPMs with the then hopefully
>> correct code.
>>
>> Andreas Schoenfeld wrote:
>>
>>
>> Hi David and Martin,
>>
>> I suppose the network code is still the code I wrote, so there are
>>
>> two
>>
>>
>> problems I know of:
>> 1. yes there is a problem with owerflows
>> 2. the shown network traffic is the sum of all network interfaces
>> including local loopback devices (lo0...).
>>
>> Both Problems could lead to astonishing data transfer rate in
>>
>> ganglia.
>>
>>
>> Sorry I had promised to fix the problems, but there was to much
>>
>> other
>>
>>
>> work ...
>>
>> Best regards
>> Andreas
>>
>>
>>
>>
>> Date: Thu, 29 Mar 2007 08:21:38 -0700 (PDT)
>> From: Martin Knoblauch <[EMAIL PROTECTED]>
>> Subject: Re: [Ganglia-general] Help! I have a petabyte/s network
>> To: David Wong <[EMAIL PROTECTED]>,
>>
>> [EMAIL PROTECTED],
>>
>>
>>
>> [EMAIL PROTECTED]
>> Message-ID: <[EMAIL PROTECTED]>
>> Content-Type: text/plain; charset=iso-8859-1
>>
>> David,
>>
>> good catch. I will have to look at it for a bit.
>>
>> Cheers
>> Martin
>> --- David Wong <[EMAIL PROTECTED]> wrote:
>>
>>
>>
>>
>>
>> I don't write much code nowadays, so I'm going to need a lot of
>>
>> help
>>
>>
>>
>>
>>
>> with this.
>>
>> I dug through the ganglia code and I found this interesting
>>
>> tidbit in
>>
>>
>>
>>
>>
>> libmetrics/aix/metrics.c which may be indicative of the problem.
>>
>> There's an assignment from cur_ninfo.ibytes to
>>
>> cur_net_stat.ibytes,
>>
>>
>>
>>
>>
>> but
>> the types of the two variables are different.
>>
>> net_stat::ibytes is a double:
>>
>> struct net_stat{
>> double ipackets;
>> double opackets;
>> double ibytes;
>> double obytes;
>> } cur_net_stat;
>>
>> and we have *ninfo declared here:
>>
>> perfstat_netinterface_total_t ninfo[2],*last_ninfo, *cur_ninfo ;
>>
>> libperfstat.h has perfstat_netinterface_total_t::ibytes as
>> u_longlong_t.
>>
>> Does this code try to do what I think it is doing, i.e. assign
>>
>> an
>>
>>
>>
>>
>>
>> unsigned 64 bit integer to a signed 64bit integer?
>>
>> I'm willing to test the code if someone who's more adept at
>>
>> coding
>>
>>
>>
>>
>>
>> and
>> building will take on the challenge.
>>
>> It looks to me that the type mismatch will have to fixed in a
>>
>> few
>>
>>
>>
>>
>>
>> places, such as CALC_NETSTAT, and we'll have to add an unsigned
>>
>> long
>>
>>
>>
>>
>>
>> long to g_val_t too. Those are the ones I can see so far.
>>
>> David Wong
>> Senior Systems Engineer
>> Management Dynamics, Inc.
>> Phone: 201-804-6127
>> [EMAIL PROTECTED]
>>
>> -----Original Message-----
>> From: Martin Knoblauch [mailto:[EMAIL PROTECTED]
>> Sent: Wednesday, March 28, 2007 12:00 PM
>> To: David Wong; [EMAIL PROTECTED]
>> Subject: Re: [Ganglia-general] Help! I have a petabyte/s network
>>
>> David,
>>
>> as far as I remember, the AIX metrics code had an
>> overflow/wrap-around
>> problem prior to 3.0.4. Maybe the fixes are not thorough enough.
>>
>> The packets/sec are of course less affected.
>>
>> Cheers
>> Martin
>>
>>
>>
>>
>>
>> ------------------------------------------------------
>> Martin Knoblauch
>> email: k n o b i AT knobisoft DOT de
>> www: http://www.knobisoft.de
>>
>>
>>
>>
-------------------------------------------------------------------------
>>
>> Take Surveys. Earn Cash. Influence the Future of IT
>> Join SourceForge.net's Techsay panel and you'll get the chance to
>> share your
>> opinions on IT & business topics through brief surveys-and earn cash
>>
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
>>
>> _______________________________________________
>> Ganglia-developers mailing list
>> Ganglia-developers@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/ganglia-developers
>>
>>
>