Hi Bernard,

I now have a consolidated SPEC file (I think it is ugly :-) ), so how do you want me to send it to you (I guess not posting to the mailing list :-) ) ?

Regards,
Michael

Bernard Li wrote:
Hi Michael:

Thanks for looking into this.  Yes, I am aware the spec file may get
bloated but I think ultimately this will be better for one (or more
person) to manage (as opposed to managing multiple files).

What do other devs/users think?

BTW, I'm cc: Marcus to see if he has any specific insights on this :-)

Cheers,

Bernard

On 4/2/07, Michael Perzl <[EMAIL PROTECTED]> wrote:
Hi Bernard,

I took a closer look and though I think it could be done it might be
very ugly for the following reasons:

- AIX is still using RPM version 3.0.5 and I am not aware of any
intentions to upgrade anytime soon
- Like I said I think it could be consolidated, however, that would
probably require tons of "%ifarch ppc" and "%ifnarch ppc" defines which
would make the SPEC file rather hard to read
- AIX RPM is installing all the software under the /opt/freeware
directory hierarchy (to better distinguish from the AIX base filesets),
therefore lots of different file locations in the SPEC file would have
to "ifdef'ed" as mentioned above.
- All the Linux specific stuff like "chkconfig" would have to be
"%ifdef'ed" appropriately.

A quick solution would probably to just rename the committed
ganglia.aix.spec to maybe ganglia.spec.aix so your rpmbuild command
doesn't get mixed up.

I'll give it a try and see how far I get along but the end result might
be ugly :-)

Regards,
Michael

Bernard Li wrote:
> Hi Michael:
>
> Any chance you can also work on merging the ganglia.aix.spec file back
> to the mainstream .spec file?  I'm about to change configure.in to
> only include the specific spec file depending on the OS, but I think
> the better solution is just to merge the two.  Right now I cannot just
> generate the distribution tarball and run 'rpmbuild -ta' since there
> are 2 spec files.
>
> Thanks in advance,
>
> Bernard
>
> On 4/2/07, Michael Perzl <[EMAIL PROTECTED]> wrote:
>>
>>  Hi Martin,
>>
>>  if possible I would like to somehow take my version (after some
>> reviewing)
>> :-)    , as it contains all the new POWER5 stuff already.
>>
>>  My understanding is - as it would require some changes to protocol.x
>> - that
>> my changes won't have a chance to get into the core Ganglia source code
>> until version 3.1 comes along.
>>
>>  This code and everything else (RPMs) can be found on my website
>> http://www.perzl.org/ganglia/.
>>
>>  This stuff is actually in use at quite many customer sites already
>> (runs on
>> AIX 4.3.3, 5.1, 5.2 and 5.3) so I would like to keep that
>> POWER5-stuff in if
>> possible. Actually, an AIX gmond implementation without the POWER5-stuff
>> based on my implementation could be done very easy (just stripping
>> off the
>> POWER5-addons).
>>
>>  Regards,
>>  Michael
>>
>>  Martin Knoblauch wrote:
>>  Michael, Andreas,
>>
>>  any chance that you could consolidate the two versions of the AIX
>> metrics that seem to be around? Seem you are the ones who have worked
>> most recently on the AIX implementation.
>>
>> Cheers
>> Martin
>>
>> --- Michael Perzl <[EMAIL PROTECTED]> wrote:
>>
>>
>>
>>  Andreas,
>>
>> thank you for taking the blame but you are off the hook here. ;-)
>>
>> If I understood David correctly, he is using my AIX Ganglia RPM
>> packages
>> with POWER5 extensions. Here most if not all implementation of how
>> the
>> metrics are collected under AIX have been changed. Everything is
>> documented on my homepage (http://www.perzl.org/ganglia/) though.
>> So everything what goes wrong here is entiremy my fault :-[
>>
>> After some investigating and some discussions with Nigel I have come
>> to
>> terms with the following facts regarding the bytes_in/bytes_out
>> problem:
>> - libperfstat (the library on AIX which obtains all the system
>> performance data) uses u_longlong_t data types (these are definitely
>> 64-bit large).
>> - The AIX kernel internally, though, may probably not be using 64-bit
>>
>> data types - more realistic is probably unsigned 32-bit - in order
>> not
>> to break compatibility (my personal opinion)
>> - The consequence now is that integer overrun may occur much easier
>> with
>> 32-bit data types than with 64-bit data types (we all probably don't
>> live long enough to see that happen).
>>
>> Please take a look at my implementation of the bytes_in metric (the
>> bytes_out implementation is accordingly):
>>
>> 01 g_val_t
>> 02 bytes_in_func( void )
>> 03 {
>> 04 g_val_t val;
>> 05 perfstat_netinterface_total_t n;
>> 06 static u_longlong_t last_bytes_in = 0, bytes_in;
>> 07 static double last_time = 0.0;
>> 08 double now, delta_t;
>> 09 struct timeval timeValue;
>> 10 struct timezone timeZone;
>> 11
>> 12 gettimeofday( &timeValue, &timeZone );
>> 13
>> 14 now = (double) (timeValue.tv_sec - boottime) +
>> (timeValue.tv_usec
>> / 1000000.0);
>> 15
>> 16 if (perfstat_netinterface_total( NULL, &n, sizeof(
>> perfstat_netinterface_total_t ), 1 ) == -1)
>> 17 val.f = 0.0;
>> 18 else
>> 19 {
>> 20 bytes_in = n.ibytes;
>> 21
>> 22 delta_t = now - last_time;
>> 23
>> 24 if ( delta_t )
>> 25 val.f = (double) (bytes_in - last_bytes_in) / delta_t;
>> 26 else
>> 27 val.f = 0.0;
>> 28
>> 29 last_bytes_in = bytes_in;
>> 30 }
>> 31
>> 32 last_time = now;
>> 33
>> 34 return( val );
>> 35 }
>>
>> In my opinion the overrun occurs in line #25 when "bytes_in <
>> last_bytes_in".
>> In my naivity I had assumed as both are of type u_longlong_t that an
>> integer overrun might never happen.
>>
>> So to solve the overrun a check for "bytes_in < last_bytes_in" must
>> be
>> introduced, something like:
>>
>> u_longlong_t d;
>> d = bytes_in - last_bytes_in;
>> if (d < 0) d += ULONG_MAX;
>>
>> and line #25 would essentially become
>> 25 val.f = (double) d / delta_t;
>>
>> Comments ?
>>
>> Regards,
>> Michael
>>
>> PS: David, the reason why you don't see it happen with pkts_in and
>> pkts_out is that probably no overrun so far has occurred but at some
>> point it will also happen.
>>
>> PPS: David, if this is a solution (I want some comments on that
>> before,
>> though) then I would be building new RPMs with the then hopefully
>> correct code.
>>
>> Andreas Schoenfeld wrote:
>>
>>
>>  Hi David and Martin,
>>
>> I suppose the network code is still the code I wrote, so there are
>>
>>  two
>>
>>
>>  problems I know of:
>> 1. yes there is a problem with owerflows
>> 2. the shown network traffic is the sum of all network interfaces
>> including local loopback devices (lo0...).
>>
>> Both Problems could lead to astonishing data transfer rate in
>>
>>  ganglia.
>>
>>
>>  Sorry I had promised to fix the problems, but there was to much
>>
>>  other
>>
>>
>>  work ...
>>
>> Best regards
>>  Andreas
>>
>>
>>
>>
>>  Date: Thu, 29 Mar 2007 08:21:38 -0700 (PDT)
>> From: Martin Knoblauch <[EMAIL PROTECTED]>
>> Subject: Re: [Ganglia-general] Help! I have a petabyte/s network
>> To: David Wong <[EMAIL PROTECTED]>,
>>
>>  [EMAIL PROTECTED],
>>
>>
>>
>>  [EMAIL PROTECTED]
>> Message-ID: <[EMAIL PROTECTED]>
>> Content-Type: text/plain; charset=iso-8859-1
>>
>> David,
>>
>>  good catch. I will have to look at it for a bit.
>>
>> Cheers
>> Martin
>> --- David Wong <[EMAIL PROTECTED]> wrote:
>>
>>
>>
>>
>>
>>  I don't write much code nowadays, so I'm going to need a lot of
>>
>>  help
>>
>>
>>
>>
>>
>>  with this.
>>
>> I dug through the ganglia code and I found this interesting
>>
>>  tidbit in
>>
>>
>>
>>
>>
>>  libmetrics/aix/metrics.c which may be indicative of the problem.
>>
>> There's an assignment from cur_ninfo.ibytes to
>>
>>  cur_net_stat.ibytes,
>>
>>
>>
>>
>>
>>  but
>> the types of the two variables are different.
>>
>> net_stat::ibytes is a double:
>>
>> struct net_stat{
>>  double ipackets;
>>  double opackets;
>>  double ibytes;
>>  double obytes;
>> } cur_net_stat;
>>
>> and we have *ninfo declared here:
>>
>> perfstat_netinterface_total_t ninfo[2],*last_ninfo, *cur_ninfo ;
>>
>> libperfstat.h has perfstat_netinterface_total_t::ibytes as
>> u_longlong_t.
>>
>> Does this code try to do what I think it is doing, i.e. assign
>>
>>  an
>>
>>
>>
>>
>>
>>  unsigned 64 bit integer to a signed 64bit integer?
>>
>> I'm willing to test the code if someone who's more adept at
>>
>>  coding
>>
>>
>>
>>
>>
>>  and
>> building will take on the challenge.
>>
>> It looks to me that the type mismatch will have to fixed in a
>>
>>  few
>>
>>
>>
>>
>>
>>  places, such as CALC_NETSTAT, and we'll have to add an unsigned
>>
>>  long
>>
>>
>>
>>
>>
>>  long to g_val_t too. Those are the ones I can see so far.
>>
>> David Wong
>> Senior Systems Engineer
>> Management Dynamics, Inc.
>> Phone: 201-804-6127
>> [EMAIL PROTECTED]
>>
>> -----Original Message-----
>> From: Martin Knoblauch [mailto:[EMAIL PROTECTED]
>> Sent: Wednesday, March 28, 2007 12:00 PM
>> To: David Wong; [EMAIL PROTECTED]
>> Subject: Re: [Ganglia-general] Help! I have a petabyte/s network
>>
>> David,
>>
>>  as far as I remember, the AIX metrics code had an
>> overflow/wrap-around
>> problem prior to 3.0.4. Maybe the fixes are not thorough enough.
>>
>>  The packets/sec are of course less affected.
>>
>> Cheers
>> Martin
>>
>>
>>
>>
>>
>> ------------------------------------------------------
>> Martin Knoblauch
>> email: k n o b i AT knobisoft DOT de
>> www: http://www.knobisoft.de
>>
>>
>>
>> -------------------------------------------------------------------------
>>
>> Take Surveys. Earn Cash. Influence the Future of IT
>> Join SourceForge.net's Techsay panel and you'll get the chance to
>> share your
>> opinions on IT & business topics through brief surveys-and earn cash
>> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
>>
>> _______________________________________________
>> Ganglia-developers mailing list
>> Ganglia-developers@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/ganglia-developers
>>
>>
>



Reply via email to