Hi Michael:

Filing a bug and attaching the patch would be nice.  Or you could just
post it here.

Thanks,

Bernard

On 4/2/07, Michael Perzl <[EMAIL PROTECTED]> wrote:
Hi Bernard,

I now have a consolidated SPEC file (I think it is ugly :-) ), so how do
you want me to send it to you (I guess not posting to the mailing list
:-) ) ?

Regards,
Michael

Bernard Li wrote:
> Hi Michael:
>
> Thanks for looking into this.  Yes, I am aware the spec file may get
> bloated but I think ultimately this will be better for one (or more
> person) to manage (as opposed to managing multiple files).
>
> What do other devs/users think?
>
> BTW, I'm cc: Marcus to see if he has any specific insights on this :-)
>
> Cheers,
>
> Bernard
>
> On 4/2/07, Michael Perzl <[EMAIL PROTECTED]> wrote:
>> Hi Bernard,
>>
>> I took a closer look and though I think it could be done it might be
>> very ugly for the following reasons:
>>
>> - AIX is still using RPM version 3.0.5 and I am not aware of any
>> intentions to upgrade anytime soon
>> - Like I said I think it could be consolidated, however, that would
>> probably require tons of "%ifarch ppc" and "%ifnarch ppc" defines which
>> would make the SPEC file rather hard to read
>> - AIX RPM is installing all the software under the /opt/freeware
>> directory hierarchy (to better distinguish from the AIX base filesets),
>> therefore lots of different file locations in the SPEC file would have
>> to "ifdef'ed" as mentioned above.
>> - All the Linux specific stuff like "chkconfig" would have to be
>> "%ifdef'ed" appropriately.
>>
>> A quick solution would probably to just rename the committed
>> ganglia.aix.spec to maybe ganglia.spec.aix so your rpmbuild command
>> doesn't get mixed up.
>>
>> I'll give it a try and see how far I get along but the end result might
>> be ugly :-)
>>
>> Regards,
>> Michael
>>
>> Bernard Li wrote:
>> > Hi Michael:
>> >
>> > Any chance you can also work on merging the ganglia.aix.spec file back
>> > to the mainstream .spec file?  I'm about to change configure.in to
>> > only include the specific spec file depending on the OS, but I think
>> > the better solution is just to merge the two.  Right now I cannot just
>> > generate the distribution tarball and run 'rpmbuild -ta' since there
>> > are 2 spec files.
>> >
>> > Thanks in advance,
>> >
>> > Bernard
>> >
>> > On 4/2/07, Michael Perzl <[EMAIL PROTECTED]> wrote:
>> >>
>> >>  Hi Martin,
>> >>
>> >>  if possible I would like to somehow take my version (after some
>> >> reviewing)
>> >> :-)    , as it contains all the new POWER5 stuff already.
>> >>
>> >>  My understanding is - as it would require some changes to protocol.x
>> >> - that
>> >> my changes won't have a chance to get into the core Ganglia source
>> code
>> >> until version 3.1 comes along.
>> >>
>> >>  This code and everything else (RPMs) can be found on my website
>> >> http://www.perzl.org/ganglia/.
>> >>
>> >>  This stuff is actually in use at quite many customer sites already
>> >> (runs on
>> >> AIX 4.3.3, 5.1, 5.2 and 5.3) so I would like to keep that
>> >> POWER5-stuff in if
>> >> possible. Actually, an AIX gmond implementation without the
>> POWER5-stuff
>> >> based on my implementation could be done very easy (just stripping
>> >> off the
>> >> POWER5-addons).
>> >>
>> >>  Regards,
>> >>  Michael
>> >>
>> >>  Martin Knoblauch wrote:
>> >>  Michael, Andreas,
>> >>
>> >>  any chance that you could consolidate the two versions of the AIX
>> >> metrics that seem to be around? Seem you are the ones who have worked
>> >> most recently on the AIX implementation.
>> >>
>> >> Cheers
>> >> Martin
>> >>
>> >> --- Michael Perzl <[EMAIL PROTECTED]> wrote:
>> >>
>> >>
>> >>
>> >>  Andreas,
>> >>
>> >> thank you for taking the blame but you are off the hook here. ;-)
>> >>
>> >> If I understood David correctly, he is using my AIX Ganglia RPM
>> >> packages
>> >> with POWER5 extensions. Here most if not all implementation of how
>> >> the
>> >> metrics are collected under AIX have been changed. Everything is
>> >> documented on my homepage (http://www.perzl.org/ganglia/) though.
>> >> So everything what goes wrong here is entiremy my fault :-[
>> >>
>> >> After some investigating and some discussions with Nigel I have come
>> >> to
>> >> terms with the following facts regarding the bytes_in/bytes_out
>> >> problem:
>> >> - libperfstat (the library on AIX which obtains all the system
>> >> performance data) uses u_longlong_t data types (these are definitely
>> >> 64-bit large).
>> >> - The AIX kernel internally, though, may probably not be using 64-bit
>> >>
>> >> data types - more realistic is probably unsigned 32-bit - in order
>> >> not
>> >> to break compatibility (my personal opinion)
>> >> - The consequence now is that integer overrun may occur much easier
>> >> with
>> >> 32-bit data types than with 64-bit data types (we all probably don't
>> >> live long enough to see that happen).
>> >>
>> >> Please take a look at my implementation of the bytes_in metric (the
>> >> bytes_out implementation is accordingly):
>> >>
>> >> 01 g_val_t
>> >> 02 bytes_in_func( void )
>> >> 03 {
>> >> 04 g_val_t val;
>> >> 05 perfstat_netinterface_total_t n;
>> >> 06 static u_longlong_t last_bytes_in = 0, bytes_in;
>> >> 07 static double last_time = 0.0;
>> >> 08 double now, delta_t;
>> >> 09 struct timeval timeValue;
>> >> 10 struct timezone timeZone;
>> >> 11
>> >> 12 gettimeofday( &timeValue, &timeZone );
>> >> 13
>> >> 14 now = (double) (timeValue.tv_sec - boottime) +
>> >> (timeValue.tv_usec
>> >> / 1000000.0);
>> >> 15
>> >> 16 if (perfstat_netinterface_total( NULL, &n, sizeof(
>> >> perfstat_netinterface_total_t ), 1 ) == -1)
>> >> 17 val.f = 0.0;
>> >> 18 else
>> >> 19 {
>> >> 20 bytes_in = n.ibytes;
>> >> 21
>> >> 22 delta_t = now - last_time;
>> >> 23
>> >> 24 if ( delta_t )
>> >> 25 val.f = (double) (bytes_in - last_bytes_in) / delta_t;
>> >> 26 else
>> >> 27 val.f = 0.0;
>> >> 28
>> >> 29 last_bytes_in = bytes_in;
>> >> 30 }
>> >> 31
>> >> 32 last_time = now;
>> >> 33
>> >> 34 return( val );
>> >> 35 }
>> >>
>> >> In my opinion the overrun occurs in line #25 when "bytes_in <
>> >> last_bytes_in".
>> >> In my naivity I had assumed as both are of type u_longlong_t that an
>> >> integer overrun might never happen.
>> >>
>> >> So to solve the overrun a check for "bytes_in < last_bytes_in" must
>> >> be
>> >> introduced, something like:
>> >>
>> >> u_longlong_t d;
>> >> d = bytes_in - last_bytes_in;
>> >> if (d < 0) d += ULONG_MAX;
>> >>
>> >> and line #25 would essentially become
>> >> 25 val.f = (double) d / delta_t;
>> >>
>> >> Comments ?
>> >>
>> >> Regards,
>> >> Michael
>> >>
>> >> PS: David, the reason why you don't see it happen with pkts_in and
>> >> pkts_out is that probably no overrun so far has occurred but at some
>> >> point it will also happen.
>> >>
>> >> PPS: David, if this is a solution (I want some comments on that
>> >> before,
>> >> though) then I would be building new RPMs with the then hopefully
>> >> correct code.
>> >>
>> >> Andreas Schoenfeld wrote:
>> >>
>> >>
>> >>  Hi David and Martin,
>> >>
>> >> I suppose the network code is still the code I wrote, so there are
>> >>
>> >>  two
>> >>
>> >>
>> >>  problems I know of:
>> >> 1. yes there is a problem with owerflows
>> >> 2. the shown network traffic is the sum of all network interfaces
>> >> including local loopback devices (lo0...).
>> >>
>> >> Both Problems could lead to astonishing data transfer rate in
>> >>
>> >>  ganglia.
>> >>
>> >>
>> >>  Sorry I had promised to fix the problems, but there was to much
>> >>
>> >>  other
>> >>
>> >>
>> >>  work ...
>> >>
>> >> Best regards
>> >>  Andreas
>> >>
>> >>
>> >>
>> >>
>> >>  Date: Thu, 29 Mar 2007 08:21:38 -0700 (PDT)
>> >> From: Martin Knoblauch <[EMAIL PROTECTED]>
>> >> Subject: Re: [Ganglia-general] Help! I have a petabyte/s network
>> >> To: David Wong <[EMAIL PROTECTED]>,
>> >>
>> >>  [EMAIL PROTECTED],
>> >>
>> >>
>> >>
>> >>  [EMAIL PROTECTED]
>> >> Message-ID: <[EMAIL PROTECTED]>
>> >> Content-Type: text/plain; charset=iso-8859-1
>> >>
>> >> David,
>> >>
>> >>  good catch. I will have to look at it for a bit.
>> >>
>> >> Cheers
>> >> Martin
>> >> --- David Wong <[EMAIL PROTECTED]> wrote:
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>  I don't write much code nowadays, so I'm going to need a lot of
>> >>
>> >>  help
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>  with this.
>> >>
>> >> I dug through the ganglia code and I found this interesting
>> >>
>> >>  tidbit in
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>  libmetrics/aix/metrics.c which may be indicative of the problem.
>> >>
>> >> There's an assignment from cur_ninfo.ibytes to
>> >>
>> >>  cur_net_stat.ibytes,
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>  but
>> >> the types of the two variables are different.
>> >>
>> >> net_stat::ibytes is a double:
>> >>
>> >> struct net_stat{
>> >>  double ipackets;
>> >>  double opackets;
>> >>  double ibytes;
>> >>  double obytes;
>> >> } cur_net_stat;
>> >>
>> >> and we have *ninfo declared here:
>> >>
>> >> perfstat_netinterface_total_t ninfo[2],*last_ninfo, *cur_ninfo ;
>> >>
>> >> libperfstat.h has perfstat_netinterface_total_t::ibytes as
>> >> u_longlong_t.
>> >>
>> >> Does this code try to do what I think it is doing, i.e. assign
>> >>
>> >>  an
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>  unsigned 64 bit integer to a signed 64bit integer?
>> >>
>> >> I'm willing to test the code if someone who's more adept at
>> >>
>> >>  coding
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>  and
>> >> building will take on the challenge.
>> >>
>> >> It looks to me that the type mismatch will have to fixed in a
>> >>
>> >>  few
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>  places, such as CALC_NETSTAT, and we'll have to add an unsigned
>> >>
>> >>  long
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>  long to g_val_t too. Those are the ones I can see so far.
>> >>
>> >> David Wong
>> >> Senior Systems Engineer
>> >> Management Dynamics, Inc.
>> >> Phone: 201-804-6127
>> >> [EMAIL PROTECTED]
>> >>
>> >> -----Original Message-----
>> >> From: Martin Knoblauch [mailto:[EMAIL PROTECTED]
>> >> Sent: Wednesday, March 28, 2007 12:00 PM
>> >> To: David Wong; [EMAIL PROTECTED]
>> >> Subject: Re: [Ganglia-general] Help! I have a petabyte/s network
>> >>
>> >> David,
>> >>
>> >>  as far as I remember, the AIX metrics code had an
>> >> overflow/wrap-around
>> >> problem prior to 3.0.4. Maybe the fixes are not thorough enough.
>> >>
>> >>  The packets/sec are of course less affected.
>> >>
>> >> Cheers
>> >> Martin
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> ------------------------------------------------------
>> >> Martin Knoblauch
>> >> email: k n o b i AT knobisoft DOT de
>> >> www: http://www.knobisoft.de
>> >>
>> >>
>> >>
>> >>
>> -------------------------------------------------------------------------
>>
>> >>
>> >> Take Surveys. Earn Cash. Influence the Future of IT
>> >> Join SourceForge.net's Techsay panel and you'll get the chance to
>> >> share your
>> >> opinions on IT & business topics through brief surveys-and earn cash
>> >>
>> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
>>
>> >>
>> >> _______________________________________________
>> >> Ganglia-developers mailing list
>> >> Ganglia-developers@lists.sourceforge.net
>> >> https://lists.sourceforge.net/lists/listinfo/ganglia-developers
>> >>
>> >>
>> >
>>
>


Reply via email to