Re: [Ganglia-general] two identical hosts, one is having trouble with gmond

Michael Bravo Wed, 27 Apr 2011 09:21:17 -0700

More precisely, some metrics seem to be collected, and periodically
sent, such as


       metric 'disk_free' being collected now
Counting device /dev/root (6.21 %)
For all disks: 142.835 GB total, 133.963 GB free for users.
        metric 'disk_free' has value_threshold 1.000000
        metric 'part_max_used' being collected now
Counting device /dev/root (6.21 %)
For all disks: 142.835 GB total, 133.963 GB free for users.
        metric 'part_max_used' has value_threshold 1.000000


and then (I think around time_threshold expiration)

        sent message 'disk_free' of length 52 with 0 errors
        sent message 'part_max_used' of length 52 with 0 errors

also, on startup all of these metrics seem to be prepared correctly:

       sending metadata for metric: disk_free
        sent message 'disk_free' of length 52 with 0 errors
        sending metadata for metric: part_max_used
        sent message 'part_max_used' of length 52 with 0 error

etc and so on

but none of these metrics appear in the node report at the web
frontend, as I listed in original message

where does the "Local disk: unknown" part coming from then?

what is the most baffling, is that this problem host is completely
identical to the one next to it, which has zero problems

On Wed, Apr 27, 2011 at 7:30 PM, Michael Bravo <mike.br...@gmail.com> wrote:
> I did try that, in non-daemonized mode, however there weren't any
> evident errors popping up (and there's a lot of information coming up
> that way), so perhaps I need an idea what to look for.
>
> On Wed, Apr 27, 2011 at 7:24 PM, Ron Cavallo <ron_cava...@s5a.com> wrote:
>> Have you tried stating up gmond on the effected server with debug set to
>> 10 in the gmond.conf? This may show some of the collection problems its
>> having more specifically....
>>
>> -RC
>>
>>
>> Ron Cavallo
>> Sr. Director, Infrastructure
>> Saks Fifth Avenue / Saks Direct
>> 12 East 49th Street
>> New York, NY 10017
>> 212-451-3807 (O)
>> 212-940-5079 (fax)
>> 646-315-0119(C)
>> www.saks.com
>>
>>
>> -----Original Message-----
>> From: Michael Bravo [mailto:mike.br...@gmail.com]
>> Sent: Wednesday, April 27, 2011 11:14 AM
>> To: ganglia-general
>> Subject: [Ganglia-general] two identical hosts,one is having trouble
>> with gmond
>>
>> Hello,
>>
>> here is a strange occurence. I have two (infact, more than two, but
>> let's consider just a pair) identical servers running identical setups
>> - identical OS, identical gmond with identical config files, identical
>> disks, identical everything. However, one of those servers is perfectly
>> well, and another one has trouble reporting default metrics.
>>
>> Here's what the "normal" one shows in node view:
>>
>> xx.xx.xx.172
>>
>> Location: Unknown
>> Cluster local time Wed Apr 27 19:05:32 2011 Last heartbeat received 5
>> seconds ago.
>> Uptime 9 days, 9:22:38
>> Load:   0.00    0.00    0.00
>> 1m      5m      15m
>>
>> CPU Utilization:        0.1     0.2     99.7
>> user    sys     idle
>> Hardware
>> CPUs: 4 x 1.95 GHz
>> Memory (RAM): 7.80 GB
>> Local Disk: Using 16.532 of 142.835 GB
>> Most Full Disk Partition: 11.6% used.   Software
>> OS: Linux 2.6.18-238.9.1.el5 (x86_64)
>> Booted: April 18, 2011, 9:42 am
>> Uptime: 9 days, 9:22:38
>> Swap: Using 0.0 of 12001.6 MB swap.
>>
>>
>> and here's what the "problem one" shows:
>>
>> xx.xx.xx.171
>>
>> Location: Unknown
>> Cluster local time Wed Apr 27 19:07:32 2011 Last heartbeat received 10
>> seconds ago.
>> Uptime 9 days, 9:20:01
>> Load:   0.00    0.00    0.00
>> 1m      5m      15m
>>
>> CPU Utilization:        0.1     0.2     99.7
>> user    sys     idle
>> Hardware
>> CPUs: 4 x 1.95 GHz
>> Memory (RAM): 7.80 GB
>> Local Disk: Unknown
>> Most Full Disk Partition: 6.2% used.    Software
>> OS: Linux 2.6.18-238.9.1.el5 (x86_64)
>> Booted: April 18, 2011, 9:47 am
>> Uptime: 9 days, 9:20:01
>> Swap: Using 12001.6 of 12001.6 MB swap.
>>
>>
>>
>> both are running gmond 3.1.7 and talk to a third host which also runs
>> gmond 3.1.7 (which is getting polled by the web frontend host with
>> gmetad 3.1.7)
>>
>> at a glance, there's something confusing gmond on the problem server, so
>> it mismatches disk partitions, or something.
>>
>> as a result, the problem node reports not all of the default metrics,
>> and those it does are somewhat off-kilter, as you can see (unknown local
>> disk?)
>>
>> Any idea what might be going wrong and/or how to pinpoint the problem?
>>
>> --
>> Michael Bravo
>>
>> ------------------------------------------------------------------------
>> ------
>> WhatsUp Gold - Download Free Network Management Software The most
>> intuitive, comprehensive, and cost-effective network management toolset
>> available today.  Delivers lowest initial acquisition cost and overall
>> TCO of any competing solution.
>> http://p.sf.net/sfu/whatsupgold-sd
>> _______________________________________________
>> Ganglia-general mailing list
>> Ganglia-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/ganglia-general
>>
>

------------------------------------------------------------------------------
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network 
management toolset available today.  Delivers lowest initial 
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
_______________________________________________
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Re: [Ganglia-general] two identical hosts, one is having trouble with gmond

Reply via email to