Hello,

I apologize to bother you all with this issue, but it seems to have 
stumped those in the ganglia-general mailing list, so I'm hoping the 
developers could help me out.

I'm using Ganglia v3.0.3 on openSUSE 10.3 which came pre-configured on a 
Microway cluster.  It's slightly modified to add their "Microway 
Control" stuff integrated which is basically a button from the Ganglia 
homepage which leads to their TriCom/NodeWatch thermal monitoring web 
page.  As such, I don't think that the issue I'm having has anything to 
do with their customization, but I wanted this to be known beforehand in 
case the possibility exists.

The issue I'm facing is with incorrect boottime and uptime for my master 
and all slave nodes.  The discussion I had on ganglia-general can be 
found here:
 
http://www.mail-archive.com/ganglia-gene...@lists.sourceforge.net/msg04814.html

To summarize, from the Ganglia homepage for my cluster, if I click on 
any of the nodes (master or slaves), the boottime is reported as:
   Wed, 31 Dec 1969 16:00:00 -0800
and the uptime is calculated based on this boottime (currently reads: 
14442 days, 13:50:04).

I have a total of separate clusters running the same version of Ganglia 
(we'll call them cluster1, cluster2, and cluster3; master1, master2 and 
master3 respectively).  cluster2 and cluster3 exhibit the same issue, 
however, cluster1 does not.  The only difference (software-wise) between 
the three is cluster1 is running SUSE (not openSUSE) 10.1.  The only 
difference (hardware-wise) between the three clusters (besides 
CPU/RAM/HDD and the number of slave nodes) is cluster1 uses an older 
type of TriCom/NodeWatch hardware which I don't believe would affect 
this.  cluster2 and cluster3 also have an InfiniBand network in addtion 
to their Ethernet network, and cluster1 simply has multiple Ethernet 
networks.  Also, on cluster3, up until yesterday, the TriCom/NodeWatch 
stuff was cabled incorrectly, rendering their NodeWatch web page to 
report no data, but the Ganglia homepage for cluster3 was able to 
produce proper data for each node (besides the issue we're discussing 
here); e.g. node load graphs, etc. all report good data.

/proc/stat reports the correct btime value.  gmond is running as 
"nobody" which is able to get data from /proc/stat.  Bernard, the 
gentleman which was helping me in ganglia-general, is on the right track 
in suspecting that it's not getting the btime value from /proc/stat, but 
we're not sure why.  Any assistance in this matter is greatly 
appreciated.  Thanks in advance!

- Ken


------------------------------------------------------------------------------
Enter the BlackBerry Developer Challenge  
This is your chance to win up to $100,000 in prizes! For a limited time, 
vendors submitting new applications to BlackBerry App World(TM) will have
the opportunity to enter the BlackBerry Developer Challenge. See full prize  
details at: http://p.sf.net/sfu/Challenge
_______________________________________________
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Reply via email to