[Ganglia-developers] hsflowd for Windows + Ganglia webfrontend

2011-02-10 Thread Bernard Li
Hi all:

I just tested hsflowd 1.11 on Windows with the latest code in trunk
and run into some issues with the frontend where certain graphs like
load_, cpu_ and mem_ reports are not showing up.

The reason is probably due to the fact that a recent change in the
sflow integration code where unsupported metrics are not submitted.
For instance, the load_* metrics are expensive to calculate on Windows
and therefore are not supported.

I was wondering would it be better to report these metrics with the
value of 0 rather than dropping them, as they are core sets of metrics
used in the frontend reports -- thoughts?

Also, I think Nick was suggesting that perhaps we can try to use the
Processor Queue Length on Windows for the load metric.

Thanks,

Bernard

--
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] dmax for python and C gmond modules

2011-02-10 Thread Bernard Li
Hi Brad:

On Tue, Feb 1, 2011 at 4:51 PM, Brad Nicholes bnicho...@novell.com wrote:

 I would probably have to go back and figure out what I was thinking at the 
 time, but I vaguely recall that dmax was hardcoded for all of the standard 
 metric in the 3.0 version of gmond.  So at the time I was probably thinking 
 that exposing dmax just wasn't necessary.  That was probably a wrong 
 assumption.  Thinking back now, it would have made since for dmax to be 
 hardcoded in 3.0 because there was no way to add or remove metrics in that 
 version.  But in the 3.1 version where you can do it, dmax shouldn't have 
 been hardcoded and should have been exposed.

With gmond 3.0, you should be able to add metrics using gmetric.

Anyway, I've just filed a feature request to better keep track of this issue:

http://bugzilla.ganglia.info/cgi-bin/bugzilla/show_bug.cgi?id=297

Thanks,

Bernard

--
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] hsflowd for Windows + Ganglia webfrontend

2011-02-10 Thread Neil McKee
That's odd,  the CPU and MEM charts are working OK for me.  It's just the 
load-avg that is missing (which causes the host to be marked down).

For troubleshooting,  if you intercept the sFlow feed with sflowtool 
(http://www.inmon.com/technology/sflowTools.php) then you can see what the 
numbers look like.  The graphical freeware tool sFlowTrend is an option too.  
(http://www.inmon.com/products/sFlowTrend.php).

How frequently would we need to poll the Processor Queue Length to get a 
reasonable load-average estimate?  If we could get away with only doing it 
every few seconds then it might be worthwhile (perhaps a random delay would be 
appropriate) but I suspect you might have to poll much faster than that to get 
something worthwhile?

Neil


On Feb 10, 2011, at 2:50 PM, Bernard Li wrote:

 Hi all:
 
 I just tested hsflowd 1.11 on Windows with the latest code in trunk
 and run into some issues with the frontend where certain graphs like
 load_, cpu_ and mem_ reports are not showing up.
 
 The reason is probably due to the fact that a recent change in the
 sflow integration code where unsupported metrics are not submitted.
 For instance, the load_* metrics are expensive to calculate on Windows
 and therefore are not supported.
 
 I was wondering would it be better to report these metrics with the
 value of 0 rather than dropping them, as they are core sets of metrics
 used in the frontend reports -- thoughts?
 
 Also, I think Nick was suggesting that perhaps we can try to use the
 Processor Queue Length on Windows for the load metric.
 
 Thanks,
 
 Bernard


--
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] hsflowd for Windows + Ganglia webfrontend

2011-02-10 Thread Bernard Li
Hi Neil:

On Thu, Feb 10, 2011 at 5:05 PM, Neil McKee neil.mc...@inmon.com wrote:

 That's odd,  the CPU and MEM charts are working OK for me.  It's just the 
 load-avg that is missing (which causes the host to be marked down).

You'll likely need to nuke your old rrd files to see the error.

For Windows, mem_shared, mem_buffers and cpu_nice are missing, and
they are needed by the respective mem_ and cpu_ reports.  I guess we
could fix to frontend code to only include them if they exist.  Do
these metrics exist in Windows-land?

 How frequently would we need to poll the Processor Queue Length to get a 
 reasonable load-average estimate?  If we could get away with only doing it 
 every few seconds then it might be worthwhile (perhaps a random delay would 
 be appropriate) but I suspect you might have to poll much faster than that to 
 get something worthwhile?

I will defer to Nick for answering that question :-)

Cheers,

Bernard

--
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] hsflowd for Windows + Ganglia webfrontend

2011-02-10 Thread Bernard Li
P.S. It looks like GMOND_STARTED=0 for the Windows host -- should I file a bug?

Thanks,

Bernard

On Thu, Feb 10, 2011 at 5:21 PM, Bernard Li bern...@vanhpc.org wrote:
 Hi Neil:

 On Thu, Feb 10, 2011 at 5:05 PM, Neil McKee neil.mc...@inmon.com wrote:

 That's odd,  the CPU and MEM charts are working OK for me.  It's just the 
 load-avg that is missing (which causes the host to be marked down).

 You'll likely need to nuke your old rrd files to see the error.

 For Windows, mem_shared, mem_buffers and cpu_nice are missing, and
 they are needed by the respective mem_ and cpu_ reports.  I guess we
 could fix to frontend code to only include them if they exist.  Do
 these metrics exist in Windows-land?

 How frequently would we need to poll the Processor Queue Length to get a 
 reasonable load-average estimate?  If we could get away with only doing it 
 every few seconds then it might be worthwhile (perhaps a random delay would 
 be appropriate) but I suspect you might have to poll much faster than that 
 to get something worthwhile?

 I will defer to Nick for answering that question :-)

 Cheers,

 Bernard


--
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] hsflowd for Windows + Ganglia webfrontend

2011-02-10 Thread Neil McKee
OK,  I cleared out everything under /var/lib/ganglia/rrds/.   Now I see the 
problem.   In the short term I guess gmond/sflow.c could just submit zeros for 
the missing metrics instead of leaving them out altogether(?)  However in the 
long term it would seem more elegant to fix this in the place where the RRD is 
constructed.   After all,  for most of these metrics posting a value of 0 is 
just plain wrong.   When the value is really dunno  or not applicable or 
undefined then leaving it out seems more correct.

Neil

On Feb 10, 2011, at 5:21 PM, Bernard Li wrote:

 Hi Neil:
 
 On Thu, Feb 10, 2011 at 5:05 PM, Neil McKee neil.mc...@inmon.com wrote:
 
 That's odd,  the CPU and MEM charts are working OK for me.  It's just the 
 load-avg that is missing (which causes the host to be marked down).
 
 You'll likely need to nuke your old rrd files to see the error.
 
 For Windows, mem_shared, mem_buffers and cpu_nice are missing, and
 they are needed by the respective mem_ and cpu_ reports.  I guess we
 could fix to frontend code to only include them if they exist.  Do
 these metrics exist in Windows-land?
 
 How frequently would we need to poll the Processor Queue Length to get a 
 reasonable load-average estimate?  If we could get away with only doing it 
 every few seconds then it might be worthwhile (perhaps a random delay would 
 be appropriate) but I suspect you might have to poll much faster than that 
 to get something worthwhile?
 
 I will defer to Nick for answering that question :-)
 
 Cheers,
 
 Bernard


--
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] hsflowd for Windows + Ganglia webfrontend

2011-02-10 Thread Neil McKee
I didn't realize that gmond_started was meant to go with every 
heatbeat-message.  Below is a patch.

Neil

[root@ganglia gmond]# svn diff sflow.c
Index: sflow.c
===
--- sflow.c (revision 2471)
+++ sflow.c (working copy)
@@ -398,7 +398,7 @@
 #define SFLOW_OK_COUNTER64(field) (field != (uint64_t)-1)
 
   /* always send a heartbeat */
-  process_sflow_uint32(hostdata, SFLOW_M_heartbeat, 0);
+  process_sflow_uint32(hostdata, SFLOW_M_heartbeat, (apr_time_as_msec(now) - 
x.uptime_mS) / 1000);
   
   if(offset_HID) {
 /* sumbit the system fields that we already extracted above */




On Feb 10, 2011, at 5:38 PM, Bernard Li wrote:

 P.S. It looks like GMOND_STARTED=0 for the Windows host -- should I file a 
 bug?
 
 Thanks,
 
 Bernard
 
 On Thu, Feb 10, 2011 at 5:21 PM, Bernard Li bern...@vanhpc.org wrote:
 Hi Neil:
 
 On Thu, Feb 10, 2011 at 5:05 PM, Neil McKee neil.mc...@inmon.com wrote:
 
 That's odd,  the CPU and MEM charts are working OK for me.  It's just the 
 load-avg that is missing (which causes the host to be marked down).
 
 You'll likely need to nuke your old rrd files to see the error.
 
 For Windows, mem_shared, mem_buffers and cpu_nice are missing, and
 they are needed by the respective mem_ and cpu_ reports.  I guess we
 could fix to frontend code to only include them if they exist.  Do
 these metrics exist in Windows-land?
 
 How frequently would we need to poll the Processor Queue Length to get a 
 reasonable load-average estimate?  If we could get away with only doing it 
 every few seconds then it might be worthwhile (perhaps a random delay would 
 be appropriate) but I suspect you might have to poll much faster than that 
 to get something worthwhile?
 
 I will defer to Nick for answering that question :-)
 
 Cheers,
 
 Bernard
 


--
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers