Re: [Ganglia-developers] hsflowd for Windows + Ganglia webfrontend

2011-02-11 Thread Bernard Li
Hi Neil:

Argh, Gmail always mess up inline patches...  anyways, I've fixed it
manually and checked it in:

https://sourceforge.net/apps/trac/ganglia/changeset/2473

Do you want commit rights to our SVN repo so that you could fix this
yourself in the future? :)

Thanks,

Bernard

On Thu, Feb 10, 2011 at 9:34 PM, Neil McKee neil.mc...@inmon.com wrote:
 I didn't realize that gmond_started was meant to go with every 
 heatbeat-message.  Below is a patch.

 Neil

 [root@ganglia gmond]# svn diff sflow.c
 Index: sflow.c
 ===
 --- sflow.c     (revision 2471)
 +++ sflow.c     (working copy)
 @@ -398,7 +398,7 @@
  #define SFLOW_OK_COUNTER64(field) (field != (uint64_t)-1)

   /* always send a heartbeat */
 -  process_sflow_uint32(hostdata, SFLOW_M_heartbeat, 0);
 +  process_sflow_uint32(hostdata, SFLOW_M_heartbeat, (apr_time_as_msec(now) - 
 x.uptime_mS) / 1000);

   if(offset_HID) {
     /* sumbit the system fields that we already extracted above */




 On Feb 10, 2011, at 5:38 PM, Bernard Li wrote:

 P.S. It looks like GMOND_STARTED=0 for the Windows host -- should I file a 
 bug?

 Thanks,

 Bernard

 On Thu, Feb 10, 2011 at 5:21 PM, Bernard Li bern...@vanhpc.org wrote:
 Hi Neil:

 On Thu, Feb 10, 2011 at 5:05 PM, Neil McKee neil.mc...@inmon.com wrote:

 That's odd,  the CPU and MEM charts are working OK for me.  It's just the 
 load-avg that is missing (which causes the host to be marked down).

 You'll likely need to nuke your old rrd files to see the error.

 For Windows, mem_shared, mem_buffers and cpu_nice are missing, and
 they are needed by the respective mem_ and cpu_ reports.  I guess we
 could fix to frontend code to only include them if they exist.  Do
 these metrics exist in Windows-land?

 How frequently would we need to poll the Processor Queue Length to get a 
 reasonable load-average estimate?  If we could get away with only doing it 
 every few seconds then it might be worthwhile (perhaps a random delay 
 would be appropriate) but I suspect you might have to poll much faster 
 than that to get something worthwhile?

 I will defer to Nick for answering that question :-)

 Cheers,

 Bernard




--
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] hsflowd for Windows + Ganglia webfrontend

2011-02-11 Thread Neil McKee
It might be helpful to have commit privileges for minor bugfixes,  yes.   
Although I wouldn't use it for bigger changes that require consensus.

I'm working on a bigger patch that will add an sflow { } configuration block so 
we can have options to control the following:

- the sflow udp port
- whether to submit 0s rather than leave undefined metrics out altogether (as a 
workaround for that issue with the CPU/mem charts for Windows)
- whether to ignore the hostname that comes with with host-sflow so that gmond 
will do it's own reverse lookup (to get the FQDN instead)
- whether to submit the extra metrics that host-sflow sends for physical servers
- whether to submit the extra metrics that host-sflow sends for virtual servers

Should have it ready for review later this afternoon.

Once the sFlow HTTP and memcached counter-blocks are finalized then we could 
quickly add support for those metrics too.  More input on those would be very 
welcome:
http://blog.sflow.com/search/label/HTTP
http://blog.sflow.com/search/label/Memcache

Regards,
Neil





On Feb 11, 2011, at 12:56 PM, Bernard Li wrote:

 Hi Neil:
 
 Argh, Gmail always mess up inline patches...  anyways, I've fixed it
 manually and checked it in:
 
 https://sourceforge.net/apps/trac/ganglia/changeset/2473
 
 Do you want commit rights to our SVN repo so that you could fix this
 yourself in the future? :)
 
 Thanks,
 
 Bernard
 
 On Thu, Feb 10, 2011 at 9:34 PM, Neil McKee neil.mc...@inmon.com wrote:
 I didn't realize that gmond_started was meant to go with every 
 heatbeat-message.  Below is a patch.
 
 Neil
 
 [root@ganglia gmond]# svn diff sflow.c
 Index: sflow.c
 ===
 --- sflow.c (revision 2471)
 +++ sflow.c (working copy)
 @@ -398,7 +398,7 @@
  #define SFLOW_OK_COUNTER64(field) (field != (uint64_t)-1)
 
   /* always send a heartbeat */
 -  process_sflow_uint32(hostdata, SFLOW_M_heartbeat, 0);
 +  process_sflow_uint32(hostdata, SFLOW_M_heartbeat, (apr_time_as_msec(now) 
 - x.uptime_mS) / 1000);
 
   if(offset_HID) {
 /* sumbit the system fields that we already extracted above */
 
 
 
 
 On Feb 10, 2011, at 5:38 PM, Bernard Li wrote:
 
 P.S. It looks like GMOND_STARTED=0 for the Windows host -- should I file a 
 bug?
 
 Thanks,
 
 Bernard
 
 On Thu, Feb 10, 2011 at 5:21 PM, Bernard Li bern...@vanhpc.org wrote:
 Hi Neil:
 
 On Thu, Feb 10, 2011 at 5:05 PM, Neil McKee neil.mc...@inmon.com wrote:
 
 That's odd,  the CPU and MEM charts are working OK for me.  It's just the 
 load-avg that is missing (which causes the host to be marked down).
 
 You'll likely need to nuke your old rrd files to see the error.
 
 For Windows, mem_shared, mem_buffers and cpu_nice are missing, and
 they are needed by the respective mem_ and cpu_ reports.  I guess we
 could fix to frontend code to only include them if they exist.  Do
 these metrics exist in Windows-land?
 
 How frequently would we need to poll the Processor Queue Length to get a 
 reasonable load-average estimate?  If we could get away with only doing 
 it every few seconds then it might be worthwhile (perhaps a random delay 
 would be appropriate) but I suspect you might have to poll much faster 
 than that to get something worthwhile?
 
 I will defer to Nick for answering that question :-)
 
 Cheers,
 
 Bernard
 
 
 


--
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


[Ganglia-developers] Vertical label in metric.php

2011-02-11 Thread Bernard Li
Hey Jesse:

Just trying to understand this commit:

https://sourceforge.net/apps/trac/ganglia/changeset/2356/trunk/monitor-core/web/graph.d/metric.php

Why are we setting the vertical label to the metricname?

Thanks,

Bernard

--
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] Vertical label in metric.php

2011-02-11 Thread Jesse Becker
As I recall, it forces the graphs within the images to align between
images that include labels, and those that do not.  One of Ganglia's
strengths is allowing for easy data/time correlations.  This is easy
only if the graphs actually have the same timescale (generally true),
and line up appropriately (which this patch tries to help with).

At least, that how I remember it.


On Fri, Feb 11, 2011 at 07:22:34PM -0500, Bernard Li wrote:
Hey Jesse:

Just trying to understand this commit:

https://sourceforge.net/apps/trac/ganglia/changeset/2356/trunk/monitor-core/web/graph.d/metric.php

Why are we setting the vertical label to the metricname?

Thanks,

Bernard

--
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

-- 
Jesse Becker
NHGRI Linux support (Digicon Contractor)

--
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] hsflowd for Windows + Ganglia webfrontend

2011-02-11 Thread Neil McKee
Here is the patch I was referring to.  It allows you to put something like this 
in your gmond.conf:

sflow {
  null_int = 0
  null_float = 0.0
}

and then if a fields like cpu_nice is missing (as in the Windows hsflowd) we'll 
submit 0.0 instead of leaving it out.   This is a work-around for the problem 
where the RRD does not even appear when cpu_nice is missing.

You can also add another setting accept_all_physical = yes,  like this:

sflow {
  null_int = 0
  null_float = 0
  accept_all_physical = yes
}

and now the extra metrics that are defined in host-sflow but not in libmetrics 
are accepted too.  These include some useful ones like the number of context 
switches,  the number of pages swapped in/out, network errors and drops, more 
info on disk reads and writes,  and so on.  The UI seems to do a good job of 
just adding these RRDs to the page (so perhaps it would be even safe to make 
yes the default here?)

I'm still skipping over the VM fields,  and don't have the option to ignore the 
sFlow hostname field yet,  but placeholder boolean options accept_all_virtual 
and accept_hostname are defined.  There is also udp_port in case you want 
to designate a non-standard port as the sFlow port (though it still has to 
appear in a udp_receive_channel section elsewhere).

I didn't edit gmond/conf.pod yet.  I figured that could happen once there is 
consensus on these options.

Thoughts?

Regards,
Neil



sflow_20110211.patch
Description: Binary data



On Feb 11, 2011, at 2:26 PM, Neil McKee wrote:

 It might be helpful to have commit privileges for minor bugfixes,  yes.   
 Although I wouldn't use it for bigger changes that require consensus.
 
 I'm working on a bigger patch that will add an sflow { } configuration block 
 so we can have options to control the following:
 
 - the sflow udp port
 - whether to submit 0s rather than leave undefined metrics out altogether (as 
 a workaround for that issue with the CPU/mem charts for Windows)
 - whether to ignore the hostname that comes with with host-sflow so that 
 gmond will do it's own reverse lookup (to get the FQDN instead)
 - whether to submit the extra metrics that host-sflow sends for physical 
 servers
 - whether to submit the extra metrics that host-sflow sends for virtual 
 servers
 
 Should have it ready for review later this afternoon.
 
 Once the sFlow HTTP and memcached counter-blocks are finalized then we could 
 quickly add support for those metrics too.  More input on those would be very 
 welcome:
 http://blog.sflow.com/search/label/HTTP
 http://blog.sflow.com/search/label/Memcache
 
 Regards,
 Neil
 
 
 
 
 
 On Feb 11, 2011, at 12:56 PM, Bernard Li wrote:
 
 Hi Neil:
 
 Argh, Gmail always mess up inline patches...  anyways, I've fixed it
 manually and checked it in:
 
 https://sourceforge.net/apps/trac/ganglia/changeset/2473
 
 Do you want commit rights to our SVN repo so that you could fix this
 yourself in the future? :)
 
 Thanks,
 
 Bernard
 
 On Thu, Feb 10, 2011 at 9:34 PM, Neil McKee neil.mc...@inmon.com wrote:
 I didn't realize that gmond_started was meant to go with every 
 heatbeat-message.  Below is a patch.
 
 Neil
 
 [root@ganglia gmond]# svn diff sflow.c
 Index: sflow.c
 ===
 --- sflow.c (revision 2471)
 +++ sflow.c (working copy)
 @@ -398,7 +398,7 @@
 #define SFLOW_OK_COUNTER64(field) (field != (uint64_t)-1)
 
  /* always send a heartbeat */
 -  process_sflow_uint32(hostdata, SFLOW_M_heartbeat, 0);
 +  process_sflow_uint32(hostdata, SFLOW_M_heartbeat, (apr_time_as_msec(now) 
 - x.uptime_mS) / 1000);
 
  if(offset_HID) {
/* sumbit the system fields that we already extracted above */
 
 
 
 
 On Feb 10, 2011, at 5:38 PM, Bernard Li wrote:
 
 P.S. It looks like GMOND_STARTED=0 for the Windows host -- should I file a 
 bug?
 
 Thanks,
 
 Bernard
 
 On Thu, Feb 10, 2011 at 5:21 PM, Bernard Li bern...@vanhpc.org wrote:
 Hi Neil:
 
 On Thu, Feb 10, 2011 at 5:05 PM, Neil McKee neil.mc...@inmon.com wrote:
 
 That's odd,  the CPU and MEM charts are working OK for me.  It's just 
 the load-avg that is missing (which causes the host to be marked down).
 
 You'll likely need to nuke your old rrd files to see the error.
 
 For Windows, mem_shared, mem_buffers and cpu_nice are missing, and
 they are needed by the respective mem_ and cpu_ reports.  I guess we
 could fix to frontend code to only include them if they exist.  Do
 these metrics exist in Windows-land?
 
 How frequently would we need to poll the Processor Queue Length to get a 
 reasonable load-average estimate?  If we could get away with only doing 
 it every few seconds then it might be worthwhile (perhaps a random delay 
 would be appropriate) but I suspect you might have to poll much faster 
 than that to get something worthwhile?
 
 I will defer to Nick for answering that question :-)
 
 Cheers,
 
 Bernard
 
 
 
 

--
The