Re: [Ganglia-developers] hsflowd for Windows + Ganglia webfrontend
Hi Neil: Argh, Gmail always mess up inline patches... anyways, I've fixed it manually and checked it in: https://sourceforge.net/apps/trac/ganglia/changeset/2473 Do you want commit rights to our SVN repo so that you could fix this yourself in the future? :) Thanks, Bernard On Thu, Feb 10, 2011 at 9:34 PM, Neil McKee neil.mc...@inmon.com wrote: I didn't realize that gmond_started was meant to go with every heatbeat-message. Below is a patch. Neil [root@ganglia gmond]# svn diff sflow.c Index: sflow.c === --- sflow.c (revision 2471) +++ sflow.c (working copy) @@ -398,7 +398,7 @@ #define SFLOW_OK_COUNTER64(field) (field != (uint64_t)-1) /* always send a heartbeat */ - process_sflow_uint32(hostdata, SFLOW_M_heartbeat, 0); + process_sflow_uint32(hostdata, SFLOW_M_heartbeat, (apr_time_as_msec(now) - x.uptime_mS) / 1000); if(offset_HID) { /* sumbit the system fields that we already extracted above */ On Feb 10, 2011, at 5:38 PM, Bernard Li wrote: P.S. It looks like GMOND_STARTED=0 for the Windows host -- should I file a bug? Thanks, Bernard On Thu, Feb 10, 2011 at 5:21 PM, Bernard Li bern...@vanhpc.org wrote: Hi Neil: On Thu, Feb 10, 2011 at 5:05 PM, Neil McKee neil.mc...@inmon.com wrote: That's odd, the CPU and MEM charts are working OK for me. It's just the load-avg that is missing (which causes the host to be marked down). You'll likely need to nuke your old rrd files to see the error. For Windows, mem_shared, mem_buffers and cpu_nice are missing, and they are needed by the respective mem_ and cpu_ reports. I guess we could fix to frontend code to only include them if they exist. Do these metrics exist in Windows-land? How frequently would we need to poll the Processor Queue Length to get a reasonable load-average estimate? If we could get away with only doing it every few seconds then it might be worthwhile (perhaps a random delay would be appropriate) but I suspect you might have to poll much faster than that to get something worthwhile? I will defer to Nick for answering that question :-) Cheers, Bernard -- The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: Pinpoint memory and threading errors before they happen. Find and fix more than 250 security defects in the development cycle. Locate bottlenecks in serial and parallel code that limit performance. http://p.sf.net/sfu/intel-dev2devfeb ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] hsflowd for Windows + Ganglia webfrontend
It might be helpful to have commit privileges for minor bugfixes, yes. Although I wouldn't use it for bigger changes that require consensus. I'm working on a bigger patch that will add an sflow { } configuration block so we can have options to control the following: - the sflow udp port - whether to submit 0s rather than leave undefined metrics out altogether (as a workaround for that issue with the CPU/mem charts for Windows) - whether to ignore the hostname that comes with with host-sflow so that gmond will do it's own reverse lookup (to get the FQDN instead) - whether to submit the extra metrics that host-sflow sends for physical servers - whether to submit the extra metrics that host-sflow sends for virtual servers Should have it ready for review later this afternoon. Once the sFlow HTTP and memcached counter-blocks are finalized then we could quickly add support for those metrics too. More input on those would be very welcome: http://blog.sflow.com/search/label/HTTP http://blog.sflow.com/search/label/Memcache Regards, Neil On Feb 11, 2011, at 12:56 PM, Bernard Li wrote: Hi Neil: Argh, Gmail always mess up inline patches... anyways, I've fixed it manually and checked it in: https://sourceforge.net/apps/trac/ganglia/changeset/2473 Do you want commit rights to our SVN repo so that you could fix this yourself in the future? :) Thanks, Bernard On Thu, Feb 10, 2011 at 9:34 PM, Neil McKee neil.mc...@inmon.com wrote: I didn't realize that gmond_started was meant to go with every heatbeat-message. Below is a patch. Neil [root@ganglia gmond]# svn diff sflow.c Index: sflow.c === --- sflow.c (revision 2471) +++ sflow.c (working copy) @@ -398,7 +398,7 @@ #define SFLOW_OK_COUNTER64(field) (field != (uint64_t)-1) /* always send a heartbeat */ - process_sflow_uint32(hostdata, SFLOW_M_heartbeat, 0); + process_sflow_uint32(hostdata, SFLOW_M_heartbeat, (apr_time_as_msec(now) - x.uptime_mS) / 1000); if(offset_HID) { /* sumbit the system fields that we already extracted above */ On Feb 10, 2011, at 5:38 PM, Bernard Li wrote: P.S. It looks like GMOND_STARTED=0 for the Windows host -- should I file a bug? Thanks, Bernard On Thu, Feb 10, 2011 at 5:21 PM, Bernard Li bern...@vanhpc.org wrote: Hi Neil: On Thu, Feb 10, 2011 at 5:05 PM, Neil McKee neil.mc...@inmon.com wrote: That's odd, the CPU and MEM charts are working OK for me. It's just the load-avg that is missing (which causes the host to be marked down). You'll likely need to nuke your old rrd files to see the error. For Windows, mem_shared, mem_buffers and cpu_nice are missing, and they are needed by the respective mem_ and cpu_ reports. I guess we could fix to frontend code to only include them if they exist. Do these metrics exist in Windows-land? How frequently would we need to poll the Processor Queue Length to get a reasonable load-average estimate? If we could get away with only doing it every few seconds then it might be worthwhile (perhaps a random delay would be appropriate) but I suspect you might have to poll much faster than that to get something worthwhile? I will defer to Nick for answering that question :-) Cheers, Bernard -- The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: Pinpoint memory and threading errors before they happen. Find and fix more than 250 security defects in the development cycle. Locate bottlenecks in serial and parallel code that limit performance. http://p.sf.net/sfu/intel-dev2devfeb ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
[Ganglia-developers] Vertical label in metric.php
Hey Jesse: Just trying to understand this commit: https://sourceforge.net/apps/trac/ganglia/changeset/2356/trunk/monitor-core/web/graph.d/metric.php Why are we setting the vertical label to the metricname? Thanks, Bernard -- The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: Pinpoint memory and threading errors before they happen. Find and fix more than 250 security defects in the development cycle. Locate bottlenecks in serial and parallel code that limit performance. http://p.sf.net/sfu/intel-dev2devfeb ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Vertical label in metric.php
As I recall, it forces the graphs within the images to align between images that include labels, and those that do not. One of Ganglia's strengths is allowing for easy data/time correlations. This is easy only if the graphs actually have the same timescale (generally true), and line up appropriately (which this patch tries to help with). At least, that how I remember it. On Fri, Feb 11, 2011 at 07:22:34PM -0500, Bernard Li wrote: Hey Jesse: Just trying to understand this commit: https://sourceforge.net/apps/trac/ganglia/changeset/2356/trunk/monitor-core/web/graph.d/metric.php Why are we setting the vertical label to the metricname? Thanks, Bernard -- The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: Pinpoint memory and threading errors before they happen. Find and fix more than 250 security defects in the development cycle. Locate bottlenecks in serial and parallel code that limit performance. http://p.sf.net/sfu/intel-dev2devfeb ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers -- Jesse Becker NHGRI Linux support (Digicon Contractor) -- The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: Pinpoint memory and threading errors before they happen. Find and fix more than 250 security defects in the development cycle. Locate bottlenecks in serial and parallel code that limit performance. http://p.sf.net/sfu/intel-dev2devfeb ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] hsflowd for Windows + Ganglia webfrontend
Here is the patch I was referring to. It allows you to put something like this in your gmond.conf: sflow { null_int = 0 null_float = 0.0 } and then if a fields like cpu_nice is missing (as in the Windows hsflowd) we'll submit 0.0 instead of leaving it out. This is a work-around for the problem where the RRD does not even appear when cpu_nice is missing. You can also add another setting accept_all_physical = yes, like this: sflow { null_int = 0 null_float = 0 accept_all_physical = yes } and now the extra metrics that are defined in host-sflow but not in libmetrics are accepted too. These include some useful ones like the number of context switches, the number of pages swapped in/out, network errors and drops, more info on disk reads and writes, and so on. The UI seems to do a good job of just adding these RRDs to the page (so perhaps it would be even safe to make yes the default here?) I'm still skipping over the VM fields, and don't have the option to ignore the sFlow hostname field yet, but placeholder boolean options accept_all_virtual and accept_hostname are defined. There is also udp_port in case you want to designate a non-standard port as the sFlow port (though it still has to appear in a udp_receive_channel section elsewhere). I didn't edit gmond/conf.pod yet. I figured that could happen once there is consensus on these options. Thoughts? Regards, Neil sflow_20110211.patch Description: Binary data On Feb 11, 2011, at 2:26 PM, Neil McKee wrote: It might be helpful to have commit privileges for minor bugfixes, yes. Although I wouldn't use it for bigger changes that require consensus. I'm working on a bigger patch that will add an sflow { } configuration block so we can have options to control the following: - the sflow udp port - whether to submit 0s rather than leave undefined metrics out altogether (as a workaround for that issue with the CPU/mem charts for Windows) - whether to ignore the hostname that comes with with host-sflow so that gmond will do it's own reverse lookup (to get the FQDN instead) - whether to submit the extra metrics that host-sflow sends for physical servers - whether to submit the extra metrics that host-sflow sends for virtual servers Should have it ready for review later this afternoon. Once the sFlow HTTP and memcached counter-blocks are finalized then we could quickly add support for those metrics too. More input on those would be very welcome: http://blog.sflow.com/search/label/HTTP http://blog.sflow.com/search/label/Memcache Regards, Neil On Feb 11, 2011, at 12:56 PM, Bernard Li wrote: Hi Neil: Argh, Gmail always mess up inline patches... anyways, I've fixed it manually and checked it in: https://sourceforge.net/apps/trac/ganglia/changeset/2473 Do you want commit rights to our SVN repo so that you could fix this yourself in the future? :) Thanks, Bernard On Thu, Feb 10, 2011 at 9:34 PM, Neil McKee neil.mc...@inmon.com wrote: I didn't realize that gmond_started was meant to go with every heatbeat-message. Below is a patch. Neil [root@ganglia gmond]# svn diff sflow.c Index: sflow.c === --- sflow.c (revision 2471) +++ sflow.c (working copy) @@ -398,7 +398,7 @@ #define SFLOW_OK_COUNTER64(field) (field != (uint64_t)-1) /* always send a heartbeat */ - process_sflow_uint32(hostdata, SFLOW_M_heartbeat, 0); + process_sflow_uint32(hostdata, SFLOW_M_heartbeat, (apr_time_as_msec(now) - x.uptime_mS) / 1000); if(offset_HID) { /* sumbit the system fields that we already extracted above */ On Feb 10, 2011, at 5:38 PM, Bernard Li wrote: P.S. It looks like GMOND_STARTED=0 for the Windows host -- should I file a bug? Thanks, Bernard On Thu, Feb 10, 2011 at 5:21 PM, Bernard Li bern...@vanhpc.org wrote: Hi Neil: On Thu, Feb 10, 2011 at 5:05 PM, Neil McKee neil.mc...@inmon.com wrote: That's odd, the CPU and MEM charts are working OK for me. It's just the load-avg that is missing (which causes the host to be marked down). You'll likely need to nuke your old rrd files to see the error. For Windows, mem_shared, mem_buffers and cpu_nice are missing, and they are needed by the respective mem_ and cpu_ reports. I guess we could fix to frontend code to only include them if they exist. Do these metrics exist in Windows-land? How frequently would we need to poll the Processor Queue Length to get a reasonable load-average estimate? If we could get away with only doing it every few seconds then it might be worthwhile (perhaps a random delay would be appropriate) but I suspect you might have to poll much faster than that to get something worthwhile? I will defer to Nick for answering that question :-) Cheers, Bernard -- The