Re: [Ganglia-general] Sflow Apache metrics

2015-04-13 Thread Neil Mckee
Sergey,

It's usually best to compile mod-sflow from sources so that it matches the
particular version of apache you are running.  So before you do that you
have the option of editing mod-sflow.c and changing the setting of
SFWB_DEFAULT_CONFIGFILE (on line 211).

https://code.google.com/p/mod-sflow/source/browse/trunk/mod_sflow.c#211

Does that work for you?

Separate question:  I'm not sure how hsflowd works if it doesn't start as
root?  What OS are you on?

Neil


On Mon, Apr 13, 2015 at 5:55 PM, Sergey svin...@apple.com wrote:


 I found following error in Apache log:

 [Mon Apr 13 23:25:14 2015] [error] (2)No such file or directory:
 apr_stat(/etc/hsflowd.auto) failed

 The problem is that Hsflowd process is running in the user directory and
 keeps hsflowd.auto file in ./run directory.
 I can’t access /etc directory and put file there also, because I don’t
 have root access.
 Any ideas?

 Thanks!
 S.


 On Apr 13, 2015, at 9:36 AM, Sergey svin...@apple.com wrote:

 Yes, I installed sflowtool and it works!
 I get all counters except http* ones.
 That’s why I tested http://hostname/sflow page, because it uses mod_sflow
 in Apache.
 It looks like some Apache+sflow issue, but I don’t know how to
 troubleshoot it.

 Thanks
 S.

 On Apr 10, 2015, at 6:28 PM, Leslie geekg...@gmail.com wrote:

 Have you installed sflowtool and seen if the sflow counters are even
 getting sent out by the machine ?  My next step would be a tcpdump to
 make sure that the sflow counters are then getting sent to the
 collecting host.

 On Fri, Apr 10, 2015 at 4:55 PM, Sergey svin...@apple.com wrote:

 Hi All!

 I installed mod_sflow on Apache and try to collect HTTP metrics by Gmond.
 The problem is that I don’t see any HTTP metrics coming from Hsflow to
 Gmond, nor HTTP counters via Apache http://hostname/sflow page.
 There is a list of counters, but they all have 0.
 Like this:

 unter method_option_count 0
 counter method_get_count 0
 counter method_head_count 0
 counter method_post_count 0
 counter method_put_count 0
 counter method_delete_count 0
 counter method_trace_count 0
 counter method_connect_count 0
 counter method_other_count 0
 counter status_1XX_count 0
 counter status_2XX_count 0
 counter status_3XX_count 0
 counter status_4XX_count 0
 counter status_5XX_count 0
 counter status_other_count 0
 string hostname xx
 gauge sampling_n 0

 At the same time http://hostname/server-status?auto is working properly:

 Total Accesses: 15
 Total kBytes: 5

 Uptime: 149
 ReqPerSec: .100671
 BytesPerSec: 34.3624
 BytesPerReq: 341.333
 BusyWorkers: 1
 IdleWorkers: 7
 Scoreboard:

 Is there a way to troubleshoot this? I need Sflow metrics.

 Thanks!
 S.


 --
 BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
 Develop your own process in accordance with the BPMN 2 standard
 Learn Process modeling best practices with Bonita BPM through live
 exercises
 http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
 event?utm_
 source=Sourceforge_BPM_Camp_5_6_15utm_medium=emailutm_campaign=VA_SF
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general



 --
 BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
 Develop your own process in accordance with the BPMN 2 standard
 Learn Process modeling best practices with Bonita BPM through live
 exercises
 http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
 event?utm_

 source=Sourceforge_BPM_Camp_5_6_15utm_medium=emailutm_campaign=VA_SF___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general




 --
 BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
 Develop your own process in accordance with the BPMN 2 standard
 Learn Process modeling best practices with Bonita BPM through live
 exercises
 http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
 event?utm_
 source=Sourceforge_BPM_Camp_5_6_15utm_medium=emailutm_campaign=VA_SF
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general


--
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_

Re: [Ganglia-general] FW: Ganglia and sFlow

2014-12-05 Thread Neil Mckee
Simon,

I don't know if this is still an issue for you,  but my understanding is
that the cluster name comes from the gmond instance that you send the sFlow
to.So if you have 1000 hosts running hsflowd and you want to divide
them into 10 clusters then you would run 10 instances of gmond somewhere
(with each listening on different udp/tcp ports if they are all on the same
host).   Then when gmetad gets the latest stats from each one it will do
the right thing.   I hope someone else will jump in if I got this wrong.

Separately,  the hostname case-sensitivity thing is tricky.  If we ignore
the hostname that hsflowd sends and submit the stats using only the IP
address then gmond/gmetad will use a reverse-DNS lookup as the name.  That
might work for some users if their DNS server is consistent and reliable.
Alternatively,   we could automatically lowercase the hostname that we get
from hsflowd.  That might work in other places,  but it might also make
things worse because now you have an identifier that might not match either
the DNS name or the Windows case-sensitive hostname.   We could try adding
new config options for this that apply to the sFlow receiver in gmond,
but I don't want to do that if it's  just going to make things more
confusing.  What hostname treatment option do you think would work for you?

Neil



On Thu, Sep 18, 2014 at 1:10 PM, Simon Ambridge simon.ambri...@qubix.com
wrote:

  Hi



 I’ve installed Ganglia 3.6.0 gmetad and gmond on an Oracle Linux collector
 and can successfully collect metrics from Oracle Linux gmond nodes.

 I also need to collect metrics from Windows 2012 R2 hosts too, so I
 installed sFlow 1.23.4-x64 – but I then found that I had blank graphs for
 the Windows node. The Windows machine has an upper-case file name and I saw
 that the directory under /var/lib/ganglia/rrds was in lower case. I changed
 $conf['case_sensitive_hostnames'] = false; to true and I now do not get
 blank graphs for the detailed stats for the Windows node. So far so good.



 However, I still have the following problems with blank stats on the main
 page, sFlow node cluster names and how to use conf.php:



 1.   Even though I get the detailed stats for the Windows machine,
 the big load_one stacked graph on the main page does not display any
 details for it. If I link the upper-case directory in /var/lib/ganglia/rrds
 to a lower-case name it correctly display. So that means that the
 'case_sensitive_hostnames' directive is respected by the node stats page
 but **not** the load_one stacked graph on the main page. The main page
 also behaves differently because in the drop-down list of nodes the Windows
 machine is listed in upper-case but on it’s detailed stats page it is
 titled in lower-case.

 2.   The Oracle Linux nodes are defined in gmond.conf as belonging to
 their named cluster. The Windows machine is automatically lumped into that
 same cluster – how do I define a cluster group for an sFlow node?

 3.   If I create a conf.php override file as recommended and put
 $conf['case_sensitive_hostnames'] = false; in there, I don’t get any graphs
 displayed at all for anything. Remove the file and the graphs come back.
 What am I doing wrong with conf.php?



 Many thanks

 Simon Ambridge


 --
 Slashdot TV.  Video for Nerds.  Stuff that Matters.

 http://pubads.g.doubleclick.net/gampad/clk?id=160591471iu=/4140/ostg.clktrk
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general


--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration  more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151iu=/4140/ostg.clktrk___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] Gauges verses counters

2014-03-18 Thread neil mckee
FYI,   Ganglia already understands the output from this alternative JMX
monitoring solution:
https://code.google.com/p/jmx-sflow-agent/

I think it has similar properties to embedded-jmxtrans.  Much better to
have the JVM push the stats every 20 seconds or so than have to poll for
them remotely over an encrypted connection.  And using the java-agent hook
means you only have to change the JVM command-line.

Neil




On Thu, Mar 13, 2014 at 11:12 AM, Silver, Jonathan 
jonathan.sil...@unify.com wrote:

 We are planning on using jmxtrans to collect and propagate a number of
 metrics to ganglia. There is no place in jmxtrans to define the metric as a
 counter or a gauge.

 If we do NOT predefine the metric in rrds, what will happen? What will
 show on the graphs? How does ganglia know that it's a gauge and not a
 counter?

 Thanks,
 jon


 --
 Learn Graph Databases - Download FREE O'Reilly Book
 Graph Databases is the definitive new guide to graph databases and their
 applications. Written by three acclaimed leaders in the field,
 this first edition is now available. Download your free book today!
 http://p.sf.net/sfu/13534_NeoTech
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general

--
Learn Graph Databases - Download FREE O'Reilly Book
Graph Databases is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] NGINX / SFLOW / Ganaglia - metrics get corrupted

2014-03-07 Thread neil mckee
Mark,

It does seem like the issue is with the sFlow from nginx-sflow-module.  I
wrote that module so I can probably help:

(1) just one instance of nginx on that server,  or two?
(2) what version of nginx?
(3) single-threaded or multi-threaded nginx?
(4) running on Linux OS?
(5) please upgrade to the latest nginx-sflow-module (0.9.8),  the one you
are running (0.9.7)  has a bug that affects graceful restarts.  The fix was
a one-liner,  so it's not a big step.
(6) please capture and send a trace of the sFlow packets arriving from this
nginx source.  For example,  if the IP address is 10.1.2.3 and it's coming
in on eth0:

root /usr/sbin/tcpdump -i eth0 -s 0 -w nginx_sflow.pcap udp port 6343 and
ip src 10.1.2.3
control-c after a few minutes to stop
root gzip nginx_sflow.pcap

then send nginx_sflow.pcap.gz

(7) please also send /etc/hsflowd.conf

The kind of thing it might be:
  - two nginx-sflow-modules running on the same host and not disambiguating
properly (supposed to happen automatically by choosing sflow datasource
index as lowest numbered TCP port number that process is listening on)

Regards,
Neil




On Fri, Mar 7, 2014 at 3:40 PM, Bernard Li bern...@vanhpc.org wrote:

 Can you connect to the gmond port and paste the XML for the metrics in
 question?  I'd like to see how they're defined.

 Thanks,

 Bernard

 On Fri, Mar 7, 2014 at 11:08 AM, Flanagan, Mark mark.flana...@unify.com
 wrote:
  http://www.sflow.org/ appears to be the defining entity for sflow.
  http://www.sflow.org/sflow_http.txt would appear to define the http
 sflow data.
 
  It is not explicitly clear just what the counter values are supposed
 to mean. The general architecture of sflow-like data would suggest the
 values should be a running counter (like the network interface metrics)
 which means gmond is implementing the packets properly and NGINX is sending
 the wrong data.
 
  That's just my guess for now.
 
 
  -Original Message-
  From: Bernard Li [mailto:bern...@vanhpc.org]
  Sent: Friday, March 07, 2014 1:39 PM
  To: Silver, Jonathan
  Cc: ganglia-general@lists.sourceforge.net; Flanagan, Mark
  Subject: Re: [Ganglia-general] NGINX / SFLOW / Ganaglia - metrics get
 corrupted
 
  Hi Jonathan:
 
  Perhaps you can share how these metrics are defined?
 
  Cheers,
 
  Bernard
 
  On Fri, Mar 7, 2014 at 10:21 AM, Silver, Jonathan
  jonathan.sil...@unify.com wrote:
  Does the following analysis mean anything to anyone?
  It seems to me that this is a basic thing that should have been seen by
 everyone else and found during first test - unless it's some config
 parameter.
 
  Thanks
  Jon
 
  ---
 
  Well, I think I understand what is happening - but I don't even want to
 think about fixing it. I'm not sure which software is right.
 
  The sflow data coming from NGINX reports the number of various HTTP
 messages (GET, HEAD, 1XX, 2XX, etc) in the measured period.
  The period is either 10 or 20 seconds - I don't have any idea why that
 isn't consistent.
 
  When gmond receives the HTTP data in sflow format, it computes the
 difference between the most recently reported value and the one before and
 divides that by the reported interval. That is, it is expecting a running
 total and that is NOT what is received.
 
  I don't know which software is right, but the NGINX reports are not
 what the gmond handler expects.
 
  All the other sflow reports appear to be correct.
 
  -- Mark
 
 
  Flow plug-in:  I am still trying to find out, it is actually built by
  another group and I'm not sure what they pulled, but I'm pretty sure
  its 0.9.8
 
  hsflowd version 1.23.2
 
  gmond 3.6.0
 
   -
  On Tuesday, 4 March 2014, Silver, Jonathan jonathan.sil...@unify.com
  wrote:
 
  We're using NGINX and sflow, to capture and send the metrics to
 ganglia.
  The metric values look correct when viewed using sflowtool, but gmond
  (on the same box)is reporting them with all kinds of random values.
 
  Running gmond --debug=10 I do see some various error messages in the
 log:
 
  Some of these:
  sequence number error - 10.235.240.31:443-3:443 lostSamples=37
 
  Some of these:
  ERROR: [Errno 111] Connection refused
 
  And some with the hostname NULL:  (But only one time for each metric)
  ***Allocating value packet for host--(null)-- and metric
  --http_meth_put--
  
 
 
  Has anyone heard of this issue? I've started adding debug statements
  to gmond, but before I go through all of that, if it's a known
 issue.
 
  Thanks for any info,
  jon
 
 
 
  --
   Subversion Kills Productivity. Get off Subversion  Make the
  Move to Perforce.
  With Perforce, you get hassle-free workflows. Merge that actually
 works.
  Faster operations. Version large binaries.  Built-in WAN optimization
  and the freedom to use Git, Perforce or both. Make the 

Re: [Ganglia-general] sflow - getting VirStorageLookupByPAth failed.

2013-03-12 Thread Neil Mckee
Ron,

You might try downloading the latest source code for hsflowd,  and compiling 
with LIBVIRT=yes VRTDSKPATH=yes

In other words:

svn checkout http://svn.code.sf.net/p/host-sflow/code/trunk host-sflow-code
cd host-sflow-code
make LIBVIRT=yes VRTDSKPATH=yes

This turns on a different way of accessing the storage info.  For details, see 
here:
http://sourceforge.net/p/host-sflow/code/398/tree/trunk/src/Linux/hsflowd.c

around line 634.

Please let me know if this works better.

Neil



On Mar 12, 2013, at 3:57 PM, Ron wrote:

 First attempt to configure sflow.
 
 I have numerous KVM/QEMU VMs running 
 
 My understanding was to use sflow to collect metrics for these.
 
 Configured a gmond as an sflow 'collector'. 
 
 Altered DNS to point to the collector.
 
 But now for each VM I get: something like:
 
 hsflowd: virStorageLookupByPath(/panfs/pan5/data/VMStorage/PiraatTriple.img) 
 failed
 
 
 But,  the file is there.
 
 ls -l /panfs/pan5/data/VMStorage/PiraatTriple.img
 -rw-rw 1 qemu qemu 17179869184 Mar 12 10:38 
 /panfs/pan5/data/VMStorage/PiraatTriple.img
 
 Anybody bumped into this before?
 
 TIA
 
 Ron Reeder 
 --
 Everyone hates slow websites. So do we.
 Make your web apps faster with AppDynamics
 Download AppDynamics Lite for free today:
 http://p.sf.net/sfu/appdyn_d2d_mar___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general

--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] calculate cpu utilization with cpu time

2012-09-27 Thread Neil Mckee
The sFlow CPU metrics are processed here:

https://github.com/ganglia/monitor-core/blob/master/gmond/sflow.c#L334

Let me know if you find a problem.

Regards,
Neil


On Aug 10, 2012, at 2:00 AM, crayon z wrote:

 Hi, all:
 
   I use ganglia to parse metrics from Host sFlow. The cpu metrics in Host 
 sFlow are in form of CPU time, however, I want to know how ganglia calculate 
 cpu utilization with cpu time.
 
 Best Regards
 
 -- 
 Crayon Z
 --
 Live Security Virtual Conference
 Exclusive live event will cover all the ways today's security and 
 threat landscape has changed and how IT managers can respond. Discussions 
 will include endpoint security, mobile security and the latest in malware 
 threats. 
 http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general

--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://ad.doubleclick.net/clk;258768047;13503038;j?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] Impact of gmond polling on data collection

2012-09-19 Thread Neil Mckee
in gmond.c:process_tc_accept_channel() could those goto statements close the 
socket and return without relinquishing the mutex?

Neil

On Sep 19, 2012, at 8:45 AM, Nicholas Satterly wrote:

 Hi Peter,
 
 Thanks for the feedback.
 
 I've added a thread mutex to the hosts hash table as you suggested and will 
 send a pull request in the next day or so.
 
 Regards,
 Nick
 
 On Mon, Sep 17, 2012 at 8:25 PM, Peter Phaal peter.ph...@gmail.com wrote:
 Nicholas,
 
 It makes sense to multi-thread gmond, but looking at your patch, I
 don't see any locking associated with the hosts hashtable. Isn't there
 a possible race if new hosts/metrics are added to the hashtable by the
 UDP thread at the same time the hashtable is being walked by the TCP
 thread?
 
 Peter
 
 On Mon, Sep 17, 2012 at 6:03 AM, Nicholas Satterly nfsatte...@gmail.com 
 wrote:
  Hi Chris,
 
  I've discovered there are two contributing factors to problems like this.
 
  1. the number of metrics being sent (possibly in short bursts) can overflow
  the UDP receive buffer.
  2. the time it takes to process metrics in the UDP receive buffer causes TCP
  connections from the gmetad's to timeout (currently hard-coded to 10
  seconds)
 
  In your case, you are probably dropping UDP packets because gmond can't keep
  up. Gmond was enhanced to allow you to increase the UDP buffer size back in
  April. I suggest you upgrade to the latest version and set this a sensible
  value for your environment.
 
  udp_recv_channel {
port = 1234
buffer = 1024000
  }
 
  To determine what is sensible is a bit of trial and error. Run netstat -su
  and keep increasing the value until you no longer see the number of packet
  receive errors going up.
 
  $ netstat -su
  Udp:
  7941393 packets received
  23 packets to unknown port received.
  0 packet receive errors
  10079118 packets sent
 
  The other possibility is that it takes so long for a gmetad to pull back all
  the metrics you are collecting for a cluster that you are preventing the
  gmond from processing metric data received via UDP. Again this can cause the
  UDP receive buffer to overflow.
 
  The problem we had at my work is related to all of the above but manifested
  itself in a slightly different way. We were seeing gaps in all our graphs
  because at times none of the servers in a cluster would respond to gmetad
  poll within 10 seconds. I used to think that the gmond was completely hung
  but realised that they would respond normally most of the time but every
  minute or so it woul take about 20-25 seconds. This happened to coincide
  with the UDP receive queue growing (Recv-Q column below) and I realised
  that it took this long for the gmond to process the metric data it had
  received via UDP from all the other servers in the cluster.
 
  $ netstat -ua
  Active Internet connections (servers and established)
  Proto Recv-Q Send-Q Local Address
  udp   1920032  0 *:8649  *:*
 
  The solution was to modify gmond and move the TCP request handler into to
  separate thread so that gmond could take as long as it needed to process
  incoming metric data (from UDP receive buffer that is large enough not to
  overflow) without blocking on the TCP requests for the XML data.
 
  The patched gmond is running without a problem in our environment so I have
  submitted a pull request[1] for it to be included in trunk.
 
  I can't be 100% sure that this patch will fix your problem but it would be
  worth a try.
 
  Regards,
  Nick
 
  [1] https://github.com/ganglia/monitor-core/pull/50
 
 
  On Sat, Sep 15, 2012 at 12:16 AM, Chris Burroughs
  chris.burrou...@gmail.com wrote:
 
  We use ganglia to monitor  500 hosts in multiple datacenters with about
  90k unique host:metric pairs per DC.  We use this data for all of the
  cool graphs in the web UI and for passive alerting.
 
  One of our checks is to measure TN of load_one on every box (we want to
  make sure gmond is working and correctly updating metrics otherwise we
  could be blind and not know it).  We consider it a failure if TN is 
  600.  This is an arbitrary number but 10 minutes seemed plenty long.
 
  Unfortunately we are seeing this check fail far too often.  We set up
  two parallel gmetad instances (monitoring identical gmonds) per DC and
  have broken our problem into two classes:
   * (A) only one of the gmetad stops updating for an entire cluster, and
  must be restarted to recover.  Since the gmetad's disagree we know the
  problem is there. [1]
   * (B) Both gmetad's say an individual host has not reported (gmond
  aggregation or sending must be at fault).  This issue is usually
  transient (that is it recovers after some period of time greater than 10
  minutes).
 
  While attempting to reproduce (A) we ran several additional gmetad
  instances (again polling the same gmonds) around 2012-12-07.  Failures
  per day are below [2].  The act of testing seems to have significantly
  increased the number of 

Re: [Ganglia-general] Gmond Compilation on Cygwin

2012-07-09 Thread Neil Mckee
You could try adding --disable-sflow as another configure option.   (Or were 
you planning to use sFlow agents such as hsflowd?).

Neil


On Jul 9, 2012, at 3:50 AM, Nigel LEACH wrote:

 Ganglia 3.4.0
 Windows 2008 R2 Enterprise
 Cygwin 1.5.25
 IBM iDataPlex dx360 with Tesla M2070
 Confuse 2.7
  
 I’m trying to use the Ganglia Python modules to monitor a Windows based GPU 
 cluster, but having problems getting gmond to compile. This ‘configure’ 
 completes successfully
  
 ./configure --with-libconfuse=/usr/local --without-libpcre 
 --enable-static-build
  
 but ‘make’ fails, this is the tail of standard output
  
 mv -f .deps/g25_config.Tpo .deps/g25_config.Po
 gcc -std=gnu99 -DHAVE_CONFIG_H -I. -I.. -DCYGWIN -I/usr/include/apr-1
 -I/usr/include/ap
 r-1-I../lib -I../include/ -I../libmetrics -D_LARGEFILE64_SOURCE -DSFLOW 
 -g -O2 -I/usr/
 local/include -fno-strict-aliasing -Wall -MT core_metrics.o -MD -MP -MF 
 .deps/core_metrics
 .Tpo -c -o core_metrics.o core_metrics.c
 mv -f .deps/core_metrics.Tpo .deps/core_metrics.Po
 gcc -std=gnu99 -DHAVE_CONFIG_H -I. -I.. -DCYGWIN -I/usr/include/apr-1
 -I/usr/include/ap
 r-1-I../lib -I../include/ -I../libmetrics -D_LARGEFILE64_SOURCE -DSFLOW 
 -g -O2 -I/usr/
 local/include -fno-strict-aliasing -Wall -MT sflow.o -MD -MP -MF 
 .deps/sflow.Tpo -c -o sfl
 ow.o sflow.c
 sflow.c: In function `process_struct_JVM':
 sflow.c:1033: warning: comparison is always true due to limited range of data 
 type
 sflow.c:1034: warning: comparison is always true due to limited range of data 
 type
 sflow.c:1035: warning: comparison is always true due to limited range of data 
 type
 sflow.c:1036: warning: comparison is always true due to limited range of data 
 type
 sflow.c:1037: warning: comparison is always true due to limited range of data 
 type
 sflow.c:1038: warning: comparison is always true due to limited range of data 
 type
 sflow.c:1039: warning: comparison is always true due to limited range of data 
 type
 sflow.c: In function `processCounterSample':
 sflow.c:1169: warning: unsigned int format, uint32_t arg (arg 4)
 sflow.c:1169: warning: unsigned int format, uint32_t arg (arg 4)
 sflow.c: In function `process_sflow_datagram':
 sflow.c:1348: error: `AF_INET6' undeclared (first use in this function)
 sflow.c:1348: error: (Each undeclared identifier is reported only once
 sflow.c:1348: error: for each function it appears in.)
 make[3]: *** [sflow.o] Error 1
 make[3]: Leaving directory `/var/tmp/ganglia-3.4.0/gmond'
 make[2]: *** [all-recursive] Error 1
 make[2]: Leaving directory `/var/tmp/ganglia-3.4.0/gmond'
 make[1]: *** [all-recursive] Error 1
 make[1]: Leaving directory `/var/tmp/ganglia-3.4.0'
 make: *** [all] Error 2
  
 Has anyone come across this before ?
  
 Many Thanks
 Nigel
  
 
 ___
 This e-mail may contain confidential and/or privileged information. If you 
 are not the intended recipient (or have received this e-mail in error) please 
 notify the sender immediately and delete this e-mail. Any unauthorised 
 copying, disclosure or distribution of the material in this e-mail is 
 prohibited.
 
 Please refer to http://www.bnpparibas.co.uk/en/email-disclaimer/ for 
 additional disclosures. 
 --
 Live Security Virtual Conference
 Exclusive live event will cover all the ways today's security and 
 threat landscape has changed and how IT managers can respond. Discussions 
 will include endpoint security, mobile security and the latest in malware 
 threats. 
 http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


[Ganglia-general] hsflowd ported to Solaris

2012-04-21 Thread Neil Mckee
Hello All,

There is now a Solaris port of hsflowd:
http://host-sflow.sourceforge.net

Binary packages for sparc and x86 can be downloaded,  but sources are only in 
the trunk:
mkdir host-sflow-trunk
svn co https://host-sflow.svn.sourceforge.net/svnroot/host-sflow/trunk 
host-sflow-trunk
more host-sflow-trunk/INSTALL.SunOS

Some Ganglia+sFlow explanation here:
http://blog.sflow.com/2011/07/ganglia-32-released.html

Thanks go to Johnny Johnson for contributing the port.

If you run Solaris your feedback would be very much appreciated.

Neil--
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] udp_recv_channel for sflow and gmetric

2012-03-28 Thread Neil McKee
I'm pretty sure this will not work.  You need separate ports.

Neil Mckee



On Mar 28, 2012, at 2:45 PM, Ozzie Sabina o...@sabina.org wrote:

 Can this be shared?  A quick googling failed me here.
 
 Can I configure a single one of these and accept messages from both gmetric 
 and sflow clients?  We use a port per-service and run multiple gmonds per 
 machine, so it's considerably simpler to only use the one port as we have the 
 infrastructure in place for that.
 
 To be explicit, if I do:
 
 globals {
   mute = no
   deaf = no
   ...
 }
 
 udp_recv_channel {
   port = 15010
 }
 
 sflow {
   udp_port = 15010
 }
 
 (a) is that sufficient alone (with gmond 3.3.1) to start collecting slow 
 metrics being spit at me, and 
 (b) will I be able to also send gmetric values to the same port?
 
 Oz
 --
 This SF email is sponsosred by:
 Try Windows Azure free for 90 days Click Here 
 http://p.sf.net/sfu/sfd2d-msazure
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
--
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] Free Velocity Online conference tomorrow

2011-10-25 Thread Neil Mckee
I guess the sFlow network interface counters could go in as well - but with the 
current model I think we would have to flatten the data so that every interface 
looked like a separate host in the Ganglia database.  Is that really what you 
want?  It seems like this is part of the discussion about naming, tagging and 
parent-child hierarchy that came up with the introduction of hypervisors and 
their VMs.   Any more thoughts on that?

Neil


On Oct 25, 2011, at 12:36 PM, Vladimir Vuksan wrote:

 Great. I am often asked about network devices such as switches and routers. 
 What is the roadmap on that ?
 
 Thanks,
 Vladimir
 
 On Tue, 25 Oct 2011, Neil Mckee wrote:
 
 Vladimir,
 
 Just an FYI since it seems to be relevant to your talk:
 
 I am preparing a patch for Ganglia that will add support for the sFlow-HTTP 
 feed,  as exported by mod-sflow, nginx-sflow-module, tomcat-sflow-valve and 
 node-sflow-module.  This represents an efficient way to get real-time HTTP 
 stats from a large web-farm.  The sFlow-HTTP spec should be finalized in the 
 next few weeks,  so the patch can go in soon after that.
 
 background info here:
 http://blog.sflow.com/search?q=HTTP
 
 discussion on sFlow-HTTP spec (please comment!):
 http://groups.google.com/group/sflow/browse_thread/thread/88accb2bad594d1d#
 
 source code links:
 http://host-sflow.sourceforge.net/relatedlinks.php
 
 Regards,
 Neil
 
 
 P.S.  sFlow-MEMCACHE support will probably be added to Ganglia at the same 
 time.
 
 
 On Oct 25, 2011, at 8:47 AM, Vladimir Vuksan wrote:
 
 I was gonna mention there is a free Velocity online conference/webcast. I
 will be speaking about backend monitoring and time permitting will be
 demoing some of the Ganglia Web 2.0 features.
 
 http://velocityconf.com/velocity-oct2011
 
 Vladimir
 
 --
 The demand for IT networking professionals continues to grow, and the
 demand for specialized networking skills is growing even more rapidly.
 Take a complimentary Learning@Cisco Self-Assessment and learn
 about Cisco certifications, training, and career opportunities.
 http://p.sf.net/sfu/cisco-dev2dev
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 
 


--
The demand for IT networking professionals continues to grow, and the
demand for specialized networking skills is growing even more rapidly.
Take a complimentary Learning@Cisco Self-Assessment and learn 
about Cisco certifications, training, and career opportunities. 
http://p.sf.net/sfu/cisco-dev2dev
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] vlan traffic counting

2011-09-06 Thread Neil Mckee
Thanks for bringing this up.  I checked a change into the hsflowd trunk that 
looks for these interfaces and excludes them from the counting.  It uses the 
SIOCGIFVLAN ioctl call -- although it seems that your filter on the device name 
might work just fine.

http://host-sflow.svn.sourceforge.net/viewvc/host-sflow/trunk/src/Linux/readInterfaces.c?annotate=231

Neil


On Aug 29, 2011, at 6:39 AM, Robin Humble wrote:

 I've noticed on our VLAN interfaces that gmond's default network metric
 seem to be miscalculating network traffic.
 
 anyone else seeing this?  we don't use many VLANs.
 
 seems the Linux OS counters for the VLAN interface also add onto on
 the parent interface, and gmond reads both so traffic reported by gmond
 is ~2x greater than it really is.
 
 eg. eth4 (no IP set) with a eth4.99 VLAN, /proc/net/dev shows
 
  Inter-|   Receive|  Transmit
   face |bytespackets errs drop fifo frame compressed multicast|bytes
 packets errs drop fifo colls carrier compressed
   ...
eth4:1453106293850 1688242724000 0  0 11874 
 3090232715347 2518006182000 0   0  0
 eth4.99:1429470895714 1688242724000 0  0 11874 
 2988353281655 912706240000 0   0  0
 
 maybe we have setup our interfaces oddly or something.
 I don't know why Tx Pkts is different between the 2 interfaces ...
 maybe an upstream MTU.
 
 aliased interfaces don't have the same problem as Linux doesn't list
 them in /proc/net/dev.
 
 our setup is ganglia 3.2.0, x86_64, centos5.6 userland, 2.6.32 vanilla
 kernels, ixgbe 10gige.
 
 the below patch fixes/hacks-around the problem by simply skipping all
 VLAN interfaces - anything with a '.' in the name.
 doesn't seem right somehow, but seems to work for me.
 
 cheers,
 robin
 --
 Dr Robin Humble, HPC Systems Analyst, NCI National Facility
 
 --- ganglia-3.2.0.orig/libmetrics/linux/metrics.c 2010-05-11 
 00:39:54.0 +1000
 +++ ganglia-3.2.0/libmetrics/linux/metrics.c  2011-08-29 16:19:55.0 
 +1000
 @@ -181,8 +181,10 @@ void update_ifdata ( char *caller )
   p = index(p, ':');
 
   /* Ignore 'lo' and 'bond*' interfaces (but sanely) */
 +  /* Ignore VLAN interfaces (eg. eth4.99) as stats are already 
 included in parent */
   if (p  strncmp (src, lo, 2) 
 -  strncmp (src, bond, 4))
 +  strncmp (src, bond, 4) 
 +  (index(src,'.') == NULL || index(src, '.')  p))
  {
 p++;
 /* Check for data from the last read for this */
 
 
 --
 EMC VNX: the world's simplest storage, starting under $10K
 The only unified storage solution that offers unified management 
 Up to 160% more powerful than alternatives and 25% more efficient. 
 Guaranteed. http://p.sf.net/sfu/emc-vnx-dev2dev
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general


--
Malware Security Report: Protecting Your Business, Customers, and the 
Bottom Line. Protect your business and customers by understanding the 
threat from malware and how it can impact your online business. 
http://www.accelacomm.com/jaw/sfnl/114/51427462/
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] Problem displaying Virtual Machine data with hsflowd and ganglia 3.2.0 in an Openstack Compute node.

2011-09-01 Thread Neil Mckee

On Aug 31, 2011, at 8:15 AM, Emanuele Verga wrote:

 Hi Neil,
 thanks a lot for the help!
  
 I verified libvirt version, it's 0.8.8
  
 I've dowloaded and compiled hsflowd revision 227: now ganglia correctly 
 receives and processes statistics VM Bytes Written and  VM Writes.
 (http://imageshack.us/f/846/vmstats.png/)
  
 Other disk statistics for VM ( VM Bytes Read, VM Disk Errors, Free Vdisk 
 Space, VM Reads, Total Vdisk Space) are not displayed, and hsflowd -dd 
 displays errors similar to the following:
  
 libvir: QEMU error : invalid argument in invalid path vda not assigned to 
 domain
 virDomainGetBlockInfo(vda) failed

Oh,   sorry.  virDomainetBlockInfo needs the path,  not the deviceName.   I've 
made the (1-line) change and checked it in.  Please svn update and try again. 
  If this works you should start to see the capacity, allocation and available 
fields.

I'm not sure whether disk errors are going to show up or not.  It depends on 
how libvirt implements the virDomainBlockStats() call for KVM.

By the way, if you don't want it to even attempt that other call that always 
fails to find the volume,  you can add this somewhere in the src/Linux/Makefile:

CFLAGS += -DHSP_VRT_USE_DISKPATH

But I think we might change it so that the error message only appears once in 
future (don't want to fill the logs),  so that might be just as good.

Neil



 I've logged a few minutes of hsflowd activity, if it can help you you can 
 download it here:
 http://www.mediafire.com/?g4jac7dm3mmb662
  
 2011/8/29 Neil Mckee neil.mckee...@gmail.com
 Sorry,  the failure of virStorageLookupByPath() was preventing 
 virDomainBlockStats() from being attempted.
 
 I checked in a fix for this,  and also code to try the newer 
 virDomainGetBlockInfo() call as a fallback should virStorageLookupByPath() 
 fail.  This call only came in with libvirt version 0.8.1.  Are you running 
 something newer than that?  (see /usr/include/libvirt/libvirt.h)
 
 If this works,  we should make a new release of hsflowd,  so please let me 
 know how it goes.
 
 Regards,
 Neil
 
 
 
 On Aug 29, 2011, at 7:24 AM, Emanuele Verga wrote:
 
 Hi, I downloaded and installe hsflowd trunk revision 226 but using hsflowd  
 I keep seeing  virStorageLookupByPath errors, and VM disk statistics aren't 
 displayed. Do I need to tell hsflowd explicitly to use target=vda call? If 
 yes, how?
 
 Thanks in advance,
 Emanuele
 
 2011/8/25 Emanuele Verga verga.emanu...@gmail.com
 Hi Neil,
 
 Yes that's possible, the problem is Nova places each image in a separate 
 folder (/var/lib/nova/instance/INSTANCENAME/), so we would have to create a 
 new pool with the corresponding path each time a new instance is created, 
 and if we start to add more servers it quicly becomes impractical. 
 
 I've not yet been able to try the hsflowd version you suggested, I'll test 
 it tomorrow and let you know.
 
 Thanks for the help!
 Emanuele
 
 
 
 

--
Special Offer -- Download ArcSight Logger for FREE!
Finally, a world-class log management solution at an even better 
price-free! And you'll get a free Love Thy Logs t-shirt when you
download Logger. Secure your free ArcSight Logger TODAY!
http://p.sf.net/sfu/arcsisghtdev2dev___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] Problem displaying Virtual Machine data with hsflowd and ganglia 3.2.0 in an Openstack Compute node.

2011-08-29 Thread Neil Mckee
Sorry,  the failure of virStorageLookupByPath() was preventing 
virDomainBlockStats() from being attempted.

I checked in a fix for this,  and also code to try the newer 
virDomainGetBlockInfo() call as a fallback should virStorageLookupByPath() 
fail.  This call only came in with libvirt version 0.8.1.  Are you running 
something newer than that?  (see /usr/include/libvirt/libvirt.h)

If this works,  we should make a new release of hsflowd,  so please let me know 
how it goes.

Regards,
Neil



On Aug 29, 2011, at 7:24 AM, Emanuele Verga wrote:

 Hi, I downloaded and installe hsflowd trunk revision 226 but using hsflowd  I 
 keep seeing  virStorageLookupByPath errors, and VM disk statistics aren't 
 displayed. Do I need to tell hsflowd explicitly to use target=vda call? If 
 yes, how?
 
 Thanks in advance,
 Emanuele
 
 2011/8/25 Emanuele Verga verga.emanu...@gmail.com
 Hi Neil,
 
 Yes that's possible, the problem is Nova places each image in a separate 
 folder (/var/lib/nova/instance/INSTANCENAME/), so we would have to create a 
 new pool with the corresponding path each time a new instance is created, and 
 if we start to add more servers it quicly becomes impractical. 
 
 I've not yet been able to try the hsflowd version you suggested, I'll test it 
 tomorrow and let you know.
 
 Thanks for the help!
 Emanuele
 
 

--
Special Offer -- Download ArcSight Logger for FREE!
Finally, a world-class log management solution at an even better 
price-free! And you'll get a free Love Thy Logs t-shirt when you
download Logger. Secure your free ArcSight Logger TODAY!
http://p.sf.net/sfu/arcsisghtdev2dev___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] Problem displaying Virtual Machine data with hsflowd and ganglia 3.2.0 in an Openstack Compute node.

2011-08-23 Thread Neil Mckee

On Aug 18, 2011, at 1:35 AM, Emanuele Verga wrote:

 Ok, I tried linking one of the disk files to the default storage pool folder 
 and it actually detected the linked volume in libvirt:
 
 After issuing a virsh pool-refresh default the disk was correctly detected 
 and reported as a volume by virsh vol-list default, but there is a problem:
 
 The disk is added as a volume in libvirth with the path parameter 
 corresponding to the soft link ( for example: 
 /var/lib/libvirt/images/instance.002d_disk) instead of the disk path ( 
 /var/lib/nova/instances/instance-002d/disk ), this means that disk 
 lookups performed by hsflowd still fail, because they try to retrieve volume 
 information associated to disk path. (example: 
 virStorageLookupByPath(/var/lib/nova/instances/instance-002d/disk) ).
 
 Hsflowd gets path information from the Virtual Machine XML definition (virsh 
 dumpxml instance.002d shows, among many other details, the following 
 line:   source file='/var/lib/nova/instances/instance-002d/disk'/), 
 that is generated and stored automatically by openstack into libvirt.xml (ex: 
 /var/lib/nova/instances/instance-002d/libvirt.xml).
 
 So, to make it work this way, we should have a way to tell Nova/Openstack  
 which path to look into to retrieve VM disks, and to create the related soft 
 links into the appropriate folder, when provisioning a new instance.
 
 
 I also tried using virt.manager to create the pool but it didn't work, the 
 pool was created but no disk was detected, i suppose because libvirt expect 
 volumes to be located right into the folder specified as pool, and doesn't 
 look in any subfolder (creating the pool manually didn't work for that 
 reason).

So,  was it not possible to specify the pool directory to be the one where the 
disk image was actually residing?  That worked for me when I tried it here.   I 
have a disk image:

/root/not_libvirt_images/test-pool2.img

and I was able to add it to a new storage pool called alternative using 
virt-manager.   So in virsh I can ask for pool-dumpxml alternative,  like 
this:

virsh # pool-dumpxml alternative
pool type='dir'
  namealternative/name
  uuid51341d87-d87e-f7ce-6cc0-81ac3967c182/uuid
  capacity233620566016/capacity
  allocation37986648064/allocation
  available195633917952/available
  source
  /source
  target
path/root/not_libvirt_images/path
permissions
  mode0700/mode
  owner0/owner
  group0/group
/permissions
  /target
/pool


 
 Something I noticed that may be important is that virt-manager is able to 
 display disk statistics for those VM. I don't really know how it gets those 
 informations, but I believe it accesses disks using the  details contained 
 into disk /disk tags located into the VM XML definition, instead of doing 
 a volume lookup starting from the volume path, like hsflowd is trying to do.
 
 Example:
 disk type='file' device='disk'
   driver name='qemu' type='qcow2'/
   source file='/var/lib/nova/instances/INSTANCENAME/disk'/
   target dev='vda' bus='virtio'/
   alias name='virtio-disk0'/
   address type='pci' domain='0x' bus='0x00' slot='0x04' 
 function='0x0'/
 /disk
 
 Could this be used in some way?

There is a libvirt call that takes the target=vda device name and returns 
reads/writes counters.  This was just added to hsflowd.  However it's not 
released yet,   so you'd have to check out the trunk using subversion and 
build from that:

svn co https://host-sflow.svn.sourceforge.net/svnroot/host-sflow/trunk 
host-sflow-trunk

I'm not sure if the same handle can be used to retrieve the capacity, 
allocation and available numbers,  though.  (If there is a libvirt expert 
on the list,  please jump in and set us straight.)

Neil

--
Get a FREE DOWNLOAD! and learn more about uberSVN rich system, 
user administration capabilities and model configuration. Take 
the hassle out of deploying and managing Subversion and the 
tools developers use with it. http://p.sf.net/sfu/wandisco-d2d-2
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] Problem displaying Virtual Machine data with hsflowd and ganglia 3.2.0 in an Openstack Compute node.

2011-08-17 Thread Neil Mckee
I don't think the soft-links will work,  but try this:

1) go back to compiling hsflowd for libvirt.
2) using virt-manager or equivalent,  tell it about the storage pool of type 
filesystem directory at /var/lib/nova/instances.

See details here:
http://virt-manager.org/page/StorageManagement
(To get to this virt-manager screen you need to select EditHost Details)

Now you should see it with virsh pool-list.  If you called it nova then 
virsh pool-info nova will show something like this:

virsh # pool-info nova
Name:   noca
UUID:   51341d87-d87e-f7ce-6cc0-81ac3967c182
State:  running
Capacity:   217.58 GB
Allocation: 35.38 GB
Available:  182.20 GB

hsflowd should then be able to pull those Capacity, Allocation and Available 
values and send them to Ganglia.

Neil
 
P.S.  It looks as though hsflowd is not filling in the disk reads/writes/errors 
counters for KVM VMs yet.  If you know of an efficient way to do that,  please 
suggest it on the hsflowd mailing list:
https://lists.sourceforge.net/lists/listinfo/host-sflow-discuss


On Aug 17, 2011, at 7:16 AM, santosh gangwani wrote:

 try having soft links using ln command, may or may not work though :)
 ln -s
 
 On Wed, Aug 17, 2011 at 4:17 PM, Emanuele Verga
 verga.emanu...@gmail.com wrote:
 Hi Neil,
 thanks for the suggestion.
 I tried to do it. After recompiling hsflowd i checked the files it had open
 and it showed:
 hsflowd   30019   nobody  mem   REG  251,3   84728
   927248 /usr/lib/libxenctrl.so.3.2.0
 hsflowd   30019   nobody  mem   REG  251,3   22928
   927250 /usr/lib/libxenstore.so.3.0.0
 but this solution didn't work.
 Virtual machines stopped showing up in ganglia and hsflowd gave this error:
 ERROR Internal error: Could not obtain handle on privileged command
 interface (2 = No such file or directory)
 xc_interface_open() failed : No such file or directory
 then it continued to work normally  (hsflowd log file:
 http://uploading.com/files/get/a3876c8b/ ) but it didn't report anything to
 ganglia and the VMS were shown as down, the reporting for the physical host
 instead it's working perfectly, as before.
 If it can help you, the implementation of Openstack we have uses KVM to
 virtualize the hosts. Could it be related?
 Thanks again,
 Emanuele
 2011/8/16 Neil Mckee neil.mckee...@gmail.com
 
 Hello,
 On an OpenStack node you may be able to use libxenstore instead of
 libvirt.  You'll need to recompile hsflowd to try this.  Looking at
 trunk/src/Linux/Makefile it appears to look for libvirt first,  but you can
 override that by compiling hsflowd like this:
 make clean
 make LIBVIRT=no
 The Makefile will then test for libxenstore and libxenctrl.  If it finds
 them it will compile with -DHSF_XEN (instead of -DHSF_VRT),  and you may get
 better results.  Please let me know what happens.
 Neil
 
 
 On Aug 16, 2011, at 2:14 AM, Emanuele Verga wrote:
 
 Hi,
 we have a problem with the following installation:
 we have a system that’s a compute node in an Openstack test installation.
 Now on this machine we decided to install Ganglia, to check it’s
 monitoring capabilities regarding virtual machines hosted on that node by
 Openstack.
 We then proceeded to add the repository for ganglia version 3.2 and
 install:
 Hsflowd 1.18
 Ganglia Monitor Demon  3.2.0.0
 Ganglia Meta demon  3.2.0.0
 Ganglia Web FronEnd3.2.0.0
 Dwoo1.1.1
 all on the same machine and to configure ports accordingly.
 All said and done, the web frontend shown  the physical host and all of
 the VM, but we were unable to:
 See the hypervisor section in the physical host statistics. It simply is
 not there.
 See the graphical preview for VM statistics
 (http://imageshack.us/photo/my-images/585/screenshotgangliacomput.png/). The
 thumbnail are missing but the links  do work, clicking on one of the missing
 thumbnails you are taken to the details page for that VM.
 See details regarding VM hard disk and I/O.
 (http://imageshack.us/photo/my-images/694/screenshotgangliainstan.png/)
 Debugging hsflowd we found errors similar to the following:
 Aug 16 06:26:57 eta hsflowd:
 virStorageLookupByPath(/var/lib/nova/instances/instance-004d/disk.local)
 failed
 Aug 16 06:26:57 eta hsflowd:
 virStorageLookupByPath(/var/lib/nova/instances/instance-004d/disk)
 failed
 The strange thing is, the path is correct.
 We checked libvirt using virsh  and found that the Storage Volumes are not
 reported  by libvirt, it seems because libvirt by default search
 informations in path /var/lib/libvirt/images,  instead nova places them
 inside /var/lib/nova/instances/INSTANCENAME/.
 Did you have the same problems when testing for Sflow/ Openstack ? How did
 you manage to resolve it?
 Any help is appreciated.
 Thanks in advance for your support

Re: [Ganglia-general] Problem displaying Virtual Machine data with hsflowd and ganglia 3.2.0 in an Openstack Compute node.

2011-08-16 Thread Neil Mckee
Hello,

On an OpenStack node you may be able to use libxenstore instead of libvirt.  
You'll need to recompile hsflowd to try this.  Looking at 
trunk/src/Linux/Makefile it appears to look for libvirt first,  but you can 
override that by compiling hsflowd like this:

make clean
make LIBVIRT=no

The Makefile will then test for libxenstore and libxenctrl.  If it finds them 
it will compile with -DHSF_XEN (instead of -DHSF_VRT),  and you may get better 
results.  Please let me know what happens.

Neil



On Aug 16, 2011, at 2:14 AM, Emanuele Verga wrote:

 Hi,
 
 we have a problem with the following installation:
 
 we have a system that’s a compute node in an Openstack test installation.
 
 Now on this machine we decided to install Ganglia, to check it’s monitoring 
 capabilities regarding virtual machines hosted on that node by Openstack.
 We then proceeded to add the repository for ganglia version 3.2 and install:
 
 Hsflowd 1.18
 Ganglia Monitor Demon  3.2.0.0
 Ganglia Meta demon  3.2.0.0
 Ganglia Web FronEnd3.2.0.0
 Dwoo1.1.1 
 
 all on the same machine and to configure ports accordingly.
 
 All said and done, the web frontend shown  the physical host and all of the 
 VM, but we were unable to:
 
 See the hypervisor section in the physical host statistics. It simply is not 
 there.
 See the graphical preview for VM statistics 
 (http://imageshack.us/photo/my-images/585/screenshotgangliacomput.png/). The 
 thumbnail are missing but the links  do work, clicking on one of the missing 
 thumbnails you are taken to the details page for that VM.
 
 See details regarding VM hard disk and I/O. 
 (http://imageshack.us/photo/my-images/694/screenshotgangliainstan.png/)
 
 Debugging hsflowd we found errors similar to the following:
 Aug 16 06:26:57 eta hsflowd: 
 virStorageLookupByPath(/var/lib/nova/instances/instance-004d/disk.local) 
 failed
 Aug 16 06:26:57 eta hsflowd: 
 virStorageLookupByPath(/var/lib/nova/instances/instance-004d/disk) failed
 The strange thing is, the path is correct.
 
 We checked libvirt using virsh  and found that the Storage Volumes are not 
 reported  by libvirt, it seems because libvirt by default search informations 
 in path /var/lib/libvirt/images,  instead nova places them inside 
 /var/lib/nova/instances/INSTANCENAME/.
 
 Did you have the same problems when testing for Sflow/ Openstack ? How did 
 you manage to resolve it?
 
 Any help is appreciated.
 Thanks in advance for your support!
 
 --
 uberSVN's rich system and user administration capabilities and model 
 configuration take the hassle out of deploying and managing Subversion and 
 the tools developers use with it. Learn more about uberSVN and get a free 
 download at:  http://p.sf.net/sfu/wandisco-dev2dev
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general

--
Get a FREE DOWNLOAD! and learn more about uberSVN rich system, 
user administration capabilities and model configuration. Take 
the hassle out of deploying and managing Subversion and the 
tools developers use with it. http://p.sf.net/sfu/wandisco-d2d-2
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] missing many samples with host-sflow...

2011-07-22 Thread Neil Mckee
500 nodes sending sFlow-HOST data is probably only about 25 packets/sec,  so 
the issue here is unlikely to be a performance bottleneck in terms of CPU, 
network bandwidth,  UDP buffers etc.

Right now the most likely explanation seems to be some race-condition over how 
long before gmond considers the data to be stale.  In the function sflow.c: 
process_sflow_gmetric() we have this:

  gfull-metric.tmax = 60; /* (secs) poll if it changes faster than this */
  gfull-metric.dmax = 0; /* (secs) how long before stale? */

I was under the impression that setting dmax to 0 is supposed to mean that 
the data does not expire at all,  but maybe this assumption is wrong?

Please confirm that you are running hsflowd with a polling-interval set to 30 
seconds or less,  and please confirm that the CPU is not busy.

The other step we could take is to log the values of lostDatagrams and 
lostSamples when the debug level is set on the command line (these counters 
that are maintained within sflow.c but not logged at the moment).  That would 
help to confirm or deny if there is any bottleneck in the front end.  The gmond 
process blocks while the XML data is being extracted.   So if you were 
extracting the XML data over a slow link to a slow device and it took a number 
of seconds to transfer,  then you might conceivably lose packets due to the UDP 
input buffer overflowing during that time.  If that is happening it will show 
up in the lostDatagrams counter.   The workaround might just be to ioctl() the 
input socket buffer to a bigger size.   I've seen this bumped up from about 
130K to over 2MB before,   so that would buy more time without having to do 
anything more elaborate.

Regards,
Neil


On Jul 21, 2011, at 12:32 PM, Robert Jordan wrote:

 I have a cluster with approximately 500 nodes reporting via host-sflow to a 
 single gmond.  In the past few days my graphs have started to look like 
 dotted lines and most of the time ganglia reports all of the nodes as down.  
 Has anyone seen similar issues? 
 --
 5 Ways to Improve  Secure Unified Communications
 Unified Communications promises greater efficiencies for business. UC can 
 improve internal communications as well as offer faster, more efficient ways
 to interact with customers and streamline customer service. Learn more!
 http://www.accelacomm.com/jaw/sfnl/114/51426253/___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general


--
10 Tips for Better Web Security
Learn 10 ways to better secure your business today. Topics covered include:
Web security, SSL, hacker attacks  Denial of Service (DoS), private keys,
security Microsoft Exchange, secure Instant Messaging, and much more.
http://www.accelacomm.com/jaw/sfnl/114/51426210/
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] missing many samples with host-sflow...

2011-07-22 Thread Neil Mckee
Upon investigation we found that a handful of the nodes were sending with 
sFlow-agent-address == 0.0.0.0.   These nodes boot using DHCP so this may be a 
race where the hsflowd daemon starts before the IP address has been learned.   
The fix will be to make hsflowd wait until it has a current IP address before 
sending (and check for changes periodically).  And at the gmond end,  we should 
probably add a check to ignore any datagrams that have 
sFlow-agent-address==0.0.0.0.

Because multiple nodes were sending with the same agent address the affect was 
to alias their data together so that it looked like successive readings from 
the same node.  Most of the time the resulting sequence number deltas were such 
that the data was being ignored anyway,  but as clocks drift over time it's 
possible that some readings would get through and result in astronomically high 
deltas being recorded.If that happened and these large deltas were enough 
to trip a sanity-check somewhere further on (perhaps in gmetad),  then that 
could explain how the gaps appeared in the chart for the whole cluster.

Neil



On Jul 22, 2011, at 1:06 PM, Neil Mckee wrote:

 500 nodes sending sFlow-HOST data is probably only about 25 packets/sec,  so 
 the issue here is unlikely to be a performance bottleneck in terms of CPU, 
 network bandwidth,  UDP buffers etc.
 
 Right now the most likely explanation seems to be some race-condition over 
 how long before gmond considers the data to be stale.  In the function 
 sflow.c: process_sflow_gmetric() we have this:
 
  gfull-metric.tmax = 60; /* (secs) poll if it changes faster than this */
  gfull-metric.dmax = 0; /* (secs) how long before stale? */
 
 I was under the impression that setting dmax to 0 is supposed to mean that 
 the data does not expire at all,  but maybe this assumption is wrong?
 
 Please confirm that you are running hsflowd with a polling-interval set to 30 
 seconds or less,  and please confirm that the CPU is not busy.
 
 The other step we could take is to log the values of lostDatagrams and 
 lostSamples when the debug level is set on the command line (these counters 
 that are maintained within sflow.c but not logged at the moment).  That would 
 help to confirm or deny if there is any bottleneck in the front end.  The 
 gmond process blocks while the XML data is being extracted.   So if you were 
 extracting the XML data over a slow link to a slow device and it took a 
 number of seconds to transfer,  then you might conceivably lose packets due 
 to the UDP input buffer overflowing during that time.  If that is happening 
 it will show up in the lostDatagrams counter.   The workaround might just be 
 to ioctl() the input socket buffer to a bigger size.   I've seen this bumped 
 up from about 130K to over 2MB before,   so that would buy more time without 
 having to do anything more elaborate.
 
 Regards,
 Neil
 
 
 On Jul 21, 2011, at 12:32 PM, Robert Jordan wrote:
 
 I have a cluster with approximately 500 nodes reporting via host-sflow to a 
 single gmond.  In the past few days my graphs have started to look like 
 dotted lines and most of the time ganglia reports all of the nodes as down.  
 Has anyone seen similar issues? 
 --
 5 Ways to Improve  Secure Unified Communications
 Unified Communications promises greater efficiencies for business. UC can 
 improve internal communications as well as offer faster, more efficient ways
 to interact with customers and streamline customer service. Learn more!
 http://www.accelacomm.com/jaw/sfnl/114/51426253/___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 


--
Storage Efficiency Calculator
This modeling tool is based on patent-pending intellectual property that
has been used successfully in hundreds of IBM storage optimization engage-
ments, worldwide.  Store less, Store more with what you own, Move data to 
the right place. Try It Now! http://www.accelacomm.com/jaw/sfnl/114/51427378/
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] Network bytes spikes

2011-03-31 Thread Neil Mckee
I checked the sFlow feed,  and it looks like the sanity checks for 32-bit 
rollover and impossible-counter-delta are already present in the hsflowd code 
(host-sflow.sourceforge.net  src/Linux/readNioCounters.c).  At least for the 
Linux and FreeBSD ports anyway.  We should add those checks to the Windows 
port.  Always better to clean things up at the source if you can.

That makes it less urgent to add the same sanity checks at the receiver end 
(monitor-core/gmond/sflow.c).   Sanity checks in too many places could cause 
headaches down the line (e.g when we all have 10Tbps links).

I apologize if this is too much information about a feature that is only 
available if you compile the Ganglia trunk from sources,   but for the record:

(1). The 32-bit rollover problem is handled in hsflowd by polling faster 
internally (every 3 seconds).  This accumulates 64-bit versions of the counters 
which are then pushed out at the normal polling frequency (typically 20 
seconds).   If the code detects that the kernel counters are already 64-bit,  
then it turns off the 3-second polling.

(2). The impossible-counter-delta sanity checks in hsflowd depend on whether 
the field is 32-bit or 64-bit.   The upper limit for a 32-bit counter delta is 
0x7FFF (about 2e9) and for a 64-bit counter it is 1e13.  These checks are 
applied to the frames and bytes counters,  but if either check fails then the 
sequence number is reset for the whole counter-block -- which invalidates all 
the counter-deltas for that polling-interval.  In other words,  if the bytes_in 
counter jumps crazily then we won't believe the frames, errors or drops 
counters either.

looking at libmetrics/linux/metrics.c,  it does seem that compiling with 
-DREMOVE_BOGUS_SPIKES will do more or less the same as (2).

Neil




On Mar 30, 2011, at 5:56 PM, Bernard Li wrote:

 Hi all:
 
 On Tue, Mar 29, 2011 at 11:30 AM, Vladimir Vuksan vli...@veus.hr wrote:
 
 I see it all the time :-(. According to Bernard this is due to problem
 with some of the Broadcom cards. Perhaps Bernard can offer more insight.
 
 Some old threads which describe the issue in more detail:
 
 http://www.mail-archive.com/ganglia-general@lists.sourceforge.net/msg04463.html
 http://www.mail-archive.com/ganglia-general@lists.sourceforge.net/msg04245.html
 
 I see two solutions to this problem:
 
 1) If this is indeed a driver issue, we should check to see if newer
 kernels can fix that.  Perhaps Vladimir could look into this
 
 2) It would probably be a good thing to implement sanity check.  I
 think Neil is looking into implementing this for the sflow
 integration.  Perhaps this could be extended for gmond data as well.
 
 To help resolve this issue, I would suggestion that we:
 
 1) File a bug at bugzilla.ganglia.info
 2) For all those affected, add comments to the bug providing the
 network driver model, module used, kernel version, OS version etc.
 
 Thanks!
 
 Bernard
 
 --
 Create and publish websites with WebMatrix
 Use the most popular FREE web apps or write code yourself; 
 WebMatrix provides all the features you need to develop and 
 publish your website. http://p.sf.net/sfu/ms-webmatrix-sf
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general


--
Create and publish websites with WebMatrix
Use the most popular FREE web apps or write code yourself; 
WebMatrix provides all the features you need to develop and 
publish your website. http://p.sf.net/sfu/ms-webmatrix-sf
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


[Ganglia-general] Ganglia and sFlow at Supercomputing 2010 in New Orleans

2010-10-27 Thread neil mckee
Hello all,

Exhibiting at the Supercomputing 2010 show in New Orleans?  Setting up a
demo cluster?

We are running a monitoring server in the SCinet NOC which is configured to
receive sFlow from the show network.  Selected pages will be shown on big
screens all around the show floor and linked from the conference website.
The server is running the latest gmond+Ganglia which accepts sFlow input.
That means you can install the lightweight hsflowd daemon on your servers
and have your cluster appear on the display too:

http://host-sflow.sourceforge.net

The server is already up and running:

http://inmon.sc10.org

I appreciate that getting a demo running on a trade-show booth can be tough
enough,  but I think you'll find this part is really easy to set up so it
won't hold you back or slow you down if you decide to give it a try.  You
might be able to justify it a) because it's good publicity,  or b) because
it's just fun(!)   Either way,  please contact me at neil.mc...@inmon.com or
come and ask for me at the NOC when you get there.

Neil
--
Nokia and ATT present the 2010 Calling All Innovators-North America contest
Create new apps  games for the Nokia N8 for consumers in  U.S. and Canada
$10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing
Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store 
http://p.sf.net/sfu/nokia-dev2dev___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general