[Ganglia-general] multiple gmetads polling single gmond
Hi, I have a rather large set of machines I have ganglia watch (~6000), and am trying to build out a resilient infrastructure. I ran into an interesting problem. I am using gmond version 3.0.2.200511011714 (as reported by --version) Basic layout - each location (~2000 machines) has a pair of hosts to which they send their metrics (unicast). There are a pair of machines that connect to gmond on each of the edge collectors and centralize the data (they connect via TCP to port 8649). We also have another pair of machines that connect to each edge gmond and grab the current XML dump for integration with Nagios (the script is called parse_ganglia for future reference). This worked nicely for quite a while, until one of our edge hosts got too many reportees. There was a connection timeout in parse_ganglia of 5 seconds, so that when one of the edge hosts was down it would move on to the other edge hosts quickly rather than waiting 60s for the down host. When one of the hosts got too many reportees, it started to take ~6s to transfer all the data. At this point, one or the other of the pair of hosts running parse_ganglia started failing on the edge host that had too many reportees. Using tcpdump, I found that though gmond was accepting the connection from both of them, it would only send data to one at a time, and it complete sending data to the first before moving on to the second. so: * host a connects * host a starts getting data * host b connects (3-way handshake complete) but no data flows * host a finishes sending data * host b starts getting data * host b finishes getting data We solved the immediate problem by increasing the timeout from 5 to 15s., but I was a little surprised that gmond behaved in this seemingly-single-threaded manner. While it's easy for us to adjust the timeout in our python parse_ganglia, it is not so easy to poke at gmetad, and I am worried about what will happen when we have variations in network quality, more hosts requesting metrics, etc. Is it true that gmond is single threaded in its network operations? Or maybe just the listener? What other effects might this have? Would it make sense to change gmond so it passes off dumping the XML feed to a child thread so that multiple simultaneous connections can be handled? Thanks for your time, -ben -- Ben Hartshorne email: [EMAIL PROTECTED] http://ben.hartshorne.net signature.asc Description: Digital signature - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Block device I/O bandwidth
On Mon, Mar 24, 2008 at 03:07:37PM -0700, Bernard Li wrote: Hi guys: I am curious as to what folks usually do to measure block device I/O bandwidth (MB/s) with their Ganglia installation. Talking specifically about disk I/O, do you guys usually just use the output of iostat -k or something like that? confirming the response from many people on this list, I use iostat -x as well. While the gmetric plugin library was down (is it back up?) I created http://ben.hartshorne.net/ganglia/ which includes two crontab-ready shell scripts to grab different bits of data from iostat and stuff them into ganglia. (disk_gmetric.sh and disk_wait_gmetric.sh) enjoy, -ben -- Ben Hartshorne email: [EMAIL PROTECTED] http://ben.hartshorne.net signature.asc Description: Digital signature - Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
[Ganglia-general] host spoofing, gmetric, and gmond
Hi All, I'm trying to get ganglia metrics from some hosts that are hiding behind a NAT box. Obviously, since ganglia identifies the sender using reverse DNS on the sending host, this does not work. I have read about the host spoofing patch, and have three questions: * does it actually spoof the IP address in the IP header, or does it insert some extra information into the XML stream saying 'hey, I'm spoofing this other computer, ignore my actual IP address'? [1] * does the spoofing only work in gmetric, or is there a way to ask gmond to spoof addresses using the same logic? * Is there some reason it would be a bad idea to have *every* reporting host spoof their own IP address? Is there a big performance hit or anything? Because I'd almost rather just have every host report who they are in the stream and then I don't need to worry about the network layout nearly so much. Thanks, -ben [1] If the former, it will get squished by the NAT and won't work. If the latter, it will get through the NAT and all will be well. I'm guessing it's the latter, because otherwise you wouldn't be able to use TCP to send the information (since the handshake would never complete). -- Ben Hartshorne email: [EMAIL PROTECTED] http://ben.hartshorne.net signature.asc Description: Digital signature - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] host spoofing, gmetric, and gmond
On Thu, Jan 24, 2008 at 11:01:35AM -0800, Matthias Blankenhaus wrote: I'm trying to get ganglia metrics from some hosts that are hiding behind a NAT box. Obviously, since ganglia identifies the sender using reverse DNS on the sending host, this does not work. Wrt to your NAT problem: I don't really see your problem. Can't you have one gmond behind the NAT that all other machines behind that NAT point to? Though I agree that would be the best solution, due to the network architecture I can't add any hosts to the area behind the NAT, nor can I add any load to the existing hosts. :( -ben -- Ben Hartshorne email: [EMAIL PROTECTED] http://ben.hartshorne.net signature.asc Description: Digital signature - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] RRDs in memory
On Wed, Jul 11, 2007 at 03:34:41PM -0400, Ofer Inbar wrote: gmetad is very write-intensive, because it updates hundreds of RRD files about every minute or two. Has anyone tried running it with the rrd directory on a RAM disk (tmpfs) ? I'll toss in my $.02 here as well, though many people have already said the same thing. I created a ramdisk when my cluster grew beyond ~50 nodes (I report a lot of extra statistics). I use an actual ramdisk instead of tmpfs (though I chose it out of ignorance when I first set it up, wikipedia[*] says that tmpfs might swap to disk whereas ramfs is just straight up in memory, nothing fancy). Instead of reconfiguring ganglia to keep the repositories in /mnt/ram0/rrds or mounting the ramdisk in /var/lib/ganglia/rrds, I mounted the ramdisk in /mnt/ram0 and made /var/lib/ganglia/rrds a symlink to /mnt/ram0/rrds. Just my preference... I wrote a new script to drop in /etc/init.d/ called, inventively enough, setup_gmetad_ramdisk, which starts before gmetad and stops after it. It creates the ramdisk, formats it, and copies over the backed up rrds. When stopped, it backs up the rrds. Theoretically, this should make system bootup and shutdown work the same as though it were on disk. Unfortunately, I am missing some part of installing the stop script correctly (in the right runlevel or something) so it doesn't actually work on shutdown. :( I imagine the fix is pretty simple, but I havn't bothered yet. I had to edit grub.conf to adjust the size of the ramdisk. By default they're 64MB, but with an argument to the kernel start line, you can set it to whatever size you need. I chose 4x the current RRD directory, to accomodate new hosts and more metrics. It is unfortunate that a reboot is required to change the size of the ramdisk. I also set up a cronjob to backup the rrds themselves every hour, but unlike the folks so far, instead of rsyncing or keeping just one copy, I keep 8 days worth of hourly snapshots, so that if something goes wrong, I can get back to a healthy snapshot. (Note - I have never actually used any snapshot further back than the most recent... ;) (Note2 - the first version of this used 'find' to get anything 8d old, and it started really tearing up the disk as the number of hosts/metrics grew. Now I use perl to create the timestamp from 8 days ago and just rm the directory. This will fail if the host is down for more than an hour, but that's OK by me.) The backup cronjob and new ramdisk start script are all available off my website http://ben.hartshorne.net/ganglia/ -ben [*] http://en.wikipedia.org/wiki/TMPFS -- Ben Hartshorne email: [EMAIL PROTECTED] http://ben.hartshorne.net signature.asc Description: Digital signature - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Using cluster name to differentiate clusters?
On Mon, Feb 26, 2007 at 01:06:23PM -0600, Seth Graham wrote: Ben Hartshorne wrote: It seems to me that using the name to determine cluster membership would simplify things for the people configuring ganglia. It would, but when you have 3000+ machines all chattering on the same port that's a lot of data for a machine to deal with. Not only do the aggregating machines have to hold it all in memory, but the gmetad host has to dump all that info into the rrds. Isn't the machine going to have to handle exactly the same amount of data, regardless of whether its on one port or two? I would imagine that by the time your network got to 3000+ hosts, things would be segregated in their own right, independent of ganglia. Such segregation would make it easy (and more logical) to use head nodes as aggregators and then pass data up the tree to your main web interface. Multicast networks can be broken up by subnet or VLAN, and the unicast nodes can use ganglia's ability to only pass on summary info, etc. Of course, I have not had the privilege of working with a cluster of that size. I've only got just over 100 hosts, so please forgive anything that will become obvious as soon as I actually have to deal with the problem... ;) -ben -- Ben Hartshorne email: [EMAIL PROTECTED] http://ben.hartshorne.net signature.asc Description: Digital signature
Re: [Ganglia-general] Ganglia web site makeover
On Thu, Dec 28, 2006 at 02:40:52PM -0800, Peter Mui wrote: Hi All (at ganglia-general): I've been talking to Matt Massie about re-doing the Ganglia website at http://ganglia.info/ We're open to any or all ideas at this point so feel free to chime in with whatever comes to mind. I actually like most aspects of the ganglia website. Even so, feeling the need to chime in, I give these three requests: * take the Main Menu to a top bar, and make it consistent across all sections of the website. My biggest website gripe is consistency - the entire site should have a consistent look-and-feel. A global menu is one step towards this goal. * Fix the gmetric repository. There are a whole bunch of neat gmetrics floating around, and several different versions of metrics that do basically the same thing. I even went so far as to start my own gmetric page because I didn't like most of the metrics there. Modeling the gmetrics section after the Firefox extensions page would be awesome - allow ratings to float the best to the top, but also categorize so alternatives within a particular metric (say, taking detailed disk metrics) are available. * Fix the Documentation section and add a FAQ, wiki style. There's a lot of good info out there, but it's hard to get to. A nice section of a wikiable FAQ would be complete sample configuraitons (actually taken from practice that *really* work, rather than ways you should be able to set it up... ;) I realize that some ofthese things are rather grandiose in scope and I hides face havn't actually offered my help.../hides face. What can I say. Here's hoping for the best. :) -ben -- Ben Hartshorne email: [EMAIL PROTECTED] http://ben.hartshorne.net signature.asc Description: Digital signature
Re: [Ganglia-general] Q: IO metrics in Ganglia
On Sun, Nov 26, 2006 at 11:38:46AM +0200, Vitaly Karasik wrote: I noticed that there is no Ganglia disk I/O metrics for Linux and MS Windows platforms? Can you recommend me some tools/plugins for collecting IO metrics? (except of writing a custom scripts around iostat) I did write a custom script to surround iostat because I wanted some additional metrics than what is presented by default. Specifically, I was interested in the %util metric from iostat. You can find that script at http://ben.hartshorne.net/ganglia/ to use or modify to suit your environment. -ben -- Ben Hartshorne email: [EMAIL PROTECTED] http://ben.hartshorne.net signature.asc Description: Digital signature
Re: [Ganglia-general] Q: is it possible to see a specific day for example, from last week
On Sun, Nov 12, 2006 at 03:23:34PM +0200, Vitaly Karasik wrote: Is there some Ganglia version (beta/patched) which allow me to see a graphs for specific day from a last week, for example? Vitaly, A while ago (May 8th, 2006), there was a thread in which a user offered patches to ganglia to provide this functionality. For help searching through the archives, the subject line was Graph templates (custom graphs) Those patches are available at http://wtf.ath.cx/screenshots.html. I had some trouble getting them to work (had to upgrade RRDTool), but I eventually did and have found the extra functionality useful on occasion. Of course, you must remember that the RRD decreases resolution over time, and so depending on how far back you look, the graphs become less and less useful. But that's just someting we accept using RRD as the backend storage for Ganglia. -ben -- Ben Hartshorne email: [EMAIL PROTECTED] http://ben.hartshorne.net signature.asc Description: Digital signature
Re: [Ganglia-general] problem about gmond,help me~~~
On Mon, Nov 06, 2006 at 09:50:41AM -0500, Rick Mohr wrote: but when I configured this node's udp_send_channel to itself. it works OK.But if that, I can only get one node infomation. and one issue confused me is that why the two node can only send udp_send_channel to itself? If I changed the udp_send_channel to other node,it does.t work.why this happened? I hope you've already checked this - do you have a firewall enabled on either host blocking the packets? in the output of '/sbin/iptables -nL', you should see something like this: Chain INPUT (policy ACCEPT) target prot opt source destination ... excerpted ACCEPT udp -- 0.0.0.0/00.0.0.0/0 state NEW udp dpt:8649 If you don't have a rule accepting ganglia data (or your firewall is completely open), the traffic will be blocked. -ben -- Ben Hartshorne email: [EMAIL PROTECTED] http://ben.hartshorne.net signature.asc Description: Digital signature
Re: [Ganglia-general] about unicast configuration
On Fri, Nov 03, 2006 at 09:32:34AM -0800, khaja mohideen wrote: Hi, I have installed configured Ganglia to monitor my cluster env. My Env. consists of about 400 systems. Currently i have configured with default multicast support. I am in need of unicast support to reduce the multicast traffice also to get gmond stats from other subnets. I tried with docs Could any one help in giving a simple config example for one to one unicast configuration. Khaja, In my environment (unicast, multiple subnets, one agregator host): in /etc/gmond.conf udp_send_channel { host = 10.20.30.40 port = 8649 } udp_recv_channel { port = 8649 } (other non-unicast portions of the config file deleted) in /etc/gmetad.conf data_source mycluster localhost eof -ben -- Ben Hartshorne email: [EMAIL PROTECTED] http://ben.hartshorne.net signature.asc Description: Digital signature
Re: [Ganglia-general] Problem with metrics
On Wed, Sep 20, 2006 at 08:35:19PM +0300, Alex Balk wrote: Hi, As far as I recall, the web frontend's code creates a list available metrics from the first node in the list. If this node doesn't have any gmetrics, then only the builtin metrics will appear in the menu. Once you choose one of the metrics, the nodes are sorted based on it (in your case, the first node will now be the one with the highest count on bytes_out). I suspect that this node doesn't have any gmetrics. Can you confirm/dispute this? This is the default behavior. Someone posted a patch to this list a while ago to change it so that it displays all metrics. I'm continuing my trend of being really helpful by not knowing when it was or who sent it. :) But it's there! -ben -- Ben Hartshorne email: [EMAIL PROTECTED] http://ben.hartshorne.net signature.asc Description: Digital signature
[Ganglia-general] windows CPU I/O wait metric
Hi, I've installed the windows version of gmond and it constantly reports about 70% I/O wait in the CPU report. The server is not actually experiencing very much disk activity. I remember a bug from several versions ago that was similar (though I don't remember the details). Is this the same thing? Any ETA on a windows build for a more recent version? Thanks, -ben -- Ben Hartshorne email: [EMAIL PROTECTED] http://ben.hartshorne.net signature.asc Description: Digital signature
Re: [Ganglia-general] Obtaining Immediate Interval Data From Ganglia
On Tue, Aug 08, 2006 at 04:22:41PM +0100, Ian Wootten wrote: I am facing a problem in that I would like short-segment up to date information from ganglia in order to monitor services after invocation. One method I have heard of that achieves something similar; write a separate module that interprets the XML feed directly. This would allow you to completely control the resolution and time frame for the data you need. In the implementation I heard, this module was called by Nagios and would send out an alert if it sensed a problem. I was actually quite impressed because it means that Nagios doesn't need to run 80 bajillion processes to monitor many many hosts. Instead, it just listents to the XML stream from ganglia and notices when a host drops off the map or a metric goes out of where it's supposed to be. -ben -- Ben Hartshorne email: [EMAIL PROTECTED] http://ben.hartshorne.net signature.asc Description: Digital signature
Re: [Ganglia-general] install and configure ganglia
Toney, Have you verified that rrdtool itself works? It may not be a problem with ganglia you're looking at but a problem with rrdtool itself. I'm not sure what might show up in the apache logs, but that may be a good place to look as well. You should be able to find your rrd databases for ganglia on the node running gmetad (which may be all the nodes). My installation puts them in /var/lib/ganglia/rrds, but I'm not sure if that's the default location. Assuming they are there... bash$ cd /var/lib/ganglia/rrds/__SummaryInfo__/ bash$ rrdtool graph /tmp/foo.png --end now --start end-12s \ --width 400 DEF:myline=cpu_nice.rrd:sum:AVERAGE \ LINE1:myline#FF:foo\n bash$ file /tmp/foo.png /tmp/foo.png: GIF image data, version 87a, 480 x 155 bash$ xv /tmp/foo.png #or some other way of viewing it If foo.png is a real graph, then you have verified that rrdtool is working correctly. If you cannot get rrdtool to create a graph for you, you should investigate why it is not working correctly before continuing to troubleshoot ganglia. -ben On Mon, Jul 17, 2006 at 12:13:25PM +0530, toney samuel wrote: Hi, i have specified the path as per your instruction but still i am not getting and graph in the web page. On 7/15/06, matt massie [EMAIL PROTECTED] wrote: toney samuel wrote: Hi i have installed as per the instructions on this link http://www.ibm.com/collaboration/wiki/display/WikiPtype/ ganglia http://www.ibm.com/collaboration/wiki/display/WikiPtype/ganglia I am able to get the ganglia page and also the status of my node. I am not getting the graphics ( graphs ). i have installed rrdtool in /usr/local/rrdtool-1.2.3 i have also specified the rrdtool path in /var/www/html/ganglia/conf.php define(RRDTOOL, /usr/local/rrdtool-1.2.3); you are so close. RRDTOOL is not the path to the directory but rather the path to the binary. define(RRDTOOL, /usr/local/rrdtool-1.2.3/bin/rrdtool); should work for you. good luck -matt - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642 ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general -- Ben Hartshorne email: [EMAIL PROTECTED] http://ben.hartshorne.net signature.asc Description: Digital signature
Re: [Ganglia-general] counters for gmetric
On Thu, Jul 13, 2006 at 10:44:46AM -0700, dan c wrote: I'm trying to use gmetric to record data from several counters that do not reset after they've been read. I tried setting the slope to negative, positive and both, but all three produce identical output. Any ideas? My solution was to make a directory /var/lib/ganglia/metrics/ and put state files there. By recording the time and the value the last time gmetric was run, you can calculate the average change per unit time without predefining the time period. I usually use 2 minutes (from cron), but of course the timeperiod you should use is determined by your data resolution requirements, as well as how the data changes. Also, it's nice to be able to change it without touching either the statefile or the script, and it deals with changes in load that might cause your script to run something other then *exactly* every 2 minutes. (Mine still crash when the load gets so high that several instantiations of th script build up without being able to run, and then run all at once; they complain if the time diff is zero.) Look at just about any of the scripts in http://ben.hartshorne.net/ganglia/ for examples of how I have done this. Some of the metrics that I measured that don't reset include the mysql Questions count (to calculate queries per second) and the disk activity straight from /proc (for which you also need to deal with your counter looping back to zero after hitting its max bound). -ben -- Ben Hartshorne email: [EMAIL PROTECTED] http://ben.hartshorne.net signature.asc Description: Digital signature
Re: [Ganglia-general] counters for gmetric
On Fri, Jul 14, 2006 at 10:12:32AM -0700, dan c wrote: Ben, thanks for your reply. I've considered keeping state files and calculating the differences, but this doesn't scale very well for a large number of hosts. RRDTool What doesn't scale? The calculation is done on the host sending the statistic, not the host recieving the statistic, so each host does its own little thing and the collectors get the same data they otherwise would. The load is distributed, so the addition of another host doesn't increase the load on any collector (other than the impact it will obviously have, being another host). I currently use this technique for 80 hosts * 5 stats * 2 minutes. OTOH, I agree that it'd be great if you didn't have to worry about it and rrdtool just did the Right Thing(TM). -ben -- Ben Hartshorne email: [EMAIL PROTECTED] http://ben.hartshorne.net signature.asc Description: Digital signature
Re: [Ganglia-general] install and configure ganglia
On Tue, Jul 11, 2006 at 03:52:03PM +0530, toney samuel wrote: i have downloaded ganglia-3.0.3.tar.gz but i am not clear how to get it working, do i need to install and apache. pls tell me if i want to download any other packages. Hi Toney, I would recommend starting here: http://ganglia.info/docs/ganglia.html#installation You will need to have apache installed on your head node, where you want to view the stats through the web-based interface. You also need to have some other packages installed, such as rrdtool. I believe the installation process will stall at certain parts if the required packages are not there. It will prompt you to install what you need. Good luck, and please mail back if you have more specific questions, -ben -- Ben Hartshorne email: [EMAIL PROTECTED] http://ben.hartshorne.net signature.asc Description: Digital signature
Re: [Ganglia-general] Graph templates (custom graphs)
On Sun, May 07, 2006 at 11:28:46PM +0300, Alex Balk wrote: Hi all, I've done some work extending my patch for custom graphs and it now includes the following features: ... Alex, this looks awesome. I can't tell you the number of times I wanted something like this. Thank you! Now for the rest... it doesn't work for me. ;) The patch applied just fine, but I get php errors in my error_log and the image that appears is broken (i.e. broken image symbol from the browser, not an image that is somehow not correct). The errors are as follows: when i click on 'custom graph'... [client 192.168.25.9] PHP Notice: Use of undefined constant referer_url - assumed 'referer_url' in /var/www/html/ganglia/custom_graph_interface.php on line 109, referer: http://localhost:8080/ganglia/? [client 192.168.25.9] PHP Notice: Use of undefined constant sec - assumed 'sec' in /var/www/html/ganglia/custom_graph_interface.php on line 476, referer: http://localhost:8080/ganglia/? [client 192.168.25.9] PHP Notice: Undefined variable: time_range in /var/www/html/ganglia/custom_graph_interface.php on line 477, referer: http://localhost:8080/ganglia/? [client 192.168.25.9] PHP Notice: Undefined index: in /var/www/html/ganglia/custom_graph_interface.php on line 477, referer: http://localhost:8080/ganglia/? [client 192.168.25.9] PHP Notice: Undefined index: interface_mode in /var/www/html/ganglia/custom_graph_interface.php on line 508, referer: http://localhost:8080/ganglia/? [client 192.168.25.9] PHP Notice: Undefined index: interface_mode in /var/www/html/ganglia/custom_graph_interface.php on line 541, referer: http://localhost:8080/ganglia/? [client 192.168.25.9] PHP Notice: Undefined index: interface_mode in /var/www/html/ganglia/custom_graph_interface.php on line 581, referer: http://localhost:8080/ganglia/? [client 192.168.25.9] PHP Notice: Undefined index: interface_mode in /var/www/html/ganglia/custom_graph_interface.php on line 611, referer: http://localhost:8080/ganglia/? [client 192.168.25.9] PHP Notice: Undefined index: interface_mode in /var/www/html/ganglia/custom_graph_interface.php on line 631, referer: http://localhost:8080/ganglia/? [client 192.168.25.9] PHP Notice: Undefined index: interface_mode in /var/www/html/ganglia/custom_graph_interface.php on line 651, referer: http://localhost:8080/ganglia/? [client 192.168.25.9] PHP Notice: Undefined index: interface_mode in /var/www/html/ganglia/custom_graph_interface.php on line 675, referer: http://localhost:8080/ganglia/? [client 192.168.25.9] PHP Notice: Undefined variable: option_1_selected in /var/www/html/ganglia/custom_graph_interface.php on line 682, referer: http://localhost:8080/ganglia/? [client 192.168.25.9] PHP Notice: Undefined index: interface_mode in /var/www/html/ganglia/custom_graph_interface.php on line 712, referer: http://localhost:8080/ganglia/? [client 192.168.25.9] PHP Notice: Undefined index: interface_mode in /var/www/html/ganglia/custom_graph_interface.php on line 755, referer: http://localhost:8080/ganglia/? when i have filled out the values and want to create the graph (in basic mode): [client 192.168.25.9] PHP Notice: Undefined variable: opt_cmdline in /var/www/html/ganglia/custom_graph_rendering.php on line 138, referer: http://localhost:8080/ganglia/custom_graph_processing.php [client 192.168.25.9] PHP Notice: Undefined variable: metrics_cmdline in /var/www/html/ganglia/custom_graph_rendering.php on line 182, referer: http://localhost:8080/ganglia/custom_graph_processing.php [client 192.168.25.9] PHP Notice: Undefined variable: legend_header in /var/www/html/ganglia/custom_graph_rendering.php on line 261, referer: http://localhost:8080/ganglia/custom_graph_processing.php ERROR: unknown function 'VDEF' There are also lots of errors as I'm filling out the form, but i figure those might be less important. Am I missing a required dependency? ganglia works fine in this installation. Thanks, -ben -- Ben Hartshorne email: [EMAIL PROTECTED] http://ben.hartshorne.net signature.asc Description: Digital signature
Re: [Ganglia-general] Graph templates (custom graphs)
On Tue, May 09, 2006 at 09:51:52PM -0700, Ben Hartshorne wrote: On Sun, May 07, 2006 at 11:28:46PM +0300, Alex Balk wrote: Hi all, I've done some work extending my patch for custom graphs and it now includes the following features: ... Alex, this looks awesome. I can't tell you the number of times I wanted something like this. Thank you! Now for the rest... it doesn't work for me. ;) The patch applied just fine, but I get php errors in my error_log and the image that appears is broken (i.e. broken image symbol from the browser, not an image that is somehow not correct). The errors are as follows: another note - in my installation, I have three data sources for my gmetad. The grid has three clusters. When I enter the custom graph pages, it says 'custom graph for: unspecified'. If I remove two of the three sources, it says 'custom graph for: ksjc' (the remaining cluster). does this not work with more than one cluster? -ben -- Ben Hartshorne email: [EMAIL PROTECTED] http://ben.hartshorne.net signature.asc Description: Digital signature
[Ganglia-general] Metric pull-down menu not showing all metrics
All, I am curious how the Metric menu in the cluster view gets populated. I have a number of my hosts reporting metrics that the others don't. For example, the hosts that are running mysql and replicating from a different database report how many seconds they are behind their master, but only 10 out of my 30 hosts run mysql. The mysql_slave ganglia metric does not usually show in the Metric pull-down menu. Previously, I had only one cluster, so clicking on 'Grid' just went straight to the cluster. For some reason, after clicking on 'Grid', I could see all the metrics that are reported. As soon as I chose a metric, only some of the metrics were present in the Metric pull-down menu. I think only the metrics present on the first host in the cluster list are present in the pull-down menu. Now I have more than one cluster in my grid, so clicking on Grid no longer gives me all the metrics in the Metric menu. I am now unable to see my mysql_slave metric without manually typing it into the URL string. Suggestions? -ben -- Ben Hartshorne email: [EMAIL PROTECTED] http://ben.hartshorne.net signature.asc Description: Digital signature
Re: [Ganglia-general] Altering metrics parameters
On Thu, May 04, 2006 at 01:00:33PM +0800, [EMAIL PROTECTED] wrote: G'day all Newbie type question but I can't seem to find a readily available answer. I'd like byte counts in and out from my nodes... but on other interfaces (eg. eth2 is used for gmond but I'm interested in eth0 and eth1). gmond.conf doesn't seem to offer such an option. Do I need to modify the source or do I run gmeter with an ifconfig+grep special? I wrote a script to report individual interface values, and then edited the ganglia PHP to display a network report (along with load, CPU, memory, etc.) that showed each interface in a different color. The script is at http://cryptio.net/~ben/ganglia/network_gmetric.sh I think I only modified conf.php and graph.php. I added the following lines to conf.php -=-=-=-=-=-=-=-=-=8=-=--=-=-=-=-=-=8-=-=-=-=-=-- # # Colors for the split network report graph # $total_rx_color = FF; $total_tx_color = FF; $eth0_rx_color = 33; $eth0_tx_color = 00FF00; $eth1_rx_color = FF00FF; $eth1_tx_color = 00; -=-=-=-=-=-=-=-=-=8=-=--=-=-=-=-=-=8-=-=-=-=-=-- I made the following changes to graph.php -=-=-=-=-=-=-=-=-=8=-=--=-=-=-=-=-=8-=-=-=-=-=-- [EMAIL PROTECTED]:/var/www/html/ganglia$ diff -c graph.php.orig graph.php *** graph.php.orig 2005-05-09 11:27:45.0 -0700 --- graph.php 2005-09-27 16:20:26.0 -0700 *** *** 18,25 # Assumes we have a $start variable (set in get_context.php). if ($size == small) { ! $height = 40; ! $width = 130; } else if ($size == medium) { --- 18,25 # Assumes we have a $start variable (set in get_context.php). if ($size == small) { ! $height = 60; ! $width = 200; } else if ($size == medium) { *** *** 176,181 --- 176,215 .LINE2:'bytes_in'#$mem_cached_color:'In' .LINE2:'bytes_out'#$mem_used_color:'Out' ; } + else if ($graph == split_network_report) + { + $style = Split Network; + + $lower_limit = --lower-limit 0 --rigid; + $extras = --base 1024; + $vertical_label = --vertical-label 'Bytes/sec'; + + $series = DEF:'total_tx'='${rrd_dir}/network_tx.rrd':'sum':AVERAGE +.DEF:'total_rx'='${rrd_dir}/network_rx.rrd':'sum':AVERAGE +.DEF:'eth0_rx'='${rrd_dir}/eth0_rx.rrd':'sum':AVERAGE +.DEF:'eth0_tx'='${rrd_dir}/eth0_tx.rrd':'sum':AVERAGE +.DEF:'eth1_rx'='${rrd_dir}/eth1_rx.rrd':'sum':AVERAGE +.DEF:'eth1_tx'='${rrd_dir}/eth1_tx.rrd':'sum':AVERAGE +.LINE3:'total_tx'#$total_tx_color:'Total TX' +.LINE3:'total_rx'#$total_rx_color:'Total RX' +.LINE2:'eth0_tx'#$eth0_tx_color:'Eth0 TX' +.LINE2:'eth0_rx'#$eth0_rx_color:'Eth0 RX' +.LINE2:'eth1_tx'#$eth1_tx_color:'Eth1 TX' +.LINE2:'eth1_rx'#$eth1_rx_color:'Eth1 RX' ; + } + else if ($graph == disk_report) + { + $style = Disk; + + $lower_limit = --lower-limit 0 --rigid; + $extras = --base 1024; + $vertical_label = --vertical-label 'Blocks/sec'; + + $series = DEF:'disk_writes'='${rrd_dir}/disk_writes.rrd':'sum':AVERAGE +.DEF:'disk_reads'='${rrd_dir}/disk_reads.rrd':'sum':AVERAGE +.LINE2:'disk_writes'#$mem_cached_color:'Write' +.LINE2:'disk_reads'#$mem_used_color:'Read' ; + } else if ($graph == packet_report) { $style = Packets; -=-=-=-=-=-=-=-=-=8=-=--=-=-=-=-=-=8-=-=-=-=-=-- -ben -- Ben Hartshorne email: [EMAIL PROTECTED] http://ben.hartshorne.net signature.asc Description: Digital signature
Re: [Ganglia-general] Metric pull-down menu not showing all metrics
Richard, Rick, Thank you both for your replies. Rick, either your or my mailer decided to wrap your patch at 72chars wide, so rather than try and use `patch` to apply it, I just applied it by hand. I'm smarter than patch, anyways. ;) Some of the line numbers were just a touch off, but close enough that I think the patch would apply cleanly to the 3.0.3 code, if one were to choose to do so. It works like a charm! Thanks, -ben -- Ben Hartshorne email: [EMAIL PROTECTED] http://ben.hartshorne.net signature.asc Description: Digital signature
Re: [Ganglia-general] A Java Virtual Machine probe
Miguel, I look forward to trying this out. You say 'Installation requires ganglia to be installed using the source code.' - I am curious what files you require. I installed ganglia using an RPM, and would like to bring in only the files I need. Is this a bad idea? What do you recommend? (and why would a compiled C program require sources to run with other compiled programs?) Thanks, -ben On Tue, Mar 28, 2006 at 05:17:29PM +0100, José Miguel Pereira Tavares wrote: Hi all! Sometime ago I enquired about the existence of a probe that would give information on a running Java Virtual Machine (JVM). Unfortunately it's a Linux only probe (for now at least). Now I am happy to present to the community a probe developed in C that can monitors a JVM and report to a gmond. It's publishing of metrics is similar to that of gmetric. It works by parsing the info at /proc/pid/status of the Linux tasks relevant to the JVM (depending on the kernel the status file reports just one process with threads or a set of tasks that represent process/threads). Although this probe is now used for monitoring Java services it can be used to monitor any other kind of process that has a long time span. It's a kind of process oriented metrics. It was done in C for the usual question on intrusiveness. Memory footprint is around 1.5 Mb Vss and it uses 0.1% of the CPU with 10 seconds of interval between samples. This probe can be downloaded at: http://student.dei.uc.pt/~mtavares/index.php?content=software Installation requires ganglia to be installed using the source code. I would be glad to hear comments, bug reports and, even better, to receive patches. Miguel Tavares -- Until they become conscious they will never rebel, and until after they have rebelled they cannot become conscious. - George Orwell's 1984 - -- Ben Hartshorne email: [EMAIL PROTECTED] http://ben.hartshorne.net signature.asc Description: Digital signature
Re: [Ganglia-general] A Java Virtual Machine probe
On Tue, Mar 28, 2006 at 08:02:32PM +0100, José Miguel Pereira Tavares wrote: Hi Ben! On Tuesday, 28 March 2006 19:30, Ben Hartshorne wrote: Miguel, I look forward to trying this out. You say 'Installation requires ganglia to be installed using the source code.' - I am curious what files you require. I installed ganglia using an RPM, and would like to bring in only the files I need. Is this a bad idea? What do you recommend? (and why would a compiled C program require sources to run with other compiled programs?) It's needed mostly to linking with several libraries (libganglia, libconfuse, lidapr-0, libmetrics, libgetopthelper) that come with ganglia and are not required to be instaled installed on the host. At least as far as I could tell. Does it statically or dynamically link against those binaries? In other words, do I only need the ganglia src on the machine on which I compile JVMProbe, or will I need it to run? I ask because I would like to deploy to ~30 hosts (to see how the stats run on my cluster), and if I can avoid deploying the sources to all those hosts, I would like it better. :) One suggestion I have so far - you mention that though the tool is called JVMProbe, it can be used on other applications. however, the stats that it sends to ganglia are all named JVM_foo, which means that it cannot be used to monitor two different applications on the same host. Perhaps you could include an option to include the name of the process it's watching as part of the metric name? For example, when watching java, use JVM_java_foo as the metric name. If you then also wanted to watch abcApp on the same host, it would report those metrics as JVM_abcApp_foo and not collide namespace. One bug report - After compilation, I ran 'sudo ./JVMProbe -d' to see if it would actually work. I got the following output: - [0] - - [1] - - [2] - - [3] - and then I cancelled the process. I was confused - was this correct output? despite the fact that the --help option said that the default named of the process to watch was 'java', I added '-n java' and then got: - [0] - JVM_taks = 39 tasks JVM_avgUse = 98 % JVM_highUse = 12 % JVM_vmPeak = 0 kB JVM_vmSize = 1274096 kB JVM_vmRSS = 765428 kB JVM_vmData = 1201280 kB JVM_vmStk = 2036 kB JVM_vmExe = 56 kB JVM_vmLib = 70228 kB - [1] - JVM_taks = 39 tasks JVM_avgUse = 98 % JVM_highUse = 12 % JVM_vmPeak = 0 kB JVM_vmSize = 1274096 kB JVM_vmRSS = 765428 kB JVM_vmData = 1201280 kB JVM_vmStk = 2036 kB JVM_vmExe = 56 kB JVM_vmLib = 70228 kB ahhh. Much better. I don't know why it didn't work without the '-n' flag. Many Many thanks for writing this module! though I give you errors only, the fact is that it works for me, nearly out of the box, and is successfully reporting JVM stats within my framework! :) -ben -- Ben Hartshorne email: [EMAIL PROTECTED] http://ben.hartshorne.net signature.asc Description: Digital signature
Re: [Ganglia-general] A Java Virtual Machine probe
Miguel, can this probe get things like stats on the garbage collector (avg time spent in GC, etc.) Thanks, -ben On Tue, Mar 28, 2006 at 05:17:29PM +0100, José Miguel Pereira Tavares wrote: Hi all! Sometime ago I enquired about the existence of a probe that would give information on a running Java Virtual Machine (JVM). Unfortunately it's a Linux only probe (for now at least). Now I am happy to present to the community a probe developed in C that can monitors a JVM and report to a gmond. It's publishing of metrics is similar to that of gmetric. It works by parsing the info at /proc/pid/status of the Linux tasks relevant to the JVM (depending on the kernel the status file reports just one process with threads or a set of tasks that represent process/threads). Although this probe is now used for monitoring Java services it can be used to monitor any other kind of process that has a long time span. It's a kind of process oriented metrics. It was done in C for the usual question on intrusiveness. Memory footprint is around 1.5 Mb Vss and it uses 0.1% of the CPU with 10 seconds of interval between samples. This probe can be downloaded at: http://student.dei.uc.pt/~mtavares/index.php?content=software Installation requires ganglia to be installed using the source code. I would be glad to hear comments, bug reports and, even better, to receive patches. Miguel Tavares -- Until they become conscious they will never rebel, and until after they have rebelled they cannot become conscious. - George Orwell's 1984 - -- Ben Hartshorne email: [EMAIL PROTECTED] http://ben.hartshorne.net signature.asc Description: Digital signature
Re: [Ganglia-general] uptime metric/graph
On Wed, Mar 15, 2006 at 11:17:07AM -0500, Richard Lefebvre wrote: Has anyone created an uptime metric/graph? It would be a great stat to collect to see how often a system is rebooted. Richard, the Time and String metrics already report uptime (and boot time). Were you imagining something other than that? -ben -- Ben Hartshorne email: [EMAIL PROTECTED] http://ben.hartshorne.net signature.asc Description: Digital signature
Re: [Ganglia-general] Documentation
On Thu, Feb 16, 2006 at 07:08:55PM -0800, Bernard Li wrote: Well somebody needs to update that doc though - it seems pretty outdated. hmm... I did just volunteer myself, didn't I? I didn't realize it was out of date. I really wish I could update it, but I have neither the time nor the understanding of how things work - I was learning from it! ;) ::sigh:: ok, well... -- Ben Hartshorne email: [EMAIL PROTECTED] http://ben.hartshorne.net signature.asc Description: Digital signature
[Ganglia-general] Documentation
In a recent thread (Ganglia 3.0.2 runtime error on AIX 5.1), Raymond Pete pointed the thread to the Ganglia documentation on the ucsf server: http://www.msg.ucsf.edu/local/ganglia/ganglia_docs/ A part of this document says that 'the latest version of this document can be found on the ganglia documentation page' and links to http://ganglia.sourceforge.net/docs/ I cannot find the document referenced on the UCSF page on sourceforge, which is a real travesty because the UCSF document is 10 times better than what's on the sourceforge docs page. Could the webmaster of the sourceforge project include a copy of the UCSF docs in the documentation section of the web page? -ben -- Ben Hartshorne email: [EMAIL PROTECTED] http://ben.hartshorne.net signature.asc Description: Digital signature
Re: [Ganglia-general] config file confusion
On Thu, Feb 09, 2006 at 11:13:15AM -0700, Ian Cunningham wrote: To expand on what Jason wrote, the name doesn't actually decide which cluster the node's data gets included in. If you are using multicast as so what is the name used for? If you're defining the name/port combination on the server running gmetad, would it have any effect to have a different name on each host? Let's say you have three hosts on each of three different multicast ports, and they're named a1, a2, a3, b1, b2, b3, c1, c2, c3. I'm not really clear on how the grid/cluster naming thing would happen. The UI shows grid/cluster/host; under what conditions would you see each of the above names? -ben -- Ben Hartshorne email: [EMAIL PROTECTED] http://ben.hartshorne.net signature.asc Description: Digital signature
Re: [Ganglia-general] librrd.so.0
On Wed, Feb 08, 2006 at 11:36:08AM -0900, [EMAIL PROTECTED] wrote: List, This is driving me crazy. We have just spun up a small cluster running Oscar 3.1. We decided to run ganglia to get a feel for load balance, system and network behavior, and possible bottlenecks. We are running Redhat Fedora Core 3. I too run FC3. My copy of librrd.so.0 comes from /usr/lib/librrd.so.0 and is in the package rrdtool-1.0.49-3 (which came as an RPM) from ftp://rpmfind.net/linux/fedora/extras/3/i386/rrdtool-1.0.49-4.fc3.i386.rpm HTH, -ben -- Ben Hartshorne email: [EMAIL PROTECTED] http://ben.hartshorne.net signature.asc Description: Digital signature
Re: [Ganglia-general] Pointers on architecting a largescale ganglia setup??
On Tue, Jan 31, 2006 at 12:15:19AM -0800, Martin Knoblauch wrote: just in case you did not know: http://ganglia.sourceforge.net/gmetric/ Everyone is invited to contribute to the repository. Martin, I believe someone else has pointed out that submissions have been closed (and apparently for a very long time...). I found most of the scripts there not quite right for what I wanted to do, so I wrote my own. I have put them up at http://cryptio.net/~ben/ganglia/ for your cunsumption. They include * disk - measures disk IO (per disk as well as cumulative) * network - reports per-interface stats (which I combined in a ganglia report to show all on one graph - fantastic for frontend/backend stuff) * mysql - reports queries per second as well as broken slow queries * sensors - CPU temp. et al for Tyan motherboards (may work for others) There is also a crontab file there for /etc/cron.d/ that calls them every two minutes and includes the (with this list's help) fixed num-users metric: */2 * * * * root /usr/bin/gmetric --name=users --value=`who | wc -l` --type=int16 One thing I like about these scripts is that they do a fair bit of error checking, so if something happens that might cause them to fail every two minutes, you don't get 100 messages in your inbox the next morning. For example, if mysqld dies on an unimportant box, you don't want to be inundated with messages. HTH, -ben p.s. these scripts have been written for a redhat-based linux installation (Fedora, CentOS, etc.). I don't know how portable they are. I expect not very much. :) -- Ben Hartshorne email: [EMAIL PROTECTED] http://ben.hartshorne.net signature.asc Description: Digital signature
[Ganglia-general] intermittent blanks in graphs
Hi, I have been running ganglia for most of the last year, quite happily. My hosts are configured to send unicast data to a single gmetad server. Recently, large portions of the cluster's graphs are empty. A sample is shown at http://cryptio.net/~ben/ganglia/blank_graphs.png Notice that not all hosts are missing data (Burgertime, for example, has all the data there). I thought it was due to high load, because I first noticed it when the gmetad server was being hammered by a separate process. But it has long since recovered, and I have not seen the graphs recover, but they have in fact gotten worse. I was running 3.0.1, and tried upgrading to 3.0.2 on the off chance it would fix something, but it did not. I have since downgraded the webui because I have made some changes[*] and I don't want to spend the time to migrate them just now. :) When I go into the page for a single host and click on the 'gmetrics' link, I find that all of my metrics have a record of being recieved within the last two minutes (my time period). And yet, their graphs show up empty. Any thoughts? What logs should I be looking at? I am running on a Fedora Core 3 system, with version 3.0.1 (now 3.0.2). I don't think I've made any gross changes to the environment within the last week, which is the time period in which all this annoyance has started. The only think I can say is that the beginning of this strangeness coincides with a brief (12-hr) period of intense load on the gmetad server. Thanks, -ben [*] for those interested - I added an 8-hour and 3-day view; I find the 8-hour view the most useful by far. I also changed the size of the graphs to fit my 20 screen. Finally, I added a Disk summary graph, in addition to the Load, CPU, Memory, and Network. Is there any interest in patching these into the source? -- Ben Hartshorne email: [EMAIL PROTECTED] http://ben.hartshorne.net signature.asc Description: Digital signature