Re: [Ganglia-general] Adding hosts to cluster in gmetad
On Dec 16, 2011, at 10:28 AM, Maciek Lasyk wrote: I've been trying to make a basic ganglia configuration: one gmetad getting data from 2 clusters (11 sources and 1 source) via unicast. Unfortunately with attached configuration I see only the first host from data_source It appears you're using the same port for both data_source lines, which is why you're having issues. Ganglia uses the port number to differentiate between clusters. host1: gmetad.conf == data_source SR1 192.168.0.23:8649 192.168.0.26:8649, 192.168.0.17:8649, 192.168.0.44:8649, 192.168.0.6:8649, 192.168.0.7:8649, 192.168.0.10:8649, 192.168.0.9:8649, 192.168.0.8:8649, 192.168.0.3:8649 192.23.1.22:8649 data_source SR2 129.253.128.112:8649 gridname GD case_sensitive_hostnames 1 == -- Learn Windows Azure Live! Tuesday, Dec 13, 2011 Microsoft is holding a special Learn Windows Azure training event for developers. It will provide a great way to learn Windows Azure and what it provides. You can attend the event by watching it streamed LIVE online. Learn more at http://p.sf.net/sfu/ms-windowsazure ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] O'Reilly eBook on Ganglia
I'd be glad to have a reference for all the settings and variables that are being used in the various .json files. As far as I can tell, the only documentation is the php code itself. Some more verbosity about making custom reports (the stuff that lives in graph.d) would be nice too. Reports at least have a wiki page and the php scripts are well commented, but the json files are devoid of info. On Dec 9, 2011, at 6:51 PM, Matt Massie wrote: We're in the process of pulling together a team to write an O'Reilly eBook on Ganglia. Here's a rough idea of some of the topics we could cover • Ganglia's components and overall architecture • Typical deployment configurations including simple steps for verifying an installation (e.g. unicast/multicast, single cluster/multiple distributed clusters/datacenter) • Navigating and using the new web interface • Tips for extending ganglia's functionality (e.g. gmetric, modules) • Common integration points (e.g. Hadoop metrics, Nagios) • A simple step-by-step checklist for debugging common ganglia issues with pointers to our web site, mailing lists, irc channel, etc. • Supported platforms and core metrics (e.g. Ganglia on AIX, Linux Power systems) • Scaling to clusters 1000 nodes • Using Ganglia in mixed environments • Ganglia in the enterprise • Development of custom modules What are the things you would be most interested in? Are there other topics you'd like to see covered? -Matt -- Learn Windows Azure Live! Tuesday, Dec 13, 2011 Microsoft is holding a special Learn Windows Azure training event for developers. It will provide a great way to learn Windows Azure and what it provides. You can attend the event by watching it streamed LIVE online. Learn more at http://p.sf.net/sfu/ms-windowsazure___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general -- Learn Windows Azure Live! Tuesday, Dec 13, 2011 Microsoft is holding a special Learn Windows Azure training event for developers. It will provide a great way to learn Windows Azure and what it provides. You can attend the event by watching it streamed LIVE online. Learn more at http://p.sf.net/sfu/ms-windowsazure ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Scaling Ganglia
On Nov 3, 2011, at 10:49 PM, Eytan Daniyalzade wrote: I am running a cluster with around 80 nodes, and ganglia-server is running on EC2 with 8G. Loading the main page or a host view on ganglia takes fairly long, ~20sec. It looks like this is taking as long as the view is making sequential loads all the graphs (images), and server takes longer than I would expect to respond to them. Could you advise any tuning to speed it up or possibly dive into what might be slowing things down? I am running Ganglia 2.1.8 web ui, and serve all files from web root. I am not really familiar with tuning apache/php for better performance. The typical bottleneck is the rrd files. Usually this doesn't present a problem with viewing the web page, but rather updating the files, but it might still be worth looking in to. The easy test is to move the rrds to a tmpfs and check for improvements. rrdcached is another common choice. My installation uses tmpfs, and is able to load a page of 500 hosts in about 17 seconds. This is without any performance related tweaking to apache or php. -- RSA(R) Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Ganglia Cluster aggregated graphs
On Oct 13, 2011, at 11:52 AM, Aidan Wong wrote: This is my first post on this Ganglia list =). I'm using the new Ganglia web 2.1.8 . Has anyone been able to create a graph that aggregates one common metric for several hosts. Try looking at the aggregate graphs tab on the web interface. It lets you use regular expressions to set up a graph showing many hosts at once. These graphs can also be added to views. -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
[Ganglia-general] Making y axis consistent?
Has anyone come up with a clever hack for getting rrd graphs produced by ganglia to use the same axis clamp? At this point, my main issue is with the new Views page, where I've configured a view showing the 15 minute load average for 6 hosts. Every single graph has a different Y axis scale, making it hard to quickly identify which nodes are most busy. There is the upper-limit and lower-limit values available for custom graphs, would it be reasonable to put preferences into the view json file? Or maybe global Y axis ranges in conf.php on a per-metric basis? I can't see a reasonable way to probe data ranges automatically.. this would put an unfair burden on on web frontend, I think. -- FREE DOWNLOAD - uberSVN with Social Coding for Subversion. Subversion made easy with a complete admin console. Easy to use, easy to manage, easy to install, easy to extend. Get a Free download of the new open ALM Subversion platform now. http://p.sf.net/sfu/wandisco-dev2dev ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Making y axis consistent?
To answer my own question, yes this is possible. Shortly after sending this email I discovered the graphs.d folder and figured out how to use the report (otherwise known as custom graphs) feature, allowing me to make a custom graph with a fixed Y axis range. Then I dug through the source for a bit to figure out how to get the views to use reports, which turns out is fairly easy. An entry like this in the view .json file: {hostname:novagpvm01.fnal.gov,metric:load_fifteen}, Will become: {hostname:novagpvm01.fnal.gov,graph:my_report}, Sorry for the mail list noise. On Aug 12, 2011, at 2:46 PM, Seth Graham wrote: Has anyone come up with a clever hack for getting rrd graphs produced by ganglia to use the same axis clamp? At this point, my main issue is with the new Views page, where I've configured a view showing the 15 minute load average for 6 hosts. Every single graph has a different Y axis scale, making it hard to quickly identify which nodes are most busy. There is the upper-limit and lower-limit values available for custom graphs, would it be reasonable to put preferences into the view json file? Or maybe global Y axis ranges in conf.php on a per-metric basis? I can't see a reasonable way to probe data ranges automatically.. this would put an unfair burden on on web frontend, I think. -- FREE DOWNLOAD - uberSVN with Social Coding for Subversion. Subversion made easy with a complete admin console. Easy to use, easy to manage, easy to install, easy to extend. Get a Free download of the new open ALM Subversion platform now. http://p.sf.net/sfu/wandisco-dev2dev ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general -- FREE DOWNLOAD - uberSVN with Social Coding for Subversion. Subversion made easy with a complete admin console. Easy to use, easy to manage, easy to install, easy to extend. Get a Free download of the new open ALM Subversion platform now. http://p.sf.net/sfu/wandisco-dev2dev ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] [Ganglia-developers] Announcing Ganglia Web 2.0RC1
On Jun 22, 2011, at 4:15 PM, Alex Dean wrote: That requires that the view name match the cluster name, right? Yes, it requires the view and the cluster to match. Could you post your changes somewhere so we could see what you did? I attached a gzipped diff of the gweb-2.0 release against the tree I was hacking on. view_permissions.patch.gz Description: GNU Zip compressed data Creating views doesn't work properly, adding to views does (sort of.. it might be smarter if the dropdown box for adding to a view only listed valid options for the user). I still want to take a stab at this, I just haven't had the time. Help me understand your use case better. You want to allow some non-admin users to edit a single view, right? Is there a case for limiting the visibility of a view, or are we only concerned with who can change a view? More generally, what permissions do we need? - view a view - create a view - edit a view - delete a view I personally only care about who can edit views.. anyone who can get to our web server is free to look at everything on it. But I suppose if effort is going to be put into editing views, might as well control who can view them too. I'd say sensible defaulte are that admins can do all of these things for all views, and anonymous users can view all views which haven't been specifically hidden. Same here.-- Simplify data backup and recovery for your virtual environment with vRanger. Installation's a snap, and flexible recovery options mean your data is safe, secure and there when you need it. Data protection magic? Nope - It's vRanger. Get your free trial download today. http://p.sf.net/sfu/quest-sfdev2dev___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] [Ganglia-developers] Announcing Ganglia Web 2.0RC1
On Jun 9, 2011, at 12:10 PM, Alex Dean wrote: I started off intending to allow per-view edit access, just like we allow per-cluster edit access for optional graphs. The complication is that each resource (a view or a cluster) in the ACL is only identified by a simple string. Thus you can't have a cluster and a view which share the same name - or, if you did you'd probably unwittingly be granting permissions you didn't mean to. I thought about introducing some kind of namespacing, and then just decided to punt until it was actually needed. So... maybe that time is now? :) Something like this wouldn't be too hard to implement: $acl-allowView( 'username', 'view-name', GangliaAcl::EDIT ); $acl-allowCluster( 'username', 'cluster-name', GangliaAcl::EDIT ); Please suggest alternate APIs here. That's just my initial brainstorm. I finally got a chance to sit down and poke at this. The good news is it's easy to implement a permissions system for adding graphs to an existing view. My method was to edit GangliaAcl.php to add an 'EDIT_VIEW' resource, and use the add() function along with a clustername to give a user view editing privileges. After updating the checkAccess() calls where appropriate in host_view.php and views.php, a user can add graphs to their view. More complicated is the creation of the views themselves. Because views can have names without any relation to ganglia clusters, the ACL system won't work. I guess one could put in a restriction that a user can only create views with the same name of clusters they have edit permissions for, but that would limit them to owning a single view per cluster. (as an aside, is it intended that once a view is created, it cannot be removed via the web interface?) The more I look at it, the more inclined I am to leave the configuration as it is. Every idea I come up with limits the flexibility of the Views or requires more acl maintenance in conf.php. -- Simplify data backup and recovery for your virtual environment with vRanger. Installation's a snap, and flexible recovery options mean your data is safe, secure and there when you need it. Data protection magic? Nope - It's vRanger. Get your free trial download today. http://p.sf.net/sfu/quest-sfdev2dev ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] [Ganglia-developers] Announcing Ganglia Web 2.0RC1
On Jun 8, 2011, at 8:25 PM, Alex Dean wrote: Hi Seth. I'm just back from a week off the grid, and trying to get caught up on a mountain of electronic stuff. Here's my quick response. Please let me know if more explanation is required. Nope, the explanation makes sense. The only thing I was missing was detail about the philosophy behind the privileges system. Editing views is not per-cluster permission because views can contain graphs from many clusters. Currently, we only support a single 'edit' permission for all views. (A user can either edit all views, or can edit none.) You can't selectively grant edit permission on a single view. That restriction could possible be lifted in the future if there is demand for it. It's my primary motivation for updating to the new interface, actually. I don't know how typical my environment is, but I'm taking care of machines belonging to many different experiments. Users like to have their resources on their own web page, and not see nodes they don't care about. Traditionally I've dealt with this in gmetad.conf, moving machines between clusters or making new clusters based on the whims of scientists. It works, but is kind of a pain. Being able to set up admin accounts and let the users arrange things to taste via a web page would make everyone happy.. I don't have to babysit ganglia, and they don't have to wait for me to update ganglia. Fortunately, it's pretty easy to modify the access checks to allow this behavior, so if I'm a minority case, I can patch where needed. I just wasn't sure if I was using the ACL system properly. thanks -- EditLive Enterprise is the world's most technically advanced content authoring tool. Experience the power of Track Changes, Inline Image Editing and ensure content is compliant with Accessibility Checking. http://p.sf.net/sfu/ephox-dev2dev ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Announcing Ganglia Web 2.0RC1
I'm having some issues getting the user roles working as expected. The wiki instructs something like: $acl-addRole( $username, GangliaAcl::GUEST ); $acl-allow( $username, $cluster, GangliaAcl::EDIT ); Which does not result in the little blue + sign to be drawn next to graphs. From line 71 in host_view.php, there is this line: if(checkAccess(GangliaAcl::ALL_VIEWS, GangliaAcl::EDIT, $conf)) { Changing it to: if(checkAccess($clustername, GangliaAcl::EDIT, $conf)) { Allows the check to succeed, but I run into the same problem in views.php. What does the 'EDIT' role actually allow a user to edit, if not views? And is it possible to configure the interface to allow a user to only edit specific views? As configured now, it appears view editing is all or nothing. thanks, On Jun 1, 2011, at 10:08 AM, Vladimir Vuksan wrote: Announcing Ganglia Web 2.0 Release Candidate 1. http://ganglia.info/?p=373 Vladimir -- Simplify data backup and recovery for your virtual environment with vRanger. Installation's a snap, and flexible recovery options mean your data is safe, secure and there when you need it. Data protection magic? Nope - It's vRanger. Get your free trial download today. http://p.sf.net/sfu/quest-sfdev2dev ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general -- EditLive Enterprise is the world's most technically advanced content authoring tool. Experience the power of Track Changes, Inline Image Editing and ensure content is compliant with Accessibility Checking. http://p.sf.net/sfu/ephox-dev2dev ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] default auth settings
On Apr 22, 2011, at 9:43 AM, Alex Dean wrote: I'd like to get some feedback on how we should configure gweb's default access permissions. #1. $conf['auth_system']=false; will disable authorization, so no logins are required and the system behaves like the current ganglia web frontend. In this case, should editing of views be allowed or denied? Do we want disabling auth to mean 'read-only access' or 'anything goes'? I think it should be read only at that point. I envision groups of users with competing ideas of what machines are important putting the web interface through a tug of war. If administrators want to risk that, it should be something they have to consciously enable. #3. Should the default be to ship with authorization enabled or disabled? My preference is that 'read-only no authorization required' is the default configuration. That's my preference too. -- Fulfilling the Lean Software Promise Lean software platforms are now widely adopted and the benefits have been demonstrated beyond question. Learn why your peers are replacing JEE containers with lightweight application servers - and what you can gain from the move. http://p.sf.net/sfu/vmware-sfemails ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Need help configuring clusters to use separate multicast IP
That might work, but I don't think anyone sets up their ganglia so that a single gmond is trying aggregate all clusters. That's what the gmetad daemon is for. Also note that even though you have a separate multicast address for each cluster, the port still has to be unique. The port is what gmetad and the web frontend use to distinguish between clusters. You get really weird results if multiple data_source lines use the same port. An ideal configuration might be: Each of the 5 clusters has a unique gmond.conf, with its own multicast address and port number. The gmetad host has 5 data_source lines to query one host from each of the 5 clusters. On Mar 23, 2011, at 9:52 AM, Ron Cavallo wrote: I need some help. I am trying to configure my gmetad to collect from different clusters on different IP's. I have 5 clusters. This is my gmetad collections server's local gmond.conf configuration: /* Feel free to specify as many udp_send_channels as you like. Gmond used to only support having a single channel */ udp_send_channel { mcast_join = 239.2.11.72 port = 8649 ttl = 1 } /* You can specify as many udp_recv_channels as you like as well. */ udp_recv_channel { mcast_join = 239.2.11.71 port = 8649 bind = 239.2.11.71 } udp_recv_channel { mcast_join = 239.2.11.72 port = 8649 bind = 239.2.11.72 } udp_recv_channel { mcast_join = 239.2.11.73 port = 8649 bind = 239.2.11.73 } udp_recv_channel { mcast_join = 239.2.11.74 port = 8649 bind = 239.2.11.74 } udp_recv_channel { mcast_join = 239.2.11.75 port = 8649 bind = 239.2.11.75 } udp_recv_channel { port = 8649 } This is an excerpt from ONE OF THE CLUSTERS ABOVE (the .74 cluster) /* Feel free to specify as many udp_send_channels as you like. Gmond used to only support having a single channel */ udp_send_channel { mcast_join = 239.2.11.74 port = 8649 ttl = 1 } /* You can specify as many udp_recv_channels as you like as well. */ udp_recv_channel { mcast_join = 239.2.11.74 port = 8649 bind = 239.2.11.74 } I configure only one server in a cluster to be polled from the gmetad since that server has all of the cluster members information in it anyway. Here is how I have it configured to talk to the one gmond shown directly above: data_source SaksGoldApps 45 sd1mzp01lx.saksdirect.com:8649 -- Enable your software for Intel(R) Active Management Technology to meet the growing manageability and security demands of your customers. Businesses are taking advantage of Intel(R) vPro (TM) technology - will your software be a part of the solution? Download the Intel(R) Manageability Checker today! http://p.sf.net/sfu/intel-dev2devmar ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general -- Enable your software for Intel(R) Active Management Technology to meet the growing manageability and security demands of your customers. Businesses are taking advantage of Intel(R) vPro (TM) technology - will your software be a part of the solution? Download the Intel(R) Manageability Checker today! http://p.sf.net/sfu/intel-dev2devmar ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Need help configuring clusters to use separate multicast IP
On Mar 23, 2011, at 10:12 AM, Ron Cavallo wrote: I see. So I need a separate IP AND A SEPARATE PORT. Got it. Also, I use a single gmond in each cluster to aggregate the single cluster. I configure the gmetad to talk to only gmond from each cluster. Is that wrong? No, your configuration is correct if the above is how you've set it up. I interpreted your previous message as saying you had a gmond process with udp_recv_channels for every cluster. -- Enable your software for Intel(R) Active Management Technology to meet the growing manageability and security demands of your customers. Businesses are taking advantage of Intel(R) vPro (TM) technology - will your software be a part of the solution? Download the Intel(R) Manageability Checker today! http://p.sf.net/sfu/intel-dev2devmar ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Need help configuring clusters to use separate multicast IP
On Mar 23, 2011, at 10:34 AM, Ron Cavallo wrote: Ahhh wait! I do!! On the AGGREGATION Server, I have both a gmetad.conf and a gmond.conf (I also monitor the server itself). I configured RECEIVE channels in the gmond.conf on the aggregation server for every cluster, specifying the IP that the clusters will be sending on. Is that wrong? It probably won't produce the desired results. So in that sense, yes it's wrong. But gmond will certainly let you do it, I'm just not sure what the resulting data will look like. Best case it would merge all clusters into a single cluster. Worst case, machines disappear and reappear randomly. -- Enable your software for Intel(R) Active Management Technology to meet the growing manageability and security demands of your customers. Businesses are taking advantage of Intel(R) vPro (TM) technology - will your software be a part of the solution? Download the Intel(R) Manageability Checker today! http://p.sf.net/sfu/intel-dev2devmar ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Ganglia: Nodes showing up in wrong clusters in web frontend
On Mar 22, 2011, at 10:53 AM, Ron Cavallo wrote: I see other examples where I have to go hunting around for cluster members that aren't reporting into the proper cluster. Any ideas? Double check the ports in use in the gmond.conf on the machines that are misbehaving. Also note that machines tend to linger in an old cluster they were reporting to, even if their config file says otherwise. If you look at the XML dump from the gmetad, you may find that a given machine appears twice. The web frontend gives fairly random results when this happens. These stale entries do eventually expire (default is 30 days I believe), but a restart of all gmond processes and gmetad will clean it up instantly. -- Enable your software for Intel(R) Active Management Technology to meet the growing manageability and security demands of your customers. Businesses are taking advantage of Intel(R) vPro (TM) technology - will your software be a part of the solution? Download the Intel(R) Manageability Checker today! http://p.sf.net/sfu/intel-dev2devmar ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Ganglia -- modify the source code
On Mar 11, 2011, at 1:51 PM, Afef MDHAFFAR wrote: Hi all, I am trying to modify the source code of Ganglia in order to make ganglia able to send monitored data via network connection to another component. I noticed that it sums the metric values of all nodes composing the cluster (eg. it calculates the load of the cluster). Would you please help me to eliminate this aggregation and send values for each node (for example: [Node1, Load, the value of the load for only this node]). You shouldn't need to modify the ganglia source to do this. If you want the per-host value, parse the XML coming from gmond. Every host has an entry in this XML tree, and the values are not aggregated. -- Colocation vs. Managed Hosting A question and answer guide to determining the best fit for your organization - today and in the future. http://p.sf.net/sfu/internap-sfd2d ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Ganglia -- modify the source code
On Mar 11, 2011, at 2:30 PM, Bernard Li wrote: Hi Seth: On Fri, Mar 11, 2011 at 12:26 PM, Seth Graham set...@fnal.gov wrote: You shouldn't need to modify the ganglia source to do this. If you want the per-host value, parse the XML coming from gmond. Every host has an entry in this XML tree, and the values are not aggregated. I was kind of surprised that no such library exists to do this -- are you aware of anything in the wild? No.. but I've never had a need to look for one. I use php's xml parser to pull the bits of data I need, which was more or less a cut and paste from the ganglia.php that ships with the web frontend. :) It's good code, it might be useful to turn it into a standalone library at some point. -- Colocation vs. Managed Hosting A question and answer guide to determining the best fit for your organization - today and in the future. http://p.sf.net/sfu/internap-sfd2d ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Noobie questions re: Ganglia
On Mar 1, 2011, at 11:26 AM, William Saxton wrote: Hi all (potential) new ganglia user here, with a couple quick questions that I couldn't find the answers to via google. 1) Where can I find how ganglia gathers information from a system? Well, it's an open source project, so you can find it by cracking open the source files. The stuff you're interested in is in the libmetrics directroy. 2) Does anyone have any experience with using ganglia, just as a backend for storage of RRD data, but then using their own custom front-end? Ganglia is structured well enough that you can easily remove any piece you don't want. The web interface is completely optional.. if you can parse xml and run rrdtool, making your own frontend is trivial. Likewise, if all you need is the xml, you can eliminate the gmetad portion and query gmond directly. Last, gmond uses a module system for collecting system metrics, allowing you to strip out anything you don't want and build your data collection up from scratch (it is a little restrictive on payload size but other than that, the sky's the limit). -- Free Software Download: Index, Search Analyze Logs and other IT data in Real-Time with Splunk. Collect, index and harness all the fast moving IT data generated by your applications, servers and devices whether physical, virtual or in the cloud. Deliver compliance at lower cost and gain new business insights. http://p.sf.net/sfu/splunk-dev2dev ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Multicast/Unicast Poll
On Jan 12, 2011, at 4:22 PM, Bernard Li wrote: Hi Seth: On Wed, Jan 12, 2011 at 1:31 PM, Seth Graham set...@fnal.gov wrote: Migrating to unicast eliminated the firewall issues, means only a select few machines have to keep metrics in memory, and no more cross talk with other groups. I never saw any solid evidence that ganglia was putting an unfair load on systems, but it was easier to reconfigure than fight it. Since you guys are in HPC and are using unicast -- what send_metadata_interval do you use? It's currently set to 15 seconds. However, only a third of our machines are migrated to 3.1.7.. everything else is still on 2.5. I chose 15 seconds because that was the number that popped up when I was searching for information on send_metadata_interval, and I haven't touched it since. MRTG data for my collector nodes don't show anything to be alarmed about, whatever bandwidth ganglia is using, it's getting buried by user consumption. I think it will stay this way, as 1000 machines is our largest cluster and that's how many are currently using 3.1.7. I do plan on sending all 3000 of our machines to a single gmetad, so it'll be interesting to see how that holds up (but my understanding is that send_metadata_interval has no effect on gmetad). Would appreciate your input on the following thread over at ganglia-developers: http://www.mail-archive.com/ganglia-developers@lists.sourceforge.net/msg05725.html I don't have much in the way of comments, because I haven't had any problems. I do like the idea to set the default to something other than zero if unicast is enabled. A warning on startup could be useful too.. I know I glazed over the send_metadata_interval in the man page several times until a google search pointed it out to me. Printing the message only when -d is specified might be good enough.. -d 1 is usually the first thing I try when things aren't doing what I want. -- Protect Your Site and Customers from Malware Attacks Learn about various malware tactics and how to avoid them. Understand malware threats, the impact they can have on your business, and how you can protect your company and customers by using code signing. http://p.sf.net/sfu/oracle-sfdevnl ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Issue with gmetad
On Jan 12, 2011, at 9:39 AM, John Williams wrote: I have also taken this one step further by installing our server on a brand new Dell R710 with 6x240GB SSD (RAID5). Ganglia is the only thing running on the server. I received the same errors after just a few minutes of running. I have also ran xmllint against the output from port 8651 and it reports no errors. Any help is appreciated. Are all your data_source groups on separate subnets? Are you using multicast? Your gmetad.conf has you giving no ports, which means everyone is using 8649, and if the various machines are on the same wire, their data is going to get piled together. I don't know if gmetad 3.0 or newer is better about this (because I always define ports these days), but in the 2.5 era it would get hopelessly confused if multiple data_sources were using the same port. It may be worth giving each data_source its own port and see if things improve. The port is what ganglia uses to differentiate clusters. Cluster name, data_source, ip address.. doesn't matter. If Two machines are using the same port to relay metrics, gmetad is going to think they're in the same cluster. -- Protect Your Site and Customers from Malware Attacks Learn about various malware tactics and how to avoid them. Understand malware threats, the impact they can have on your business, and how you can protect your company and customers by using code signing. http://p.sf.net/sfu/oracle-sfdevnl ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Multicast/Unicast Poll
On Jan 12, 2011, at 3:12 PM, Jesse Becker wrote: In light of the recent discussions over metadata and unicast vs. multicast, we (meaning Bernard) have created a poll on http://ganglia.info/ to try and gauge the use of each. Please let us know if you use multicast, unicast, or both in your environments. If you have any comments about using one or the other, We used multicast for a long time because it's certainly easy, and ganglia is something multicast is well suited for. But as the years rolled on, firewalls got involved, people became concerned about memory and network usage, and subnet privacy was eroding. We started getting other departments' machines mixed in with our machines, and this caused all kinds of confusion on both sides. Migrating to unicast eliminated the firewall issues, means only a select few machines have to keep metrics in memory, and no more cross talk with other groups. I never saw any solid evidence that ganglia was putting an unfair load on systems, but it was easier to reconfigure than fight it. So the reasons to switch were mostly political. -- Protect Your Site and Customers from Malware Attacks Learn about various malware tactics and how to avoid them. Understand malware threats, the impact they can have on your business, and how you can protect your company and customers by using code signing. http://p.sf.net/sfu/oracle-sfdevnl ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] tcp/ip instead of multi-cast ???
On Jan 10, 2011, at 2:11 PM, Sayler, Steven (Contractor) wrote: Because of our network, multicast protocol will be a major problem. Is there a way to run ganglia gmond/gmetad via tcp/ip? Yes, look into the udp_send_channel option for gmond.conf. If you specify a host and port gmond will switch to a unicast mode. -- Gaining the trust of online customers is vital for the success of any company that requires sensitive data to be transmitted over the Web. Learn how to best implement a security strategy that keeps consumers' information secure and instills the confidence they need to proceed with transactions. http://p.sf.net/sfu/oracle-sfdevnl ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Ganglia for data collection, not storage/graphing?
Yes, because all of the ganglia data is stored in an xml format. You telnet to a gmetad or gmond process and get a dump of everything that daemon knows about. Makes it easy to write additional tools because xml parsers are a dime a dozen. It's fast enough to be used in a web page.. I use it for everything from monitoring kernel versions to system uptime for a tactical overview page that helps monitor around 3500 machines. On Dec 9, 2010, at 11:57 AM, O G wrote: Hello, Is Ganglia written in a way that let's one use its data gathering capabilities, but not data storage and graphing? For example, can one write some sort of Ganglia component or plugin that takes data collected from any of its built-in or other components and sends it somewhere else (a different file, a different database, out over the network, etc.) instead of having the data being written by Ganglia to RRD files and later graphed from there? Thanks, Otis -- ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general -- ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Archive Ganglia
On Oct 29, 2010, at 5:24 AM, nigel.le...@uk.bnpparibas.com wrote: For various convoluted reasons, I would like to copy my rrd files to another server, and view them as a point in time archive. In effect just have the webfrontend running, and no gematd or gmond processes. Any ideas ? It seems that simply copying the /var/www/html/ganglia and /data/rrds directory does not work, as the webfrontend requires new data to as input. Simply copying the files won't work because of the way rrd averages out data and expects new input. The correct way to keep old data is to create the the RRDs with an RRA that holds data for as long as you want it. An alternative (painful) method is do dump the rrd's and use the xml output as your archive. Another option to look into is the way rrdtool allows you to specify a time period to create a graph for. The ganglia frontend defaults to now for the end time. So you could copy your rrds somewhere, and tweak the php scripts to change now to the last time the rrds were updated. This might not scale too well though, depending how many point in time archives you need to keep available. -- Nokia and ATT present the 2010 Calling All Innovators-North America contest Create new apps games for the Nokia N8 for consumers in U.S. and Canada $10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store http://p.sf.net/sfu/nokia-dev2dev ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Does Ganglia measure itself?
On Sep 21, 2010, at 1:52 PM, Jesse Becker wrote: You can avoid this by using unicast to specifically designated collector gmonds (then having gmetad poll those for overall status). Or by enabling 'deaf' on machines that you don't want collecting data and are stuck on multicast for whatever reason. That said, gmond is probably the least burdensome metrics collection I've ever used.. an idle machine running gmond will still list a load average of zero. The hald that modern linux distros love to run consumes more processor. -- Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Ganglia Web Forum?
On Aug 10, 2010, at 2:15 PM, Bernard Li wrote: But I just want to clarify that we are *not* abandoning the mailing-list and IRC. Web forums isn't really my cup of tea either but I just wanted to make sure that forum users have a place to get their questions answered about Ganglia if they prefer that over traditional mailing-lists. This is the one big advantage of a forum.. once the google bot gets into it, finding answers to questions is ridiculously easy. The sourceforge mailing list search is passable, but the interface is crap and it's a huge hassle getting what you want. It would be great if forum/mailing-list can just be converged into one thing, perhaps we can look into using Nabble? http://www.nabble.com/ I've never used nabble, but I have used email-to-forum gateways before, and it always seems to turn into a mess.. sigs get into forum posts, quoting is all kinds of mess, fun stuff like that. Maybe nabble is better. I wouldn't know.. but it's the sort of thing to keep an eye out for. -- This SF.net email is sponsored by Make an app they can't live without Enter the BlackBerry Developer Challenge http://p.sf.net/sfu/RIM-dev2dev ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] python module strings versus gmetric strings
On 5/7/10 9:37 AM, Brad Nicholes wrote: This is the process which packages a metric into a very small packet which can be passed between systems safely. Apologies for barging into this discussion, but I've been working on getting used to the modules features of ganglia this week and this caught my eye. In the past I've used gmetric to get some auditing information from our systems. For example, space and inode usage on specific user-owned filesystems. My script collects this list of mounts, delimits it, then uses gmetric to dump it into the xml tree for collection at a central location. I'm aware ganglia is intended to be performance monitoring software, but it's always been so good at shipping data around it's hard to resist using it for more than just metrics. At any rate, as we prepare for our upgrade to ganglia 3, I discovered that submitting strings via the python module interface is limited to 32 characters.. anything longer produces some odd behavior in the xml tree. I was able to increase this maximum by adjusting the MAX_G_STRING_SIZE in gm_value.h, but your comment about small packets and safety make me question whether this is wise. Are there risks to this I am unaware of? If I have to pass arbitrary non-graphable data from my machines to a central host, should I continue to use gmetric? If so, what does gmetric do differently that allows longer strings than gmond does? -- ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Extending the format of gmetad.conf
Daniel Pocock wrote: - is it important for users to maintain the files manually, or will the focus shift to tools, web interface or config files generated from some other enterprise data source? I've been content with the existing file format for the 7 or so years I've been running ganglia. At this point if changes were going to be made, I think making it consistent with gmond's configuration format would be a noble effort. Not a big fan of any kind of GUI interface to maintain a text file, many other software packages have made this mistake.. the problems it creates is a config file that reads like line noise, or hidden options in the config that never get GUI elements to control them. -- This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Fw: No graph display on ganglia web page
I would assume errors with using the rrd tool to make the graphs. Some kind of path issue perhaps? Try checking your error logs and see if anything comes up. Or right click where the graph should be, copy the image location, and load it in a new tab. Sometimes you'll get some helpful text from this. Thanachote Pothanant wrote: Hi Seth, Thank you for you reply and sorry if duplicate. I just try for only 1 node (client, server and web front are all in the same node). About 'memory_limit' in php.ini, it already set to 128M. Please follow these links for my broswer screenshots. http://img90.imageshack.us/img90/1203/ganglia1.jpg http://img194.imageshack.us/img194/4039/ganglia2.jpg Thank you very much for your help. Thanach Inactive hide details for Seth Graham set...@fnal.govSeth Graham set...@fnal.gov *Seth Graham set...@fnal.gov* 08/14/2009 09:21 PM To Thanachote Pothanant/Thailand/i...@ibmth cc ganglia-general@lists.sourceforge.net Subject Re: [Ganglia-general] Fw: No graph display on ganglia web page How many machines are you monitoring? Are you getting the page headers at all? php defaults for memory allowance are pretty small. I forget what the default is, but whenever a php script exceeds this limit the script will exit. In the case of ganglia, this usually means either no graphs, or only a few graphs, are shown. Try doubling the 'memory_limit' in php.ini. I currently have mine set to 128M, which is more than enough room for pages up to 1000 nodes. Thanachote Pothanant wrote: Hi all, I'm pretty new with ganglia. So please help me with this problem. My problem is that in ganglia web page, all graphs cannot be displayed. I set $debug to 1 in graph.php and got rrdtool command. When I tried executing the command and redirect () output to file, the graph was there. But in the web page there is nothing. My environment details are as follow, OS: AIX 6100-03-01-0921 Processor type: PowerPC POWER4 processor Apache version: Apache/2.2.11 (Unix) PHP/5.2.9 with Suhosin-Patch mod_ssl/2.2.11 OpenSSL/0.9.8j DAV/2 mod_chroot/0.5 Belows are list of rpm packages, apr-1.3.3-2 libconfuse-2.6-1 expat-2.0.1-2 ganglia-lib-3.1.2-1 ganglia-gmond-3.1.2-1 zlib-1.2.3-5 freetype2-2.3.9-1 libpng-1.2.38-1 libart_lgpl-2.3.20-1 rrdtool-1.2.30-2 ganglia-gmetad-3.1.2-1 ganglia-web-3.0.3-1 fontconfig-2.7.0-1 I got rpm packages except ganglia-web-3.0.3-1 from http://www.perzl.org/ganglia/ I'm using ganglia-web-3.0.3-1 because I got this error when I tried to install ganglia-web-3.1.1-1 lparaix61:root rpm -Uvh --ignoreos ganglia-web-3.1.1-1.noarch.rpm error: failed dependencies: php-gd is needed by ganglia-web-3.1.1-1 Please help me out with this problem. Thank you very much. Thanach -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Fw: No graph display on ganglia web page
How many machines are you monitoring? Are you getting the page headers at all? php defaults for memory allowance are pretty small. I forget what the default is, but whenever a php script exceeds this limit the script will exit. In the case of ganglia, this usually means either no graphs, or only a few graphs, are shown. Try doubling the 'memory_limit' in php.ini. I currently have mine set to 128M, which is more than enough room for pages up to 1000 nodes. Thanachote Pothanant wrote: Hi all, I'm pretty new with ganglia. So please help me with this problem. My problem is that in ganglia web page, all graphs cannot be displayed. I set $debug to 1 in graph.php and got rrdtool command. When I tried executing the command and redirect () output to file, the graph was there. But in the web page there is nothing. My environment details are as follow, OS: AIX 6100-03-01-0921 Processor type: PowerPC POWER4 processor Apache version: Apache/2.2.11 (Unix) PHP/5.2.9 with Suhosin-Patch mod_ssl/2.2.11 OpenSSL/0.9.8j DAV/2 mod_chroot/0.5 Belows are list of rpm packages, apr-1.3.3-2 libconfuse-2.6-1 expat-2.0.1-2 ganglia-lib-3.1.2-1 ganglia-gmond-3.1.2-1 zlib-1.2.3-5 freetype2-2.3.9-1 libpng-1.2.38-1 libart_lgpl-2.3.20-1 rrdtool-1.2.30-2 ganglia-gmetad-3.1.2-1 ganglia-web-3.0.3-1 fontconfig-2.7.0-1 I got rpm packages except ganglia-web-3.0.3-1 from http://www.perzl.org/ganglia/ I'm using ganglia-web-3.0.3-1 because I got this error when I tried to install ganglia-web-3.1.1-1 lparaix61:root rpm -Uvh --ignoreos ganglia-web-3.1.1-1.noarch.rpm error: failed dependencies: php-gd is needed by ganglia-web-3.1.1-1 Please help me out with this problem. Thank you very much. Thanach -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] how to preserve rrd data (as long as I want)
jiangyouu wrote: Hi! I want preserve rrd database as long as possible,and review specific day (or hour even minute )'s specific data(like cpu Utilization rate ,network Utilization rate ),but the default value of dear Mr Ganglia is one year. The resolution of stored data is configured when the rrd is generated, which is done by gmetad. If you need higher resolution you'd have to hack up the gmetad source to generate the RRA's you desire. The wishlist for gmetad mentions allowing custom RRA's, but it's not in yet. The penalty this generates is the rrds use more disk space. I have no experience trying to store a year (or more) of per-minute data, but I would imagine it carries a performance penalty as well. The other option is to do an rrd dump of the data, and archive that. -- Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise -Strategies to boost innovation and cut costs with open source participation -Receive a $600 discount off the registration fee with the source code: SFAD http://p.sf.net/sfu/XcvMzF8H___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] gmond stops logging data
timl wrote: I'm running ganglia version 3.1.0 and periodically gmond seems to stop collecting data. I can see the incoming traffic to the server and the web pages show the hosts as being up, but no data is logged. Originally I started seeing this on 3.0.5 so I upgraded.. but the lastest seems to have made the problem worse. Sometimes the graphs start showing data on their own, but usually I have to restart gmond on the clients. Where does the fresh data stop showing up? That is, dumping out the XML does the reported stop updating? I've seen holes appear in rrd files when the gmetad server is having trouble keeping up with the I/O, and restarting gmond on clients wouldn't repair that. But that's where it seems like the problem is, especially if the web page never shows the machines being down. Try running gmetad in debug mode and see if it complains about writing to the rrds. btw, surprise meeting you here. ;) - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] ganglia and job monarch
Daniel Bourque wrote: Currently, I have this as the revc_channel on the gmond accepting info from all the worker nodes: udp_recv_channel { port = 8666 family = inet4 } I don't see how this channel is associated with a particular cluster . If I add another udp_recv_channel , and tell job monarch to use that channel , how will ganglia be able to separate that input from the the worker other nodes ? Ganglia groups machines based on which port number they communicate on. Hostname, ip address or text labels in gmetad.conf are irrelevant. It takes everyone a while to wrap their head around this, but it works well once you get used to it. Every port you have a gmond chattering on will have a completely unique XML tree. So you could put your worker nodes on port 8666, put the batch server on 8665, and enter two data_source lines in your gmetad.conf to collect from those ports. When you bring up the ganglia web page, you'll notice that the view has changed a little bit and you'll see two clusters instead of the single one you did originally. Finally, you point your jobmonarch tools at port 8665 so it can get the data it needs. You may be able to skip putting the data_source line in gmetad.conf for the batch server. I don't have an opportunity right now to test all the possibilities so you're on your own there. - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] ganglia and job monarch
Daniel Bourque wrote: Hi, my setup is as follow, 2 PBS head nodes running torque, moab , ganglia and a group of compute nodes running pbs_mom. Ganglia's gmond is running on the headnodes in mute mode. I'm trying to get rid of the localhost.localdomain node that now shows up in ganglia because job monarch reports as localhost.localdomain. I'm not sure what this means, because jobmonarch doesn't report as anything, instead it adds metrics to an existing host's entry in the xml tree (specifically, your pbs server). If you telnet to your xml_port on the gmetad server and dump to a file, and search for 'MONARCH', you'll see that everything is included inside a pair of HOST tags. The only place the hostname is set is within that HOST tag. It seems to me your pbs server is confused about its own hostname. Either a bad entry in /etc/hosts, or assuming a redhat system, something in /etc/sysconfig is setting the machine's name to localhost.localdomain (which is a default). - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] ganglia and job monarch
Daniel Bourque wrote: I don't want ganglia to report on the nodes running pbs_server. I only care about the compute nodes. having non compute nodes in ganglia messes up the usage statistics. The proper way to fix this is have your pbs server submit ganglia information on a different port than the worker nodes. Since Job Monarch must piggy back off an existing node, I must use BATCH_HOST_TRANSLATE to map localhost.localdomain to one of my compute node. Correct ? I can't comment, because I've never used that feature of job monarch. From what I can tell in the jobmond.conf file, that's not the intended purpose. So if it does work, great. If it doesn't, I'm not surprised. ;) - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] gmetad giving high TN values
Bernard Li wrote: Hi Kirk: On Wed, Jun 25, 2008 at 1:53 PM, Kirk McDonald [EMAIL PROTECTED] wrote: gmetad runs on a certain host. Also on that host are a number of gmond instances, which are the gmond instances polled by gmetad. Each of these instances is reported to by a separate cluster, and they are each a separate data source for gmetad. All of the XML polling happens over localhost. I am curious why you are running multiple instances of gmond on the gmetad host. Wouldn't it suffice to simply have gmetad poll gmonds running on your cluster directly? I can't speak for Kirk, but in one instance I do this to get around firewall restrictions. I can send packets out, but not in. So I had to jerry-rig some way to forward data to my gmetad. ;) I wouldn't do it this way again, but I haven't gotten around to un-doing the decision. - Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Leveraging Ganglia XML output for more then monitoring -- the Thebes Consortium Project
Jesse Becker wrote: Bernard Li wrote: While not exactly what you have in mind, but have you taken a look at the JobMonarch project? https://subtrac.sara.nl/oss/jobmonarch/ AFAIK it does also work with SGE. Meh...not really. It's under development, and doesn't work so well with the 6.x versions. I think it works with the old 5.3 series though. job monarch reveals a flaw with PBS (what we use, I imagine this isn't a unique trait) in the sense that the worker nodes do not have the capacity to report job information. Job monarch can only run on the central server.. which makes ganglia, due to its distributed nature, a poor partner. If I'm understanding the goal of the Thebes project, they would try to get the authors of batch system software to adopt a more ganglia like approach to reporting statistics.. which I'd be happy to see. The concern I have is overloading the metrics held in memory by a gmond process to the point it starts consuming noticeable amounts of resources. Ganglia's xml output is by far my favorite feature of the program, the xml is easy to parse and use for homegrown monitoring tools. I worry that if ganglia became a default dumping ground for service information the xml would become inconvenient to work with. The other downside I see is the nature of the data itself. Job information from a batch system is not something you can stuff into an RRD.. you'd have to develop some other way to store job history information, a task that would put a greater load on gmetad and introduce additional scalability concerns. My second favorite feature of ganglia is how simple it is, and I don't know if I'd appreciate it the same way if an installation had a dependency tree as long as my arm. - Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Largest Ganglia installation?
Bernard Li wrote: Dear Ganglia community: Was browsing our SourceForge website and found this description of the project: Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids. It is based on a hierarchical design targeted at federations of clusters. Supports clusters up to 2000 nodes in size. The part I want to focus on is Supports clusters up to 2000 nodes in size. I suppose this was probably written a while back and I would like to update it -- so if you have an installation monitoring more than 2000 hosts, do let us know and we'll update the description with the largest installation from the community :) We have ~3000 machines being monitored with ganglia, but the load is split between two separate machines. Probably not the statistic you were after. :) Best machine I had available when setting up carried 4GB of memory, so I could have put everything on one machine but it wouldn't have left much room for growth. More memory than 4GB is pretty common now.. next time I upgrade our gmetad hosts I'll probably try to put all 3000 on one node, but for now it's two machines. So far I have yet to hear of anybody reaching a ceiling in terms of scaling Ganglia installations once the usual steps of putting rrds in ramdisk/tmpfs -- do let us know if you learn otherwise! Thanks, Bernard - Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general - Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] how to setup ganglia to run in Unicast mode
Sai p Seshasayee wrote: Hi Team, I am a new user to ganglia. I have been trying to setup ganglia to run in Unciast mode. Please get back to me regarding the same. Configuration for gmetad is identical for both unicast and multicast. The only difference is the gmond.conf on your machines. Instead of mcast_join, you use something like: udp_send_channel { host = 127.0.0.2 port = 8000 } When you specify a host option, gmond will use unicast. On the machine that you specify as the destination for unicast packets, you'll need something like: udp_recv_channel { bind = 127.0.0.2 port = 8000 } Thanks and Regards Sai Prakash Poughkeepsie Unix Development Lab IBM Systems and Technology Group External: 845-435-4720 email: [EMAIL PROTECTED] Notes: Sai p Seshasayee/Poughkeepsie/IBM - Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general - Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Largest Ganglia installation?
Bernard Li wrote: Hi Seth: On Fri, Jun 6, 2008 at 7:45 AM, Seth Graham [EMAIL PROTECTED] wrote: We have ~3000 machines being monitored with ganglia, but the load is split between two separate machines. Probably not the statistic you were after. :) Best machine I had available when setting up carried 4GB of memory, so I could have put everything on one machine but it wouldn't have left much room for growth. More memory than 4GB is pretty common now.. next time I upgrade our gmetad hosts I'll probably try to put all 3000 on one node, but for now it's two machines. Thanks for the stats -- it's always good to know what the community is up to. How large is your rrd currently ~550MB on one server, ~650MB on the other, using the default metrics setup. The value fluctuates due to machines getting moved between clusters, and old rrds not being deleted. Have approached 1GB in those situations, usually due to configuration errors. ;) Since you have 2 gmetad servers, do you have an additional gmetad server to aggregate the data from the federated servers? Not currently but I've been considering a setup like that for including some new machines we're getting (to deal with a firewall). Due to the way the departments here are set up, the clusters divide pretty cleanly and there was no need to aggregate the information. It's probably good that it is so.. a number of our users would start complaining if someone else's stats started appearing on their web page. It looks like rrdtool 1.3 is getting closer to being released. I am looking forward to it too. Running out of the ram disk has never sat well with me, I dislike the threat of losing any amount of data if there's a crash. - Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Ganglia 3.0.7-1
Owens, David L wrote: In the gmetad.conf file I have data_source called Non_Prod with four servers ie: hostname:8649. These four servers are on the same subnet. I want to add two machines that are on a different subnet. I have tried different ports but will not display under Non_Prod. Any suggestions? Ganglia uses the port to divide groups of machines, and provides no way of merging them. You have to find a way to get all the machines you want grouped together to chat on the same port. Switching to unicast packets is the easiest way to do this, but you'll lose the redundancy that multicast provides. I'm pretty sure you can mix multicast and unicast in a single gmond.conf but I've never actually done it so I could be talking gibberish. David - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Setup large clusters
Martin Hicks wrote: The configuration of gmetad has been modified to store the rrds in /dev/shm, but this directory gets very large so I'd like to move away from that. Using tmpfs is pretty much your only option. As you discovered, the disk I/O will bring most machines to their knees. Is there a way that I should be architecting the configuration files to make ganglia scale to work on this cluster? I think I want to run gmetad on each head node, and to use that RRD data without regenerating it on the admin node. Is that possible? This is definitely possible, though I don't think it's necessary. I have machines handling 1500 reporting nodes without problems, writing the rrds to a tmpfs. The downside of setting up ganglia with head nodes is that you have to set up some way to make the rrds available to a central web server. Several ways to do that too, but they introduce their own headaches. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] XML Parser for Gmetad
Buccaneer for Hire. wrote: Hey All, Anyone have an XML parser or pointer for more information for gmetad? I am trying to get a notifier together. THX There's one in ganglia.php in the web frontend. I chopped it up for use in some of my personal scripts and works well.. assuming you're wanting to use php. - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] RRDs in memory
Ben Hartshorne wrote: I created a ramdisk when my cluster grew beyond ~50 nodes (I report a lot of extra statistics). I use an actual ramdisk instead of tmpfs (though I chose it out of ignorance when I first set it up, wikipedia[*] says that tmpfs might swap to disk whereas ramfs is just straight up in memory, nothing fancy). I initially used ramdisk as well, also out of ignorance. Ran into stability problems with it.. once I tried allocating more than 1GB to the disk I started getting system crashes and out of memory errors (system had 4GB physical memory). Once I switched to tmpfs it became rock solid. tmpfs has the added advantage of being easier to configure.. no editing kernel boot arguments, just pass mount the options you want and it does it all for you. Ramdisk is probably better on a busy system where you don't want risk a bunch of swapping, but on a dedicated gmetad host I reccomend tmpfs. - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] RRDs in memory
Ofer Inbar wrote: gmetad is very write-intensive, because it updates hundreds of RRD files about every minute or two. Has anyone tried running it with the rrd directory on a RAM disk (tmpfs) ? You'd need something to periodically copy the RRDs to a real disk, but that could happen much less frequently (maybe every 20 minutes). You'd also need a more complicated boot time startup procedure to set up the repository on RAM disk before starting gmetad. Have any of you tried anything like this? What'd you do? How'd it go? We were forced to move our rrd's to tmpfs a couple years ago once our numbered of monitored machines grew into the thousands. It works as well as you would want to, other than the risk of losing everything if the machine goes down. A cron job was put into place to tar the rrds to physical disk once an hour, and edits to rc.local untars it back into the tmpfs on boot. Still risk losing data, but ganglia will average out the gap after a while. Load on the machine drops nearly to zero once you move to tmpfs. - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] A survey of Ganglia users and usage.
Buccaneer for Hire. wrote: The simplicity is a major plus as well as the integration w/ Globus. With a little thinking you can extend the reporting easily. The only think I with I had was notification. I have a large cluster and a number a smaller (128 nodes) and it would make it easier for us to be proactive. So I am writing something that will parse the xml and notify. This is something I've done, of a form. It was a fun little project, mostly because of the way ganglia makes everything so easy to parse. Eventually the email got unpopular, so the project ended up being a web page that tied in with a hardware and ticket database that the admin could view and get a quick idea of what's down and what's being worked on.
Re: [Ganglia-general] A survey of Ganglia users and usage.
[EMAIL PROTECTED] wrote: Perhaps we could create a simple anonymous survey for Ganglia users? Code authors could then be guided quantitively by what the community is really doing - what kind of hosts they monitor - what they use in Ganglia, and what they may need. What do you (all) think? I've been under the impression for a while ganglia wasn't getting a whole lot of development and was mostly in maintenance mode. It hasn't changed a whole lot in the few years I've been using it (except perhaps the config file format, a change that was much appreciated). The software is already excellent, and most of the changes I could suggest would be philosophical my way is better than your way type things.
Re: [Ganglia-general] Gmetad and web frontend on different machines.
Martin Knoblauch wrote: Richard, depending on the cluster size, writing the RRDs via NFS might turn out to be a huge bottleneck. Writing them to local disk is sometimes bad enough. Reading them over nfs may be okay though, depends how often users are hitting reload. Cheers Martin --- [EMAIL PROTECTED] wrote: Saundry, It sort of looks like you can, but actually you can't. gmetad writes to rrd databases as local files, and the web and php read rrd databases as local (actually it invokes rrdtool itself). I imagine you could separate the two using NFS filessystems, but I have not tried this. kind regards, Richard Grevis Production Architecture Barclays Capital, Canary Wharf, London, E14 4BB -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of saundrya mishra Sent: 29 March 2007 14:30 To: ganglia-general@lists.sourceforge.net Subject: [Ganglia-general] Gmetad and web frontend on different machines. Hi There, I am new to Ganglia. Can we have gmetad and web frontend for a cluster to be running on two different machines?? If yes, then how is it possible since i read in the configuration file of the web frontend that the RRDTool databases need to be local to be read? Greetings, Saundrya. For more information about Barclays Capital, please visit our web site at http://www.barcap.com. Internet communications are not secure and therefore the Barclays Group does not accept legal responsibility for the contents of this message. Although the Barclays Group operates anti-virus programmes, it does not accept responsibility for any damage whatsoever that is caused by viruses being passed. Any views or opinions presented are solely those of the author and do not necessarily represent those of the Barclays Group. Replies to this email may be monitored by the Barclays Group for operational or business reasons. - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] ganglia between two networks
Jeremy Hansen wrote: I've setup ganglia in the past and typically it's pretty straight forward. Now I have to deal with nodes being in two completely separate networks where it seems udp broadcast are most likely filtered. Is there just a simple config option to have nodes contact a host directly? I was playing around with udp_send_channel, udp_recv_channel and tcp_accept_channel but I'm coming up short. Perhaps something with my gmetad.conf? Does gmetad accept the incoming connections or is this handled by another gmond running on the reporting host? udp_send_channel is the proper configuration option to send directly to a host. You do need a gmond on the other end with udp_recv_channel AND tcp_accept_channel set. Then have the data_source in gemtad.conf connect to whatever you set in tcp_accept_channel. I have this set up for a particular group of machines under my care, which have to punch through a firewall that the external gmetad cannot reach. Have a special gmond running on the gmetad host with the unique configuration options, whom gmetad interfaces with. Thanks for any pointers. -jeremy - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Using cluster name to differentiate clusters?
Ben Hartshorne wrote: On Mon, Feb 26, 2007 at 01:06:23PM -0600, Seth Graham wrote: Ben Hartshorne wrote: It seems to me that using the name to determine cluster membership would simplify things for the people configuring ganglia. It would, but when you have 3000+ machines all chattering on the same port that's a lot of data for a machine to deal with. Not only do the aggregating machines have to hold it all in memory, but the gmetad host has to dump all that info into the rrds. Isn't the machine going to have to handle exactly the same amount of data, regardless of whether its on one port or two? Not neccessarily, because you can instruct a machine to not poll a port at all. This is an easy way to exploit the features of networking to limit the traffic a host has to parse. If you break up clusters by the name, the machine will have to read in the data for everything that exists on a subnet and filter based on the data it captures. I would imagine that by the time your network got to 3000+ hosts, things would be segregated in their own right, independent of ganglia. Such segregation would make it easy (and more logical) to use head nodes as aggregators and then pass data up the tree to your main web interface. Multicast networks can be broken up by subnet or VLAN, and the unicast nodes can use ganglia's ability to only pass on summary info, etc. This is true. Assuming one keeps them reasonable the volume of data would not become a problem. But IP blocks always seem to be getting bigger and bigger when new ones are assigned, and people are always cramming more and more machines onto them. We haven't crossed the line of 1000 machines on a single vlan yet, but the IP space is there and I worry what happens then. The main web interface is my concern. Gmetad sucks up memory like it's free, and the disk I/O created when rrds are updated quickly get out of hand. Because of this we had to move the rrds to a ramdisk, which eats up even more memory. Of course, I have not had the privilege of working with a cluster of that size. I've only got just over 100 hosts, so please forgive anything that will become obvious as soon as I actually have to deal with the problem... ;) I think your idea could work, it just seems (to me) to rely on a lot more components being configured in an ideal way. In my experience, I never get that. ;)
Re: [Ganglia-general] What are the rrdtool creation parameters for Ganglia Databases?
The rrd creation values can be found in gmetad/rrd_helpers.c and gmetad/conf.c Ian Wootten wrote: Hi all, I want to replicate ganglia's storage in Java, using a multicast listener, storing and manipulating using rrd4j. Firstly has anyone done anything similar? I'm struggling knowing what parameters to set for the database and getting an adequate resolution of the metrics captured from multicast (10-30s for the application I desire). Does anyone know what the datasource and archive creation commands would be/how many there are? Secondly, and I think this is the main thing, the capture of information seems to take ages to be recieved in this way. I'm aware of the MonaLISA project and their java interfaces into ganglia, but a similar implementation by myself seems extremely slow. Currently packets seem to be retrieved at a rate of 1 a second, with each packet containing a single metric value - I'd like to have a complete set after 10 or so seconds.Would I be better off sticking to my current method of interfacing with ganglia's rrd databases directly and extracting content via the fetch command? Thanks, Ian - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Obtaining Immediate Interval Data From Ganglia
Ian Wootten wrote: Hmm, Apologies for the empty reply. Thanks for those suggestions... I'm assuming we're talking kernel modules here, No, we're not. The term 'module' is probably being misapplied here, the stuff being discussed is a module in the sense it extends basic ganglia functionality, but it's not going to be something that is loaded as part of ganglia. Anything you can write that can telnet to a port can fetch the ganglia xml data, enabling you to store it however you want.
Re: [Ganglia-general] Obtaining Immediate Interval Data From Ganglia
Ben Hartshorne wrote: On Tue, Aug 08, 2006 at 04:22:41PM +0100, Ian Wootten wrote: I am facing a problem in that I would like short-segment up to date information from ganglia in order to monitor services after invocation. One method I have heard of that achieves something similar; write a separate module that interprets the XML feed directly. This works well, and I've done it in a single page of code so it's simple stuff. It becomes more of a chore when you want to actually store that data. For short periods it's fine, but if you got a lot of machines and you don't cycle out data the database will get really big really fast. The other thing to consider is how often the xml updates. Clients running gmond only report at set intervals, so if you're trying to get information once a second you'll be unhappy with the results. This, in combination with gmetad's write intervals, is why the rrds have the 'NaN' holes in them. Writing your own module may improve getting real actual numbers, but it won't improve the quality of the data. Newer versions of ganglia allow you to customize update intervals, but on a big network setting values too frequent will generate a lot of traffic, and probably slow the gmetad server to a crawl with all the rrd updates.
Re: [Ganglia-general] number of source problem in gmetad
[EMAIL PROTECTED] init.d]# telnet strauss01 8649 Trying 192.168.1.110... Connected to strauss01. Escape character is '^]'. Connection closed by foreign host. Is anyone got any ideas? When I started getting this, I had to add the server I was connecting from as one of the trusted_hosts on all the machines that were reporting data. I made this change in /etc/gmond.conf. I don't know why this started happening, after a reboot of the cluster that was involved all the machines started refusing connections from the machine that was collecting the data. The above problem looks similar to the one I had.