Re: [Ganglia-developers] Question about Ganglia
Well, from this project's standpoint I wish I had access to a big installation to test on still, but I don't (and I'm happy to be doing what I'm doing). The scalability issue I've seen happened in a grid with a few dozen clusters; each cluster had a few dozen nodes in it, and we monitored a couple hundred metrics per node (yes, I'm being deliberately vague). In that case, the typical failure was that the grid summary metrics would "gap", even though the cluster summary metrics and the underlying host metrics were updated. We were running on a plenty-big ramdisk (8GB), and the actual CPU utilization was under 50%, IIRC. We traced the problem to lock contention in gmetad. Gmetad needs to take a lock per cluster as it collects data for the grid summary metrics. Otherwise, each cluster just runs in its own thread, with its own sampling against the gmonds. By reducing the number of clusters per grid (and moving to a grid-of-grid) deployment, we were able to handle the same number of machines & metrics with much less probability of gapping. My suspicion is that a naive attempt to do the kind of pivoting you're talking about in this email will require holding those cluster-query locks longer. My advice would be to treat each cluster as if it was its own gmond when doing that. I.e., the aggregating thread should simply read the data into its own workspace as quickly as possible without doing any arithmetic (in fact, a copy-on-write clone would be great) so as to avoid holding those cluster locks too long. Then the arithmetic/aggregation thread can run independently. If it can't do arithmetic fast enough, well, change the RRAs to reflect how fast the aggregation thread can run. At all costs, avoid getting in the way of recording the host & cluster metrics; they're the raw material. Folks can always derive their aggregates from that data later or out-of-band if necessary. -- ReC On Tue, Nov 30, 2010 at 10:54 AM, Bernard Li wrote: > Hi Rick: > > On Tue, Nov 30, 2010 at 10:38 AM, Rick Cobb wrote: > > > On your idea, Bernard -- I don't think it would necessarily require gmond > > You're right, I meant to say gmetad... it was late last night :) > > > changes. OTOH, I think it would require very interesting gmetad changes > to > > do a good job. In particular, the ability to summarize by different > > aggregations seems like the scalability wall; it's already hard to get > grid > > summaries updated (the only inter-thread update in the current code). > > Can you elaborate on the last point? Depending on what you mean > exactly, I might have a solution for that... > > I think going forward the gmetad-python code could be easier to > maintain and add new features, and not to mention it supports the > plug-in interface. However, it hasn't really been tested in a large > production environment for scalability and stability, so that's why I > really want to get it released in some form or manner ASAP. > > This is a call for help -- if you have a large installation and would > like to help test some cutting edge code, please take gmetad-python > out for a spin and report back any issues you may have. Hopefully > we'll have it packaged up and released as 3.2.0 by early 2011 :-) > > Thanks all! > > Bernard > -- Increase Visibility of Your 3D Game App & Earn a Chance To Win $500! Tap into the largest installed PC base & get more eyes on your game by optimizing for Intel(R) Graphics Technology. Get started today with the Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs. http://p.sf.net/sfu/intelisp-dev2dev___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Question about Ganglia
Hi Rick: On Tue, Nov 30, 2010 at 10:38 AM, Rick Cobb wrote: > On your idea, Bernard -- I don't think it would necessarily require gmond You're right, I meant to say gmetad... it was late last night :) > changes. OTOH, I think it would require very interesting gmetad changes to > do a good job. In particular, the ability to summarize by different > aggregations seems like the scalability wall; it's already hard to get grid > summaries updated (the only inter-thread update in the current code). Can you elaborate on the last point? Depending on what you mean exactly, I might have a solution for that... I think going forward the gmetad-python code could be easier to maintain and add new features, and not to mention it supports the plug-in interface. However, it hasn't really been tested in a large production environment for scalability and stability, so that's why I really want to get it released in some form or manner ASAP. This is a call for help -- if you have a large installation and would like to help test some cutting edge code, please take gmetad-python out for a spin and report back any issues you may have. Hopefully we'll have it packaged up and released as 3.2.0 by early 2011 :-) Thanks all! Bernard -- Increase Visibility of Your 3D Game App & Earn a Chance To Win $500! Tap into the largest installed PC base & get more eyes on your game by optimizing for Intel(R) Graphics Technology. Get started today with the Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs. http://p.sf.net/sfu/intelisp-dev2dev ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Question about Ganglia
Another wacky way to "solve" this is to layer Ganglias. I.e., write an independent script which polls the gmetad for its XML, projects the data as if, e.g., every LPAR is a host, and posts that to an independent gmond using gmetric or its modern equivalents. Then point a new gmetad instance at that gmond (or set thereof). On your idea, Bernard -- I don't think it would necessarily require gmond changes. OTOH, I think it would require very interesting gmetad changes to do a good job. In particular, the ability to summarize by different aggregations seems like the scalability wall; it's already hard to get grid summaries updated (the only inter-thread update in the current code). -- ReC On Mon, Nov 29, 2010 at 11:21 PM, Bernard Li wrote: > Hi Michael: > > I don't think the current frontend code supports what you want without > some major hacking. The frontend expects the user to view the "grid" > as multiple clusters and the "cluster" as multiple hosts. > > A common feature request is to have hosts arbitrarily cluster/group > based on different criteria (eg. function, geographical location, > etc.) > > Perhaps Vladimir can incorporate some of these ideas in his frontend > re-write. But some fundamental changes may need to be made in the > gmond level... > > Hope this helps. > > Cheers, > > Bernard > > On Tue, Nov 16, 2010 at 11:21 AM, Michael Perzl wrote: > > I have a question regarding the PHP web code of Ganglia: > > > > My setup looks like that: > > > > I have one Grid and several Clusters. > > Each Cluster has between 10-250 nodes, all running on AIX LPARs (but the > > question is independent of the OS). > > I have clustered on a logical level, i.e., not according to the different > > hardware systems but for instance, all SAP systems, all TSM systems etc. > > I have lots of additional metrics, provided via a C-DSO. > > One of the additional metrics is a system-identifier, i.e., a hardware > > identifier that I want to use to identify the hardware that each LPAR is > > running on. > > > > Now in addition to the logical view I want to introduce a "physical > view", > > i.e., a view of all LPARs running on a specific system with a certain > > hardware identifier. This is were my illiteracy regarding PHP comes into > > play. > > > > Basically, I want to have a list of ALL Ganglia nodes so that I can loop > > over and group them according to their hardware identifier. > > > > I was able to accomplish someting in cluster_view.php with the following > > code snippet but have failed miserably so far in meta_view.php. > > > > > > > > $sysids = array(); > > $sysid_count = 0; > > foreach ($metrics as $host => $val) > > { > >if (isset( $val["serial_num"]['VAL'] )) > >{ > > $id = $val["serial_num"]['VAL']; > > if (! in_array( $id, $sysids )) > > { > > $sysids[$sysid_count] = $id; > > $sysid_count++; > > } > >} > > } > > sort( $sysids ); > > > > $cv = @fopen("/tmp/cv.txt","w"); > > foreach ($hosts_up as $host => $val) > > { > >fputs($cv,"$host\n"); > > } > > > > fputs($cv,"=\n"); > > fputs($cv,"sysid_count = $sysid_count\n"); > > fputs($cv,"=\n"); > > > > foreach ($sysids as $id) > > { > >fputs($cv,"$id\n"); > > } > > fputs($cv,"=\n"); > > > > foreach ($sysids as $id) > > { > >fputs($cv,"---> $id\n"); > >foreach ($hosts_up as $host => $h) > >{ > > if ( $metrics[$host]["serial_num"]['VAL'] == $id ) > > { > > fputs($cv,"$host\n"); > > } > >} > > } > > > > fclose($cv); > > > > > > > > Any help/hints regarding how a loop over all Ganglia nodes can be > > accomplished in metaview.php would be highly welcome. Thanks. > > > > Regards, > > Michael > > > > > -- > > Beautiful is writing same markup. Internet Explorer 9 supports > > standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 & L3. > > Spend less time writing and rewriting code and more time creating great > > experiences on the web. Be a part of the beta today > > http://p.sf.net/sfu/msIE9-sfdev2dev > > ___ > > Ganglia-developers mailing list > > Ganglia-developers@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/ganglia-developers > > > > > > > -- > Increase Visibility of Your 3D Game App & Earn a Chance To Win $500! > Tap into the largest installed PC base & get more eyes on your game by > optimizing for Intel(R) Graphics Technology. Get started today with the > Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs. > http://p.sf.net/sf
Re: [Ganglia-developers] Question about Ganglia
Hi Michael: I don't think the current frontend code supports what you want without some major hacking. The frontend expects the user to view the "grid" as multiple clusters and the "cluster" as multiple hosts. A common feature request is to have hosts arbitrarily cluster/group based on different criteria (eg. function, geographical location, etc.) Perhaps Vladimir can incorporate some of these ideas in his frontend re-write. But some fundamental changes may need to be made in the gmond level... Hope this helps. Cheers, Bernard On Tue, Nov 16, 2010 at 11:21 AM, Michael Perzl wrote: > I have a question regarding the PHP web code of Ganglia: > > My setup looks like that: > > I have one Grid and several Clusters. > Each Cluster has between 10-250 nodes, all running on AIX LPARs (but the > question is independent of the OS). > I have clustered on a logical level, i.e., not according to the different > hardware systems but for instance, all SAP systems, all TSM systems etc. > I have lots of additional metrics, provided via a C-DSO. > One of the additional metrics is a system-identifier, i.e., a hardware > identifier that I want to use to identify the hardware that each LPAR is > running on. > > Now in addition to the logical view I want to introduce a "physical view", > i.e., a view of all LPARs running on a specific system with a certain > hardware identifier. This is were my illiteracy regarding PHP comes into > play. > > Basically, I want to have a list of ALL Ganglia nodes so that I can loop > over and group them according to their hardware identifier. > > I was able to accomplish someting in cluster_view.php with the following > code snippet but have failed miserably so far in meta_view.php. > > > > $sysids = array(); > $sysid_count = 0; > foreach ($metrics as $host => $val) > { > if (isset( $val["serial_num"]['VAL'] )) > { > $id = $val["serial_num"]['VAL']; > if (! in_array( $id, $sysids )) > { > $sysids[$sysid_count] = $id; > $sysid_count++; > } > } > } > sort( $sysids ); > > $cv = @fopen("/tmp/cv.txt","w"); > foreach ($hosts_up as $host => $val) > { > fputs($cv,"$host\n"); > } > > fputs($cv,"=\n"); > fputs($cv,"sysid_count = $sysid_count\n"); > fputs($cv,"=\n"); > > foreach ($sysids as $id) > { > fputs($cv,"$id\n"); > } > fputs($cv,"=\n"); > > foreach ($sysids as $id) > { > fputs($cv,"---> $id\n"); > foreach ($hosts_up as $host => $h) > { > if ( $metrics[$host]["serial_num"]['VAL'] == $id ) > { > fputs($cv,"$host\n"); > } > } > } > > fclose($cv); > > > > Any help/hints regarding how a loop over all Ganglia nodes can be > accomplished in metaview.php would be highly welcome. Thanks. > > Regards, > Michael > > -- > Beautiful is writing same markup. Internet Explorer 9 supports > standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 & L3. > Spend less time writing and rewriting code and more time creating great > experiences on the web. Be a part of the beta today > http://p.sf.net/sfu/msIE9-sfdev2dev > ___ > Ganglia-developers mailing list > Ganglia-developers@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/ganglia-developers > > -- Increase Visibility of Your 3D Game App & Earn a Chance To Win $500! Tap into the largest installed PC base & get more eyes on your game by optimizing for Intel(R) Graphics Technology. Get started today with the Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs. http://p.sf.net/sfu/intelisp-dev2dev ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Question about Ganglia web front end
Federico Sacerdoti wrote: Since Matt is going to be indisposed for a while due to his new baby, I will take this one. :O!!! Dang, more people I need to send shirts to this month. :P We are definately planning to implement this idea, and I'm glad you see the need for it. Matt's idea, which I think is excellent, is to use XPath to allow subset queries of the XML tree. Each gmetad would understand XPath, and only return the portion of the tree that is pertinent. Yes, this does make things more complicated, we need an interactive protocol to the gmetad, but hey, we're designers. I think a modification of HTTP oughta work, especially considering the more hierarchical nature of Ganglia 3's monitoring core. Heck, we'll probably have to rewrite the fricken' metadaemon anyway. I haven't been able to devote many cycles to Ganglia lately, as other projects have taken over. I will continue to happily armchair-quarterback on these lists, at the very least, and develop in what spare time I have (which means a Darwin port, probably, so I can run it on my TiBook :P ).
Re: [Ganglia-developers] Question about Ganglia web front end
On a (very) slightly less pie-in-the-sky note... Has anyone considered the utility of being able to select a subset of cluster/host/metric data from the metadaemon? In other words, you send a command that limits display to values that have been updated in the last 60 seconds, or a particular cluster or host name. Any speed hit from building the new response would probably be made up for in the front-end. If you're viewing host details for one host out of a 900-host combined metadaemon XML feed, and the front end only parses 30 metrics instead of 2700... well... It would mean rewriting the listen thread of gmetad (and, simultaneously, the front-end) to have a full-duplex conversation and there's some possible gotchas there... but what the heck, this is the developer's list. Anyway. Another great idea from the people who brought you beer milkshakes... Since Matt is going to be indisposed for a while due to his new baby, I will take this one. We are definately planning to implement this idea, and I'm glad you see the need for it. Matt's idea, which I think is excellent, is to use XPath to allow subset queries of the XML tree. Each gmetad would understand XPath, and only return the portion of the tree that is pertinent. Yes, this does make things more complicated, we need an interactive protocol to the gmetad, but hey, we're designers. Federico
Re: [Ganglia-developers] Question about Ganglia web front end
It's also worth noting that there's no particular reason to avoid developing other front-ends to Ganglia. If you'd rather build one your way using a particular technology or library that better suits your needs, the existing architecture makes it pretty easy for you to do so. Almost as if it was designed that way. ;) I am not exactly a charter member of the PHP fanclub myself. We use Python a lot here internally, but I don't know how I'd feel about using mod_* for something that could be processing a very large number of nodes. But if we're going to talk about what language/platform combo would be best for a Ganglia front-end app, in purely abstract terms (i.e. someone else writes it :P ), I think I'd want to see the whole thing implemented as a Java servlet. There'd be all kinds of speed benefits just from the decreased overhead for each request, not to mention decoupling the XML/RRD parsing and page rendering. In fact, it might even make sense to write the metadaemon as child threads of the front-end (parsing the XML into an internal data structure, and updating the RRD files every time the values change - this way you're not parsing all the XML every time you hit Reload...). On a (very) slightly less pie-in-the-sky note... Has anyone considered the utility of being able to select a subset of cluster/host/metric data from the metadaemon? In other words, you send a command that limits display to values that have been updated in the last 60 seconds, or a particular cluster or host name. Any speed hit from building the new response would probably be made up for in the front-end. If you're viewing host details for one host out of a 900-host combined metadaemon XML feed, and the front end only parses 30 metrics instead of 2700... well... It would mean rewriting the listen thread of gmetad (and, simultaneously, the front-end) to have a full-duplex conversation and there's some possible gotchas there... but what the heck, this is the developer's list. Anyway. Another great idea from the people who brought you beer milkshakes... Federico Sacerdoti wrote: Well there are a few reasons. I know only a cursory bit about xslt, however, so let me know if I'm off base on any of these. We chose PHP over XSLT because: -PHP is faster, and more mature. -Can handle CGI variables which keep state between different HTML views. -Can read form data given by user. -Can read in user-defined local configuration files from disk (private_clusters, etc). -Can call functions using local shell (for rrdtool graph, for example). On the other hand, I think PHP is a cumbersome language. I have talked about using mod_python, but the fact is, I would have to see a really good reason for doing so, as it would take alot of work, and perhaps end up being slower. Hope this helps answer your question. Federico On Thursday, December 5, 2002, at 11:53 AM, [EMAIL PROTECTED] wrote: The Ganglia web frontend uses PHP to transform xml to html (I think). Why was that method chosen instead of using PHP to make calls to xslt scripts to do the transformation? Is there a belief that PHP is better than xsl for coding xml to html transformations? Does the Ganglia web front end include transformations that aren't easily expressed in xsl? Jonathan Federico Rocks Cluster Group, Camp X-Ray, SDSC, San Diego GPG Fingerprint: 3C5E 47E7 BDF8 C14E ED92 92BB BA86 B2E6 0390 8845 --- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Question about Ganglia web front end
Well there are a few reasons. I know only a cursory bit about xslt, however, so let me know if I'm off base on any of these. We chose PHP over XSLT because: -PHP is faster, and more mature. -Can handle CGI variables which keep state between different HTML views. -Can read form data given by user. -Can read in user-defined local configuration files from disk (private_clusters, etc). -Can call functions using local shell (for rrdtool graph, for example). On the other hand, I think PHP is a cumbersome language. I have talked about using mod_python, but the fact is, I would have to see a really good reason for doing so, as it would take alot of work, and perhaps end up being slower. Hope this helps answer your question. Federico On Thursday, December 5, 2002, at 11:53 AM, [EMAIL PROTECTED] wrote: The Ganglia web frontend uses PHP to transform xml to html (I think). Why was that method chosen instead of using PHP to make calls to xslt scripts to do the transformation? Is there a belief that PHP is better than xsl for coding xml to html transformations? Does the Ganglia web front end include transformations that aren't easily expressed in xsl? Jonathan Federico Rocks Cluster Group, Camp X-Ray, SDSC, San Diego GPG Fingerprint: 3C5E 47E7 BDF8 C14E ED92 92BB BA86 B2E6 0390 8845