Ole, Thanks for this level of information. I think I also share concerns about load scripts on the hosts and have a slightly more complex structure but close enough so it is good to hear that other people are having good success in this type of deployment.
If you are ever allowed, I would welcome getting a look at your db+web structure. This is the one that I have most concerns about i.e. performance, amount of data, etc. Again thanks for the info and I hope to share my deployment scheme in the future (once I get there!). Chris On Fri, 2006-02-24 at 11:41, Ole Turvoll wrote: > Chris, all, > > I'm in the same legal position as Alex (In addition I'm not allowed to use my > work email address and rely on my ISP email service is only up intermittently > - but webmail is blocked while I'm at work). > > However I'd like to share my experience. > > Our hierarchy (by geography) is as follows: > > Global gmetad > | > | > Regional gmetad (collecting every 60 seconds) > | > | > send_receive gmonds (from 10 ~ 400 nodes) > | > | > nodes (~10k) sending udp unicast > > > I agree with Alex's notions though we do not use gmetrics functionality, for > various reasons, mainly around the impact of loading scripts and the > manageability around using this solution. > > What we've implemented is a global gmond web server configuration engine. > The architecture of this is an oracle database with a web front end which > controls our ganglia architecture. > > A walk through of functionality: > > On the nodes (gmond package) > With each gmond package I include a perl script which sends variables taken > from the local host (fqdn, interface) via HTTP to the web server. > > On the server (gmetad package) > The PHP cgi script sitting on the web server will return to the node a > gmond.conf from which specifics of which gmonds it will report to are > included. > > Depending on the fqdn the PHP cgi script gets it will either (1) enter the > host into a default gmond (updating the database) or (2) send its > configuration file back with the predesigned gmonds (taken from the database) > it will report to. > > Finally on the node end it starts the gmond (last phase of a package install) > with the newly acquired gmond.conf. > > Some other points about the architecture. > - It uses the TemplatePower engine. > - A cronjob checks that the gmond.conf is up to date every day at 12 pm local > time (there is no DoS since we've included a timeout on the node side) > - The database tables are very simple. > - Anyone can bulk update the database through a simple DBD perl script > - A web front end for the database table allows us to easily view the > send_receive gmonds and the send gmonds, which enables us to understand and > manage our environment with very low overhead. > > That's all I can think of for now.... Any questions/queries are welcome. > > Unfortunately I'm in the same position as Alex, would like to share this but > am not aware if I can at this time. > > Thanks, > > Ole > > > Chris Croswhite wrote: > > >Alex, > > > >Thanks for the great information. I'll check out the Jan email thread > >and then follow up with more questions. > > > >BTW, the script statement did come across rather badly, sorry about that > >(and after all that PC training I was required to take!) > > > >Thanks > >Chris > > > >On Thu, 2006-02-23 at 10:36, Alex Balk wrote: > > > > > >>Chris Croswhite wrote: > >> > >> > >> > >>>Alex, > >>> > >>>Yeah, I already have a ton of questions and need some pointers in large > >>>scale deploys (best practices, do's, dont's, etc,). > >>> > >>> > >>> > >>> > >>Till I get the legal issues out of the way, I can't share the scripts... > >>What I can do, however, is share the ideas I've implemented, as those > >>were developed outside the customer environment and were just spin-offs > >>of common concepts like orchestration, federation, etc. > >>Here's a few things: > >> > >> * When unicasting a tree hierarchy of nodes could provide useful > >> drill-down capabilities. > >> * Most organization already have some form of logical grouping for > >> cluster nodes. For example: faculty, course, devel-group, etc. > >> Within those groups one might find additional logical > >> partitioning. For example: platform, project, developer, etc. > >> Using these as the basis for constructing your logical hierarchy > >> provides real-world basis for information analysis, saves you the > >> trouble of deciding how to construct your grid tree and prevents > >> gmond aggregators from handling too many nodes (though I've found > >> that a single gmond can store information for 1k nodes without > >> noticeable impact on performance). > >> * Nodes will sometimes move between logical clusters. Hence, > >> whatever mechanism you have in place has to detect this and > >> regenerate its gmond.conf. > >> * Using a central map which stores "cluster_name gmond_aggregator > >> gmetad_aggregator" will save you the headache of figuring out who > >> reports where, who pulls info from where, etc. If you take this > >> approach be sure to cache this file locally or put it on your > >> parallel FS (if you use one). You wouldn't want 10k hosts trying > >> to retrieve it from a single filer. > >> * The same map file approach can be used for gmetrics. This allows > >> anyone in your IT group to add custom metrics without having to be > >> familiar with gmetric and without having to handle crontabs. A > >> mechanism which reads (the cached version of) this file could > >> handle inserting/removing crontabs as needed. > >> > >>Also, check out the ganglia-general thread from January 2006 called > >>"Pointers on architecting a large scale ganglia setup". > >> > >> > >> > >> > >>>I would love to get my hands on your shell scripts to figure out what > >>>you are doing (the unicast idea is pretty good). > >>> > >>> > >>> > >>> > >>Okay, that sounds almost obscene ;-) > >> > >> > >>Cheers, > >>Alex > >> > >> > >> > >>>Chris > >>> > >>> > >>>On Thu, 2006-02-23 at 09:35, Alex Balk wrote: > >>> > >>> > >>> > >>>>Chris, > >>>> > >>>> > >>>>Cool! Thanks! > >>>> > >>>>If you need any pointers on large-scale deployments, beyond the > >>>>excellent thread that was discussed here last month, drop us a line. I'm > >>>>managing Ganglia on a cluster of about the same size as yours, spanning > >>>>multiple sites. > >>>> > >>>> > >>>>I've developed a framework for automating the deployment of Ganglia in a > >>>>federated mode (we use unicast). I'm currently negotiating the > >>>>possibility of releasing this framework to the Ganglia community. It's > >>>>not the prettiest piece of code, as it's written in bash and spans a few > >>>>thousands lines of code (I didn't expect it to grow into something like > >>>>that), but it provides some nice functionality like map-based logical > >>>>clusters, automatic node migration between clusters, map-based gmetrics, > >>>>and some other candies. > >>>> > >>>>If negotiations fail I'll consider rewriting it from scratch in perl on > >>>>my own free time. > >>>> > >>>> > >>>>btw, I think Martin was looking for a build on HP-UX 11... > >>>> > >>>> > >>>>Cheers, > >>>> > >>>>Alex > >>>> > >>>> > >>>>Chris Croswhite wrote: > >>>> > >>>> > >>>> > >>>> > >>>>>>This raises another issue, which I believe is significant to the > >>>>>>development process of Ganglia. At the moment we don't seem to have > >>>>>>(correct me if I'm wrong) official testers for various platforms. > >>>>>>Maybe we could have some people volunteer to be official beta testers? > >>>>>>We wouldn't have to have a release out the door without properly > >>>>>>testing it under most OS/archs. > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>The company I work for is looking to deploy ganglia across all compute > >>>>>farms, some ~10k systems. I could help with beta testing on these > >>>>>platforms: > >>>>>HP-UX 11+11i > >>>>>AIX51+53 > >>>>>slowlaris7-10 > >>>>>solaris10 x64 > >>>>>linux32/64 (SuSE and RH) > >>>>> > >>>>>Just let me know when you have a new candidate and I can push the client > >>>>>onto some test systems. > >>>>> > >>>>>Chris > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>> > >>> > >>> > > > > > > > >------------------------------------------------------- > >This SF.Net email is sponsored by xPML, a groundbreaking scripting language > >that extends applications into web and mobile media. Attend the live webcast > >and join the prime developer group breaking into this new coding territory! > >http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > >_______________________________________________ > >Ganglia-developers mailing list > >Ganglia-developers@lists.sourceforge.net > >https://lists.sourceforge.net/lists/listinfo/ganglia-developers > > > > > > > > >