Ole,

Thanks for this level of information.  I think I also share concerns
about load scripts on the hosts and have a slightly more complex
structure but close enough so it is good to hear that other people are
having good success in this type of deployment.

If you are ever allowed, I would welcome getting a look at your db+web
structure.  This is the one that I have most concerns about i.e.
performance, amount of data, etc.

Again thanks for the info and I hope to share my deployment scheme in
the future (once I get there!).

Chris

On Fri, 2006-02-24 at 11:41, Ole Turvoll wrote:
> Chris, all,
> 
> I'm in the same legal position as Alex (In addition I'm not allowed to use my 
> work email address and rely on my ISP email service is only up intermittently 
> - but webmail is blocked while I'm at work).
> 
> However I'd like to share my experience.
> 
> Our hierarchy (by geography) is as follows:
> 
> Global gmetad
>       |
>       |
> Regional gmetad (collecting every 60 seconds)
>       |
>       |
> send_receive gmonds (from 10 ~ 400 nodes) 
>       |
>       |
> nodes (~10k) sending udp unicast
> 
> 
> I agree with Alex's notions though we do not use gmetrics functionality, for 
> various reasons, mainly around the impact of loading scripts and the 
> manageability around using this solution.
> 
> What we've implemented is a global gmond web server configuration engine.  
> The architecture of this is an oracle database with a web front end which 
> controls our ganglia architecture.
> 
> A walk through of functionality:
> 
> On the nodes (gmond package)
> With each gmond package I include a perl script which sends variables taken 
> from the local host (fqdn, interface) via HTTP to the web server.  
> 
> On the server (gmetad package)
> The PHP cgi script sitting on the web server will return to the node a 
> gmond.conf from which specifics of which gmonds it will report to are 
> included.  
> 
> Depending on the fqdn the PHP cgi script gets it will either (1) enter the 
> host into a default gmond (updating the database) or (2) send its 
> configuration file back with the predesigned gmonds (taken from the database) 
> it will report to.
> 
> Finally on the node end it starts the gmond (last phase of a package install) 
> with the newly acquired gmond.conf.
> 
> Some other points about the architecture.
> - It uses the TemplatePower engine.
> - A cronjob checks that the gmond.conf is up to date every day at 12 pm local 
> time (there is no DoS since we've included a timeout on the node side)
> - The database tables are very simple.
> - Anyone can bulk update the database through a simple DBD perl script
> - A web front end for the database table allows us to easily view the 
> send_receive gmonds and the send gmonds, which enables us to understand and 
> manage our environment with very low overhead. 
>  
> That's all I can think of for now....  Any questions/queries are welcome.
> 
> Unfortunately I'm in the same position as Alex, would like to share this but 
> am not aware if I can at this time.
> 
> Thanks,
> 
> Ole
> 
> 
> Chris Croswhite wrote:
> 
> >Alex,
> >
> >Thanks for the great information.  I'll check out the Jan email thread
> >and then follow up with more questions.
> >
> >BTW, the script statement did come across rather badly, sorry about that
> >(and after all that PC training I was required to take!)
> >
> >Thanks
> >Chris
> >
> >On Thu, 2006-02-23 at 10:36, Alex Balk wrote:
> >  
> >
> >>Chris Croswhite wrote:
> >>
> >>    
> >>
> >>>Alex,
> >>>
> >>>Yeah, I already have a ton of questions and need some pointers in large
> >>>scale deploys (best practices, do's, dont's, etc,).
> >>>
> >>>  
> >>>      
> >>>
> >>Till I get the legal issues out of the way, I can't share the scripts...
> >>What I can do, however, is share the ideas I've implemented, as those
> >>were developed outside the customer environment and were just spin-offs
> >>of common concepts like orchestration, federation, etc.
> >>Here's a few things:
> >>
> >>    * When unicasting a tree hierarchy of nodes could provide useful
> >>      drill-down capabilities.
> >>    * Most organization already have some form of logical grouping for
> >>      cluster nodes. For example: faculty, course, devel-group, etc.
> >>      Within those groups one might find additional logical
> >>      partitioning. For example: platform, project, developer, etc.
> >>      Using these as the basis for constructing your logical hierarchy
> >>      provides real-world basis for information analysis, saves you the
> >>      trouble of deciding how to construct your grid tree and prevents
> >>      gmond aggregators from handling too many nodes (though I've found
> >>      that a single gmond can store information for 1k nodes without
> >>      noticeable impact on performance).
> >>    * Nodes will sometimes move between logical clusters. Hence,
> >>      whatever mechanism you have in place has to detect this and
> >>      regenerate its gmond.conf.
> >>    * Using a central map which stores "cluster_name gmond_aggregator
> >>      gmetad_aggregator" will save you the headache of figuring out who
> >>      reports where, who pulls info from where, etc. If you take this
> >>      approach be sure to cache this file locally or put it on your
> >>      parallel FS (if you use one). You wouldn't want 10k hosts trying
> >>      to retrieve it from a single filer.
> >>    * The same map file approach can be used for gmetrics. This allows
> >>      anyone in your IT group to add custom metrics without having to be
> >>      familiar with gmetric and without having to handle crontabs. A
> >>      mechanism which reads (the cached version of) this file could
> >>      handle inserting/removing crontabs as needed.
> >>
> >>Also, check out the ganglia-general thread from January 2006 called
> >>"Pointers on architecting a large scale ganglia setup".
> >>
> >>
> >>    
> >>
> >>>I would love to get my hands on your shell scripts to figure out what
> >>>you are doing (the unicast idea is pretty good).
> >>>
> >>>  
> >>>      
> >>>
> >>Okay, that sounds almost obscene ;-)
> >>
> >>
> >>Cheers,
> >>Alex
> >>
> >>    
> >>
> >>>Chris
> >>>
> >>>
> >>>On Thu, 2006-02-23 at 09:35, Alex Balk wrote:
> >>>  
> >>>      
> >>>
> >>>>Chris,
> >>>>
> >>>>
> >>>>Cool! Thanks!
> >>>>
> >>>>If you need any pointers on large-scale deployments, beyond the
> >>>>excellent thread that was discussed here last month, drop us a line. I'm
> >>>>managing Ganglia on a cluster of about the same size as yours, spanning
> >>>>multiple sites.
> >>>>
> >>>>
> >>>>I've developed a framework for automating the deployment of Ganglia in a
> >>>>federated mode (we use unicast). I'm currently negotiating the
> >>>>possibility of releasing this framework to the Ganglia community. It's
> >>>>not the prettiest piece of code, as it's written in bash and spans a few
> >>>>thousands lines of code (I didn't expect it to grow into something like
> >>>>that), but it provides some nice functionality like map-based logical
> >>>>clusters, automatic node migration between clusters, map-based gmetrics,
> >>>>and some other candies.
> >>>>
> >>>>If negotiations fail I'll consider rewriting it from scratch in perl on
> >>>>my own free time.
> >>>>
> >>>>
> >>>>btw, I think Martin was looking for a build on HP-UX 11...
> >>>>
> >>>>
> >>>>Cheers,
> >>>>
> >>>>Alex
> >>>>
> >>>>
> >>>>Chris Croswhite wrote:
> >>>>
> >>>>    
> >>>>        
> >>>>
> >>>>>>This raises another issue, which I believe is significant to the
> >>>>>>development process of Ganglia. At the moment we don't seem to have
> >>>>>>(correct me if I'm wrong) official testers for various platforms.
> >>>>>>Maybe we could have some people volunteer to be official beta testers?
> >>>>>>We wouldn't have to have a release out the door without properly
> >>>>>>testing it under most OS/archs.
> >>>>>>    
> >>>>>>        
> >>>>>>            
> >>>>>>
> >>>>>The company I work for is looking to deploy ganglia across all compute
> >>>>>farms, some ~10k systems.  I could help with beta testing on these
> >>>>>platforms:
> >>>>>HP-UX 11+11i
> >>>>>AIX51+53
> >>>>>slowlaris7-10
> >>>>>solaris10 x64
> >>>>>linux32/64 (SuSE and RH)
> >>>>>
> >>>>>Just let me know when you have a new candidate and I can push the client
> >>>>>onto some test systems.
> >>>>>
> >>>>>Chris
> >>>>>
> >>>>>  
> >>>>>      
> >>>>>          
> >>>>>
> >>>  
> >>>      
> >>>
> >
> >
> >
> >-------------------------------------------------------
> >This SF.Net email is sponsored by xPML, a groundbreaking scripting language
> >that extends applications into web and mobile media. Attend the live webcast
> >and join the prime developer group breaking into this new coding territory!
> >http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
> >_______________________________________________
> >Ganglia-developers mailing list
> >Ganglia-developers@lists.sourceforge.net
> >https://lists.sourceforge.net/lists/listinfo/ganglia-developers
> >
> >
> >  
> >
> 


Reply via email to