Chris Croswhite wrote:

> Alex,
>
> Yeah, I already have a ton of questions and need some pointers in large
> scale deploys (best practices, do's, dont's, etc,).
>
>   

Till I get the legal issues out of the way, I can't share the scripts...
What I can do, however, is share the ideas I've implemented, as those
were developed outside the customer environment and were just spin-offs
of common concepts like orchestration, federation, etc.
Here's a few things:

    * When unicasting a tree hierarchy of nodes could provide useful
      drill-down capabilities.
    * Most organization already have some form of logical grouping for
      cluster nodes. For example: faculty, course, devel-group, etc.
      Within those groups one might find additional logical
      partitioning. For example: platform, project, developer, etc.
      Using these as the basis for constructing your logical hierarchy
      provides real-world basis for information analysis, saves you the
      trouble of deciding how to construct your grid tree and prevents
      gmond aggregators from handling too many nodes (though I've found
      that a single gmond can store information for 1k nodes without
      noticeable impact on performance).
    * Nodes will sometimes move between logical clusters. Hence,
      whatever mechanism you have in place has to detect this and
      regenerate its gmond.conf.
    * Using a central map which stores "cluster_name gmond_aggregator
      gmetad_aggregator" will save you the headache of figuring out who
      reports where, who pulls info from where, etc. If you take this
      approach be sure to cache this file locally or put it on your
      parallel FS (if you use one). You wouldn't want 10k hosts trying
      to retrieve it from a single filer.
    * The same map file approach can be used for gmetrics. This allows
      anyone in your IT group to add custom metrics without having to be
      familiar with gmetric and without having to handle crontabs. A
      mechanism which reads (the cached version of) this file could
      handle inserting/removing crontabs as needed.

Also, check out the ganglia-general thread from January 2006 called
"Pointers on architecting a large scale ganglia setup".


> I would love to get my hands on your shell scripts to figure out what
> you are doing (the unicast idea is pretty good).
>
>   

Okay, that sounds almost obscene ;-)


Cheers,
Alex

> Chris
>
>
> On Thu, 2006-02-23 at 09:35, Alex Balk wrote:
>   
>> Chris,
>>
>>
>> Cool! Thanks!
>>
>> If you need any pointers on large-scale deployments, beyond the
>> excellent thread that was discussed here last month, drop us a line. I'm
>> managing Ganglia on a cluster of about the same size as yours, spanning
>> multiple sites.
>>
>>
>> I've developed a framework for automating the deployment of Ganglia in a
>> federated mode (we use unicast). I'm currently negotiating the
>> possibility of releasing this framework to the Ganglia community. It's
>> not the prettiest piece of code, as it's written in bash and spans a few
>> thousands lines of code (I didn't expect it to grow into something like
>> that), but it provides some nice functionality like map-based logical
>> clusters, automatic node migration between clusters, map-based gmetrics,
>> and some other candies.
>>
>> If negotiations fail I'll consider rewriting it from scratch in perl on
>> my own free time.
>>
>>
>> btw, I think Martin was looking for a build on HP-UX 11...
>>
>>
>> Cheers,
>>
>> Alex
>>
>>
>> Chris Croswhite wrote:
>>
>>     
>>>> This raises another issue, which I believe is significant to the
>>>> development process of Ganglia. At the moment we don't seem to have
>>>> (correct me if I'm wrong) official testers for various platforms.
>>>> Maybe we could have some people volunteer to be official beta testers?
>>>> We wouldn't have to have a release out the door without properly
>>>> testing it under most OS/archs.
>>>>     
>>>>         
>>> The company I work for is looking to deploy ganglia across all compute
>>> farms, some ~10k systems.  I could help with beta testing on these
>>> platforms:
>>> HP-UX 11+11i
>>> AIX51+53
>>> slowlaris7-10
>>> solaris10 x64
>>> linux32/64 (SuSE and RH)
>>>
>>> Just let me know when you have a new candidate and I can push the client
>>> onto some test systems.
>>>
>>> Chris
>>>
>>>   
>>>       
>
>   

Reply via email to