On 9/12/14 8:25 AM, daniel.j.marr...@us.hsbc.com wrote:
Greetings all and apologies if I have directed this to the incorrect list but I thought it might be best to begin with those closest to the source, so to speak.

I work for a fairly large international bank and we are currently evaluating options for collecting and visualizing performance related statistics for the entirety of our UNIX/Linux estate (with the possibility of including Windows at some point). Naturally I took to the internet and came across Ganglia as one of the (widely used) possible options. I then spent some time looking through reports of issues, etc. and have some questions/concerns regarding how best to organize my infrastructure should I decide to recommend Ganglia as the solution.

In preparation I thought it best to do some discovery around the size of our estate and any details our end users (system administrators, performance engineers, etc) would say needed to be included in the metric set. To that end I would say that we have approximately 26K servers today and, given rough extrapolation, could easily wind up in the neighborhood of 4.5M total metrics within the total system. Our expectation is to extend the base set of metrics to include any number of middleware related measurements which is the primary reason for the significant number of metrics. We will also be using unicast .. unless, of course, a compelling enough case can be made for the alternative.

My initial instincts are to subdivide the Ganglia infrastructure by major data-center with each one represented by a single grid. I imagine I would need 6-12 clusters (possibly more) per grid and will definitely be looking to use rrdcached. I do not know if that will be enough segregation to allow gmetad to perform as required. Several of my larger (more influential) end users have indicated a need for some fairly tight resolutions (15s for 4hrs for a number of high value metrics).

I guess my initial question is this ... has anyone done anything like this, at this scale, with any success and - if so - would it be possible to get some additional information (scrubbed diagram, etc) regarding how it is best done? I've been searching the net and keep coming back to a single image showing a hierarchy of gmetad and some fairly interesting descriptions of other implementations but nothing that actually makes it clear to me.

Hi Daniel,

If you haven't already seen it, I would recommend checking out the Ganglia O'Reilly book (http://shop.oreilly.com/product/0636920025573.do). It has several case studies from large and complex organizations, complete with scale information and some benchmarking, that might help aid your planning.

- Adam Compton

------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Reply via email to