Re: [Ganglia-developers] Does Ganglia work well for a large-scale cluster

Anders Björklund Thu, 30 Mar 2017 14:38:09 -0700

Also, if the size of the XML payload is the biggest concern (rather than the 
sheer amount of XDR traffic) then gzip compression would be a good idea:


gzip_output = yes

See https://www.quantcast.com/blog/quantcast-open-source-diaries-ganglia-gzip/ 
for some background. Also might want to look into using rrdcached ?

https://github.com/ganglia/monitor-core/wiki/Integrating-Ganglia-with-rrdcached

/Anders


Den 2017-03-30 kl. 15:22, skrev Vladimir Vuksan:
Clusters are logical grouping of like hosts. This can be e.g. per location 
(same data center), per app or per function (DB, web, etc.). It really depends 
how you are viewing your environment. There is no right or wrong way to group 
it.

Vladimir

03/30/2017 u 04:30 AM, Guo, Jason je napisao/la:
Thanks Vladimir

As you mentioned, FB had clusters with tens of thousands of nodes in a cluster.

How they orchestrate these nodes? Here are some options in my mind

1.       All the nodes share a few centralized gmonds and all of them belong to 
a single cluster (the cluster concept in ganglia)

2.       All the nodes share a few centralized gmonds and each centralized 
gmond belong to different cluster, and there is a single gmetad which poll data 
from these centralized gmond

3.       There are multiple gmetad/grid and then orchestrate these grids with a 
centralized gmetad/grid\

Thanks & Best Regards,
Jason Guo

From: Vladimir Vuksan <vli...@veus.hr><mailto:vli...@veus.hr>
Date: Wednesday, March 29, 2017 at 20:09
To: "Guo, Jason" <ju...@ebay.com><mailto:ju...@ebay.com>, 
"ganglia-developers@lists.sourceforge.net"<mailto:ganglia-developers@lists.sourceforge.net>
 
<ganglia-developers@lists.sourceforge.net><mailto:ganglia-developers@lists.sourceforge.net>
Subject: Re: [Ganglia-developers] Does Ganglia work well for a large-scale 
cluster

Hi Jason,

it depends on the number of metrics and associated metadata in the cluster and 
how busy gmetad is overall. Also depends on your hardware. At one point FB had 
clusters with tens of thousands of nodes in a cluster.

Try to keep your metrics lean ie. don't add any metric descriptions if you 
don't have to so to keep the XML payload small and it should be fine.

Vladimir

3/28/2017 u 10:19 PM, Guo, Jason je napisao/la:
Hi,


I’m writing this mail to discuss whether Ganglia works well for a large-scale 
cluster (more than 4000 nodes).


As per Ganglia document, ganglia can scale to handle clusters with 2000 nodes. 
So many people have concern on using Ganglia for a 4000 nodes production 
cluster.
It has been used to link clusters across university campuses and around the 
world and can scale to handle clusters with 2000 nodes.

If the cluster is large than 2000 nodes, say 4000 nodes, can Ganglia handle it 
properly?


To verify this, I create a 5000 nodes ganglia cluster on top of Docker cluster 
(10 machine).
I put 500 nodes in a cluster, so there are 10 cluster. And these 10 clusters 
are in the same Grid.
For each gmond,  I use a script to generate 30 customized metrics (with 
gmetric).

Currently it works fine in the Docker based test environment.

So, my question is whether Ganglia is suitable for 4000 nodes cluster?

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot

_______________________________________________
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] Does Ganglia work well for a large-scale cluster

Reply via email to