Hi Steve,

Most likely, your problems are caused by disk I/O activity because
gmetad it trying to update tens of thousands of rrd files every 15
seconds.  I have switched to using tmpfs and have no problems monitoring
a little over 1,000 nodes with a single gmetad collector node.  The
computer running gmetad is a dual 1GHz PIII with 1Gig of RAM, and
typically has a load under 1.0.  I am using about 350Megs of RAM to
monitor the thousand nodes, so you will probably have to allocate a
pretty big chunk of memory for your three thousand nodes.  Just put an
entry similar to this into /etc/fstab and mount it:

none  /var/lib/ganglia/rrds  tmpfs  \
  size=500M,mode=755,uid=nobody,gid=nobody 0 0


~Jason


On Wed, 2003-09-17 at 19:45, Steve Gilbert wrote:
> Hi folks,
> 
> I don't know if I'm just trying to push Ganglia to more than it can handle
> or if I'm doing something wrong, but no matter how I design my Ganglia
> structure, gmetad seems to always crush the machine where it runs.  Here's
> an overview of my environment:
> 
> Ganglia 2.5.4
> All hosts involved are running RedHat 7.2
> RRDtool version 1.0.45
> 
> I have 16 subnets, each with 200 machines give or take a few.  I estimate
> around 3000 nodes total.  Some of these are dual P3, some are single P4, and
> a few random Xeon and Itanium nodes.  Every node is running gmond, and
> that's running fine.
> 
> Each subnet has a "master" node that is a dual P3 1.3GHz.  This box provides
> DNS, NIS, and static DHCP for the subnet.  Normal load on these machines is
> very, very minimal.
> 
> My first attempt was to set up a single dedicated Ganglia machine running
> gmetad, Apache, and the web frontend.  In this machine's gmetad.conf file, I
> listed each of the "master" nodes in the subnets as data sources.  I thought
> having one box collect all the data and store the RRD files would be great.
> Well, this was a bad idea...the box (a P4 with 2GB RAM) was absolutely
> crushed...load shot up to 8.5, and all the graphs continually had gaps in
> them.
> 
> So my next attempt to was to install gmetad on each of the "master" nodes.
> I would have this gmetad collect data for the subnet, and then run another
> gmetad on my Ganglia web machine to just talk to these 16 other gmetads.  I
> don't really like having to now backup 16 machines, but I've had problems
> before with trying to store RRD files on an NFS mount, so I decided not to
> try that.  This isn't working all that great, either...the gmetad on these
> "master" nodes (collecting data from ~200 hosts each) is also causing a
> pretty high load...the boxes now stay around 2-3 load points all the time
> and sometimes slows down other operations on the box.
> 
> Am I doing something wrong, or is gmetad really this much of a resource hog?
> Anyone else trying to use Ganglia to monitor 3000 machines?  Am I asking too
> much?  Thanks for any insight.
> 
> Steve Gilbert
> Unix Systems Administrator
> [EMAIL PROTECTED]
> 
> 
> -------------------------------------------------------
> This sf.net email is sponsored by:ThinkGeek
> Welcome to geek heaven.
> http://thinkgeek.com/sf
> _______________________________________________
> Ganglia-general mailing list
> Ganglia-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/ganglia-general
> 

Reply via email to