[Ganglia-general] rrd files world writable
Hi, any idea why the rrd files are created with world writable permissions? For example: -rw-rw-rw-1 nobody root11948 Sep 17 09:18 /var/lib/ganglia/rrds/unspecified/avicenna/cpu_nice.rrd Ist that a ganglia issue or it depends more on the rrdtool? Anas
[Ganglia-general] Re: web frontend PHP problem
Forgot to write back to the list about this...turns out that I just needed to upgrade my version of PHP. You need version 4.1. Steve Gilbert Unix Systems Administrator [EMAIL PROTECTED]
[Ganglia-general] Ganglia architecture and gmond load
Hi folks, I don't know if I'm just trying to push Ganglia to more than it can handle or if I'm doing something wrong, but no matter how I design my Ganglia structure, gmetad seems to always crush the machine where it runs. Here's an overview of my environment: Ganglia 2.5.4 All hosts involved are running RedHat 7.2 RRDtool version 1.0.45 I have 16 subnets, each with 200 machines give or take a few. I estimate around 3000 nodes total. Some of these are dual P3, some are single P4, and a few random Xeon and Itanium nodes. Every node is running gmond, and that's running fine. Each subnet has a master node that is a dual P3 1.3GHz. This box provides DNS, NIS, and static DHCP for the subnet. Normal load on these machines is very, very minimal. My first attempt was to set up a single dedicated Ganglia machine running gmetad, Apache, and the web frontend. In this machine's gmetad.conf file, I listed each of the master nodes in the subnets as data sources. I thought having one box collect all the data and store the RRD files would be great. Well, this was a bad idea...the box (a P4 with 2GB RAM) was absolutely crushed...load shot up to 8.5, and all the graphs continually had gaps in them. So my next attempt to was to install gmetad on each of the master nodes. I would have this gmetad collect data for the subnet, and then run another gmetad on my Ganglia web machine to just talk to these 16 other gmetads. I don't really like having to now backup 16 machines, but I've had problems before with trying to store RRD files on an NFS mount, so I decided not to try that. This isn't working all that great, either...the gmetad on these master nodes (collecting data from ~200 hosts each) is also causing a pretty high load...the boxes now stay around 2-3 load points all the time and sometimes slows down other operations on the box. Am I doing something wrong, or is gmetad really this much of a resource hog? Anyone else trying to use Ganglia to monitor 3000 machines? Am I asking too much? Thanks for any insight. Steve Gilbert Unix Systems Administrator [EMAIL PROTECTED]
Re: [Ganglia-general] Ganglia architecture and gmond load
steve- the single biggest problem with scaling gmetad is disk i/o problems. what type of filesystem are you writing the gmetad RRDs to? most people have had very good luck using a Ram-based filesystem and then periodically syncing the data to disk. for example in linux, % mount -t tmpfs tmpfs /mnt now the /mnt directory is a ram-backed filesystem. if the machine is rebooted however all the data is lost. so you will need to write the contents of that filesystem to disk every now and then. -matt Today, Steve Gilbert wrote forth saying... From: Steve Gilbert [EMAIL PROTECTED] To: 'ganglia-general@lists.sourceforge.net' ganglia-general@lists.sourceforge.net Date: Wed, 17 Sep 2003 16:45:26 -0700 Subject: [Ganglia-general] Ganglia architecture and gmond load Hi folks, I don't know if I'm just trying to push Ganglia to more than it can handle or if I'm doing something wrong, but no matter how I design my Ganglia structure, gmetad seems to always crush the machine where it runs. Here's an overview of my environment: Ganglia 2.5.4 All hosts involved are running RedHat 7.2 RRDtool version 1.0.45 I have 16 subnets, each with 200 machines give or take a few. I estimate around 3000 nodes total. Some of these are dual P3, some are single P4, and a few random Xeon and Itanium nodes. Every node is running gmond, and that's running fine. Each subnet has a master node that is a dual P3 1.3GHz. This box provides DNS, NIS, and static DHCP for the subnet. Normal load on these machines is very, very minimal. My first attempt was to set up a single dedicated Ganglia machine running gmetad, Apache, and the web frontend. In this machine's gmetad.conf file, I listed each of the master nodes in the subnets as data sources. I thought having one box collect all the data and store the RRD files would be great. Well, this was a bad idea...the box (a P4 with 2GB RAM) was absolutely crushed...load shot up to 8.5, and all the graphs continually had gaps in them. So my next attempt to was to install gmetad on each of the master nodes. I would have this gmetad collect data for the subnet, and then run another gmetad on my Ganglia web machine to just talk to these 16 other gmetads. I don't really like having to now backup 16 machines, but I've had problems before with trying to store RRD files on an NFS mount, so I decided not to try that. This isn't working all that great, either...the gmetad on these master nodes (collecting data from ~200 hosts each) is also causing a pretty high load...the boxes now stay around 2-3 load points all the time and sometimes slows down other operations on the box. Am I doing something wrong, or is gmetad really this much of a resource hog? Anyone else trying to use Ganglia to monitor 3000 machines? Am I asking too much? Thanks for any insight. Steve Gilbert Unix Systems Administrator [EMAIL PROTECTED] --- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
RE: [Ganglia-general] Ganglia architecture and gmond load
I'm writing the RRDs to a local SCSI drive with an ext3 filesystem. I'll investigate the RAM disk option. Thanks! Steve Gilbert Unix Systems Administrator [EMAIL PROTECTED] -Original Message- From: matt massie [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 17, 2003 5:02 PM To: Steve Gilbert Cc: 'ganglia-general@lists.sourceforge.net' Subject: Re: [Ganglia-general] Ganglia architecture and gmond load steve- the single biggest problem with scaling gmetad is disk i/o problems. what type of filesystem are you writing the gmetad RRDs to? most people have had very good luck using a Ram-based filesystem and then periodically syncing the data to disk. for example in linux, % mount -t tmpfs tmpfs /mnt now the /mnt directory is a ram-backed filesystem. if the machine is rebooted however all the data is lost. so you will need to write the contents of that filesystem to disk every now and then. -matt Today, Steve Gilbert wrote forth saying... From: Steve Gilbert [EMAIL PROTECTED] To: 'ganglia-general@lists.sourceforge.net' ganglia-general@lists.sourceforge.net Date: Wed, 17 Sep 2003 16:45:26 -0700 Subject: [Ganglia-general] Ganglia architecture and gmond load Hi folks, I don't know if I'm just trying to push Ganglia to more than it can handle or if I'm doing something wrong, but no matter how I design my Ganglia structure, gmetad seems to always crush the machine where it runs. Here's an overview of my environment: Ganglia 2.5.4 All hosts involved are running RedHat 7.2 RRDtool version 1.0.45 I have 16 subnets, each with 200 machines give or take a few. I estimate around 3000 nodes total. Some of these are dual P3, some are single P4, and a few random Xeon and Itanium nodes. Every node is running gmond, and that's running fine. Each subnet has a master node that is a dual P3 1.3GHz. This box provides DNS, NIS, and static DHCP for the subnet. Normal load on these machines is very, very minimal. My first attempt was to set up a single dedicated Ganglia machine running gmetad, Apache, and the web frontend. In this machine's gmetad.conf file, I listed each of the master nodes in the subnets as data sources. I thought having one box collect all the data and store the RRD files would be great. Well, this was a bad idea...the box (a P4 with 2GB RAM) was absolutely crushed...load shot up to 8.5, and all the graphs continually had gaps in them. So my next attempt to was to install gmetad on each of the master nodes. I would have this gmetad collect data for the subnet, and then run another gmetad on my Ganglia web machine to just talk to these 16 other gmetads. I don't really like having to now backup 16 machines, but I've had problems before with trying to store RRD files on an NFS mount, so I decided not to try that. This isn't working all that great, either...the gmetad on these master nodes (collecting data from ~200 hosts each) is also causing a pretty high load...the boxes now stay around 2-3 load points all the time and sometimes slows down other operations on the box. Am I doing something wrong, or is gmetad really this much of a resource hog? Anyone else trying to use Ganglia to monitor 3000 machines? Am I asking too much? Thanks for any insight. Steve Gilbert Unix Systems Administrator [EMAIL PROTECTED] --- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Ganglia architecture and gmond load
Hi Steve, Most likely, your problems are caused by disk I/O activity because gmetad it trying to update tens of thousands of rrd files every 15 seconds. I have switched to using tmpfs and have no problems monitoring a little over 1,000 nodes with a single gmetad collector node. The computer running gmetad is a dual 1GHz PIII with 1Gig of RAM, and typically has a load under 1.0. I am using about 350Megs of RAM to monitor the thousand nodes, so you will probably have to allocate a pretty big chunk of memory for your three thousand nodes. Just put an entry similar to this into /etc/fstab and mount it: none /var/lib/ganglia/rrds tmpfs \ size=500M,mode=755,uid=nobody,gid=nobody 0 0 ~Jason On Wed, 2003-09-17 at 19:45, Steve Gilbert wrote: Hi folks, I don't know if I'm just trying to push Ganglia to more than it can handle or if I'm doing something wrong, but no matter how I design my Ganglia structure, gmetad seems to always crush the machine where it runs. Here's an overview of my environment: Ganglia 2.5.4 All hosts involved are running RedHat 7.2 RRDtool version 1.0.45 I have 16 subnets, each with 200 machines give or take a few. I estimate around 3000 nodes total. Some of these are dual P3, some are single P4, and a few random Xeon and Itanium nodes. Every node is running gmond, and that's running fine. Each subnet has a master node that is a dual P3 1.3GHz. This box provides DNS, NIS, and static DHCP for the subnet. Normal load on these machines is very, very minimal. My first attempt was to set up a single dedicated Ganglia machine running gmetad, Apache, and the web frontend. In this machine's gmetad.conf file, I listed each of the master nodes in the subnets as data sources. I thought having one box collect all the data and store the RRD files would be great. Well, this was a bad idea...the box (a P4 with 2GB RAM) was absolutely crushed...load shot up to 8.5, and all the graphs continually had gaps in them. So my next attempt to was to install gmetad on each of the master nodes. I would have this gmetad collect data for the subnet, and then run another gmetad on my Ganglia web machine to just talk to these 16 other gmetads. I don't really like having to now backup 16 machines, but I've had problems before with trying to store RRD files on an NFS mount, so I decided not to try that. This isn't working all that great, either...the gmetad on these master nodes (collecting data from ~200 hosts each) is also causing a pretty high load...the boxes now stay around 2-3 load points all the time and sometimes slows down other operations on the box. Am I doing something wrong, or is gmetad really this much of a resource hog? Anyone else trying to use Ganglia to monitor 3000 machines? Am I asking too much? Thanks for any insight. Steve Gilbert Unix Systems Administrator [EMAIL PROTECTED] --- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
RE: [Ganglia-general] Ganglia architecture and gmond load
Thanks Jason...I think I'm going to go back to my single gmetad design and try that again with the tmpfs. I can throw as much memory as I want on that box. I'll report back tomorrow on how well it goes with 3000 nodes :-) Steve Gilbert Unix Systems Administrator [EMAIL PROTECTED] -Original Message- From: Jason A. Smith [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 17, 2003 5:20 PM To: Steve Gilbert Cc: 'ganglia-general@lists.sourceforge.net' Subject: Re: [Ganglia-general] Ganglia architecture and gmond load Hi Steve, Most likely, your problems are caused by disk I/O activity because gmetad it trying to update tens of thousands of rrd files every 15 seconds. I have switched to using tmpfs and have no problems monitoring a little over 1,000 nodes with a single gmetad collector node. The computer running gmetad is a dual 1GHz PIII with 1Gig of RAM, and typically has a load under 1.0. I am using about 350Megs of RAM to monitor the thousand nodes, so you will probably have to allocate a pretty big chunk of memory for your three thousand nodes. Just put an entry similar to this into /etc/fstab and mount it: none /var/lib/ganglia/rrds tmpfs \ size=500M,mode=755,uid=nobody,gid=nobody 0 0 ~Jason On Wed, 2003-09-17 at 19:45, Steve Gilbert wrote: Hi folks, I don't know if I'm just trying to push Ganglia to more than it can handle or if I'm doing something wrong, but no matter how I design my Ganglia structure, gmetad seems to always crush the machine where it runs. Here's an overview of my environment: Ganglia 2.5.4 All hosts involved are running RedHat 7.2 RRDtool version 1.0.45 I have 16 subnets, each with 200 machines give or take a few. I estimate around 3000 nodes total. Some of these are dual P3, some are single P4, and a few random Xeon and Itanium nodes. Every node is running gmond, and that's running fine. Each subnet has a master node that is a dual P3 1.3GHz. This box provides DNS, NIS, and static DHCP for the subnet. Normal load on these machines is very, very minimal. My first attempt was to set up a single dedicated Ganglia machine running gmetad, Apache, and the web frontend. In this machine's gmetad.conf file, I listed each of the master nodes in the subnets as data sources. I thought having one box collect all the data and store the RRD files would be great. Well, this was a bad idea...the box (a P4 with 2GB RAM) was absolutely crushed...load shot up to 8.5, and all the graphs continually had gaps in them. So my next attempt to was to install gmetad on each of the master nodes. I would have this gmetad collect data for the subnet, and then run another gmetad on my Ganglia web machine to just talk to these 16 other gmetads. I don't really like having to now backup 16 machines, but I've had problems before with trying to store RRD files on an NFS mount, so I decided not to try that. This isn't working all that great, either...the gmetad on these master nodes (collecting data from ~200 hosts each) is also causing a pretty high load...the boxes now stay around 2-3 load points all the time and sometimes slows down other operations on the box. Am I doing something wrong, or is gmetad really this much of a resource hog? Anyone else trying to use Ganglia to monitor 3000 machines? Am I asking too much? Thanks for any insight. Steve Gilbert Unix Systems Administrator [EMAIL PROTECTED] --- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general