[Ganglia-general] rrd files world writable

2003-09-17 Thread Anas Nashif

Hi,

any idea why the rrd files are created with world writable permissions?

For example:
-rw-rw-rw-1 nobody   root11948 Sep 17 09:18 
/var/lib/ganglia/rrds/unspecified/avicenna/cpu_nice.rrd


Ist that a ganglia issue or it depends more on the rrdtool?


Anas




[Ganglia-general] Re: web frontend PHP problem

2003-09-17 Thread Steve Gilbert
Forgot to write back to the list about this...turns out that I just needed
to upgrade my version of PHP.  You need version 4.1.

Steve Gilbert
Unix Systems Administrator
[EMAIL PROTECTED]



[Ganglia-general] Ganglia architecture and gmond load

2003-09-17 Thread Steve Gilbert
Hi folks,

I don't know if I'm just trying to push Ganglia to more than it can handle
or if I'm doing something wrong, but no matter how I design my Ganglia
structure, gmetad seems to always crush the machine where it runs.  Here's
an overview of my environment:

Ganglia 2.5.4
All hosts involved are running RedHat 7.2
RRDtool version 1.0.45

I have 16 subnets, each with 200 machines give or take a few.  I estimate
around 3000 nodes total.  Some of these are dual P3, some are single P4, and
a few random Xeon and Itanium nodes.  Every node is running gmond, and
that's running fine.

Each subnet has a master node that is a dual P3 1.3GHz.  This box provides
DNS, NIS, and static DHCP for the subnet.  Normal load on these machines is
very, very minimal.

My first attempt was to set up a single dedicated Ganglia machine running
gmetad, Apache, and the web frontend.  In this machine's gmetad.conf file, I
listed each of the master nodes in the subnets as data sources.  I thought
having one box collect all the data and store the RRD files would be great.
Well, this was a bad idea...the box (a P4 with 2GB RAM) was absolutely
crushed...load shot up to 8.5, and all the graphs continually had gaps in
them.

So my next attempt to was to install gmetad on each of the master nodes.
I would have this gmetad collect data for the subnet, and then run another
gmetad on my Ganglia web machine to just talk to these 16 other gmetads.  I
don't really like having to now backup 16 machines, but I've had problems
before with trying to store RRD files on an NFS mount, so I decided not to
try that.  This isn't working all that great, either...the gmetad on these
master nodes (collecting data from ~200 hosts each) is also causing a
pretty high load...the boxes now stay around 2-3 load points all the time
and sometimes slows down other operations on the box.

Am I doing something wrong, or is gmetad really this much of a resource hog?
Anyone else trying to use Ganglia to monitor 3000 machines?  Am I asking too
much?  Thanks for any insight.

Steve Gilbert
Unix Systems Administrator
[EMAIL PROTECTED]



Re: [Ganglia-general] Ganglia architecture and gmond load

2003-09-17 Thread matt massie
steve-

the single biggest problem with scaling gmetad is disk i/o problems.  what 
type of filesystem are you writing the gmetad RRDs to?  most people have 
had very good luck using a Ram-based filesystem and then periodically 
syncing the data to disk.  

for example in linux,

% mount -t tmpfs tmpfs /mnt

now the /mnt directory is a ram-backed filesystem.  if the machine is 
rebooted however all the data is lost.  so you will need to write the 
contents of that filesystem to disk every now and then.

-matt

Today, Steve Gilbert wrote forth saying...

 From: Steve Gilbert [EMAIL PROTECTED]
 To: 'ganglia-general@lists.sourceforge.net'
 ganglia-general@lists.sourceforge.net
 Date: Wed, 17 Sep 2003 16:45:26 -0700
 Subject: [Ganglia-general] Ganglia architecture and gmond load
 
 Hi folks,
 
 I don't know if I'm just trying to push Ganglia to more than it can handle
 or if I'm doing something wrong, but no matter how I design my Ganglia
 structure, gmetad seems to always crush the machine where it runs.  Here's
 an overview of my environment:
 
 Ganglia 2.5.4
 All hosts involved are running RedHat 7.2
 RRDtool version 1.0.45
 
 I have 16 subnets, each with 200 machines give or take a few.  I estimate
 around 3000 nodes total.  Some of these are dual P3, some are single P4, and
 a few random Xeon and Itanium nodes.  Every node is running gmond, and
 that's running fine.
 
 Each subnet has a master node that is a dual P3 1.3GHz.  This box provides
 DNS, NIS, and static DHCP for the subnet.  Normal load on these machines is
 very, very minimal.
 
 My first attempt was to set up a single dedicated Ganglia machine running
 gmetad, Apache, and the web frontend.  In this machine's gmetad.conf file, I
 listed each of the master nodes in the subnets as data sources.  I thought
 having one box collect all the data and store the RRD files would be great.
 Well, this was a bad idea...the box (a P4 with 2GB RAM) was absolutely
 crushed...load shot up to 8.5, and all the graphs continually had gaps in
 them.
 
 So my next attempt to was to install gmetad on each of the master nodes.
 I would have this gmetad collect data for the subnet, and then run another
 gmetad on my Ganglia web machine to just talk to these 16 other gmetads.  I
 don't really like having to now backup 16 machines, but I've had problems
 before with trying to store RRD files on an NFS mount, so I decided not to
 try that.  This isn't working all that great, either...the gmetad on these
 master nodes (collecting data from ~200 hosts each) is also causing a
 pretty high load...the boxes now stay around 2-3 load points all the time
 and sometimes slows down other operations on the box.
 
 Am I doing something wrong, or is gmetad really this much of a resource hog?
 Anyone else trying to use Ganglia to monitor 3000 machines?  Am I asking too
 much?  Thanks for any insight.
 
 Steve Gilbert
 Unix Systems Administrator
 [EMAIL PROTECTED]
 
 
 ---
 This sf.net email is sponsored by:ThinkGeek
 Welcome to geek heaven.
 http://thinkgeek.com/sf
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 




RE: [Ganglia-general] Ganglia architecture and gmond load

2003-09-17 Thread Steve Gilbert
I'm writing the RRDs to a local SCSI drive with an ext3 filesystem.  I'll
investigate the RAM disk option.  Thanks!

Steve Gilbert
Unix Systems Administrator
[EMAIL PROTECTED]


-Original Message-
From: matt massie [mailto:[EMAIL PROTECTED]
Sent: Wednesday, September 17, 2003 5:02 PM
To: Steve Gilbert
Cc: 'ganglia-general@lists.sourceforge.net'
Subject: Re: [Ganglia-general] Ganglia architecture and gmond load


steve-

the single biggest problem with scaling gmetad is disk i/o problems.  what 
type of filesystem are you writing the gmetad RRDs to?  most people have 
had very good luck using a Ram-based filesystem and then periodically 
syncing the data to disk.  

for example in linux,

% mount -t tmpfs tmpfs /mnt

now the /mnt directory is a ram-backed filesystem.  if the machine is 
rebooted however all the data is lost.  so you will need to write the 
contents of that filesystem to disk every now and then.

-matt

Today, Steve Gilbert wrote forth saying...

 From: Steve Gilbert [EMAIL PROTECTED]
 To: 'ganglia-general@lists.sourceforge.net'
 ganglia-general@lists.sourceforge.net
 Date: Wed, 17 Sep 2003 16:45:26 -0700
 Subject: [Ganglia-general] Ganglia architecture and gmond load
 
 Hi folks,
 
 I don't know if I'm just trying to push Ganglia to more than it can handle
 or if I'm doing something wrong, but no matter how I design my Ganglia
 structure, gmetad seems to always crush the machine where it runs.  Here's
 an overview of my environment:
 
 Ganglia 2.5.4
 All hosts involved are running RedHat 7.2
 RRDtool version 1.0.45
 
 I have 16 subnets, each with 200 machines give or take a few.  I estimate
 around 3000 nodes total.  Some of these are dual P3, some are single P4,
and
 a few random Xeon and Itanium nodes.  Every node is running gmond, and
 that's running fine.
 
 Each subnet has a master node that is a dual P3 1.3GHz.  This box
provides
 DNS, NIS, and static DHCP for the subnet.  Normal load on these machines
is
 very, very minimal.
 
 My first attempt was to set up a single dedicated Ganglia machine running
 gmetad, Apache, and the web frontend.  In this machine's gmetad.conf file,
I
 listed each of the master nodes in the subnets as data sources.  I
thought
 having one box collect all the data and store the RRD files would be
great.
 Well, this was a bad idea...the box (a P4 with 2GB RAM) was absolutely
 crushed...load shot up to 8.5, and all the graphs continually had gaps in
 them.
 
 So my next attempt to was to install gmetad on each of the master nodes.
 I would have this gmetad collect data for the subnet, and then run another
 gmetad on my Ganglia web machine to just talk to these 16 other gmetads.
I
 don't really like having to now backup 16 machines, but I've had problems
 before with trying to store RRD files on an NFS mount, so I decided not to
 try that.  This isn't working all that great, either...the gmetad on these
 master nodes (collecting data from ~200 hosts each) is also causing a
 pretty high load...the boxes now stay around 2-3 load points all the time
 and sometimes slows down other operations on the box.
 
 Am I doing something wrong, or is gmetad really this much of a resource
hog?
 Anyone else trying to use Ganglia to monitor 3000 machines?  Am I asking
too
 much?  Thanks for any insight.
 
 Steve Gilbert
 Unix Systems Administrator
 [EMAIL PROTECTED]
 
 
 ---
 This sf.net email is sponsored by:ThinkGeek
 Welcome to geek heaven.
 http://thinkgeek.com/sf
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 



Re: [Ganglia-general] Ganglia architecture and gmond load

2003-09-17 Thread Jason A. Smith
Hi Steve,

Most likely, your problems are caused by disk I/O activity because
gmetad it trying to update tens of thousands of rrd files every 15
seconds.  I have switched to using tmpfs and have no problems monitoring
a little over 1,000 nodes with a single gmetad collector node.  The
computer running gmetad is a dual 1GHz PIII with 1Gig of RAM, and
typically has a load under 1.0.  I am using about 350Megs of RAM to
monitor the thousand nodes, so you will probably have to allocate a
pretty big chunk of memory for your three thousand nodes.  Just put an
entry similar to this into /etc/fstab and mount it:

none  /var/lib/ganglia/rrds  tmpfs  \
  size=500M,mode=755,uid=nobody,gid=nobody 0 0


~Jason


On Wed, 2003-09-17 at 19:45, Steve Gilbert wrote:
 Hi folks,
 
 I don't know if I'm just trying to push Ganglia to more than it can handle
 or if I'm doing something wrong, but no matter how I design my Ganglia
 structure, gmetad seems to always crush the machine where it runs.  Here's
 an overview of my environment:
 
 Ganglia 2.5.4
 All hosts involved are running RedHat 7.2
 RRDtool version 1.0.45
 
 I have 16 subnets, each with 200 machines give or take a few.  I estimate
 around 3000 nodes total.  Some of these are dual P3, some are single P4, and
 a few random Xeon and Itanium nodes.  Every node is running gmond, and
 that's running fine.
 
 Each subnet has a master node that is a dual P3 1.3GHz.  This box provides
 DNS, NIS, and static DHCP for the subnet.  Normal load on these machines is
 very, very minimal.
 
 My first attempt was to set up a single dedicated Ganglia machine running
 gmetad, Apache, and the web frontend.  In this machine's gmetad.conf file, I
 listed each of the master nodes in the subnets as data sources.  I thought
 having one box collect all the data and store the RRD files would be great.
 Well, this was a bad idea...the box (a P4 with 2GB RAM) was absolutely
 crushed...load shot up to 8.5, and all the graphs continually had gaps in
 them.
 
 So my next attempt to was to install gmetad on each of the master nodes.
 I would have this gmetad collect data for the subnet, and then run another
 gmetad on my Ganglia web machine to just talk to these 16 other gmetads.  I
 don't really like having to now backup 16 machines, but I've had problems
 before with trying to store RRD files on an NFS mount, so I decided not to
 try that.  This isn't working all that great, either...the gmetad on these
 master nodes (collecting data from ~200 hosts each) is also causing a
 pretty high load...the boxes now stay around 2-3 load points all the time
 and sometimes slows down other operations on the box.
 
 Am I doing something wrong, or is gmetad really this much of a resource hog?
 Anyone else trying to use Ganglia to monitor 3000 machines?  Am I asking too
 much?  Thanks for any insight.
 
 Steve Gilbert
 Unix Systems Administrator
 [EMAIL PROTECTED]
 
 
 ---
 This sf.net email is sponsored by:ThinkGeek
 Welcome to geek heaven.
 http://thinkgeek.com/sf
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 



RE: [Ganglia-general] Ganglia architecture and gmond load

2003-09-17 Thread Steve Gilbert
Thanks Jason...I think I'm going to go back to my single gmetad design and
try that again with the tmpfs.  I can throw as much memory as I want on that
box.  I'll report back tomorrow on how well it goes with 3000 nodes :-)

Steve Gilbert
Unix Systems Administrator
[EMAIL PROTECTED]


-Original Message-
From: Jason A. Smith [mailto:[EMAIL PROTECTED]
Sent: Wednesday, September 17, 2003 5:20 PM
To: Steve Gilbert
Cc: 'ganglia-general@lists.sourceforge.net'
Subject: Re: [Ganglia-general] Ganglia architecture and gmond load


Hi Steve,

Most likely, your problems are caused by disk I/O activity because
gmetad it trying to update tens of thousands of rrd files every 15
seconds.  I have switched to using tmpfs and have no problems monitoring
a little over 1,000 nodes with a single gmetad collector node.  The
computer running gmetad is a dual 1GHz PIII with 1Gig of RAM, and
typically has a load under 1.0.  I am using about 350Megs of RAM to
monitor the thousand nodes, so you will probably have to allocate a
pretty big chunk of memory for your three thousand nodes.  Just put an
entry similar to this into /etc/fstab and mount it:

none  /var/lib/ganglia/rrds  tmpfs  \
  size=500M,mode=755,uid=nobody,gid=nobody 0 0


~Jason


On Wed, 2003-09-17 at 19:45, Steve Gilbert wrote:
 Hi folks,
 
 I don't know if I'm just trying to push Ganglia to more than it can handle
 or if I'm doing something wrong, but no matter how I design my Ganglia
 structure, gmetad seems to always crush the machine where it runs.  Here's
 an overview of my environment:
 
 Ganglia 2.5.4
 All hosts involved are running RedHat 7.2
 RRDtool version 1.0.45
 
 I have 16 subnets, each with 200 machines give or take a few.  I estimate
 around 3000 nodes total.  Some of these are dual P3, some are single P4,
and
 a few random Xeon and Itanium nodes.  Every node is running gmond, and
 that's running fine.
 
 Each subnet has a master node that is a dual P3 1.3GHz.  This box
provides
 DNS, NIS, and static DHCP for the subnet.  Normal load on these machines
is
 very, very minimal.
 
 My first attempt was to set up a single dedicated Ganglia machine running
 gmetad, Apache, and the web frontend.  In this machine's gmetad.conf file,
I
 listed each of the master nodes in the subnets as data sources.  I
thought
 having one box collect all the data and store the RRD files would be
great.
 Well, this was a bad idea...the box (a P4 with 2GB RAM) was absolutely
 crushed...load shot up to 8.5, and all the graphs continually had gaps in
 them.
 
 So my next attempt to was to install gmetad on each of the master nodes.
 I would have this gmetad collect data for the subnet, and then run another
 gmetad on my Ganglia web machine to just talk to these 16 other gmetads.
I
 don't really like having to now backup 16 machines, but I've had problems
 before with trying to store RRD files on an NFS mount, so I decided not to
 try that.  This isn't working all that great, either...the gmetad on these
 master nodes (collecting data from ~200 hosts each) is also causing a
 pretty high load...the boxes now stay around 2-3 load points all the time
 and sometimes slows down other operations on the box.
 
 Am I doing something wrong, or is gmetad really this much of a resource
hog?
 Anyone else trying to use Ganglia to monitor 3000 machines?  Am I asking
too
 much?  Thanks for any insight.
 
 Steve Gilbert
 Unix Systems Administrator
 [EMAIL PROTECTED]
 
 
 ---
 This sf.net email is sponsored by:ThinkGeek
 Welcome to geek heaven.
 http://thinkgeek.com/sf
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general