Re: [Ganglia-developers] 3.0.4 and srclib

Richard.Grevis Mon, 04 Sep 2006 07:15:38 -0700

Matt,

some comments below.


> -----Original Message-----
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On 
> Behalf Of matt massie
> Sent: 03 September 2006 08:36
> To: ganglia-developers@lists.sourceforge.net
> Subject: [Ganglia-developers] 3.0.4 and srclib
> 
> 
> hey guys-
> 
> i wanted to comment on the who "srclib" library business.  
> there are two separate issues that need to be addressed.
> 
> first.
> 
> i feel it is critically important that gmond be statically 
> linked against all support libraries.  there are sysadmins 
> our there managing 1000s of ganglia machines and having a 
> single statically linked binary makes it easy for them.  
> imagine needing to do a yum update on thousands of hosts in 
> order to make sure all rpm dependencies are met on upgrade.
>   ouch.  imagine using an os without a package manager and 
> managing library dependencies.  ooouch.
> 
> second.
> 
> i don't really care one way or another how the source for the 
> support libraries gets on the _compile_ host since we will 
> statically link against those libraries anyway.  having them 
> in the "srclib" directory gives us more control and makes it 
> easier for people to download and compile.  although it also 
> makes packaging and maintenance for us a lot harder.


I completely agree and I also statically link. But what I also
do is compile all the support libraries and RRDTool to have
an install path that's the same as ganglia's (even though most
may not be used given the static linking). I wanted to be sure
that ganglia compiled correctly and properly regardless of the
install state of these libraries on the compile host.

> 
> i think it's best we talk as a group about what to do with 
> "srclib" for a while since it's a complex issue but it 
> shouldn't keep us from testing and releasing 3.0.4 sometime 
> this month.  it's very likely "srclib" will be addressed in a 
> soon to be released release.
> 
> my talk at linuxworld went well.  i got to spend a few days 
> hanging out with tobi oetiker (RRDTool), ian berry (cacti), 
> remo rickli (nedi), kees cook (sendpage).  tobi and i have 
> been talking about ways to help with performance of RRDTool 
> and reduce disk io.


I have been pondering this too. I now monitor 6,000 mostly
windows hosts on a couple of ganglia servers at 10 second intervals.
I have sustained I/O rates to SAN of 21 Megabytes per second.
As you know, RRDTool's I/O behaviour is very simple (open, read,
write, close for every metric/file). But at least that is safe
and relatively stateless. If you don't mind losing a few data points
if RRDTool/gmetad suddenly fails, the one could buffer some number of
data points
for every host/metric and flush as required.

As RRDTool allows multiple data points per call, the same buffer/flush
behaviour could be coded in Ganglia too.

The next idea I have been thinking about is the step size paframeter for
the RRDs.
As you know, every metric that is not a string will have an RRD update
at the poll rate of the cluster. So one gets gazillions of updates of
stuff like processor clock speed, total memory etc, and not everything
needs polling at the same rate as (say) cpu anyway.

It would be better I think to set the step size to be the sample rate
of the metric on the monitored host. There may be a few problems:

1) the metric sample rate is never transmitted to the headnode or
ganglia server.
2) If different RRDs have different step sizes, then the method of
defining
    the RRDs RRAs needs changing.
3) What step size do you give to the cluster or grid RRDs if the hosts
don't
   all sample at the same rate?

My last idea is a bit speculative (read I am not sure), but for a metric
if the TN goes up and the VAL does not change, don't call RRDupdate.
Actually, you don't need the time, but the code would need to be mindful
of
writing a value before RRD decides the metric has gone undefined. If a
changed value
comes along after a period of time with no updating, then 2 values need
to be written -
the last old value at -1 time, and the new metric at the current time.

My last disturbing idea is to only update a metric when its TN is less
than some value,
say 2 x poll rate, or when a timeout is reached. Doing it that way
obviates the need to
store the previous values of all metrics on all hosts.

phew.....
regards,
richard

> 
> i've also been meeting with groundwork open source and they 
> are planning to build a test cluster compromised of a number 
> of different hardware/software platforms.
> 
> hope you guys are having a good weekend.
> -matt
> 
> 
> 
> 
> 
> --------------------------------------------------------------
> -----------
> Using Tomcat but need to do more? Need to support web 
> services, security? Get stuff done quickly with 
> pre-integrated technology to make your job easier Download 
> IBM WebSphere Application Server v.1.0.1 based on Apache 
> Geronimo 
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&;
dat=121642
_______________________________________________
Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers
------------------------------------------------------------------------
For more information about Barclays Capital, please visit our web site at 
http://www.barcap.com.

Internet communications are not secure and therefore the Barclays Group does 
not accept legal responsibility for the contents of this message.  Although the 
Barclays Group operates anti-virus programmes, it does not accept 
responsibility for any damage whatsoever that is caused by viruses being 
passed.  Any views or opinions presented are solely those of the author and do 
not necessarily represent those of the Barclays Group.  Replies to this email 
may be monitored by the Barclays Group for operational or business reasons.
------------------------------------------------------------------------

Re: [Ganglia-developers] 3.0.4 and srclib

Reply via email to