Re: [Ganglia-developers] Question about Ganglia

2010-11-30 Thread Rick Cobb
Well, from this project's standpoint I wish I had access to a big
installation to test on still, but I don't (and I'm happy to be doing what
I'm doing).

The scalability issue I've seen happened in a grid with a few dozen
clusters; each cluster had a few dozen nodes in it, and we monitored a
couple hundred metrics per node (yes, I'm being deliberately vague). In that
case, the typical failure was that the grid summary metrics would "gap",
even though the cluster summary metrics and the underlying host metrics were
updated.  We were running on a plenty-big ramdisk (8GB), and the actual CPU
utilization was under 50%, IIRC.

We traced the problem to lock contention in gmetad. Gmetad needs to take a
lock per cluster as it collects data for the grid summary metrics.
Otherwise, each cluster just runs in its own thread, with its own sampling
against the  gmonds.

By reducing the number of clusters per grid (and moving to a grid-of-grid)
deployment, we were able to handle the same number of machines & metrics
with much less probability of gapping.

My suspicion is that a naive attempt to do the kind of pivoting you're
talking about in this email will require holding those cluster-query locks
longer.

My advice would be to treat each cluster as if it was its own gmond when
doing that.  I.e., the aggregating thread should simply read the data into
its own workspace as quickly as possible without doing any arithmetic (in
fact, a copy-on-write clone would be great) so as to avoid holding those
cluster locks too long. Then the arithmetic/aggregation thread can run
independently.  If it can't do arithmetic fast enough, well, change the RRAs
to reflect how fast the aggregation thread can run.  At all costs, avoid
getting in the way of recording the host & cluster metrics; they're the raw
material. Folks can always derive their aggregates from that data later or
out-of-band if necessary.

-- ReC

On Tue, Nov 30, 2010 at 10:54 AM, Bernard Li  wrote:

> Hi Rick:
>
> On Tue, Nov 30, 2010 at 10:38 AM, Rick Cobb  wrote:
>
> > On your idea, Bernard -- I don't think it would necessarily require gmond
>
> You're right, I meant to say gmetad...  it was late last night :)
>
> > changes.  OTOH, I think it would require very interesting gmetad changes
> to
> > do a good job.  In particular, the ability to summarize by different
> > aggregations seems like the scalability wall; it's already hard to get
> grid
> > summaries updated (the only inter-thread update in the current code).
>
> Can you elaborate on the last point?   Depending on what you mean
> exactly, I might have a solution for that...
>
> I think going forward the gmetad-python code could be easier to
> maintain and add new features, and not to mention it supports the
> plug-in interface.  However, it hasn't really been tested in a large
> production environment for scalability and stability, so that's why I
> really want to get it released in some form or manner ASAP.
>
> This is a call for help -- if you have a large installation and would
> like to help test some cutting edge code, please take gmetad-python
> out for a spin and report back any issues you may have.  Hopefully
> we'll have it packaged up and released as 3.2.0 by early 2011 :-)
>
> Thanks all!
>
> Bernard
>
--
Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] Question about Ganglia

2010-11-30 Thread Bernard Li
Hi Rick:

On Tue, Nov 30, 2010 at 10:38 AM, Rick Cobb  wrote:

> On your idea, Bernard -- I don't think it would necessarily require gmond

You're right, I meant to say gmetad...  it was late last night :)

> changes.  OTOH, I think it would require very interesting gmetad changes to
> do a good job.  In particular, the ability to summarize by different
> aggregations seems like the scalability wall; it's already hard to get grid
> summaries updated (the only inter-thread update in the current code).

Can you elaborate on the last point?   Depending on what you mean
exactly, I might have a solution for that...

I think going forward the gmetad-python code could be easier to
maintain and add new features, and not to mention it supports the
plug-in interface.  However, it hasn't really been tested in a large
production environment for scalability and stability, so that's why I
really want to get it released in some form or manner ASAP.

This is a call for help -- if you have a large installation and would
like to help test some cutting edge code, please take gmetad-python
out for a spin and report back any issues you may have.  Hopefully
we'll have it packaged up and released as 3.2.0 by early 2011 :-)

Thanks all!

Bernard

--
Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] Question about Ganglia

2010-11-30 Thread Rick Cobb
Another wacky way to "solve" this is to layer Ganglias.  I.e., write an
independent script which polls the gmetad for its XML, projects the data as
if, e.g., every LPAR is a host, and posts that to an independent gmond using
gmetric or its modern equivalents.  Then point a new gmetad instance at that
gmond (or set thereof).

On your idea, Bernard -- I don't think it would necessarily require gmond
changes.  OTOH, I think it would require very interesting gmetad changes to
do a good job.  In particular, the ability to summarize by different
aggregations seems like the scalability wall; it's already hard to get grid
summaries updated (the only inter-thread update in the current code).

-- ReC

On Mon, Nov 29, 2010 at 11:21 PM, Bernard Li  wrote:

> Hi Michael:
>
> I don't think the current frontend code supports what you want without
> some major hacking.  The frontend expects the user to view the "grid"
> as multiple clusters and the "cluster" as multiple hosts.
>
> A common feature request is to have hosts arbitrarily cluster/group
> based on different criteria (eg. function, geographical location,
> etc.)
>
> Perhaps Vladimir can incorporate some of these ideas in his frontend
> re-write.  But some fundamental changes may need to be made in the
> gmond level...
>
> Hope this helps.
>
> Cheers,
>
> Bernard
>
> On Tue, Nov 16, 2010 at 11:21 AM, Michael Perzl  wrote:
> > I have a question regarding the PHP web code of Ganglia:
> >
> > My setup looks like that:
> >
> > I have one Grid and several Clusters.
> > Each Cluster has between 10-250 nodes, all running on AIX LPARs (but the
> > question is independent of the OS).
> > I have clustered on a logical level, i.e., not according to the different
> > hardware systems but for instance, all SAP systems, all TSM systems etc.
> > I have lots of additional metrics, provided via a C-DSO.
> > One of the additional metrics is a system-identifier, i.e., a hardware
> > identifier that I want to use to identify the hardware that each LPAR is
> > running on.
> >
> > Now in addition to the logical view I want to introduce a "physical
> view",
> > i.e., a view of all LPARs running on a specific system with a certain
> > hardware identifier. This is were my illiteracy regarding PHP comes into
> > play.
> >
> > Basically, I want to have a list of ALL Ganglia nodes so that I can loop
> > over and group them according to their hardware identifier.
> >
> > I was able to accomplish someting in cluster_view.php with the following
> > code snippet but have failed miserably so far in meta_view.php.
> >
> > 
> >
> > $sysids = array();
> > $sysid_count = 0;
> > foreach ($metrics as $host => $val)
> > {
> >if (isset( $val["serial_num"]['VAL'] ))
> >{
> >   $id = $val["serial_num"]['VAL'];
> >   if (! in_array( $id, $sysids ))
> >   {
> >  $sysids[$sysid_count] = $id;
> >  $sysid_count++;
> >   }
> >}
> > }
> > sort( $sysids );
> >
> > $cv = @fopen("/tmp/cv.txt","w");
> > foreach ($hosts_up as $host => $val)
> > {
> >fputs($cv,"$host\n");
> > }
> >
> > fputs($cv,"=\n");
> > fputs($cv,"sysid_count = $sysid_count\n");
> > fputs($cv,"=\n");
> >
> > foreach ($sysids as $id)
> > {
> >fputs($cv,"$id\n");
> > }
> > fputs($cv,"=\n");
> >
> > foreach ($sysids as $id)
> > {
> >fputs($cv,"---> $id\n");
> >foreach ($hosts_up as $host => $h)
> >{
> >   if ( $metrics[$host]["serial_num"]['VAL'] == $id )
> >   {
> >  fputs($cv,"$host\n");
> >   }
> >}
> > }
> >
> > fclose($cv);
> >
> > 
> >
> > Any help/hints regarding how a loop over all Ganglia nodes can be
> > accomplished in metaview.php would be highly welcome. Thanks.
> >
> > Regards,
> > Michael
> >
> >
> --
> > Beautiful is writing same markup. Internet Explorer 9 supports
> > standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
> > Spend less time writing and  rewriting code and more time creating great
> > experiences on the web. Be a part of the beta today
> > http://p.sf.net/sfu/msIE9-sfdev2dev
> > ___
> > Ganglia-developers mailing list
> > Ganglia-developers@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/ganglia-developers
> >
> >
>
>
> --
> Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
> Tap into the largest installed PC base & get more eyes on your game by
> optimizing for Intel(R) Graphics Technology. Get started today with the
> Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
> http://p.sf.net/sf

Re: [Ganglia-developers] Question about Ganglia

2010-11-29 Thread Bernard Li
Hi Michael:

I don't think the current frontend code supports what you want without
some major hacking.  The frontend expects the user to view the "grid"
as multiple clusters and the "cluster" as multiple hosts.

A common feature request is to have hosts arbitrarily cluster/group
based on different criteria (eg. function, geographical location,
etc.)

Perhaps Vladimir can incorporate some of these ideas in his frontend
re-write.  But some fundamental changes may need to be made in the
gmond level...

Hope this helps.

Cheers,

Bernard

On Tue, Nov 16, 2010 at 11:21 AM, Michael Perzl  wrote:
> I have a question regarding the PHP web code of Ganglia:
>
> My setup looks like that:
>
> I have one Grid and several Clusters.
> Each Cluster has between 10-250 nodes, all running on AIX LPARs (but the
> question is independent of the OS).
> I have clustered on a logical level, i.e., not according to the different
> hardware systems but for instance, all SAP systems, all TSM systems etc.
> I have lots of additional metrics, provided via a C-DSO.
> One of the additional metrics is a system-identifier, i.e., a hardware
> identifier that I want to use to identify the hardware that each LPAR is
> running on.
>
> Now in addition to the logical view I want to introduce a "physical view",
> i.e., a view of all LPARs running on a specific system with a certain
> hardware identifier. This is were my illiteracy regarding PHP comes into
> play.
>
> Basically, I want to have a list of ALL Ganglia nodes so that I can loop
> over and group them according to their hardware identifier.
>
> I was able to accomplish someting in cluster_view.php with the following
> code snippet but have failed miserably so far in meta_view.php.
>
> 
>
> $sysids = array();
> $sysid_count = 0;
> foreach ($metrics as $host => $val)
> {
>    if (isset( $val["serial_num"]['VAL'] ))
>    {
>   $id = $val["serial_num"]['VAL'];
>   if (! in_array( $id, $sysids ))
>   {
>  $sysids[$sysid_count] = $id;
>  $sysid_count++;
>   }
>    }
> }
> sort( $sysids );
>
> $cv = @fopen("/tmp/cv.txt","w");
> foreach ($hosts_up as $host => $val)
> {
>    fputs($cv,"$host\n");
> }
>
> fputs($cv,"=\n");
> fputs($cv,"sysid_count = $sysid_count\n");
> fputs($cv,"=\n");
>
> foreach ($sysids as $id)
> {
>    fputs($cv,"$id\n");
> }
> fputs($cv,"=\n");
>
> foreach ($sysids as $id)
> {
>    fputs($cv,"---> $id\n");
>    foreach ($hosts_up as $host => $h)
>    {
>   if ( $metrics[$host]["serial_num"]['VAL'] == $id )
>   {
>  fputs($cv,"$host\n");
>   }
>    }
> }
>
> fclose($cv);
>
> 
>
> Any help/hints regarding how a loop over all Ganglia nodes can be
> accomplished in metaview.php would be highly welcome. Thanks.
>
> Regards,
> Michael
>
> --
> Beautiful is writing same markup. Internet Explorer 9 supports
> standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
> Spend less time writing and  rewriting code and more time creating great
> experiences on the web. Be a part of the beta today
> http://p.sf.net/sfu/msIE9-sfdev2dev
> ___
> Ganglia-developers mailing list
> Ganglia-developers@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/ganglia-developers
>
>

--
Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] Question about Ganglia web front end

2002-12-06 Thread Steven Wagner

Federico Sacerdoti wrote:

Since Matt is going to be indisposed for a while due to his new baby, I 
will take this one.


:O!!!

Dang, more people I need to send shirts to this month.  :P

We are definately planning to implement this idea, and I'm glad you see 
the need for it. Matt's idea, which I think is excellent, is to use 
XPath to allow subset queries of the XML tree. Each gmetad would 
understand XPath, and only return the portion of the tree that is 
pertinent. Yes, this does make things more complicated, we need an 
interactive protocol to the gmetad, but hey, we're designers.


I think a modification of HTTP oughta work, especially considering the more 
hierarchical nature of Ganglia 3's monitoring core.  Heck, we'll probably 
have to rewrite the fricken' metadaemon anyway.


I haven't been able to devote many cycles to Ganglia lately, as other 
projects have taken over.  I will continue to happily armchair-quarterback 
on these lists, at the very least, and develop in what spare time I have 
(which means a Darwin port, probably, so I can run it on my TiBook :P ).





Re: [Ganglia-developers] Question about Ganglia web front end

2002-12-06 Thread Federico Sacerdoti

On a (very) slightly less pie-in-the-sky note...

Has anyone considered the utility of being able to select a subset of 
cluster/host/metric data from the metadaemon?  In other words, you 
send a command that limits display to values that have been updated in 
the last 60 seconds, or a particular cluster or host name.  Any speed 
hit from building the new response would probably be made up for in 
the front-end.  If you're viewing host details for one host out of a 
900-host combined metadaemon XML feed, and the front end only parses 
30 metrics instead of 2700... well...


It would mean rewriting the listen thread of gmetad (and, 
simultaneously, the front-end) to have a full-duplex conversation and 
there's some possible gotchas there... but what the heck, this is the 
developer's list.


Anyway.  Another great idea from the people who brought you beer 
milkshakes...


Since Matt is going to be indisposed for a while due to his new baby, I 
will take this one.
We are definately planning to implement this idea, and I'm glad you see 
the need for it. Matt's idea, which I think is excellent, is to use 
XPath to allow subset queries of the XML tree. Each gmetad would 
understand XPath, and only return the portion of the tree that is 
pertinent. Yes, this does make things more complicated, we need an 
interactive protocol to the gmetad, but hey, we're designers.


Federico




Re: [Ganglia-developers] Question about Ganglia web front end

2002-12-05 Thread Steven Wagner
It's also worth noting that there's no particular reason to avoid 
developing other front-ends to Ganglia.  If you'd rather build one your way 
using a particular technology or library that better suits your needs, the 
existing architecture makes it pretty easy for you to do so.  Almost as if 
it was designed that way. ;)


I am not exactly a charter member of the PHP fanclub myself.  We use Python 
a lot here internally, but I don't know how I'd feel about using mod_* for 
something that could be processing a very large number of nodes.


But if we're going to talk about what language/platform combo would be best 
for a Ganglia front-end app, in purely abstract terms (i.e. someone else 
writes it :P ), I think I'd want to see the whole thing implemented as a 
Java servlet.  There'd be all kinds of speed benefits just from the 
decreased overhead for each request, not to mention decoupling the XML/RRD 
parsing and page rendering.  In fact, it might even make sense to write the 
metadaemon as child threads of the front-end (parsing the XML into an 
internal data structure, and updating the RRD files every time the values 
change - this way you're not parsing all the XML every time you hit Reload...).


On a (very) slightly less pie-in-the-sky note...

Has anyone considered the utility of being able to select a subset of 
cluster/host/metric data from the metadaemon?  In other words, you send a 
command that limits display to values that have been updated in the last 60 
seconds, or a particular cluster or host name.  Any speed hit from building 
the new response would probably be made up for in the front-end.  If you're 
viewing host details for one host out of a 900-host combined metadaemon XML 
feed, and the front end only parses 30 metrics instead of 2700... well...


It would mean rewriting the listen thread of gmetad (and, simultaneously, 
the front-end) to have a full-duplex conversation and there's some possible 
gotchas there... but what the heck, this is the developer's list.


Anyway.  Another great idea from the people who brought you beer milkshakes...

Federico Sacerdoti wrote:
Well there are a few reasons. I know only a cursory bit about xslt, 
however, so let me know if I'm off base on any of these.


We chose PHP over XSLT because:
-PHP is faster, and more mature.
-Can handle CGI variables which keep state between different HTML views.
-Can read form data given by user.
-Can read in user-defined local configuration files from disk 
(private_clusters, etc).

-Can call functions using local shell (for rrdtool graph, for example).

On the other hand, I think PHP is a cumbersome language. I have talked 
about using mod_python, but the fact is, I would have to see a really 
good reason for doing so, as it would take alot of work, and perhaps end 
up being slower.


Hope this helps answer your question.
Federico



On Thursday, December 5, 2002, at 11:53 AM, [EMAIL PROTECTED] wrote:

The Ganglia web frontend uses PHP to transform xml to html (I think). 
Why was that method chosen instead of using PHP to make calls to xslt 
scripts to do the transformation? Is there a belief that PHP is better 
than xsl for coding xml to html transformations? Does the Ganglia web 
front end include transformations that aren't easily expressed in xsl?


Jonathan



Federico

Rocks Cluster Group, Camp X-Ray, SDSC, San Diego
GPG Fingerprint: 3C5E 47E7 BDF8 C14E ED92  92BB BA86 B2E6 0390 8845



---
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers






Re: [Ganglia-developers] Question about Ganglia web front end

2002-12-05 Thread Federico Sacerdoti
Well there are a few reasons. I know only a cursory bit about xslt, 
however, so let me know if I'm off base on any of these.


We chose PHP over XSLT because:
-PHP is faster, and more mature.
-Can handle CGI variables which keep state between different HTML views.
-Can read form data given by user.
-Can read in user-defined local configuration files from disk 
(private_clusters, etc).

-Can call functions using local shell (for rrdtool graph, for example).

On the other hand, I think PHP is a cumbersome language. I have talked 
about using mod_python, but the fact is, I would have to see a really 
good reason for doing so, as it would take alot of work, and perhaps 
end up being slower.


Hope this helps answer your question.
Federico



On Thursday, December 5, 2002, at 11:53 AM, [EMAIL PROTECTED] 
wrote:


The Ganglia web frontend uses PHP to transform xml to html (I think). 
Why was that method chosen instead of using PHP to make calls to xslt 
scripts to do the transformation? Is there a belief that PHP is better 
than xsl for coding xml to html transformations? Does the Ganglia web 
front end include transformations that aren't easily expressed in xsl?


Jonathan



Federico

Rocks Cluster Group, Camp X-Ray, SDSC, San Diego
GPG Fingerprint: 3C5E 47E7 BDF8 C14E ED92  92BB BA86 B2E6 0390 8845