from:"Brad Nicholes"

Re: [Ganglia-developers] Bugfix not summited because bugzilla not accessable from my area

2013-09-26 Thread Brad Nicholes

There are links to bugzilla on the ganglia.info web site.  For example, the 
last sentence on the ganglia.info support tab has a link. 
(http://ganglia.info/?page_id=68)

Brad

>>> Bernard Li  9/25/2013 12:55 PM >>>
Hello:

We are no longer using Bugzilla, please submit the bugs in GitHub Issues:

https://github.com/ganglia/ganglia-web/issues 

Just curious though, where did you see the link to Bugzilla?

Thanks!

Bernard


On Wed, Sep 25, 2013 at 10:07 AM, 田甲  wrote:

> Will someone help me submit this bugfix? Thanks.
>
> Module: ganglia-web
> File: stacked.php
> Description: This bug happens when showing stacked graph of custom metrics
> on selected few hosts. In this case, count($hosts) is regarded as the count
> of valid ones, which results in a overwhelming value of $cx, leading to
> blank color 0xFF.
> Diff:
>
> $ diff stacked.php stacked.php2
> 98d97
> < $i = 0;
> 100,101c99
> < $cx = $i/(1+count($hosts));
> < $i++;
> ---
> > $cx = $index/(1+count($hosts));
>
> --
> Regards,
> Hydrogenesis
> oxygenera...@gmail.com 
>
>
> --
> October Webinars: Code for Performance
> Free Intel webinars can help you accelerate application performance.
> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most
> from
> the latest Intel processors and coprocessors. See abstracts and register >
> http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk 
> ___
> Ganglia-developers mailing list
> Ganglia-developers@lists.sourceforge.net 
> https://lists.sourceforge.net/lists/listinfo/ganglia-developers 
>
>

--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] Trac Wiki, Bugzilla and GitHub

2012-05-15 Thread Brad Nicholes

I'm not sure that just abandoning the issues in Bugzilla is a good idea without 
at least trying to follow up with the person who submitted the issue.  Some of 
the issues may still be valid and we certainly don't want to abandon those.  
How much effort would it be to try to either validate or follow up on the open 
issues?  

OTOH, I guess leaving bugzilla as read-only means that the issues will still be 
there and not necessarily be lost.  We could close out each ticket in bugzilla 
with a note to the submitter that says that if the bug is still an issue then 
resubmit the bug to GitHub Issues.  I guess what I am saying is that I don't 
have a strong opinion either way.

Brad

>>> On 5/14/2012 at 11:37 AM, in message
, Bernard
Li  wrote:
> I spoke with Vladimir briefly on IRC and he recommends that we just
> move to GitHub Issues, reason being it works better with the GitHub
> workflow (as Alex Dean also mentioned in his email).
> 
> I am okay with this, as long as we take the effort to go through
> bugzilla.ganglia.info and close out obsolete tickets and move all the
> relevant open ones to GitHub Issues.  We can leave the old bugs in
> Bugzilla for archival purposes and in read-only mode.
> 
> Another option which Vladimir suggest is just forget about the old
> tickets in Bugzilla and start fresh in GitHub Issues.
> 
> I am leaning towards option 1 -- what do you guys think?
> 
> Thanks,
> 
> Bernard
> 
> On Sat, May 12, 2012 at 2:12 AM, Daniel Pocock  wrote:
>>
>>
>> On 12/05/12 00:44, Bernard Li wrote:
>>> Hi Daniel:
>>>
>>> On Fri, May 11, 2012 at 3:08 AM, Daniel Pocock  wrote:
>>>
 If I host it, it would purely be on a voluntary basis, so I would be
 hoping for upstream and/or Debian to be providing convenient packages
 and security updates.  Although I am quite capable of installing it
 manually, time spent maintaining such an install of bugzilla would cut
 into time spent maintaining any other open source packages I contribute to
>>>
>>> Thanks to Ben Hartshorne, I was able to find this:
>>>
>>> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=638705 
>>>
>>> So yeah, bugzilla is temporarily removed from Debian.  However, it's
>>
>> Yes, that was the same link I posted - it doesn't say temporary or
>> permanent, it just says they need at least 2 people willing to support
>> the package in some sense.  It also suggests that the way upstream
>> distributes the tarball makes it necessary to do a lot of patching, that
>> deters people from maintaining a package.
>>
>>> still available in EPEL:
>>>
>>> http://dl.fedoraproject.org/pub/epel/6/x86_64/ 
>>>
>>> Is this really an issue?
>>
>> Yes, definitely, because if something like that is publicly accessible,
>> it needs security updates.  Debian and RHEL often put out security
>> updates for supported packages within a matter of hours (much faster
>> than the non-Linux platform vendor)
>>
>> The reason for using Debian is that I already have a VM running for
>> reSIProcate, it could be shared for the Ganglia project, used to
>> bootstrap releases, etc.  The physical server is under a commercial
>> hosting contract in Telehouse, one of London's most well connected data
>> centres:
>> http://en.wikipedia.org/wiki/Telehouse_Europe#London 
> 
> --
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and 
> threat landscape has changed and how IT managers can respond. Discussions 
> will include endpoint security, mobile security and the latest in malware 
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ 
> ___
> Ganglia-developers mailing list
> Ganglia-developers@lists.sourceforge.net 
> https://lists.sourceforge.net/lists/listinfo/ganglia-developers 




--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] Trac Wiki, Bugzilla and GitHub

2012-05-10 Thread Brad Nicholes

+1 for sticking with bugzilla.  If we can move it to somewhere that is more 
maintainable, that would be better.  But I would hate to just abandon 
everything there.

Brad


>>> On 5/10/2012 at 10:01 AM, in message
, Bernard
Li  wrote:
> Hi Daniel:
> 
> Just for the record, I actually like Bugzilla and would like to keep using
> it.  However because we do not have direct ownership to the server (it is
> being hosted at UC Berkeley) it makes it hard to maintain.  For instance it
> has currently been down for at least two days and so far I have not been
> able to get ahold of the admins who could tell us what's going on.  This is
> not the first time it has happened.
> 
> So either we move the Bugzilla instance to somewhere we have more control
> or we move them to GitHub Issues, it just can't stay where it is.
> 
> I agree however that there are probably more bugs in Bugzilla than GitHub
> Issues so perhaps moving from GitHub Issues -> Bugzilla and disabling
> GitHub Issues is the way to go.  But I am also under the impression some
> folks like GitHub Issues better.
> 
> Anybody else have any comments?
> 
> Thanks!
> 
> Bernard
> 
> On Thursday, May 10, 2012, Daniel Pocock wrote:
> 
>>
>> > This is our request for help.  We need someone to take charge of
>> > managing our documentation making sure they are up to date and in one
>> > canonical location.  We'll also need someone to help with importing the
>> > bugs in Bugzilla to GitHub Issues.
>>
>>
>> We definitely have to abandon bugzilla?
>>
>> Can we just turn off the issue tracker in github to avoid people opening
>> issues in the wrong place?
>>
>>
>> --
>> Live Security Virtual Conference
>> Exclusive live event will cover all the ways today's security and
>> threat landscape has changed and how IT managers can respond. Discussions
>> will include endpoint security, mobile security and the latest in malware
>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ 
>> ___
>> Ganglia-developers mailing list
>> Ganglia-developers@lists.sourceforge.net 
>> https://lists.sourceforge.net/lists/listinfo/ganglia-developers 
>>




--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] 3.3.3 tagged

2012-03-21 Thread Brad Nicholes

>>> On 3/21/2012 at 12:48 PM, in message
, Vladimir Vuksan
 wrote:
> I agree with Alex. We are churning through too many versions. I would 
> personally be OK with overriding the existing 3.3.2 tag and going with 
> 3.3.2 instead of 3.3.4.
> 

The problem with reusing a version number is that you end up with different 
snapshots of the code under the same version number in the wild.  Whether or 
not a specific snapshot of the code has actually be designated as a release 
doesn't matter, the fact is that the version stamped snapshot has been posted 
and available to the public.  All it takes is for one user or distributor to 
get a hold of the wrong snapshot and the Ganglia project ends up wasting time 
answering and debunking bugs that don't actually exist in the actual release.  
It is much easier to explain to a user that a specific version number was never 
released rather than tell them that the 3.3.2 version of their code is the 
wrong one.  Skipping version numbers is actually a common practice.  The Apache 
HTTPD project is actually doing the same thing as we speak due to a bad release 
candidate.

Brad

--
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

[Ganglia-developers] Other project related stuff (was:Re: releasing 3.3.2 today?)

2012-03-21 Thread Brad Nicholes

All,
My comments on this subject probably have more to do with the overall way 
that the Ganglia projects works rather than just versioning.  Right now we have 
a wiki page that is hosted at SourceForge 
(http://sourceforge.net/apps/trac/ganglia) which describes how the Ganglia 
project used to work when the repository was SVN.  Now that things have moved 
to Github and some of the people who were running the project at the time are 
not as involved anymore (namely me :), it seems as though things are getting a 
little confusing.  For example, the versioning rules for Ganglia releases is 
also described on the sourceforge wiki 
(http://sourceforge.net/apps/trac/ganglia/wiki/how_project_works).  Although in 
some ways it might be similar to what has been discussed recently, it is 
different.  People who are trying to figure out how the Ganglia project works 
will probably run across the older wiki page first (since it is still linked to 
Ganglia.info) and then be confused by how versioning is actually handled now.  
Also, since the procedures and policies on the older wiki page were modeled 
around SVN, the Git way of doing things obviously makes a lot of the older wiki 
information obsolete.

   I think it is important that the Ganglia project gits on the same (wiki) 
page with regards to how the project works and what information the project 
wants to provide to new users and developers (all puns intended ;-) .  
Especially in light of the fact that many of us are working on the Ganglia 
Monitoring book.  Hopefully once the book is released, it will generate more 
interest in the Ganglia project and it would probably look bad if we had two 
different wiki pages providing conflicting information.  Since I haven't been 
as involved with the project especially since the source code was moved to Git, 
is there someone who could review the SourceForge wiki page and determine what 
information is still valid and which isn't?  At that point we could decided 
whether to just update the SourceForge wiki or move it all to Github wiki.

Comments?

Brad 


--
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] "binchols" in GitHub

2012-03-15 Thread Brad Nicholes

>>> On 3/13/2012 at 11:42 PM, in message
, Bernard
Li  wrote:
> Not sure who added "bnichols" to GitHub, but he's not Brad Nicholes:
> 
> https://github.com/bnichols 
> 
> I've revoked his membership.
> 
> Brad, could you please confirm your id on GitHub?  It's "bnicholes", right?
> 
> https://github.com/bnicholes 
> 


Yes, this one is mine.  Thanks for catching that.

Brad



--
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] Ganglia REST interface (was:Re: Gauging interest in writing a Ganglia eBook)

2011-12-05 Thread Brad Nicholes

>>> On 12/5/2011 at 12:09 PM, in message <20111205190901.go17...@mail.nih.gov>,
Jesse Becker  wrote:
> On Mon, Dec 05, 2011 at 01:17:19PM -0500, Brad Nicholes wrote:
>>All,
>> I just wanted to get some feedback on how much interest there would be 
> for a REST interface for Ganglia.  I spent a few days putting together a POC 
> of a REST interface and was able to get something that implements the 
> following REST URLs:
>>
>>/clusters
>>/clusters/{cid}
>>/clusters/{cid}/hosts
>>/hosts
>>/hosts/{hid}
>>/hosts/{hid}/metrics
>>/hosts/{hid}/metrics/{mid}
>>/hosts/{hid}/metrics/{mid}/data
>>/hosts/{hid}/metrics/{mid}/graph
>>/hosts/{hid}/metrics/{mid}/info
> 
> For [chm]id, what is considered valid?  Are the "common" names of these
> valid, or do we have use some sort of unique ID instead?
> 

As it stands right now, the identifier is just the common name found in the 
XML.  We would need to do something else in gmond like we have talked about 
before (something like a GUID) in order to change this.  But since the URL 
itself qualifies the resource uniquely, common names shouldn't be a problem.  

> It's certainly a good idea I think, and could be used to simplify a lot
> of the frontend UI code.
> 
> For things like '/hosts/{hid}/metrics/{mid}/graph', how would various
> graphing options be passed?

I have taken a very simplistic approach for now.   A graphing URL would like 
something like

/hosts/foo.org/metrics/cpu_user/graph?from=now-2day&to=now-1day&title="cpu 
user"&vlabel="percentage"&arealabel="cpu"

Obviously there are many more graphing options than these that are defined in 
rrdtool.  We would just need to figure out the best way to expose them.  
Options like CDEFs and VDEFs might be interesting.  Then there are the stacked 
graphs.  I'm not sure how to handle that.  But if it can be expressed on a 
command line, we should be able to figure out how to express it in REST.

> 
> For /hosts/{hid}/metrics/{mid}/data, could that be expanded to return
> LAST, MIN,MAX, AVERAGE (etc) values as well?
> 
> 

Absolutely, it would just be a matter of defining more query  parameters

Brad

--
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

[Ganglia-developers] Ganglia REST interface (was:Re: Gauging interest in writing a Ganglia eBook)

2011-12-05 Thread Brad Nicholes

All,
 I just wanted to get some feedback on how much interest there would be for 
a REST interface for Ganglia.  I spent a few days putting together a POC of a 
REST interface and was able to get something that implements the following REST 
URLs:

/clusters
/clusters/{cid}
/clusters/{cid}/hosts
/hosts
/hosts/{hid}
/hosts/{hid}/metrics
/hosts/{hid}/metrics/{mid}
/hosts/{hid}/metrics/{mid}/data
/hosts/{hid}/metrics/{mid}/graph
/hosts/{hid}/metrics/{mid}/info 

/clusters/... and /hosts/... pull data directly from the XML produced by 
gmetad.  .../data, .../graph, .../info pull data or graphs directly from the 
rrd's through rrdtool.  .../graph actually produces an rrdtool graph for the 
specified metric given some query params that affect the attributes of the 
graph.  I would show you all a live demo but I don't have access to a web 
server outside our firewall.  Before I can contribute the code, I need to get 
permission from my employer first so before going to that trouble, I just 
wanted to see if I was headed in the right direction.  

comments (on at least the little bit of information I gave you :)

Brad



>>> On 12/2/2011 at 5:31 PM, in message
, Matt
Massie  wrote:
> Brad-
> 
> Can you open a new thread on the developer's list?  I think there's going
> to be quite a bit of interest in a REST interface to Ganglia.  It would be
> really useful to have.  I know I've been tempted to write one myself.
> 
> -Matt
> 
> 
> On Fri, Dec 2, 2011 at 4:21 PM, Vladimir Vuksan  wrote:
> 
>> I am sure lots of people would appreciate REST interface to Ganglia.
>> Myself and Jeff Buchbinder have been talking on how we could implement
>> it but if you already have it completed that would be an awesome
>> addition ;-).
>>
>> Vladimir
>>
>> On 02.12.2011 10:45, Brad Nicholes wrote:
>> > Hey Matt,
>> > How are you?  It's been a while.  I know I haven't been biggest
>> > contributor to the Ganglia project lately but I still monitor the
>> > mailing lists and this book sounds like a great idea.  Count me in
>> > anywhere I can help.
>> >
>> > On a slightly different note:
>> >
>> > I have managed to carve out a little time over the past few weeks to
>> > get back into a little Ganglia development.  Since we are gauging
>> > interest, would anybody be interested in a REST interface for
>> > Ganglia?
>> > I have worked up a POC that allows a user to query metrics from
>> > gmetad through REST as well as pull data and graphs directly from the
>> > RRD files.  I still have to get permission from my employer before I
>> > can contribute the REST code to the Ganglia project, but before I go
>> > to that effort I just wanted to see if this is something that the
>> > Ganglia community would be interested in.
>> >
>> > Brad
>> >
>> >
>> >
>> >>>> On 12/1/2011 at 12:31 PM, in message
>> > ,
>> > Matt
>> > Massie  wrote:
>> >> There's an O'reilly editor who's interested in publishing a ~50-page
>> >> eBook
>> >> on ganglia.
>> >>
>> >> I have no doubt the ganglia community would benefit from a book
>> >> covering
>> >> topics like:
>> >>
>> >>- Ganglia's components and overall architecture
>> >>- Typical deployment configurations including simple steps for
>> >> verifying
>> >>an installation (e.g. unicast/multicast, single cluster/multiple
>> >>distributed clusters/datacenter)
>> >>- Navigating and using the new web interface
>> >>- Tips for extending ganglia's functionality (e.g. gmetric,
>> >> modules)
>> >>- Common integration points (e.g. Hadoop metrics, Nagios)
>> >>- A simple step-by-step checklist for debugging common ganglia
>> >> issues
>> >>with pointers to our web site, mailing lists, irc channel, etc.
>> >>- Supported platforms and core metrics
>> >>- Scaling to clusters > 1000 nodes
>> >>
>> >> These are just ideas off the top of my head and not meant to final
>> >> or
>> >> comprehensive but meant to provide a list for discussion.  Of
>> >> course, let
>> >> me know if there's topics the community would like to know more (or
>> >> less)
>> >> about.  The purpose of the book is to serve as a first-read book for
>> >> people
>> >> new to ganglia.  Keep in mind, for much of the book, we

Re: [Ganglia-developers] Gauging interest in writing a Ganglia eBook

2011-12-02 Thread Brad Nicholes

Hey Matt,
How are you?  It's been a while.  I know I haven't been biggest contributor 
to the Ganglia project lately but I still monitor the mailing lists and this 
book sounds like a great idea.  Count me in anywhere I can help.  

On a slightly different note:

I have managed to carve out a little time over the past few weeks to get back 
into a little Ganglia development.  Since we are gauging interest, would 
anybody be interested in a REST interface for Ganglia?  I have worked up a POC 
that allows a user to query metrics from gmetad through REST as well as pull 
data and graphs directly from the RRD files.  I still have to get permission 
from my employer before I can contribute the REST code to the Ganglia project, 
but before I go to that effort I just wanted to see if this is something that 
the Ganglia community would be interested in.

Brad



>>> On 12/1/2011 at 12:31 PM, in message
, Matt
Massie  wrote:
> There's an O'reilly editor who's interested in publishing a ~50-page eBook
> on ganglia.
> 
> I have no doubt the ganglia community would benefit from a book covering
> topics like:
> 
>- Ganglia's components and overall architecture
>- Typical deployment configurations including simple steps for verifying
>an installation (e.g. unicast/multicast, single cluster/multiple
>distributed clusters/datacenter)
>- Navigating and using the new web interface
>- Tips for extending ganglia's functionality (e.g. gmetric, modules)
>- Common integration points (e.g. Hadoop metrics, Nagios)
>- A simple step-by-step checklist for debugging common ganglia issues
>with pointers to our web site, mailing lists, irc channel, etc.
>- Supported platforms and core metrics
>- Scaling to clusters > 1000 nodes
> 
> These are just ideas off the top of my head and not meant to final or
> comprehensive but meant to provide a list for discussion.  Of course, let
> me know if there's topics the community would like to know more (or less)
> about.  The purpose of the book is to serve as a first-read book for people
> new to ganglia.  Keep in mind, for much of the book, we won't be starting
> from scratch.  We already have a good amount of documentation that just
> needs to be organized and edited.
> 
> I'll be happy to contribute time to make this eBook a reality; however, I
> want the book authors to be the leaders and experts in the ganglia
> community.  I think it best we divide and conquer and write the book as a
> team.  Who is interesting in helping write the book?
> 
> -Matt




--
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] AUTHORS file

2011-02-23 Thread Brad Nicholes

>>> On 2/22/2011 at 5:50 PM, in message
, Bernard Li
 wrote:
> Hi all:
> 
> I'd like to propose that we replace AUTHORS file with the following 
> contents:
> 
> --- cut ---
> Ganglia Development Team 
> 
> For a full list of current/past developers and contributors, please
> see: http://ganglia.info/?page_id=325 
> --- cut ---
> 
> Thoughts?
> 

+1


--
Free Software Download: Index, Search & Analyze Logs and other IT data in 
Real-Time with Splunk. Collect, index and harness all the fast moving IT data 
generated by your applications, servers and devices whether physical, virtual
or in the cloud. Deliver compliance at lower cost and gain new business 
insights. http://p.sf.net/sfu/splunk-dev2dev 
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] dmax for python and C gmond modules

2011-02-01 Thread Brad Nicholes

>>> On 2/1/2011 at 12:02 PM, in message
, Bernard Li
 wrote:
> Hi all:
> 
> When I was writing a Python module for monitoring GPU metrics, I
> noticed that the interface does not provide a way to set dmax.
> According to the gmetric man page, dmax is:
> 
>-d, --dmax=INT
>   The lifetime in seconds of this metric  (default=*0*)
> 
> So I was wondering why this isn't available for the Python or C module
> interfaces.
> 
> Right now, if I had added a new metric via the interfaces and no
> longer want it, I would have to:
> 
> 1) Comment it out in the conf
> 2) Restart the gmond where the module originates from
> 3) Restart the collector gmond
> 4) Restart gmetad
> 

I would probably have to go back and figure out what I was thinking at the 
time, but I vaguely recall that dmax was hardcoded for all of the standard 
metric in the 3.0 version of gmond.  So at the time I was probably thinking 
that exposing dmax just wasn't necessary.  That was probably a wrong 
assumption.  Thinking back now, it would have made since for dmax to be 
hardcoded in 3.0 because there was no way to add or remove metrics in that 
version.  But in the 3.1 version where you can do it, dmax shouldn't have been 
hardcoded and should have been exposed.  

Brad

--
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires 
February 28th, so secure your free ArcSight Logger TODAY! 
http://p.sf.net/sfu/arcsight-sfd2d
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] send_metadata_interval

2011-01-11 Thread Brad Nicholes

>>> On 1/10/2011 at 4:52 PM, in message
, Bernard Li
 wrote:
> Hi Brad:
> 
> Thanks for your reply.
> 
> On Mon, Jan 10, 2011 at 8:06 AM, Brad Nicholes  wrote:
> 
>> The purpose of setting the send_metadata_interval to 0 by default was to 
> avoid unnecessary traffic for our default configuration of multicast.  
> Setting the directive to anything other than 0 will cause each gmond to start 
> sending all of its metric metadata on that interval.  If you are going to set 
> it by default, IMO 30 seconds is too low.  The problem is that people only 
> notice this in the first few minutes after restarting a gmond.  They expect 
> metrics to start showing up immediately.  After the gmond node finally does 
> send its metadata, rebroadcasting the metadata at any interval is just 
> consuming unnecessary bandwidth on the network.  Especially in a multicast 
> environment where it isn't needed at all.  Also consider that the more gmond 
> nodes you have the more traffic you are going to but on the network where 99% 
> of the time the extra traffic is totally unnecessary.
> 
> I have a perhaps naive question.  It sounds like
> send_metadata_interval is only relevant to unicast configuration, so
> why is multicast affected as well?  How difficult of a code change
> would it be if we make the send_metadata_interval directive to only
> affect unicast?
> 

We could add code to gmond to always disable resending metadata based on an 
interval.  But then that is what the default value of 0 was doing.  


> Also multicast is the default configuration due to historic reasons
> but not because it is more common.  It is however easier to set up if
> your environment supports it.  Is it time for us to evaluate whether
> we should switch to unicast as the default?  And if so how?  What is
> the actual spread between unicast and multicast users?  If it turns
> out that the majority of our (new) users are using unicast, should we
> spend more time/effort making it easier for them to use Ganglia?
> 

Actually I think this is a good idea.  In my experience, unicast seems to be 
more the norm rather than the exception now.  If we were to make unicast the 
default, then that would make the suggestion above more relevant.  We would 
probably want to put something in the code to automatically disable the send 
metadata for multicast.


>> 300 or 600 seconds is probably good enough for a default.  But no matter 
> what the default is, users still have to understand what that directive is 
> for and how to optimize it.  The value of send_metadata_interval will 
> probably be different for every installation when you take into consideration 
> the number of nodes, the number of metrics and any other network related 
> variables.
> 
> A couple more ideas came out of a brief brainstorming session on IRC
> between Vladimir, Jesse and myself:
> 
> 1) Collector gmond should request metadata from all gmonds when it has
> been freshly (re)started

This already happens in multicast mode.  Whenever a gmond node receives a 
metric packet for which it has no metadata, it automatically sends out a 
request on the channel for metadata.  The end result is that all gmond nodes 
are constantly resyncing themselves until all nodes in a cluster have a 
complete metadata picture.  However, the same can not be done for unicast 
because, by definition, there is no two-way communication.  In order to make 
the same functionality work for unicast, we would have to introduce a new 
listen port on every gmond that would accept commands and respond to whatever 
they are.  Doing that opens up a security risk that would have to be dealt with 
correctly. 

> 2) Add a configuration check for gmond so upon starting, if
> configuration is unicast-based, and send_metadata_interval is 0, warn
> the user to set it to a sane number

This would be a good idea no matter what else we do.  

> 3) Find a middle ground of default send_metadata_interval which does
> not hurt new users in HPC space wanting to use unicast
> 
> 2) and 3) are workarounds which could be implemented relatively
> quickly, 1) maybe not so much.

agreed

Brad

--
Gaining the trust of online customers is vital for the success of any company
that requires sensitive data to be transmitted over the Web.   Learn how to 
best implement a security strategy that keeps consumers' information secure 
and instills the confidence they need to proceed with transactions.
http://p.sf.net/sfu/oracle-sfdevnl 
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] send_metadata_interval

2011-01-10 Thread Brad Nicholes

>>> On 1/7/2011 at 9:10 PM, in message
, Jesse Becker
 wrote:
> On Fri, Jan 7, 2011 at 15:25, Bernard Li  wrote:
>> Hi all:
>>
>> Since the release of Ganglia 3.1, we have introduced the new
>> configuration option send_metadata_interval in gmond.conf.  This is
>> set to 0 by default and the user must set this to a sane number if
>> using unicast otherwise if gmonds are restarted, hosts may appear to
>> be offline (this is documented in the release notes).  A bug has
>> already been filed:
>>
>> http://bugzilla.ganglia.info/cgi-bin/bugzilla/show_bug.cgi?id=242 
>>
>> We recently have a lot of users having this issue and Vladimir
>> recommend that we just set a sane number as the default and be done
>> with it, since we end up spending a lot of time on IRC/mailing-list to
>> solve the same problem over and over again.
>>
>> Since there have been some commits to the 3.1 branch since tagging
>> 3.1.7, I propose we just copy 3.1.7 tag, update the send_meta_data
>> interval in the configuration file and release that as 3.1.8.
>>
>> This is not the normal procedure for making a release, so I'd like to
>> get some feedback from other developers.
>>
>> BTW I am thinking of setting send_metadata_interval to 30 seconds.
>> Also, does anybody know if this setting affects multicast setups in
>> any way?
> 
> I think that it's fine to set this to a non-zero value, but I wonder
> if 30 seconds is too high.  I did a quick set of checking on the
> actual packets that are sent--and specifically the metadata packets.
> I haven't been able to really delve into the code to figure exactly
> what's going on (this part of the code is't terribly transparent to
> me), but I *think* that they are really large--on the order of several
> KB when fully assembled, as compared to less than 100-120 bytes for a
> typical metric packet .  I think that size will increase with the
> number of metrics stored, since each one must be described in full XML
> each time.
> 
> The reason for the large size is that an entire XML description of the
> metrics appears to be sent each time.  Metadata packets also appear to
> go over TCP, not UDP.
> 
> My testing was pretty simple:
> 1) setup a gmond (from SVN, well after 3.1 came out) in unicast mode.
> 2) set 'send_metadata_interfaval' to 1
> 3) disable all modules, except for 'mod_core'
> 4) remove all collection groups.
> 5) start gmond, and run tcpdump.
> 
> On a large cluster, with lots of metrics per host, I can see problems
> if the metadata packets are sent too frequently.  I have hosts that
> send well over 300 metrics (lots of CPU cores makes for lots of
> metrics...).  Each of these need to be described in the metadata
> packets.
> 
> So I think that setting a non-zero default is fine.  But think that
> something like 300 or 600 seconds would be preferable.
> 

The purpose of setting the send_metadata_interval to 0 by default was to avoid 
unnecessary traffic for our default configuration of multicast.  Setting the 
directive to anything other than 0 will cause each gmond to start sending all 
of its metric metadata on that interval.  If you are going to set it by 
default, IMO 30 seconds is too low.  The problem is that people only notice 
this in the first few minutes after restarting a gmond.  They expect metrics to 
start showing up immediately.  After the gmond node finally does send its 
metadata, rebroadcasting the metadata at any interval is just consuming 
unnecessary bandwidth on the network.  Especially in a multicast environment 
where it isn't needed at all.  Also consider that the more gmond nodes you have 
the more traffic you are going to but on the network where 99% of the time the 
extra traffic is totally unnecessary.

300 or 600 seconds is probably good enough for a default.  But no matter what 
the default is, users still have to understand what that directive is for and 
how to optimize it.  The value of send_metadata_interval will probably be 
different for every installation when you take into consideration the number of 
nodes, the number of metrics and any other network related variables.

Brad


--
Gaining the trust of online customers is vital for the success of any company
that requires sensitive data to be transmitted over the Web.   Learn how to 
best implement a security strategy that keeps consumers' information secure 
and instills the confidence they need to proceed with transactions.
http://p.sf.net/sfu/oracle-sfdevnl 
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] Ganglia Web top-level project + versioning

2010-11-05 Thread Brad Nicholes

>>> On 11/4/2010 at 6:21 PM, in message
, Bernard Li
 wrote:
> Hi Brad:
> 
> [I've changed the subject line to be more reflective of the current 
> discussions]
> 
> On Thu, Nov 4, 2010 at 8:50 AM, Brad Nicholes  wrote:
> 
>> I'm not sure that we need to physically split the web frontend from the 
> backend as far as the Ganglia project goes.  IMO, why not just follow the 
> pattern that we already have in SVN under trunk.  Right now we have 
> trunk/monitor-core which includes everything.  Could we just create a new 
> directory under trunk called web-frontend and move everything that has to do 
> with the web frontend out of monitor-core and into web-frontend.  From that 
> point on, they could both be treated as separate projects with their own 
> release cycles without physically splitting the code into different 
> repositories.  Tagging and branches would also work the same way.
> 
> That's fine.
> 
> How about versioning?  Or am I thinking too much?  One potential issue
> is that ganglia-core would be at 4.0 and ganglia-web will be at 3.5 --
> this might cause confusion as to what combination is supported, or
> vice versa.
> 

As far as versioning goes, I think that ganglia-web would just follow its own 
version scheme.  The frontend might have to include some kind of check on the 
version of the backend to make sure that it is compatible.  I'm not sure how 
flexible the frontend could be, but since all it is doing is consuming XML, I 
am guessing that it could be fairly flexible when it comes to backward 
compatibility.  I am guessing that the most likely scenario is that a user 
would upgrade the frontend a lot more frequently than the backend.  So there 
probably wouldn't have to be much need to worry about an older frontend having 
to support a newer backend.  I think it would be a natural thing for a Ganglia 
user to automatically upgrade the frontend whenever the backend is upgraded.  
But they would probably upgrade the frontend routinely wthout a backend upgrade.

Anyway, yes I think you are thinking too much :-)  Documenting compatibility 
would probably be sufficient.  Of course we as the Ganglia developers, wouldn't 
be able to test every new release of the frontend with every previous release 
of the backend.  But like I said, since the frontend is just consuming XML, it 
should be flexible enough to handle backwards compatibility.  Also the fact 
that the XML schema isn't expected to change, at least no drastically, within a 
major version of the backend, backward compatibility should be simple.

Brad 

--
The Next 800 Companies to Lead America's Growth: New Video Whitepaper
David G. Thomson, author of the best-selling book "Blueprint to a 
Billion" shares his insights and actions to help propel your 
business during the next growth cycle. Listen Now!
http://p.sf.net/sfu/SAP-dev2dev
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] [Ganglia-general] IRC chat on Ganglia Web Frontend re-write 10/13/2010 (Wed) 9-10am PDT

2010-11-04 Thread Brad Nicholes

I'm not sure that we need to physically split the web frontend from the backend 
as far as the Ganglia project goes.  IMO, why not just follow the pattern that 
we already have in SVN under trunk.  Right now we have trunk/monitor-core which 
includes everything.  Could we just create a new directory under trunk called 
web-frontend and move everything that has to do with the web frontend out of 
monitor-core and into web-frontend.  From that point on, they could both be 
treated as separate projects with their own release cycles without physically 
splitting the code into different repositories.  Tagging and branches would 
also work the same way.

The only purpose I see for splitting them into two different projects is to try 
to grow two different communities (ie. developers with rights to the web 
project who don't necessarily have rights to the monitor-core project and 
vice-versa).  Given the fact that we don't really have a large developer 
community, I'm not sure that it would be a good idea to split the community 
that we have.

Brad



>>> On 11/4/2010 at 1:15 AM, in message
, Bernard Li
 wrote:
> Hi all:
> 
> The other day we were talking on IRC regarding how to proceed with
> this "re-write" effort for the frontend.  In the beginning, I was
> gung-ho on this re-write from scratch, however, recently Vladimir has
> been hacking away adding new features to the existing code in trunk.
> You can get a taste of it here:
> 
> http://ec2-184-72-167-114.compute-1.amazonaws.com/ganglia-new/ 
> 
> Which got me to thinking...  is a re-write from scratch the best
> approach, or should we just try to keep extending what we have?
> 
> Another administrative issue that cropped up, is whether to split out
> Ganglia-Web as a separate project such that it doesn't need to follow
> the main Ganglia release cycle (since the frontend code is usually
> backward/forward compatible with Ganglia releases anyway).
> 
> My idea is to create a new project for the frontend, give it a new
> name and start with a new version.  With that, we can tell users that
> after Ganglia X, we will no longer be shipping the web component, use
> Y for that.
> 
> Another approach is to retain the Ganglia name, but say that after
> Ganglia 3.2, there are 2 separate projects, ganglia and ganglia-web,
> in which case ganglia-web will be on a different release cycle than
> ganglia.
> 
> Sounds confusing?  Yes it is! :)
> 
> I don't really care either way, as long as it causes the least
> confusion to the users -- feel free to offer Plan C.
> 
> Another plan I have in mind is after we create branch-3.2 from trunk,
> we remove the web component from the code base, in which case all
> future bug fixes to ganglia-web goes into that branch only, and we
> will move development to GitHub (just for the frontend).
> 
> Thanks!
> 
> Bernard
> 
> On Thu, Oct 21, 2010 at 11:29 PM, Bernard Li  wrote:
>> Hi all:
>>
>> Sorry for the delay in posting the log, but I have finally uploaded
>> it.  Thanks Jesse for logging:
>>
>> http://therealms.org/oss/ganglia/ganglia_frontend_rewrite_irc_101310.txt 
>>
>> I have left the log as is, I just filtered out people's hostnames and
>> stuff.  I chopped off at the end when we started discussing outside
>> the scope of the frontend re-write.
>>
>> I will try to summarize the log in the next few days, but if anybody
>> else who was there would like to take a stab at it, please feel free.
>>
>> I think Erik and Vladimir have been hard at work hacking at a Ganglia
>> installation on an AWS instance.  We will try to schedule another time
>> to sync up and discuss further (would a phone teleconference be better
>> this time, or should we stick with IRC)?
>>
>> It doesn't look like the hackathon would happen next month.  It might
>> become a virtual hackathon but I would really like to put all the
>> developers in a room, but anyway, we'll see.
>>
>> Thanks again for all who showed up, and for all the great discussions.
>>
>> Cheers,
>>
>> Bernard
>>
>> On Wed, Oct 13, 2010 at 11:52 AM, Jesse Becker  wrote:
>>> I have a log that I will try to clean up and post later today.
>>>
>>> On Wed, Oct 13, 2010 at 14:46, Dave Josephsen  wrote:
 Hey all,

 Did anyone take minutes?  I wasn't able to attend but am interested in 
> hearing about the chat.

 Thanks

 -dave

 - Original Message -
 From: "Bernard Li" 
 To: ganglia-developers@lists.sourceforge.net, "Ganglia" 
> 
 Sent: Thursday, October 7, 2010 1:55:26 PM GMT -06:00 US/Canada Central
 Subject: [Ganglia-general] IRC chat on Ganglia Web Frontend re-write 
> 10/13/2010 (Wed) 9-10am PDT

 Dear all:

 I've been talking to people on and off about doing a web frontend
 re-write, in fact I have been thinking about it since almost three
 years ago when I started the "wishlist" thread:

 
> http://www.mail-archive.com/ganglia-developers@lists.sourceforge.net/msg03070.h
>  
> tml

 I've mana

Re: [Ganglia-developers] sFlow counters in Ganglia

2010-10-18 Thread Brad Nicholes

Apparently DNSSD is not working for me.  I don't have access to the DNS server 
so my guess is the #4 below is not set properly.  I turned off DNSSD and 
instead manually added a collector which pointed to my Ganglia server running 
the enabled gmond.  After restarting hsflowd, I started to see packets showing 
up on the gmond server.  After fixing another gmetad configuration problem, my 
box running sflow showed up.  Very cool.

Brad



>>> On 10/18/2010 at 12:50 PM, in message
<4d8a91b2-20d9-45a2-aea8-1de79d656...@inmon.com>, Neil McKee
 wrote:
> It sounds like you are on the right track,  but here is a little hsflowd 
> troubleshooting checklist...
> 
> On the source (box running hsflowd):
> 
> (1).  Any error messages in /var/log/messages?
> (2).  Check /etc/hsflowd.conf,  is the collector set,  or is DNSSD=on?
> (3).  If using DNSSD, is the "search" setting correct in /etc/resolv.conf?
> (4).  If using DNSSD,  are the SRV and TXT records in the zone file on the 
> DNS server?
> (5).  You can run hsflowd at the shell prompt with debug logging:  "root> 
> hsflowd -dd"
> 
> On the destination (box running gmond):
> 
> (6).  sFlow enabled in gmond.conf?
> (7).  check firewall settings (e.g. "root> iptables --list")
> (8).  Watch for an sFlow packet arriving every 20 seconds or so:  "root> 
> /usr/sbin/tcpdump -n udp port 6343"
> (Just remember that tcpdump sees packets before they hit the firewall,  so 
> check (7) may still apply)
> 
> On the UI: 
> 
> Should look exactly the same as is if gmond were running on the source too.
> 
> Regards,
> Neil
> 
> 
> On Oct 18, 2010, at 10:46 AM, Bernard Li wrote:
> 
>> Hi Brad:
>> 
>> On Mon, Oct 18, 2010 at 9:55 AM, Brad Nicholes  wrote:
>> 
>>> I built gmond with the sflow patch and got it up and running.  Then I 
> downloaded hsflowd from the sourceforge project as described in the gmond 
> doc.  hsflowd build and installed as expected and everything seemed to be up 
> and running.  I also added the extra udp_recv_channel block to the gmond.conf 
> file.  But now after everything is up and running, I don't see anything 
> different in Ganglia.  The web front end is just showing the same monitored 
> computers as it did before with the same metrics. If I query gmond through 
> telnet, I am not seeing any new metrics or spoofed nodes.  What am I missing?
>>> 
>>> I also tried to trace the network traffic on one of the machines that is 
> running hsflowd for anything from port 6343.  I'm not seeing anything there 
> either even though the box says that hsflowd is running.  I currently have 
> hsflowd running on two different boxes.  One is a SLED 10 box and the other 
> is a SLES 10 box.
>> 
>> You will need 2 hosts to test this, one running gmond and the other
>> running hsflowd (don't run gmond on this host).
>> 
>> On gmond.conf, add the extra udp_recv_channel as you had done and on
>> the hsflowd.conf, follow these instructions:
>> 
>> http://bugzilla.ganglia.info/cgi-bin/bugzilla/show_bug.cgi?id=276#c5 
>> 
>> Then the hsflowd host should show up on the gmond XML stream as if
>> it's running gmond.
>> 
>> Cheers,
>> 
>> Bernard




--
Download new Adobe(R) Flash(R) Builder(TM) 4
The new Adobe(R) Flex(R) 4 and Flash(R) Builder(TM) 4 (formerly 
Flex(R) Builder(TM)) enable the development of rich applications that run
across multiple browsers and platforms. Download your free trials today!
http://p.sf.net/sfu/adobe-dev2dev
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] sFlow counters in Ganglia

2010-10-18 Thread Brad Nicholes

I built gmond with the sflow patch and got it up and running.  Then I 
downloaded hsflowd from the sourceforge project as described in the gmond doc.  
hsflowd build and installed as expected and everything seemed to be up and 
running.  I also added the extra udp_recv_channel block to the gmond.conf file. 
 But now after everything is up and running, I don't see anything different in 
Ganglia.  The web front end is just showing the same monitored computers as it 
did before with the same metrics. If I query gmond through telnet, I am not 
seeing any new metrics or spoofed nodes.  What am I missing?

I also tried to trace the network traffic on one of the machines that is 
running hsflowd for anything from port 6343.  I'm not seeing anything there 
either even though the box says that hsflowd is running.  I currently have 
hsflowd running on two different boxes.  One is a SLED 10 box and the other is 
a SLES 10 box.

Brad



>>> On 10/11/2010 at 4:33 PM, in message
<9dd4668f-8a92-484e-9aac-8bece1c2a...@inmon.com>, Neil McKee
 wrote:
> As suggested,  I moved the sFlow receiver into a new file "sflow.c" and 
> eliminated any C99 assumptions.   This time there is a "--disable-sflow" 
> configure option too:
> 
> http://bugzilla.ganglia.info/cgi-bin/bugzilla/show_bug.cgi?id=276 
> 
> In order for sflow.c to feed data directly into the repository,  I had to 
> expose 1 structure definition and 5 functions that were previously private to 
> gmond.c.  Hence the new .h file, "gmond_internal.h".
> 
> Neil
> 
> 
> On Oct 7, 2010, at 2:56 PM, Peter Phaal wrote:
> 
>> Brad,
>> 
>> Thanks for the feedback. My comments are in-line.
>> 
>> On Oct 7, 2010, at 12:27 PM, Brad Nicholes wrote:
>> 
>>> Sorry to jump into this thread so late but I thought that I would throw my 
>>> 2 
> cents in.  
>>> 
>>> I finally got a chance to take a look at the code.  I was able to compile 
>>> it 
> but ran into some C99 issues with variable declarations.  Once I got the code 
> to compile, I was able to take a closer look at what it was doing.  From what 
> I could tell, it looks like the sflow integration is based around reading XDR 
> packets from an sflow agent and turning them into gmond spoofing metrics.  My 
> first question after seeing this is why does this code have to be built into 
> gmond.c?  Why can't it just do the same thing in a module that would be 
> plugged into gmond?  
>>> 
>>> The reason why I ask this is because we went to a lot of work to pull all 
>>> of 
> the metric gathering out of gmond and into modules (including all of the 
> standard metrics).  Some of the main reasons for this is so that metric 
> gathering could be pluggable without having to affect the gmond code itself.  
> That way if a bug was ever found and fixed for a specific metrics, we 
> wouldn't have to re-release all of Ganglia just for one metric fix.  Also, 
> modules give the user the ability to customize each gmond agent to conform to 
> the specific needs of the node where gmond is running.  Regarding sflow, it 
> seems that in order to integrate the sflow metrics into the Ganglia 
> monitoring system, only a single gmond node needs to be configured to gather 
> the sflow metrics.  All of the other gmond agents can continue to be 
> configured and run as they were.  Given that, it would make more sense to 
> integrate sflow as a module that could be loaded under a single gmond agent 
> rather than replacing all the gmond agents or even upgrading just a single 
> agent.  It would also seem to follow the way that other metric modules and 
> spoofing modules have been implemented as well.
>> 
>> 
>> I am not very familiar with gmod modules, but it looks like they are 
> designed around a polling model and used to retrieve metrics from the server 
> that the particular instance of gmond is running on:
>> 1. a module is loaded in the modules section of the gmond.conf file and 
> registers a set of metrics it can provide
>> 2. metrics are then included in collection_group sections and polled at the 
> specified intervals
>> 
>> With sFlow, the counters are being pushed by remote servers. There may be 
> hundreds of sFlow agents sending XDR packets to the single gmond instance. 
> Our code acts as a gateway, translating the metrics from the remote hosts and 
> presenting them as if they had arrived in the form of Ganglia XDR datagrams 
> from remote gmond instances. This function needs to be part of the main 
> datagram processing loop. I don't see a way for a module to inject code into 
> the packet processing loop(?)
>> 
>> We do of course plan to limit the changes t

Re: [Ganglia-developers] 3.0.x release, make 3.1 maintenance branch, create 3.2 release branch

2010-10-08 Thread Brad Nicholes

>>> On 10/8/2010 at 1:31 AM, in message <20101008073146.gc8...@sajinet.com.pe>,
Carlo Marcelo Arenas Belon  wrote:
> On Wed, Oct 06, 2010 at 03:23:40PM -0700, Bernard Li wrote:
>> Hi all:
>> 
>> I'd like to request that we make one last 3.0.x bug-fix release and
>> EOL that branch.
> 
> mostly everyone using EPEL packages in CentOS/RHEL is using 3.0 and
> so EOL than branch (like was done for 2.5 which was until recently
> all that was provided by debian and still what is distributed in
> by Novell) would be IMHO a bad idea.
> 
>> This will make way for shifting 3.1 to maintenance
>> branch and creating a 3.2. release branch from trunk.
> 
> if all you want is to have a 3.2 release branch (what would be the
> main feature on it, though?) then just do so; why affect 3.0 or 3.1
> but that decision?
> 

I think that Bernard is just basically stating the obvious and would just like 
to make the obvious, official.  There hasn't been any significant changes to 
the 3.0.x branch in 2 years nor has there been a release in that long either.  
The last release of from the 3.0.x branch was done by Bernard and I have a 
feeling that he isn't really interested in doing any more.  So the fact that 
there really isn't anybody willing to maintain the 3.0.x branch and that there 
are no release worthy bug fixes or enhancements to it, the branch is 
effectively EOL anyway.

As far as what would be new in 3.2, the sflow stuff is looking really cool.  I 
would rather see it implemented as a module, but if it isn't and ends up in 
gmond itself, that may be reason enough to call it 3.2.

Brad

--
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today.
http://p.sf.net/sfu/beautyoftheweb
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] sFlow counters in Ganglia

2010-10-08 Thread Brad Nicholes

>>> On 10/7/2010 at 3:56 PM, in message
<6b7a9ac5-465d-4311-ac85-fd42c219d...@inmon.com>, Peter Phaal
 wrote:
> Brad,
> 
> Thanks for the feedback. My comments are in-line.
> 
> On Oct 7, 2010, at 12:27 PM, Brad Nicholes wrote:
>  
>> Sorry to jump into this thread so late but I thought that I would throw my 2 
> cents in.  
>> 
>> I finally got a chance to take a look at the code.  I was able to compile it 
> but ran into some C99 issues with variable declarations.  Once I got the code 
> to compile, I was able to take a closer look at what it was doing.  From what 
> I could tell, it looks like the sflow integration is based around reading XDR 
> packets from an sflow agent and turning them into gmond spoofing metrics.  My 
> first question after seeing this is why does this code have to be built into 
> gmond.c?  Why can't it just do the same thing in a module that would be 
> plugged into gmond?  
>> 
>> The reason why I ask this is because we went to a lot of work to pull all of 
> the metric gathering out of gmond and into modules (including all of the 
> standard metrics).  Some of the main reasons for this is so that metric 
> gathering could be pluggable without having to affect the gmond code itself.  
> That way if a bug was ever found and fixed for a specific metrics, we 
> wouldn't have to re-release all of Ganglia just for one metric fix.  Also, 
> modules give the user the ability to customize each gmond agent to conform to 
> the specific needs of the node where gmond is running.  Regarding sflow, it 
> seems that in order to integrate the sflow metrics into the Ganglia 
> monitoring system, only a single gmond node needs to be configured to gather 
> the sflow metrics.  All of the other gmond agents can continue to be 
> configured and run as they were.  Given that, it would make more sense to 
> integrate sflow as a module that could be loaded under a single gmond agent 
> rather than replacing all the gmond agents or even upgrading just a single 
> agent.  It would also seem to follow the way that other metric modules and 
> spoofing modules have been implemented as well.
> 
> 
> I am not very familiar with gmod modules, but it looks like they are 
> designed around a polling model and used to retrieve metrics from the server 
> that the particular instance of gmond is running on:
> 1. a module is loaded in the modules section of the gmond.conf file and 
> registers a set of metrics it can provide
> 2. metrics are then included in collection_group sections and polled at the 
> specified intervals
> 
> With sFlow, the counters are being pushed by remote servers. There may be 
> hundreds of sFlow agents sending XDR packets to the single gmond instance. 
> Our code acts as a gateway, translating the metrics from the remote hosts and 
> presenting them as if they had arrived in the form of Ganglia XDR datagrams 
> from remote gmond instances. This function needs to be part of the main 
> datagram processing loop. I don't see a way for a module to inject code into 
> the packet processing loop(?)
> 

The way that I would envision an sflow module working would be similar to the 
spoofing example module that is currently checked into the Ganglia SVN 
repository.  The spoofing module can be found at 
http://ganglia.svn.sourceforge.net/viewvc/ganglia/trunk/monitor-core/gmond/python_modules/example/spfexample.py?revision=1895&view=markup
  .  Unfortunately it is a python module example rather than a C module but 
hopefully you can get the idea of what I am talking about from the code.  One 
application of this kind of spoofing module would be to load it under a gmond 
instance running on a VM host.  It would then query each VM running on the box 
and register a set of spoofed metrics for each VM.  From that point on, the 
module just reports the metrics for each spoofed VM and returns them as if 
gmond were running on each of the VMs.  I actually have another python module 
that does exactly that, but I haven't been able to release the source code for 
it yet.  You can also look at the modpython.c module to get an idea of how to 
do the spoofing in C code.  But then you guys have already worked with the 
spoofing code as part of the patch that you already did so you probably already 
know how that works.

Basically an sflow module would be loaded like any other module and a 
collection interval would be set in the configuration file.  In the sflow 
module itself, register a spoofed metric for each managed sflow monitored node. 
 How you get the list of nodes to register is up to you.  It could be from the 
gmond.conf file, some other configuration file or by listening to the sflow 
data packets themselves.  

The module would then start a thread that would read the XDR packets in exactly 
t

Re: [Ganglia-developers] sFlow counters in Ganglia

2010-10-07 Thread Brad Nicholes

>>> On 9/13/2010 at 3:07 PM, in message
, "Peter Phaal"
 wrote:
> We have started this project and the pieces seem to be falling into place
> nicely. We already have the first metrics showing up in the web interface.
> 
> The changes needed to implement sFlow support are contained entirely within
> the gmond.c file and are limited to the process_udp_recv_channel method.
> 
> Adding the following lines to the gmond.conf file enables sFlow support:
> /* sFlow channel */
> /* Note: 6343 is the IANA registered port for the sFlow protocol */
> udp_recv_channel {
>   port = 6343
> }
> 
> Our initial goal is to populate all the standard metrics from libmetrics.
> Once we have that working, we will send a patch containing the changes to
> gmond.c.
> 

Sorry to jump into this thread so late but I thought that I would throw my 2 
cents in.  

I finally got a chance to take a look at the code.  I was able to compile it 
but ran into some C99 issues with variable declarations.  Once I got the code 
to compile, I was able to take a closer look at what it was doing.  From what I 
could tell, it looks like the sflow integration is based around reading XDR 
packets from an sflow agent and turning them into gmond spoofing metrics.  My 
first question after seeing this is why does this code have to be built into 
gmond.c?  Why can't it just do the same thing in a module that would be plugged 
into gmond?  

The reason why I ask this is because we went to a lot of work to pull all of 
the metric gathering out of gmond and into modules (including all of the 
standard metrics).  Some of the main reasons for this is so that metric 
gathering could be pluggable without having to affect the gmond code itself.  
That way if a bug was ever found and fixed for a specific metrics, we wouldn't 
have to re-release all of Ganglia just for one metric fix.  Also, modules give 
the user the ability to customize each gmond agent to conform to the specific 
needs of the node where gmond is running.  Regarding sflow, it seems that in 
order to integrate the sflow metrics into the Ganglia monitoring system, only a 
single gmond node needs to be configured to gather the sflow metrics.  All of 
the other gmond agents can continue to be configured and run as they were.  
Given that, it would make more sense to integrate sflow as a module that could 
be loaded under a single gmond agent rather than replacing all the gmond agents 
or even upgrading just a single agent.  It would also seem to follow the way 
that other metric modules and spoofing modules have been implemented as well.

Implementing the sflow integration as a module would also allow it to change 
whenever a newer version of sflow is released or whenever the sflow spec or 
transport changes.  A user could simply upgrade his ganglia sflow module and be 
up to date with the latest spec without having to wait for the Ganglia project 
to re-release ganglia.

Anyway, the more that I am learning about sflow and what it does especially in 
relation to Ganglia and what it does, this all seems like a really cool idea.  
I am looking forward to seeing this integration done especially if it is 
through a pluggable module.

Brad

--
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today.
http://p.sf.net/sfu/beautyoftheweb
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] 3.0.x release, make 3.1 maintenance branch, create 3.2 release branch

2010-10-07 Thread Brad Nicholes

>>> On 10/6/2010 at 4:23 PM, in message
, Bernard Li
 wrote:
> Hi all:
> 
> I'd like to request that we make one last 3.0.x bug-fix release and
> EOL that branch.  This will make way for shifting 3.1 to maintenance
> branch and creating a 3.2. release branch from trunk.
> 
> Thoughts?
> 

+1, I don't know how many people are still on 3.0.x but it seems like most of 
the activity, at least from the mailing lists, seems to be on 3.1

Brad


--
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today.
http://p.sf.net/sfu/beautyoftheweb
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] Replaced TemplatePower with Dwoo in trunk web code

2010-09-09 Thread Brad Nicholes

>>> On 9/9/2010 at 09:19 AM, in message
<716533.84440...@web113303.mail.gq1.yahoo.com>, Martin Knoblauch
 wrote: 
> Hi,
> 
>  my € 0.02 are:
> 
> +1 for Trunk
> -1 for 3.1.x and 3.0.x.
> 
>  why am I opposed to the backports? Dwoo introduces considerable new 
> infrastructure that *I* view not suitable for the "stable" and "legacy" 
> trees. 
> Both of them are bug-fix only in my opinion. It is fine for trunk, no 
> question.
> 
>  Does the GPL licensing cause any real issues to end users? Just curious. 
> The 
> situation is pretty old by now anyway.
> 

The GPL could cause problems for the end user of they decide to modify or 
customize the frontend and then redistribute it.  In this case they would be 
required to release the source code.  That may not be an issue since all of the 
code is PHP script code anyway.  But the bigger issues is that there is a 
license conflict which can bring heartache to some customers.  The fact that we 
have an option in trunk whether we backport to the 3.1 or 3.0 branches, gives 
end users an option if they feel nervous about powertemplate.  At the very 
least, we should probably  backport the Dwoo code to 3.1 and 3.0 and make if 
available even if we don't put the backport in the main repository branch.

Brad

--
This SF.net Dev2Dev email is sponsored by:

Show off your parallel programming skills.
Enter the Intel(R) Threading Challenge 2010.
http://p.sf.net/sfu/intel-thread-sfd
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] Replaced TemplatePower with Dwoo in trunk web code

2010-09-09 Thread Brad Nicholes

>>> On 9/9/2010 at 7:30 AM, in message <20100909133028.gb31...@sajinet.com.pe>,
Carlo Marcelo Arenas Belon  wrote:
> On Wed, Sep 08, 2010 at 05:00:28PM -0700, Bernard Li wrote:
>> 
>> Just a quick note saying that I have replaced TemplatePower with Dwoo
>> as our PHP templating engine in the trunk web code:
> 
> why would we want to do that and throw away useful and time
> tested code? why would we do this in trunk destabilizing the
> development branch instead of in an independent branch which
> could be tested and validated before it gets merged into trunk
> if proven to at least be as usefull as the old code?, what is
> the scope of the work that is required on the templates and
> the rest of the PHP code to make this transition?
> 
>> Please test and report any issues (especially security related).  I'd
>> like to get this backported to 3.1.x and 3.0.x branches soon.
> 
> -1 in both accounts, they are both maintenance branches and shouldn't
> have any major rearchitecture done in them.
> 
>> Dwoo is modified/new BSD-licensed, which is the same as Ganglia.
> 
> and this doesn't make a difference at all AFAIK with the fact that
> templatepower is GPL and some of the PHP code LGPL (for a discusion
> on that read old threads on this issue) specially considering that
> the frontend code is not "linked" with anything else.
> 

Bernard is talking about the Ganglia license as a whole which includes the 
front end code.  The fact that templatepower is GPL has a direct affect on the 
front end code and therefore affects Ganglia as a project.  It is true that the 
front end code does not link with any of the backend (ie. gmond, gmetad), but 
it does link with all of the PHP code.  Therefore removing templatepower not 
only from trunk but from the 3.1 and 3.0 branches as well, relieves our end 
users from having to worry about licensing and any modifications that they make 
to their customized PHP code.  Basically this move just brings Ganglia more 
inline with regards to licensing.  I don't see any harm in replacing 
templatepower in trunk and then after a sufficient amount of testing, 
backporting this change to the 3.1 and 3.0 branches.  That is exactly the 
purpose of trunk and complies with the guidelines that we have established on 
the wiki.

BTW, the 3.1.x branch is not a maintenance branch.  It is currently our release 
branch.  Any new releases of Ganglia will be produced from the 3.1.x branch.  
In addition, trunk is our development branch which should allow for new 
contributions at any time.  Therefore being able to move forward with a change 
like incorporating Dwoo rather than TemplatePower in trunk, for whatever 
reason, is the appropriate thing to do.

Brad 

--
This SF.net Dev2Dev email is sponsored by:

Show off your parallel programming skills.
Enter the Intel(R) Threading Challenge 2010.
http://p.sf.net/sfu/intel-thread-sfd
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] [Ganglia-general] Spoofing functionality in 3.1.x branch...

2010-07-02 Thread Brad Nicholes

>>> On 6/24/2010 at 4:37 PM, in message
, Bernard Li
I don't think that I intended for the spoofing example module to be included in 
the distribution tarball.  If I remember right, I think I checked it in just so 
that the knowledge on how to do it didn't get lost.  If you want to include it, 
you can.  But I don't think that it is really necessary and it certainly 
shouldn't be enabled since it really does nothing at all.

Brad


 wrote:
> Hi Brad:
> 
> I am doing some cleanup in the trunk repo and found that the
> spfexample (python module) was not included in the distribution
> tarball and/or backported to the 3.1.x branch.  Should it be?  I'm not
> saying we should activate it by default, but should just include it
> much like the other example python module.
> 
> Thanks,
> 
> Bernard
> 
> On Thu, Dec 4, 2008 at 9:14 AM, Brad Nicholes  wrote:
>>
>> For those that are interested in the module based spoofing feature, all of 
> the functionality should be complete and has been backported to the 3.1.x 
> branch.  I have also added some spoofing module examples to trunk that can be 
> downloaded from monitor-core/gmond/python_modules/example/spfexample.py in 
> the 
> trunk repository.  There  is also a small .pyconf file in 
> monitor-core/gmond/python_modules/conf.d/spfexample.pyconf.  This example 
> module should give you enough guidance so that you can build your own 
> spoofing module.  Please let me know if anything is missing.
>>
>> Brad
>>
>>
>> -
>> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
>> Build the coolest Linux based applications with Moblin SDK & win great prizes
>> Grand prize is a trip for two to an Open Source event anywhere in the world
>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ 
>> ___
>> Ganglia-general mailing list
>> ganglia-gene...@lists.sourceforge.net 
>> https://lists.sourceforge.net/lists/listinfo/ganglia-general 
>>




--
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] RFE suggestions for Ganglia 3.1.7

2010-06-01 Thread Brad Nicholes

>>> On 6/1/2010 at 10:56 AM, in message
, Jesse Becker
 wrote:
> On Tue, Jun 1, 2010 at 11:52, Art Peck  wrote:
>>
>> I am very impressed with Ganglia 3.1.7. I would like to create an addon 
> package to facilitate monitoring of the Oracle Sun Ray Server Software and 
> associated desktop devices.
>>
>> I've already created one Python module and integrated it into gmond and 
> gmetad. For the most part, it is working as I wanted. However, I would really 
> like to be able to manipulate the formatting of the resulting graph(s). For 
> example, I would greatly prefer a line graph to an area and I really need to 
> STACK several metrics on a graph. So I have the follow RFE's:
> 
> Writing custom graphing modules has been more easily supported for
> several releases now.  If you have need to create customized charts
> (like many of us do), there is a well documented framework for doing
> so.  Take a look at the various *_report.php files in the web/graph.d
> directory of your ganglia installation.  I've specifically written a
> storage report script that uses stacked graphs.
> 
>> (1) Extend the descriptor dictionary to include key=value pairs that get 
> passed to gmetd and the web frontend allowing for specification of more of 
> the rrdgraph formatting options. Maybe something like 'graph_type' = 'Line', 
> 'line_color' = 'Red', 'background_color' = 'Lt Blue', etc
> 
> I believe that you could "fake" this already using string metrics.
> 

This is true.  You can actually add anything that you want to the metric 
definition in your module.  If Gmond does not recognize the extra elements in 
the definition, it will simply add the key/value pair as  in the 
XML that is produced by gmond.  Both gmond and gmetad will ignore extra data 
and just continue to pass it through in the XML.  At that point you could write 
a specialized graph module in the web front end that does understand the extra 
data tags in the XML and reacts to them appropriately.

Brad 


--

___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] [Ganglia-general] ganglia using ports < 1024?

2010-03-25 Thread Brad Nicholes

>>> On 3/24/2010 at 07:22 PM, in message
<57153a00-be68-4cee-b867-4cea86925...@crackpot.org>, Alex Dean
 wrote: 

> On Mar 24, 2010, at 3:40 PM, Brad Nicholes wrote:
> 
>> [Moved to the dev list]
>>
>>>>> On 3/22/2010 at 7:49 AM, in message
>> , Winnie  
>> Lacesso
>>  wrote:
>>
>>> Dear All,
>>>
>>> I'm new to ganglia, but it looks wonderful.
>>> Platform is Scientific Linux 4 & 5.
>>> Due to no firewall > 1024 it might be more prudent for ganglia to use
>>> ports < 1024 - I'm nervous to have anything php-related listening on
>>> unfirewalled network. Am trying to configure ganglia using ports 849,
>>> 851 for xml & 852 for interactive.
>>> Since gmetad & gmond run as user ganglia, I think this is the  
>>> problem - it
>>> can't create ports < 1024?
>>> Errors are
>>> tcp_listen() on xml_port failed: Permission denied
>>>
>>> I can't seem to find any example where ganglia is used with ports <  
>>> 1024.
>>> Impossible?
>>> Pointers gratefully deferenced
>>>
>>
>> The problem here is that the socket creation is happening after  
>> gmond has dropped root.  Therefore unless the runas user has  
>> sufficient rights to create a port <1024 you will run into problems  
>> trying to use ports <1024.
>>
>> I know there was a lot of discussion several months ago about when  
>> to daemonize and when to setuid.  Can somebody who is more familiar  
>> with the discussion respond with what the outcome was?  Carlo,  
>> Daniel  maybe??
>>
>> Can we just move the calls to setup_listen_channels_pollset and  
>> Ganglia_udp_send_channels_create before the call to  
>> setid_if_necessary?  It would still be happening after the call to  
>> daemonize_if_necessary.
> 
> There's nothing written in PHP listening anywhere is there?  No daemon  
> processes written in PHP.  PHP is run as an Apache module, so it will  
> be on whatever port Apache is listening on and (presumably) will be  
> running as an unprivileged user.

No, this is an issue with gmond only

 Brad


--
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] [Ganglia-general] ganglia using ports < 1024?

2010-03-24 Thread Brad Nicholes

[Moved to the dev list]

>>> On 3/22/2010 at 7:49 AM, in message
, Winnie Lacesso
 wrote:

> Dear All,
> 
> I'm new to ganglia, but it looks wonderful.
> Platform is Scientific Linux 4 & 5.
> Due to no firewall > 1024 it might be more prudent for ganglia to use
> ports < 1024 - I'm nervous to have anything php-related listening on
> unfirewalled network. Am trying to configure ganglia using ports 849, 
> 851 for xml & 852 for interactive.
> Since gmetad & gmond run as user ganglia, I think this is the problem - it
> can't create ports < 1024?
> Errors are
> tcp_listen() on xml_port failed: Permission denied
> 
> I can't seem to find any example where ganglia is used with ports < 1024.
> Impossible?
> Pointers gratefully deferenced
> 

The problem here is that the socket creation is happening after gmond has 
dropped root.  Therefore unless the runas user has sufficient rights to create 
a port <1024 you will run into problems trying to use ports <1024.

I know there was a lot of discussion several months ago about when to daemonize 
and when to setuid.  Can somebody who is more familiar with the discussion 
respond with what the outcome was?  Carlo, Daniel  maybe??

Can we just move the calls to setup_listen_channels_pollset and 
Ganglia_udp_send_channels_create before the call to setid_if_necessary?  It 
would still be happening after the call to daemonize_if_necessary.

Brad

--
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] gmond fails with "apr_pollset_create failed: Invalid argument" when no udp_recv_channels or tcp_accept_channels are defined

2010-03-24 Thread Brad Nicholes

>>> On 3/19/2010 at 4:03 PM, in message
, Bernard Li
 wrote:
> Dear all:
> 
> Looks like we have a bug in setup_listen_channels_pollset() in gmond.c.
> 
> If your gmond.conf has no udp_recv_channel or tcp_accept_channel
> defined, gmond will fail to run with the error message:
> 
> apr_pollset_create failed: Invalid argument
> 
> The error checking for apr_pollset_create() was recently implemented
> since r2041.
> 
> The issue seems to be that on certain platform, apr_pollset_create()
> will fail if "total_listen_channels" = 0 (this is the "size" argument
> according to the apr_pollset_create definition).
> 
> Previously, since there was no error checking, the code would continue
> merrily without erroring out.  listen_channels will still be NULL and
> thus would set deaf = 1 in main().  Now since we have error checking,
> it actually bombs out.
> 
> One fix is basically to check whether total_listen_channels is 0 prior
> to the apr_pollset_create() call and if so just return.  This should
> give the same behaviour as before.
> 
> So far I have been able to reproduce it on CentOS 5 x86_64.  However,
> there has been conflicting reports regarding whether this fails on
> Ubuntu 9.04 or not.  So if you guys could test this out and report
> back what platforms you encounter this bug, that would be great.
> 
> To reproduce the bug, simply comment out the udp_recv_channel and
> tcp_accept_channel clauses and run gmond.  It should fail with the
> error message mentioned.
> 

Fails on SLED-10  [glibc-2.4]
Works on OpenSuse 11.2   [glibc-2.10]

+1 on the fix that you suggested.

Brad


--
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] some changes

2010-03-08 Thread Brad Nicholes

That's not an uncommon situation to be in.  I haven't been able to contribute 
as much to the project as I did in the past for many of the same reasons.  Just 
don't wander off too far.  It's always nice to have a good developer around.

Brad

>>> On 3/8/2010 at 2:10 PM, in message <4b9567bb.1030...@pocock.com.au>, Daniel
Pocock  wrote:

> 
> Hi everyone,
> 
> As some of you are aware, I have been employed to work full time on a
> project involving the customization, deployment and reporting from
> Ganglia in a large enterprise.  This has allowed me to dedicate some of
> my time to collaborating with the open source community to (hopefully)
> improve what was already quite a neat product before I was first
> introduced to it.
> 
> In the near future I am making a transition back to the world of IT
> consulting and contracting.  Although I do see myself using and
> contributing to Ganglia in the future, it may not be on the same scale
> as what I have done over the last couple of years.
> 
> I think it's quite important that I make people aware of this because of
> my role as the release manager for the most current release, 3.1.7.  I
> am very aware that some of the changes I have introduced are a little
> controversial and some people may have preferred alternative solutions
> (or maintained the status quo).  I am also aware that some of these
> changes will lead to some additional requests for clarification and
> support on the user's email list.
> 
> One final comment: I just want to thank all those who have contributed
> to a project that is being used very successfully on an enormously large
> scale.  When participating in this project through the mailing list and
> IRC, I can't help noticing that there is an observable difference
> between the quality of the support and interaction here as compared to
> support from some large, well paid commercial vendors that I have dealt
> with over the years.
> 
> Regards,
> 
> Daniel
> 
> 
> --
> Download Intel® Parallel Studio Eval
> Try the new software tools for yourself. Speed compiling, find bugs
> proactively, and fine-tune applications for parallel performance.
> See why Intel Parallel Studio got high marks during beta.
> http://p.sf.net/sfu/intel-sw-dev 
> ___
> Ganglia-developers mailing list
> Ganglia-developers@lists.sourceforge.net 
> https://lists.sourceforge.net/lists/listinfo/ganglia-developers 




--
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] Problems building trunk r2290

2010-03-03 Thread Brad Nicholes

Makes sense to me

Brad

>>> On 3/3/2010 at 1:54 PM, in message
, Bernard Li
 wrote:
> Hi Brad:
> 
> Thanks for the confirmation.
> 
> However, I have another issue related to the first problem.  Basically
> my x86_64 CentOS is detected as "x86_64-unknown-linux-gnu" and thus it
> is setting LIB_SUFFIX to "lib" instead of "lib64".
> 
> Do RHEL hosts really identify themselves as x86_64-redhat-linux*?  How
> about Fedora?  I do confirm that this works as expected on an openSUSE
> box.
> 
> Perhaps we should reverse the logic and make a special case for Debian 
> instead?
> 
> Cheers,
> 
> Bernard
> 
> On Wed, Mar 3, 2010 at 11:28 AM, Brad Nicholes  wrote:
>> I am seeing the same thing.  To get past it I just hardcoded the path to the 
> sed utility.  I'm guessing that either some platforms or some version of 
> libtool isn't setting the SED environment variable but the configure script 
> assumes that it is.
>>
>> Brad
>>
>>>>> On 3/2/2010 at 6:21 PM, in message
>> , Bernard Li
>>  wrote:
>>> Hi all:
>>>
>>> I am having problems building trunk r2290.
>>>
>>> Specifically I have 2 issues:
>>>
>>> 1) During ./configure
>>>
>>> ./configure: line 20056: syntax error near unexpected token `)'
>>> ./configure: line 20056: `x86_64-suse-linux*)'
>>>
>>> 2) During make -C web conf.php
>>>
>>> make: Entering directory `/root/code/ganglia.trunk/web'
>>> ../scripts/fixconfig conf.php.in
>>> ../scripts/fixconfig: line 60: @SED@: command not found
>>> ../scripts/fixconfig: line 67: : No such file or directory
>>> make: *** [conf.php] Error 1
>>> make: Leaving directory `/root/code/ganglia.trunk/web'
>>>
>>> The first issue could be fixed by the following patch:
>>>
>>> Index: configure.in
>>> ===
>>> --- configure.in  (revision 2290)
>>> +++ configure.in  (working copy)
>>> @@ -341,8 +341,7 @@
>>>  # (insert others here)
>>>  LIB_SUFFIX=lib
>>>  case $host in
>>> -x86_64-redhat-linux*)
>>> -x86_64-suse-linux*)
>>> +x86_64-redhat-linux* | x86_64-suse-linux*)
>>>LIB_SUFFIX=lib64
>>>;;
>>>  esac
>>>
>>> The second issue...  does it have something to do with the old
>>> autotools that I'm using?
>>>
>>> Thanks,
>>>
>>> Bernard
>>>
>>> --
>>> Download Intel® Parallel Studio Eval
>>> Try the new software tools for yourself. Speed compiling, find bugs
>>> proactively, and fine-tune applications for parallel performance.
>>> See why Intel Parallel Studio got high marks during beta.
>>> http://p.sf.net/sfu/intel-sw-dev 
>>> ___
>>> Ganglia-developers mailing list
>>> Ganglia-developers@lists.sourceforge.net 
>>> https://lists.sourceforge.net/lists/listinfo/ganglia-developers 
>>
>>
>>
>>
>> --
>> Download Intel® Parallel Studio Eval
>> Try the new software tools for yourself. Speed compiling, find bugs
>> proactively, and fine-tune applications for parallel performance.
>> See why Intel Parallel Studio got high marks during beta.
>> http://p.sf.net/sfu/intel-sw-dev 
>> ___
>> Ganglia-developers mailing list
>> Ganglia-developers@lists.sourceforge.net 
>> https://lists.sourceforge.net/lists/listinfo/ganglia-developers 
>>




--
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] Problems building trunk r2290

2010-03-03 Thread Brad Nicholes

I am seeing the same thing.  To get past it I just hardcoded the path to the 
sed utility.  I'm guessing that either some platforms or some version of 
libtool isn't setting the SED environment variable but the configure script 
assumes that it is.

Brad

>>> On 3/2/2010 at 6:21 PM, in message
, Bernard Li
 wrote:
> Hi all:
> 
> I am having problems building trunk r2290.
> 
> Specifically I have 2 issues:
> 
> 1) During ./configure
> 
> ./configure: line 20056: syntax error near unexpected token `)'
> ./configure: line 20056: `x86_64-suse-linux*)'
> 
> 2) During make -C web conf.php
> 
> make: Entering directory `/root/code/ganglia.trunk/web'
> ../scripts/fixconfig conf.php.in
> ../scripts/fixconfig: line 60: @SED@: command not found
> ../scripts/fixconfig: line 67: : No such file or directory
> make: *** [conf.php] Error 1
> make: Leaving directory `/root/code/ganglia.trunk/web'
> 
> The first issue could be fixed by the following patch:
> 
> Index: configure.in
> ===
> --- configure.in  (revision 2290)
> +++ configure.in  (working copy)
> @@ -341,8 +341,7 @@
>  # (insert others here)
>  LIB_SUFFIX=lib
>  case $host in
> -x86_64-redhat-linux*)
> -x86_64-suse-linux*)
> +x86_64-redhat-linux* | x86_64-suse-linux*)
>LIB_SUFFIX=lib64
>;;
>  esac
> 
> The second issue...  does it have something to do with the old
> autotools that I'm using?
> 
> Thanks,
> 
> Bernard
> 
> --
> Download Intel® Parallel Studio Eval
> Try the new software tools for yourself. Speed compiling, find bugs
> proactively, and fine-tune applications for parallel performance.
> See why Intel Parallel Studio got high marks during beta.
> http://p.sf.net/sfu/intel-sw-dev 
> ___
> Ganglia-developers mailing list
> Ganglia-developers@lists.sourceforge.net 
> https://lists.sourceforge.net/lists/listinfo/ganglia-developers 




--
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] [Ganglia-general] Ganglia 3.1.7 ready for testing

2010-03-02 Thread Brad Nicholes

>>> On 3/2/2010 at 4:23 AM, in message <4b8cf534.7090...@pocock.com.au>, Daniel
Pocock  wrote:

> Thanks to those who provided feedback - any objections to making 3.1.7
> generally available?  I would like to make it GA within the next 1-2
> days now.
> 
> 

+1


> Michael Perzl wrote:
>> I have successfully compiled and tested 3.1.7 on
>> - AIX 5.1 ML04
>> - AIX 5.3 ML00
>> - AIX 5.3 TL07
>> - AIX 6.1 TL03
>>
>> Regards,
>> Michael
>>
>> On 02/22/2010 12:15 PM, Daniel Pocock wrote:
>>   
>>> Just a reminder - any feedback is welcome, or feel free to discuss 3.1.7
>>> on IRC
>>>
>>> It would be good to have positive confirmation of which platforms this
>>> has been tested on, so far, I have tested
>>> - Debian lenny,
>>> - RHEL3/4/5,
>>> - CentOS 5,
>>>   - Solaris 8 and
>>> - Cygwin.
>>>
>>> and Brad has done some testing on SLES10
>>>
>>> Regards,
>>>
>>> Daniel
>>>
>>> Daniel Pocock wrote:
>>>
>>> 
 I've tagged 3.1.7 and built a tarball:

  http://ganglia.info/testing/ganglia-3.1.7.tar.gz 

 The md5sum for 3.1.7 is: 6aa5e2109c2cc8007a6def0799cf1b4c

 Since 3.1.6, only two things have changed and may need to be tested
 again by those who tested 3.1.6:
   - the build system (support for commas in CFLAGS)
   - the multicpu module - percentages reported differently

 This is not confirmation that the release is in GA status - a further
 notification will be sent when the testing period has elapsed without
 any serious defect.  Users are invited to test the tarball and submit
 feedback.

 Please do not commit on branches/monitor-core-3.1 until after 3.1.7
 goes GA, in case further tweaks are needed to facilitate a successful
 release.

 Below are the release notes from the STATUS file.  Other documentation
 has also changed since 3.1.2 and should be reviewed:

 GANGLIA 3.1 STATUS:   -*-text-*-
 Last modified at [$Date: 2010-02-17 11:01:08 + (Wed, 17 Feb 2010) $]

 The current version of this file can be found at:

*
 
> http://ganglia.svn.sourceforge.net/svnroot/ganglia/branches/monitor-core-3.1/ST
>  
> ATUS

 Release history:

  3.1.7 : Tagged: Feb 17, 2010
  3.1.6 : Tagged: Feb  4, 2010 (not released for GA)
  3.1.5(hargrave)   : Tagged: Nov 24, 2009 (not released for GA)
  3.1.4(hargrave)   : Tagged: Oct 26, 2009 (not released for GA)
  3.1.3(avenger): Tagged: Sep 19, 2009 (not released for GA)
  3.1.2(langley): Released: Feb 17, 2009
  3.1.1(wien)   : Released: Sep 10, 2008
  3.1.0(amelia) : Released: Jul 30, 2008

 Contributors looking for a mission:

* Just do an egrep on "TODO", "XXX" or "FIXME" in the source.
* Review the bug database at: http://bugzilla.ganglia.info/ 
* Open bugs in the bug database.
* Implement a feature from the wishlist at:
 http://sourceforge.net/apps/trac/ganglia/wiki/ganglia_wish-list 

 CURRENT RELEASE NOTES:
(Please update this area with a brief description of bug fixes and
 enhancements that have been backported for the current release)

Note: 3.1.3, 3.1.4, 3.1.5 and 3.1.6 never became GA, therefore,
the release notes for all of them are combined below.

3.1.7:

* Fix build support for RHEL5/issue with commas in CFLAGS
* multicpu module: show CPU utilization as a value between 0-100% for
  each core

3.1.6:

* Merge commit 1966 from trunk to fix "contrib/removespikes.pl"
* Bootstrapping with Debian 5.0 (lenny) versions of autotools for
  this and future releases.

 
> http://www.mail-archive.com/ganglia-developers@lists.sourceforge.net/msg05352.h
>  
> tml

 
> http://www.mail-archive.com/ganglia-gene...@lists.sourceforge.net/msg04688.html
>  
> 
* Require user to explicitly specify sysconfdir when building from
 source,
  due to the fact that the old behavior was not consistent with the
  documented behavior.
* Configuration files and scripts are now created during the install
 phase
  rather than during configure.   This allows values such as
 @sysconfdir@
  to be used in the template configuration files.
* Abolish the use of release names - only release numbers will be used
  to distinguish versions in future
* libmetrics: workaround system header conflict in DFBSD>= 2.4 (BUG245)
* Use PCRE regex matching to configure metrics using the name_match
 directive
* rrdcached support
* gmetad now uses apr and the sleep intervals between polls are
 randomized
  in a way that supports shorter polling intervals
* FreeBSD support: fixes for crashes and disk statistics (BUG153)

Re: [Ganglia-developers] Ganglia 3.1.7 ready for testing

2010-02-17 Thread Brad Nicholes

Gmond, So far so good on SUSE Linux 10.  Built, installed, gathering metrics 
without a problem :)  Still need to install Gmetad on my server.

Brad

>>> On 2/17/2010 at 4:31 AM, in message <4b7bd38c.3010...@pocock.com.au>, Daniel
Pocock  wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> 
> I've tagged 3.1.7 and built a tarball:
> 
> http://ganglia.info/testing/ganglia-3.1.7.tar.gz 
> 
> The md5sum for 3.1.7 is: 6aa5e2109c2cc8007a6def0799cf1b4c
> 
> Since 3.1.6, only two things have changed and may need to be tested
> again by those who tested 3.1.6:
>  - the build system (support for commas in CFLAGS)
>  - the multicpu module - percentages reported differently
> 
> This is not confirmation that the release is in GA status - a further
> notification will be sent when the testing period has elapsed without
> any serious defect.  Users are invited to test the tarball and submit
> feedback.
> 
> Please do not commit on branches/monitor-core-3.1 until after 3.1.7
> goes GA, in case further tweaks are needed to facilitate a successful
> release.
> 
> Below are the release notes from the STATUS file.  Other documentation
> has also changed since 3.1.2 and should be reviewed:
> 
> GANGLIA 3.1 STATUS:   -*-text-*-
> Last modified at [$Date: 2010-02-17 11:01:08 + (Wed, 17 Feb 2010) $]
> 
> The current version of this file can be found at:
> 
>   *
> http://ganglia.svn.sourceforge.net/svnroot/ganglia/branches/monitor-core-3.1/S
>  
> TATUS
> 
> Release history:
> 
> 3.1.7 : Tagged: Feb 17, 2010
> 3.1.6 : Tagged: Feb  4, 2010 (not released for GA)
> 3.1.5(hargrave)   : Tagged: Nov 24, 2009 (not released for GA)
> 3.1.4(hargrave)   : Tagged: Oct 26, 2009 (not released for GA)
> 3.1.3(avenger): Tagged: Sep 19, 2009 (not released for GA)
> 3.1.2(langley): Released: Feb 17, 2009
> 3.1.1(wien)   : Released: Sep 10, 2008
> 3.1.0(amelia) : Released: Jul 30, 2008
> 
> Contributors looking for a mission:
> 
>   * Just do an egrep on "TODO", "XXX" or "FIXME" in the source.
>   * Review the bug database at: http://bugzilla.ganglia.info/ 
>   * Open bugs in the bug database.
>   * Implement a feature from the wishlist at:
> http://sourceforge.net/apps/trac/ganglia/wiki/ganglia_wish-list 
> 
> CURRENT RELEASE NOTES:
>   (Please update this area with a brief description of bug fixes and
>enhancements that have been backported for the current release)
> 
>   Note: 3.1.3, 3.1.4, 3.1.5 and 3.1.6 never became GA, therefore,
>   the release notes for all of them are combined below.
> 
>   3.1.7:
> 
>   * Fix build support for RHEL5/issue with commas in CFLAGS
>   * multicpu module: show CPU utilization as a value between 0-100% for
> each core
> 
>   3.1.6:
> 
>   * Merge commit 1966 from trunk to fix "contrib/removespikes.pl"
>   * Bootstrapping with Debian 5.0 (lenny) versions of autotools for
> this and future releases.
>
> http://www.mail-archive.com/ganglia-developers@lists.sourceforge.net/msg05352.
> html
>
> http://www.mail-archive.com/ganglia-gene...@lists.sourceforge.net/msg04688.htm
>  
> l
>   * Require user to explicitly specify sysconfdir when building from
> source,
> due to the fact that the old behavior was not consistent with the
> documented behavior.
>   * Configuration files and scripts are now created during the install
> phase
> rather than during configure.   This allows values such as
> @sysconfdir@
> to be used in the template configuration files.
>   * Abolish the use of release names - only release numbers will be used
> to distinguish versions in future
>   * libmetrics: workaround system header conflict in DFBSD >= 2.4 (BUG245)
>   * Use PCRE regex matching to configure metrics using the name_match
> directive
>   * rrdcached support
>   * gmetad now uses apr and the sleep intervals between polls are
> randomized
> in a way that supports shorter polling intervals
>   * FreeBSD support: fixes for crashes and disk statistics (BUG153)
>   * Further tweaks to Solaris build support (remove C99 hack)
>   * Eliminate conflict with ncpus symbol name on older Solaris
>   * AIX support: determine if the host is a virtual server (BUG226)
>   * AIX support: setting linker flags (BUG227), add -lm
>   * AIX support: tweaks for AIX >= v6.1
>   * AIX support: revised init scripts for gmond and gmetad
>   * Check for Python.h explicitly
>   * Include the necessary Python files in the distribution tarball,
> regardless
> of how BUILD_PYTHON is set (r2215).
>   * Remove references to GNU toolchain in documentation
>   * Fortify write_data_to_rrd against overflows
>   * Web interface: minor formatting changes
>   * mcast_if implementation tweaked so that the send channel will be bound
> to the IP of the outgoing interface
>   * Documentation updates relating to the options for multihomed hosts,
> particularly bind, bind_hostname and mcast_if

Re: [Ganglia-developers] multicpu module: r2116 and other issues

2010-02-16 Thread Brad Nicholes

I'm not sure that this is the kind of response that you are looking for on this 
issue, but I would tend to agree with your alternative options below.  I don't 
think that it is mandatory for every module to work on every platform.  One of 
the main purposes for modules is to make it easier for metrics to be added or 
removed from the overall gathering system.  With Ganglia 3.0.x and below, this 
was not an option so every built-in metric was required to work on every 
supported platform.  

With the modular concept, I don't believe that requiring every module to also 
work on every platform is necessary any more.  What it does mean is that every 
module will work on every platform that has a developer interested in porting 
the module to a new platform.  If there is nobody willing to step up to the 
plate or no real need for a particular metric on a certain platform, then why 
require the effort out of the few developers that we have?  Where we know that 
some modules don't work on some platforms, it should be a simple matter of 
documenting that and also noting that we could use the help to extend the 
module.

Brad

>>> On 2/16/2010 at 8:03 AM, in message <4b7ab3ca.4090...@pocock.com.au>, Daniel
Pocock  wrote:

> Some further multicpu comments, I've been looking at this discussion
> about `Irix' mode:
> 
> http://www.mail-archive.com/ganglia-gene...@lists.sourceforge.net/msg04567.htm
>  
> l
> 
> and I feel that there may be some confusion about Irix mode and Solaris
> mode in top.
> 
> The top man page says that Irix mode and Solaris mode should only impact
> the display of per-task CPU stats.  It doesn't appear to say that these
> modes should impact the per-core stats.
> 
> I also notice Carlo's patch on trunk (r2116) appears to be an attempt to
> address this issue, although Carlo has mentioned more work is required.
> 
> Can anyone else make any comment on this specific issue, what else they
> expect from multicpu, or what flaws are outstanding?
> 
> 
> Daniel Pocock wrote:
>> I've been contemplating the multicpu module, which currently only works 
>> on Linux and Cygwin.
>>
>> Carlo has indicated that promoting it's use (as a consequence of the 
>> PCRE patch) may not be ideal for two reasons:
>>
>> a) bugs on the supported platforms (Linux and Cygwin)
>>
>> b) not functional on other platforms (e.g. Solaris) where it gives no 
>> meaningful error if a user tries to load it
>>
>> For the Solaris platform, I was considering the idea of a generic kstat 
>> module.  It would generate thousands of metric names (gmond -m output), 
>> but CPU metrics could then be selectively enabled with the PCRE 
>> support.  So a dedicated multicpu module for Solaris may not be needed.
>>
>> I don't think it is necessary for every module to run on every platform 
>> - maybe this one just shouldn't be compiled at all except on Linux and 
>> Cygwin.
>>
>> Maybe it is also possible to consider some other options:
>>
>> a) mark some modules as experimental/beta, and have a single configure 
>> option for enabling all experimental modules, a separate package for 
>> them, etc
>>
>> b) split the development of some modules from the monitor-core-3.1 
>> branch so that they don't hold back releases
>>
>>
>>
>> --
>> This SF.Net email is sponsored by the Verizon Developer Community
>> Take advantage of Verizon's best-in-class app development support
>> A streamlined, 14 day to market process makes app distribution fast and easy
>> Join now and get one step closer to millions of Verizon customers
>> http://p.sf.net/sfu/verizon-dev2dev 
>> ___
>> Ganglia-developers mailing list
>> Ganglia-developers@lists.sourceforge.net 
>> https://lists.sourceforge.net/lists/listinfo/ganglia-developers 
>>   
> 
> 
> --
> SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
> Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
> http://p.sf.net/sfu/solaris-dev2dev 
> ___
> Ganglia-developers mailing list
> Ganglia-developers@lists.sourceforge.net 
> https://lists.sourceforge.net/lists/listinfo/ganglia-developers 



--
SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
http://p.sf.net/sfu/solaris-dev2dev
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] versioning confusion

2010-02-08 Thread Brad Nicholes

>>> On 2/8/2010 at 1:39 PM, in message <20100208203937.ga22...@gentoo.org>, 
>>> Justin
Bronder  wrote:
> On 08/02/10 20:04 +, Daniel Pocock wrote:
>> 
>> >>   So, why not put the "rc" or "pre" Tag into an GANGLIA_EXTRA_VERSION and 
>> >> embed 
>> >> that into the code. That way there would be no confusion about what is in 
>> >> the tarball. Then we could have as many testing releases before the final 
>> >> one. SVN tags are cheap. What am I missing? I mean, now we are confuing 
>> >> people with skipped "releases".
>> >>
>> >> 
>> >
>> > Basically for the reasons that I mentioned above.  Agreed that SVN tags 
>> > are 
> cheap but the major reasons are to reduce the number of publically available 
> tarballs and to make sure that the release process itself does not allow for 
> problems to creep into the code.  By releasing exactly what we are testing, 
> it reduces the number of steps in the testing and release process and at the 
> same time ensures that an officially released tarball is exactly the same 
> tarball that was tested and approved by the community during the testing 
> period.  Also remember that we haven't ever skipped a "release".  We have 
> only skipped revision numbers.  The Ganglia web site and the sourceforge 
> project site are still the definitive authority on what our current release 
> is.  By simply checking those sites, there should be no question or confusion 
> on what our current release is.  It would be a big mistake for someone to 
> pull a tarball from the testing download area and deploy that into their 
> production
>>   e
>> >  nvironment.  Like every other project, the only official download area, 
>> > as 
> far as the Ganglia project is concerned, is the sourceforge down web page and 
> currently the latest release available on that site is 3.1.2.  If, hopefully 
> in a few weeks, we release 3.1.6 or whatever the final revision number is, 
> that will become the official Ganglia release and it really doesn't matter 
> what happened to any of the previous revisions.
>> >
>> >   
>> 
>> 
>> I have tested on several platforms, and for 3.1.6, I provided snapshots
>> every few days for other people to do testing, but one issue slipped
>> through the cracks, so 3.1.7 will be released imminently to fix that. 
>> Maybe there needs to be a sign-off process, e.g. a RHEL user, a Solaris
>> user, etc who must test the final snapshot before a tag is done, and
>> maybe we should do that before 3.1.7 is tagged.
>> 
>> I agree with Brad's point about releasing the tarball that has actually
>> been tested.  If we went through the process of signing-off the
>> snapshot, then the process would need to be repeated for the tag too.
>> 
>> There is another factor as well: I have been quite aggressive about
>> fixing bugs and backporting minor functionality improvements.  This was
>> done between 3.1.2 and 3.1.3.  There was then another whole bunch of
>> stuff done between 3.1.5 and 3.1.6.  At this stage, the intention is to
>> make the minimum possible changes to provide a usable release (hopefully
>> 3.1.7), and then some more pro-active bug fixing will resume again.
> 
> 
> I think Martin's point is being missed here.  Speaking as just a distro
> maintainer, the use of rc and pre tags do provide some significant benefits.
> Consider the following simplified workflow:
> 
> - Development happens in branches/3.1.1.
> - When a release is being considered, the version is updated to report
>   3.1.1_rc1 and a copy of the branch is created in tags/3.1.1_rc1.
> - Should bugs be found in rc1, development is again done in branches/3.1.1
>   and when complete, tags/3.1.1_rc2 is created.
> - And so on, until an rc is declared to be stable.  Then only the version is
>   updated and tags/3.1.1 is created.
> 
> If any changes that need to be backported are found in the above process,
> they are committed to branches/3.1.0, and the same tagging rc process is
> used.
> 
> While a number of extra svn 'branches' are created during this process, you
> can now create the multiple tarballs with no confusion as to from which svn
> revision the originated from.  Backporting from branch to branch should also
> be fairly simple.
> 
> This process has one major advantage.  Each rc tarball can be packaged and
> released by the distros for wider testing.  As it stands now, that cannot be
> done.  For instance in Gentoo, I'd have no problem pushing rc's to overlays
> that are marked for testing newer releases, but this requires the tarball 
> and
> version to correctly change if what is packaged changes.
> 
> 
> If the above came of as preachy, I apologize.  Ganglia is a great product 
> and
> this is just my suggestion.

Thanks Justin,  really the one thing that you suggested and that we aren't 
doing is actually creating the tag in SVN.  What we have been doing is whenever 
we create snapshots we add a fourth revision number which happens to match the 
SVN revision that the tarball was created from.  T

Re: [Ganglia-developers] versioning confusion

2010-02-05 Thread Brad Nicholes

>>> On 2/5/2010 at 3:58 AM, in message
<815639.84803...@web113311.mail.gq1.yahoo.com>, Martin Knoblauch
 wrote:
> - Original Message ----
> 
>> From: Brad Nicholes 
>> To: Martin Knoblauch ; Ramon Bastiaans 
> 
>> Cc: "ganglia-developers@lists.sourceforge.net" 
> 
>> Sent: Thu, February 4, 2010 4:33:31 PM
>> Subject: Re: [Ganglia-developers] versioning confusion
>> 
>> >>> On 2/4/2010 at 6:50 AM, in message <4b6ad096.8030...@sara.nl>, Ramon 
>> Bastiaans
>> wrote:
>> > Ahh, I see.
>> > 
>> > On 02/04/2010 12:11 PM, Martin Knoblauch wrote:
>> >>
>> 
>> If we were to make release candidates publically available with a release 
> number 
>> other than major.minor.revision  (for example 3.1.3rc1), we would also be 
>> required to put this same release number in the source code itself to ensure 
> 
>> that there is a differentiation between a release candidate and the official
>> release since both would be made public (one during the testing period and 
> the 
>> other being an official release).  In order to transition the release 
> candidate, 
>> in this case to an official release, we would be required to explode the 
>> tarball, change the version number, retag SVN with the changed file and 
> revision 
>> number, re-boot strap the source code, recreate the tarball and then finally 
>> make the new tarball publically available under the final release number.  
> All 
>> of this leaves the final tarball open to potential problems.  It just makes 
> more 
>> sense from a testing and release prospective to release the tarball in the 
> exact 
>> condition as it was tested.  This leaves no possibility for errors or 
> problems 
>> creeping into the final released tarball.
> 
>   So, why not put the "rc" or "pre" Tag into an GANGLIA_EXTRA_VERSION and 
> embed 
> that into the code. That way there would be no confusion about what is in 
> the tarball. Then we could have as many testing releases before the final 
> one. SVN tags are cheap. What am I missing? I mean, now we are confuing 
> people with skipped "releases".
> 

Basically for the reasons that I mentioned above.  Agreed that SVN tags are 
cheap but the major reasons are to reduce the number of publically available 
tarballs and to make sure that the release process itself does not allow for 
problems to creep into the code.  By releasing exactly what we are testing, it 
reduces the number of steps in the testing and release process and at the same 
time ensures that an officially released tarball is exactly the same tarball 
that was tested and approved by the community during the testing period.  Also 
remember that we haven't ever skipped a "release".  We have only skipped 
revision numbers.  The Ganglia web site and the sourceforge project site are 
still the definitive authority on what our current release is.  By simply 
checking those sites, there should be no question or confusion on what our 
current release is.  It would be a big mistake for someone to pull a tarball 
from the testing download area and deploy that into their production 
environment.  Like every other project, the only official download area, as far 
as the Ganglia project is concerned, is the sourceforge down web page and 
currently the latest release available on that site is 3.1.2.  If, hopefully in 
a few weeks, we release 3.1.6 or whatever the final revision number is, that 
will become the official Ganglia release and it really doesn't matter what 
happened to any of the previous revisions.

>  > 
>> Another option would be to tag and tar the source code under the final 
> release 
>> version number and make it available for testing.  Then if bugs are found 
> during 
>> testing, fix the bugs, retag and retar under the same version number.  The 
>> problem with this is that we could end up with multiple different tarballs 
> all 
>> with the same version number publically available.  The only way to tell 
> which 
>> one was the real release would be by the date on the tarball rather than 
> version 
>> number.
>> 
> 
>  much to convoluted and confusing. Agreed.
> 
>> Anyway, you can read more about this process on the Ganglia wiki page at 
>> http://sourceforge.net/apps/trac/ganglia/wiki/how_project_works   This 
> release 
>> process was basically patterned after the way that the Apache httpd project 
>> produces testing and official tarballs.
>> 
> 
>  As I said in the past, that process may work for Apache. I do not see many 
> skipped releases there. Maybe they have a more strict project management.
> 

It is

Re: [Ganglia-developers] versioning confusion

2010-02-04 Thread Brad Nicholes

>>> On 2/4/2010 at 8:42 AM, in message <4b6aeb00.1070...@pocock.com.au>, Daniel
Pocock  wrote:

>> available.  The only way to tell which one was the real release would be by 
> the date on the tarball rather than version number.
>>   
> Not quite - we could digitally sign the release tarball at the point
> where it is confirmed to be stable.  People could make sure they had a
> stable release by comparing the md5sum, something that they should do
> anyway.
> 

True, but I think that the version number is still the primary differentiator. 
Once the tarball has been exploded and deployed, we are back to file dates 
again.

Brad


--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] Ganglia SLED and SLES testing...

2010-02-04 Thread Brad Nicholes

>>> On 2/4/2010 at 12:49 AM, in message <4b6a7c0c.8070...@pocock.com.au>, Daniel
Pocock  wrote:

>>> 
 One other problem.  After doing a configure and make I tried to do a make 
   
>>> distdir just to get the built files in the web directory.  In the resulting 
>>> dist directory, nothing was built in the web directory.  conf.php.in was 
>>> there but no conf.php etc.   Is there something different I need to do now 
> to 
>>> get all of the .in files resolved in the web directory?
>>> 
   
   
>>> Try
>>>
>>> make -C web conf.php version.php
>>>
>>> You'll notice that I've put something like that in the spec file
>>> 
>>
>> OK, that worked but it begs the question, shouldn't this just happen on a 
> "make", "make install" or "make dist*" rather than it being a separate 
> command?  I'm sure there must be a good reason for it, just curious.  Is this 
> documented somewhere other than .spec file?  If I wanted to just pull the 
> tarball and build it myself, I shouldn't have to be guessing at the build 
> steps other than configure/make/make install.  
>>
>>   
> `make install' has never actually installed the web files anywhere
> 
> What has changed though is that conf.php and version.php are only
> generated at the last moment rather than at the configure stage.
> 
> However, I can probably tweak this a little more to ensure they are
> generated for you, but you still have to copy them to where you want them.
> 
> 

It would be nice to end up with a deployable web directory after a make dist or 
make distdir (or any of the other make dist* targets).  Also one other issue 
that I ran into which I think is probably not a regression but it is annoying.  
If you explode the tarball then configure/make/make dist*, the make dist* will 
fail looking for ../scripts/svn2cl.sh.  I think that this script is stripped 
out when the final tarball is created.  Even if it were there, since the 
exploded tarball is not under source control, the script would fail anyway.

Sorry for the late testing reports,
Brad


--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] versioning confusion

2010-02-04 Thread Brad Nicholes

>>> On 2/4/2010 at 6:50 AM, in message <4b6ad096.8030...@sara.nl>, Ramon 
>>> Bastiaans
 wrote:
> Ahh, I see.
> 
> On 02/04/2010 12:11 PM, Martin Knoblauch wrote:
>>
>>   3.1.3 .. 3.1.5 were canned during testing. Apparently our process does not 
> allow for fixing bugs/regressions between tagging and final release, so it 
> was decided to never publish the intermediates.
>>
>>
> Perhaps in stead of tagging the "public beta" releases as a final 
> version, they could be tagged as "release candidate". I.e. call it 
> 3.1.6rc1 or 3.1.6pre1 or something similar.
> 
>>   One of the reasons might be lack of good beta testing (which I am guilty 
> of myself :-(, but I do not really understand, why we couldn't just keep 
> 3.1.3 
> as the name of the release.
>>
>>
> The public beta's are a good way to counter that, but it seems a bit 
> silly to me to skip entire version levels just because of release 
> procedures.
> 

The reason for skipping revision numbers is to make sure that we don't end up 
with confusion about a version in relation to what has already been released (I 
know, that statement in itself seems confusing but let me explain :).  Each 
testing candidate is tagged with a release number and the tarball is built as 
if it were an official release.  The tarball is then made available on the 
Ganglia site for testing.  If the testing proves that the release candidate is 
valid, then the tarball is simply copied to the official release download site 
and becomes the official release.  The status file and web page are also 
updated to reflect the release and corresponding release number.  Under this 
process nothing had to be done to the actual physical tarball in order to 
transition it from a release candidate to an official release.  BTW, the 
ganglia web site is correct, the current release is 3.1.2 and we are preparing 
3.1.6 to go to testing.

If we were to make release candidates publically available with a release 
number other than major.minor.revision  (for example 3.1.3rc1), we would also 
be required to put this same release number in the source code itself to ensure 
that there is a differentiation between a release candidate and the official 
release since both would be made public (one during the testing period and the 
other being an official release).  In order to transition the release 
candidate, in this case to an official release, we would be required to explode 
the tarball, change the version number, retag SVN with the changed file and 
revision number, re-boot strap the source code, recreate the tarball and then 
finally make the new tarball publically available under the final release 
number.  All of this leaves the final tarball open to potential problems.  It 
just makes more sense from a testing and release prospective to release the 
tarball in the exact condition as it was tested.  This leaves no possibility 
for errors or problems creeping into the final released tarball.

Another option would be to tag and tar the source code under the final release 
version number and make it available for testing.  Then if bugs are found 
during testing, fix the bugs, retag and retar under the same version number.  
The problem with this is that we could end up with multiple different tarballs 
all with the same version number publically available.  The only way to tell 
which one was the real release would be by the date on the tarball rather than 
version number.

Anyway, you can read more about this process on the Ganglia wiki page at 
http://sourceforge.net/apps/trac/ganglia/wiki/how_project_works   This release 
process was basically patterned after the way that the Apache httpd project 
produces testing and official tarballs.

Brad

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] Ganglia SLED and SLES testing...

2010-02-03 Thread Brad Nicholes

>>> On 2/3/2010 at 3:41 PM, in message <4b69fbb1.6090...@pocock.com.au>, Daniel
Pocock  wrote:
> Brad Nicholes wrote:
>>>>> On 2/3/2010 at 03:06 PM, in message <4b69f36a.40...@pocock.com.au>, Daniel
>>>>> 
>> Pocock  wrote: 
>>
>>   
>>>> I have tried a quick test of your latest snap shot and so far the only 
>>>> thing 
> 
>>>>   
>>> that I am seeing is that the include path to the conf.d directory in the 
>>> gmond.conf file is not getting set correctly.  It is still pointing to 
>>> ./conf.d/*.conf rather than the value that was passed in with --sysconfdir. 
>>>  
>>> I'm not sure if this is a regression or not, but it is a problem.  That is 
> as 
>>> far as I have tested so far.
>>> 
>>>>   
>>>>   
>>> Some things to check:
>>>
>>> In your source tree, what is in  lib/default_conf.h ?
>>>
>>> 
>>
>> include ('" SYSCONFDIR "/conf.d/*.conf')\n
>>
>>   
>>> Does gmond/gmond.conf exist in the source tree?  If so, it is used
>>> instead of auto-generating
>>>
>>> 
>>
>> No, after doing a make install, gmond.conf did not exist anywhere.  I 
> created it using ./gmond -t > gmond.conf  command
>>
>>   
> 
>>> What is the output of gmond -t?
>>>
>>> 
>>
>> The result I got with the invalid gmond.conf file was with the above 
> command.  But I just did it again and came up with a correct gmond.conf.  So 
> I am thing user error right now.
>>
>>   
> Ok, please let me know if you can reproduce the problem
>>> Remember, the spec file also tries to copy /etc/gmond.conf to
>>> /etc/ganglia - you didn't have an old /etc/gmond.conf on the box?
>>> 
>>
>> I was actually trying to install into a test area rather than /etc/.  That 
> might be why I didn't see a gmond.conf after the build if /etc/gmond.conf is 
> essentially hard coded.
>>   
> It is only hard coded in the spec file, not in any binary
> 
> The spec file looks for /etc/gmond.conf and moves it to
> /etc/ganglia/gmond.conf
> 
> rpm would also leave an existing gmond.conf intact
> 
> You only get the fresh gmond.conf if none of (/etc/gmond.conf,
> /etc/ganglia/gmond.conf) exists
> 
>> One other problem.  After doing a configure and make I tried to do a make 
> distdir just to get the built files in the web directory.  In the resulting 
> dist directory, nothing was built in the web directory.  conf.php.in was 
> there but no conf.php etc.   Is there something different I need to do now to 
> get all of the .in files resolved in the web directory?
>>
>>   
> Try
> 
> make -C web conf.php version.php
> 
> You'll notice that I've put something like that in the spec file

OK, that worked but it begs the question, shouldn't this just happen on a 
"make", "make install" or "make dist*" rather than it being a separate command? 
 I'm sure there must be a good reason for it, just curious.  Is this documented 
somewhere other than .spec file?  If I wanted to just pull the tarball and 
build it myself, I shouldn't have to be guessing at the build steps other than 
configure/make/make install.  

Brad 


--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] Ganglia SLED and SLES testing...

2010-02-03 Thread Brad Nicholes

>>> On 2/3/2010 at 03:06 PM, in message <4b69f36a.40...@pocock.com.au>, Daniel
Pocock  wrote: 

>> I have tried a quick test of your latest snap shot and so far the only thing 
> that I am seeing is that the include path to the conf.d directory in the 
> gmond.conf file is not getting set correctly.  It is still pointing to 
> ./conf.d/*.conf rather than the value that was passed in with --sysconfdir.  
> I'm not sure if this is a regression or not, but it is a problem.  That is as 
> far as I have tested so far.
>>
>>   
> 
> 
> Some things to check:
> 
> In your source tree, what is in  lib/default_conf.h ?
> 

include ('" SYSCONFDIR "/conf.d/*.conf')\n

> Does gmond/gmond.conf exist in the source tree?  If so, it is used
> instead of auto-generating
> 

No, after doing a make install, gmond.conf did not exist anywhere.  I created 
it using ./gmond -t > gmond.conf  command

> What is the output of gmond -t?
> 

The result I got with the invalid gmond.conf file was with the above command.  
But I just did it again and came up with a correct gmond.conf.  So I am thing 
user error right now.

> Remember, the spec file also tries to copy /etc/gmond.conf to
> /etc/ganglia - you didn't have an old /etc/gmond.conf on the box?

I was actually trying to install into a test area rather than /etc/.  That 
might be why I didn't see a gmond.conf after the build if /etc/gmond.conf is 
essentially hard coded.

One other problem.  After doing a configure and make I tried to do a make 
distdir just to get the built files in the web directory.  In the resulting 
dist directory, nothing was built in the web directory.  conf.php.in was there 
but no conf.php etc.   Is there something different I need to do now to get all 
of the .in files resolved in the web directory?

Brad 

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] Policy on updating files in 3.1.x/contrib

2010-02-03 Thread Brad Nicholes

>>> On 2/3/2010 at 5:04 AM, in message <4b696663.6010...@pocock.com.au>, Daniel
Pocock  wrote:

>>  what is the policy for updating files in the "contrib" directory of 3.0.x 
> and 3.1.x? Do I need to do the backport approval dance (*)? Or can I just go 
> ahead. The "removespikes.pl" file needs an update in the 3.1.x branch.
>>
>>   
> Any updates to 3.1 require co-ordination from the release manager 
> (myself) when a release is imminent (as it is now).  Generally, let me 
> know the commit number(s) on trunk and then I will let you know if you 
> can backport it on 3.1.6 or wait for 3.1.7.  According to the policies, 
> the release manager has the final say, but I am open to consider anyone 
> who has an opinion for/against a particular patch.
> 

Do we include the contrib directory with the release?  I didn't think we were, 
but even if we do, the contrib directory is not under the same rules as the 
standard release.  AFAIUI, the contrib directory is basically a "use at your 
own risk" kind of thing.  They are user contributions that the Ganglia project 
does not maintain or support.  We documented that in the README.contrib file 
located in the directory.  You should still commit any changes into trunk first 
and then backport to the 3.1.x branch just to make sure that the two areas are 
in sync.  But other than that, there are no other guidelines.  

Just as a side note, now that Groundwork Opensource is maintaining their 
monitorforge site, we have been encouraging people to post their contributions 
there.  One main reason for that is so that the contributor themselves can have 
complete control over the contribution and any updates rather than having to 
rely on a Ganglia developer to commit changes on the contributor's behalf.  

Brad

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] tcpconn.py issues

2010-02-02 Thread Brad Nicholes

>>> On 2/2/2010 at 10:28 AM, in message <4b6860c0.9070...@pocock.com.au>, Daniel
Pocock  wrote:

>> If tcpconn is functioning normally after the initial startup, then that 
> basically answers the questions.  It appears that at least on CentOS/RHEL5 
> python is not yielding after calling start() and therefore not allowing the 
> threading module to call the threads run() method.  The result is that by not 
> yielding, it is opening the window wider and allowing multiple calls to the 
> start() function.  Adding a try...catch block around the start() call and 
> ignoring the exceptions will probably fix the problem.  I would call this a 
> showstopper or a regression for the same reason that you stated.  It is more 
> of a cosmetic annoyance which can be fixed.  Nothing about this issue 
> actually prevents tcpconn from functioning normally.  I can add the 
> try...catch block and check that code in or you can just tag and we fix it 
> next time.
>>
>>   
> 
> Can you please try this on trunk, and we will aim to deliver the fix 
> with 3.1.7.  To get something tagged before Friday, I would prefer not 
> to include any last minute changes unless we find an issue that is a 
> serious regression or showstopper.

Checked into trunk r2264


--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] tcpconn.py issues

2010-02-02 Thread Brad Nicholes

>>> On 2/2/2010 at 9:57 AM, in message <4b68598e.6050...@pocock.com.au>, Daniel
Pocock  wrote:
> Brad Nicholes wrote:
>>>>> On 2/2/2010 at 6:23 AM, in message <4b682769.6000...@pocock.com.au>, 
>>>>> Daniel
>>>>> 
>> Pocock  wrote:
>>
>>   
>>> I've just been testing r2258 on CentOS 5.  rpmbuild runs successfully 
>>> and the packages install and run.
>>>
>>> However, I notice that some of the tcpconn metrics are failing.  
>>> tcpconn.py doesn't appear to have changed since r1658 (August 2008).  It 
>>> is the only python module that is loaded by default.
>>>
>>> The commit mentions moving the netstat thread start - are you able to 
>>> have a look at this Brad?
>>>
>>> You can get my tarball from http://www.pocock.com.au/ganglia/test if you 
>>> need to.  It is bootstrapped on Debian 5.
>>>
>>>
>>> metric 'tcp_established' being collected now
>>> metric 'tcp_established' has value_threshold 1.00
>>> metric 'tcp_listen' being collected now
>>> [PYTHON] Can't call the metric handler function for [tcp_listen] in the 
>>> python module [tcpconn].
>>>
>>> Traceback (most recent call last):
>>>   File "/usr/lib/ganglia/python_modules/tcpconn.py", line 67, in 
>>> TCP_Connections
>>> _WorkerThread.start()
>>>   File "/usr/lib/python2.4/threading.py", line 410, in start
>>> assert not self.__started, "thread already started"
>>> AssertionError: thread already started
>>> metric 'tcp_listen' has value_threshold 1.00
>>> metric 'tcp_timewait' being collected now
>>> [PYTHON] Can't call the metric handler function for [tcp_timewait] in 
>>> the python module [tcpconn].
>>>
>>> 
>>
>> I can't reproduce the problem so all I can do is take a guess at what might 
> be happening and leave it to somebody who is seeing the issue to verify what 
> is happening.  The exception that you are seeing is a result of a thread 
> trying to be started multiple times.  There is an if statement in 
> TCP_connections() that is suppose to prevent this from happening.  This if 
> statement checks two thread variables that should indicate what state the 
> thread is in.  The running thread variable is set to false during thread 
> initialization and is set to true as soon as the threads run method is 
> called.  The run method is of the thread is called as a result of calling the 
> start() method on the thread object.  Each time that one of the tcpconn 
> metrcs is gathered, the metric callback hits the thread start if statement.  
> If the run thread variable is set to true, then no other metric invocation 
> should be allowed to start the thread again.  
>>
>>   
> When you say you can't reproduce the problem, are you trying on a 
> CentOS5/RHEL5 box, or something different?
> 

Something different.  All I have available is SLES and SLED boxes.  I don't 
have access to CentOS or RHEL5.


>> There is a very small window where, on initial startup, two metric callbacks 
> could get past the if statement in TCP_connections() and try to start the 
> thread a second time.  The windows would be caused by a delay between the 
> time that the start() method is called and when the threading module finally 
> calls the threads run() method.  We could add a try...catch block around the 
> start() call to catch and ignore the exception if the thread is started a 
> second time.  But the part that bothers me is that in the list of exceptions, 
> the thread was obviously attempted more than just a second time.  
>>
>> So my questions are, is the thread really running when the second or more 
> attempts are made?  Is the thread bailing out somewhere before the "running" 
> thread variable is set?  If we added the try...catch block and ignored the 
> thread, does this leave the thread running and in a functional state?  
> Without being able to reproduce the problem, I can't really answer these 
> questions.
>>
>>   

> I don't know exactly how to check those things
> 
> What I can see is that the errors only appear when the daemon starts 
> (maybe the first time it collects each metric).  After that, the values 
> are transmitted.  Can you give any examples of how to debug this for 
> someone who is not a Python expert?
> 
> Do you think this is a showstopper for 3.1.6?  I don't believe it can be 
> a regression on this release o

Re: [Ganglia-developers] tcpconn.py issues

2010-02-02 Thread Brad Nicholes

>>> On 2/2/2010 at 6:23 AM, in message <4b682769.6000...@pocock.com.au>, Daniel
Pocock  wrote:

> 
> I've just been testing r2258 on CentOS 5.  rpmbuild runs successfully 
> and the packages install and run.
> 
> However, I notice that some of the tcpconn metrics are failing.  
> tcpconn.py doesn't appear to have changed since r1658 (August 2008).  It 
> is the only python module that is loaded by default.
> 
> The commit mentions moving the netstat thread start - are you able to 
> have a look at this Brad?
> 
> You can get my tarball from http://www.pocock.com.au/ganglia/test if you 
> need to.  It is bootstrapped on Debian 5.
> 
> 
> metric 'tcp_established' being collected now
> metric 'tcp_established' has value_threshold 1.00
> metric 'tcp_listen' being collected now
> [PYTHON] Can't call the metric handler function for [tcp_listen] in the 
> python module [tcpconn].
> 
> Traceback (most recent call last):
>   File "/usr/lib/ganglia/python_modules/tcpconn.py", line 67, in 
> TCP_Connections
> _WorkerThread.start()
>   File "/usr/lib/python2.4/threading.py", line 410, in start
> assert not self.__started, "thread already started"
> AssertionError: thread already started
> metric 'tcp_listen' has value_threshold 1.00
> metric 'tcp_timewait' being collected now
> [PYTHON] Can't call the metric handler function for [tcp_timewait] in 
> the python module [tcpconn].
> 

I can't reproduce the problem so all I can do is take a guess at what might be 
happening and leave it to somebody who is seeing the issue to verify what is 
happening.  The exception that you are seeing is a result of a thread trying to 
be started multiple times.  There is an if statement in TCP_connections() that 
is suppose to prevent this from happening.  This if statement checks two thread 
variables that should indicate what state the thread is in.  The running thread 
variable is set to false during thread initialization and is set to true as 
soon as the threads run method is called.  The run method is of the thread is 
called as a result of calling the start() method on the thread object.  Each 
time that one of the tcpconn metrcs is gathered, the metric callback hits the 
thread start if statement.  If the run thread variable is set to true, then no 
other metric invocation should be allowed to start the thread again.  

There is a very small window where, on initial startup, two metric callbacks 
could get past the if statement in TCP_connections() and try to start the 
thread a second time.  The windows would be caused by a delay between the time 
that the start() method is called and when the threading module finally calls 
the threads run() method.  We could add a try...catch block around the start() 
call to catch and ignore the exception if the thread is started a second time.  
But the part that bothers me is that in the list of exceptions, the thread was 
obviously attempted more than just a second time.  

So my questions are, is the thread really running when the second or more 
attempts are made?  Is the thread bailing out somewhere before the "running" 
thread variable is set?  If we added the try...catch block and ignored the 
thread, does this leave the thread running and in a functional state?  Without 
being able to reproduce the problem, I can't really answer these questions.

Brad

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] [RFC] two step gmond initialization

2009-12-11 Thread Brad Nicholes

>>> On 12/11/2009 at 6:21 AM, in message <4b224750.2090...@pocock.com.au>, 
>>> Daniel
Pocock  wrote:

>> it replaces apr_proc_detach with an inline implementation of it on plain
>> POSIX and that should be most likely as portable (at least for the platforms
>> we care of) and doesn't intentionally include any error checking to make it
>>   
> How about Cygwin and mingw?  I'm not sure if the use of pipe(), fork(), 
> etc is possible there
> 
> I think we need to take a broader decision about the way we support the 
> Windows platform anyway, we may not need to support detach on that 
> platform.  With Cygwin or with mingw, we should be able to include 
> native code for running as a service.
> 
> So my proposal would be that we extend Carlo's concept so that there are 
> two variations of it, using #ifdef :
> 
> - a UNIX variation of gmond that has detach functionality implemented 
> with fork, pipe, etc
> 
> - a Windows variation of gmond that has built in support for running as 
> a service
> 
> The cygrunsrv source code here provides us with an example of how to go 
> about it:
> 
> http://sourceware.org/cgi-bin/cvsweb.cgi/cygrunsrv/?cvsroot=cygwin-apps#dirlis
>  
> t
> 
> What do people think about having this type of native code in gmond 
> rather than just using apr?  Or should we try to patch apr to provide 
> the functionality?
> 

I have to admit that I haven't dug into this issue to understand exactly why we 
are having problems with APR.  APR is designed to solve these problems in a 
cross platform way and we are proposing that we abandon the cross platform 
solution in favor of a platform specific solution.  I know that httpd doesn't 
have these issues and they detach and run just fine across a wide variety of 
platforms including windows, BSD, solaris, etc.  Why are we having these 
problems when httpd doesn't?  Is the real solution as simple as going to the 
APR mailing list and asking why this issue exists in APR and if there is a 
workaround?  I haven't really seen this issue show up on the APR mailing list 
so far or did I miss it?  

One of the problems that we already have with gmond is that there is already 
too much platform specific code in it which is why we have to rely on cygwin in 
order to run on windows.  It is also the reason why gmetad doesn't really run 
on windows because it wasn't built on top of a cross platform solution.  My gut 
feel is that we should be moving ganglia more towards APR rather than away from 
it.

My 2 cents

Brad

--
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] [Ganglia-general] Ganglia 3.1.5 beta ready for final testing

2009-12-02 Thread Brad Nicholes

>>> On 12/2/2009 at 7:21 AM, in message <4b1677e4.8000...@pocock.com.au>, Daniel
Pocock  wrote:
> I would like gmond to return a non-zero return code if it fails to 
> initialise, e.g. if it is unable to bind or if it is unable to resolve a 
> hostname mentioned in gmond.conf
> 
> Otherwise, the init-script always says that it started '[OK]' even if 
> the daemon process has died on startup.
> 
> That is why this change was made.  However, I see a few solutions going 
> forward:
> 
> - we can discard the patch completely
> 
> - we can discard the patch, and I could write another patch that does 
> some tests (e.g. resolving host names) before daemonizing
> 
> - we can #ifdef the patch so that on BSD systems, it daemonizes earlier, 
> and on other systems it does so later
> 
> - we can modify the init script to sleep and then call `ps -C gmond' and 
> determine if it kept running
> 
> - post the problem on the apr dev list and discuss it there before 
> making any decision
> 
> 

I'm not sure that I have anything to add as far as the discussion of this issue 
goes, but I have commit rights on the APR project.  If you go with the last 
option and take this discussion to the APR-dev list, I can certainly get 
whatever patch is agreed upon committed and backported in APR.  The downside to 
that option is that we would have to bundle the latest APR RPMs or tarball with 
Ganglia rather than using the distro version.  So even if we do find a solution 
in APR, we will probably still have to build in a workaround in gmond.

Brad


--
Join us December 9, 2009 for the Red Hat Virtual Experience,
a free event focused on virtualization and cloud computing. 
Attend in-depth sessions from your desk. Your couch. Anywhere.
http://p.sf.net/sfu/redhat-sfdev2dev
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] Fwd: Getting started with developing a C++ DSO module

2009-11-30 Thread Brad Nicholes

>>> On 11/25/2009 at 11:12 AM, in message
<75fb37ae0911251012i328f8f00u5586dab199c97...@mail.gmail.com>, Sylvester Steele
 wrote:
>>  I don't know why you would be getting a segfault on this line.  Gmond 
> expects the array to be NULL terminated so all you are doing is adding one 
> extra entry and filling it will NULLs.  With the array being NULL terminated, 
> gmond doesn't have to keep track of the metric count, it only has to look for 
> a NULL entry.
>>
> 
> 
> More modifications- and I can't figure out where the problem is.
> 
> I tried
> 
> gmi->name=NULL
> 
> that didn't work either
> 
> Then I thought I should remove the null metric and have only one metric 
> like:
> 
> 
> 
> gmi = (Ganglia_25metric*)apr_array_push(metric_info);
> 
> gmi->name= apr_pstrdup (pool,"Random_Numbers_2");
> gmi->tmax=90;
> gmi->type=GANGLIA_VALUE_UNSIGNED_INT;
> gmi->msg_size= UDP_HEADER_SIZE+8;
> gmi->units= apr_pstrdup (pool,"Num");
> gmi->slope=apr_pstrdup (pool,"both");
> gmi->fmt=apr_pstrdup (pool,"%u");
> gmi->desc= apr_pstrdup (pool,"Example module metric (random numbers) 2");
> MMETRIC_INIT_METADATA(gmi, pool);
> MMETRIC_ADD_METADATA(gmi,MGROUP,"example_2");
> 
> printf ("\n First metric done");
> 
> /*
> gmi = (Ganglia_25metric*)apr_array_push(metric_info);
> printf ("\nStarted second metric..");
> 
> gmi->name= apr_pstrdup (pool,"Constant_Number_2");
> gmi->tmax=90;
> gmi->type=GANGLIA_VALUE_UNSIGNED_INT;
> gmi->msg_size= UDP_HEADER_SIZE+8;
> gmi->units= apr_pstrdup (pool,"Num");
> gmi->slope=apr_pstrdup (pool,"zero");
> gmi->fmt=apr_pstrdup (pool,"%u");
> gmi->desc= apr_pstrdup (pool,"Example module metric (constant number) 2");
> MMETRIC_INIT_METADATA(gmi, pool);
> MMETRIC_ADD_METADATA(gmi,MGROUP,"example_2");
> 
> printf ("\nSecond metric done");
> */
> //gmi = (Ganglia_25metric*)apr_array_push(metric_info);
> //printf ("\nStarted null metric");
> 
> //gmi->name= apr_pstrdup (pool,NULL);
> 
> //memset (gmi, 0, sizeof(*gmi));
> printf ("\nMetric initing done");
> return 0;
> 
> And this gives a segfault too! And here is the output:
> 
> In the ex_metric_init function
> Got first GMI
>  First metric done
> Segmentation fault
> 
> 
> ie there is a segfault between the last and second last printf
> statements! Any clues?
> 

Sorry, my only suggestion would be to run it in the debugger to get a better 
idea of what is happening.

Brad


--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] Fwd: Getting started with developing a C++ DSO module

2009-11-25 Thread Brad Nicholes

>>> On 11/25/2009 at 10:19 AM, in message <008b01ca6df3$823a2690$86ae73...@com>,
"Sylvester Steele"  wrote:
>> >My guess is because you have static string pointers being passed from a
> DSO module to gmond.  I would suggest using apr_pstrdup(p, >>here>) to allocate the memory from an APR memory pool before handing the
> pointers back to gmond.
> 
> Thanks Brad- that helped, but I am still getting a seg fault from the second
> line:
> 
> gmi = (Ganglia_25metric*)apr_array_push(metric_info);
> memset (gmi, 0, sizeof(*gmi));
> 
> I am doing this to set the last metric to null. Why should this be
> happening? BTW- my metric_info has size=10 and I am putting in only two
> metrics before this (The null metric is the third)
> 
> 

I don't know why you would be getting a segfault on this line.  Gmond expects 
the array to be NULL terminated so all you are doing is adding one extra entry 
and filling it will NULLs.  With the array being NULL terminated, gmond doesn't 
have to keep track of the metric count, it only has to look for a NULL entry.

Brad


--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] Fwd: Getting started with developing a C++ DSO module

2009-11-25 Thread Brad Nicholes

>>> On 11/25/2009 at 9:16 AM, in message
<75fb37ae0911250816k5e7e0373x25ad2ee613930...@mail.gmail.com>, Sylvester Steele
 wrote:
> Ok, so I tried to make a dynamically initializing module. I am
> basically trying to convert the example module to a dynamically
> initializing one..
> 
> My metrc_init function looks like this:
> 
> static int ex_metric_init ( apr_pool_t *p )
> {
> 
> 
> Ganglia_25metric* gmi;
>apr_pool_create(&pool, p);
> 
> metric_info = apr_array_make(pool, 10, sizeof(Ganglia_25metric));
>// metric_mapping_info = apr_array_make(pool, 10, sizeof(mapped_info_t));
> 
> gmi = (Ganglia_25metric*)apr_array_push(metric_info);
> 
> gmi->name= "Random_Numbers_2";
> gmi->tmax=90;
> gmi->type=GANGLIA_VALUE_UNSIGNED_INT;
> gmi->msg_size= UDP_HEADER_SIZE+8;
> gmi->units= "Num";
> gmi->slope="both";
> gmi->fmt="%u";
> gmi->desc= "Example module metric (random numbers) 2";
> MMETRIC_INIT_METADATA(gmi, pool);
> MMETRIC_ADD_METADATA(gmi,MGROUP,"example_2");
> 
> gmi = (Ganglia_25metric*)apr_array_push(metric_info);
> 
> gmi->name= "Constant_Number_2";
> gmi->tmax=90;
> gmi->type=GANGLIA_VALUE_UNSIGNED_INT;
> gmi->msg_size= UDP_HEADER_SIZE+8;
> gmi->units= "Num";
> gmi->slope="zero";
> gmi->fmt="%u";
> gmi->desc= "Example module metric (constant number) 2";
> MMETRIC_INIT_METADATA(gmi, pool);
> MMETRIC_ADD_METADATA(gmi,MGROUP,"example_2");
> 
> 
> gmi = (Ganglia_25metric*)apr_array_push(metric_info);
> 
> memset (gmi, 0, sizeof(*gmi));
> 
> 
> return 0;
> }
> 
> 
> Q1. For some reason this gives me a segmentation fault. Any ideas why?
> 

My guess is because you have static string pointers being passed from a DSO 
module to gmond.  I would suggest using apr_pstrdup(p, ) 
to allocate the memory from an APR memory pool before handing the pointers back 
to gmond.

> Q2. How do I put printf / cout statements so I can see them when gmond
> runs- which will be very helpful for debugging.
> 

You should be able to just use printf statements and then run gmond in debug 
mode. With gmond running in debug mode, all printf statements should print out 
on the console.

Brad




--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] [Ganglia-general] 3.1.4 to go GA?

2009-11-20 Thread Brad Nicholes

>>> On 11/20/2009 at 8:46 AM, in message <4b06b9d6.5080...@pocock.com.au>, 
>>> Daniel
Pocock  wrote:
> Brad Nicholes wrote:
>>>>> On 11/20/2009 at 8:07 AM, in message <4b06b0af.1050...@pocock.com.au>, 
>>>>> Daniel
>>>>> 
>> Pocock  wrote:
>>   
>>> Brad Nicholes wrote:
>>> 
>>>> I've been running it on a very small set of machines.  It all looks good 
>>>> to 
>>>>   
>>> me.
>>> 
>>>>   
>>>>   
>>> No complaints from anyone... is that sufficient to go live?  I'm not 
>>> sure if I have the access level to put the release on the SF site though.
>>> 
>>
>> You are the release manager.  The decision to go live is your call.  :)
>>   
> Ok, 3.1.4 is now GA and ready for distribution.
> 
> As usual, any feedback is still welcome and will be used to shape 3.1.5
> 

BTW, besides posting the tarball on SF for distribution, you will also need to 
fix up the Current Release Notes page on the Wiki and fix the configuration and 
installation pages with the latest documentation.

Brad


--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] 3.1.4 to go GA?

2009-11-20 Thread Brad Nicholes

>>> On 11/20/2009 at 8:07 AM, in message <4b06b0af.1050...@pocock.com.au>, 
>>> Daniel
Pocock  wrote:
> Brad Nicholes wrote:
>> I've been running it on a very small set of machines.  It all looks good to 
> me.
>>   
> 
> No complaints from anyone... is that sufficient to go live?  I'm not 
> sure if I have the access level to put the release on the SF site though.

You are the release manager.  The decision to go live is your call.  :)

Brad


--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] gmetad using apr as of r2106, shorter polling intervals

2009-11-20 Thread Brad Nicholes

Up until now, gmetad hasn't really used APR for any of its base functionality.  
If we are going to start putting gmetad on top of APR, there are a number of 
places where gmetad could really be improved.  One of the most glaring areas is 
in the metric_hash code.  This code is currently being generated through the 
gperf tool which produces C code that is very specific to a certain set of tags 
and how they should be hashed.  Furthermore, if there are changes to any of the 
metric tags, this hashing code has to be manually generated before any of the 
autotools can be run.  It would really be nice to remove gmetad's dependance on 
the gperf tool and instead put all of the hashing functionality on top of the 
apr_hash*** table functions.  These functions are much more flexible and would 
remove a significant amount of complex code.  

In addition to that, there are many other areas such as threading and memory 
which could really benefit from APR.  Not to mention portablility.

Just a thought in case anybody is looking for someplace where they could really 
contribute to Ganglia.

Brad

>>> On 11/20/2009 at 8:05 AM, in message <4b06b01d.3050...@pocock.com.au>, 
>>> Daniel
Pocock  wrote:

> 
> As discussed previously on the list, I've adapted gmetad to use apr's 
> sleep functionality.  For anyone using trunk, please run autoreconf && 
> ./configure to get the newest gmetad/Makefile
> 
> Changing the sleep code to randomize intervals using a percentage rather 
> than absolute value should be helpful for shorter polling intervals - I 
> would be interested in any feedback from people using Ganglia for 
> polling intervals smaller than 15 seconds.
> 
> The change is in trunk but may be backported for 3.1.5
> 
> 
> r2106 | d_pocock | 2009-11-20 14:58:09 + (Fri, 20 Nov 2009) | 1 line
> 
> Rewrite gmetad sleep code in various places to use apr, remove magic 
> numbers, sleep as a percentage of the step rather than an absolute 
> random adjustment
> 
> 
> --
> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
> trial. Simplify your report design, integration and deployment - and focus on 
> 
> what you do best, core application coding. Discover what's new with
> Crystal Reports now.  http://p.sf.net/sfu/bobj-july 
> ___
> Ganglia-developers mailing list
> Ganglia-developers@lists.sourceforge.net 
> https://lists.sourceforge.net/lists/listinfo/ganglia-developers 

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] Fwd: Getting started with developing a C++ DSO module

2009-11-19 Thread Brad Nicholes

>>> On 11/19/2009 at 9:13 AM, in message
<75fb37ae0911190813q66cf1f96w1afe84f8bdbe1...@mail.gmail.com>, Sylvester Steele
 wrote:
>>  I'm not sure what you are looking for.  The purpose of the code that I 
> referred to was to show how a module would generate the metric definitions 
> during the initialization phase of gmond.  Basically what happens is that 
> when gmond is started it loads each module and calls the metric_init function 
> for each module.  At that point each module has the opportunity to tell gmond 
> what metrics it supports by passing back an array of metric definitions.  
> That is how gmond determines which module supports which metrics.  The 
> contents of the metric definition array is completely up to the module 
> itself.  However once the module returns the list of metric definitions, that 
> list can not be changed until the next time that gmond stops and restarts.  
> There is no way to alter the list of metrics that gmond is monitoring on the 
> fly during normal gmond operation.  If the latter is what you are looking 
> for, gmond does not support on-the-fly functionality yet.
>>
>>
>>
> 
> 
> I don't want to change anything after a module starts. Changes to
> which metrics are being collected can wait until a restart. But- every
> time gmond restarts- this module may be collecting a different number
> of standard metrics. So I don't need to change the metadata of the
> metrics themselves- I just need to say at start: "OK we are collecting
> these X metrics"-where X is variable and will change only at a
> restart- but varies between restarts. No changes need be made after
> initialization.  I hope this clarifies things a bit.
> 

So the mod_python code that I referred to is doing that.  By creating a 
metric_info array in your metric_init function using the apr array calls you 
can create a dynamic array rather than using a hard coded static array in your 
code.  However you still have another problem.  Your configuration file still 
needs to match up with the metrics that are being gathered.  In other words, 
you still need to have a corresponding metic block within a collection_group in 
your gmond.conf configuration file whose metric name matches a metric 
definition that is being returned by one of the loaded modules.  Right now 
there isn't a way to dynamically generate the gmond configuration for a metric 
even though the metric module has the ability to collect data for the given 
metic.  Basically what this means is that if you expect that on a given restart 
of gmond that X number of new metrics are going to be collected by your metric 
module, you have to manually enter their corresponding configuration into 
gmond.conf.  

Adding functionality to have the metric configuration be completely driven by a 
metric module still needs to be done.  Basically a cool feature that is looking 
for somebody to implement it.  

Brad

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] gmetad no summary for spoof'd data patch

2009-11-19 Thread Brad Nicholes

>>> On 11/18/2009 at 8:19 AM, in message
<20091118151950.ga13...@porcupine.cita.utoronto.ca>, Robin Humble
 wrote:
> Hi Brad,
> 
> I appreciate you taking the time to look at the patch.
> 
> On Tue, Nov 17, 2009 at 09:54:11AM -0700, Brad Nicholes wrote:
>> On 11/7/2009 at 12:06 AM, in message 
> <20091107070643.ga20...@porcupine.cita.utoronto.ca>, Robin Humble 
>  wrote:
>>> turns out that there's a SPOOF_HOST EXTRA_ELEMENT attached to each
>>> spoof'd metric, and when 100's of hosts (>40 or so should trigger it)
>>> have spoof'd entries, then those add up and then corrupt the summary
>>> Metric structure enough to destroy the .type and stop the rrd being
>>> generated.
>>> I'm guessing it's the same as the MAX_EXTRA_ELEMENTS problem, except
>>> for the summary table instead of the host table.
>>I took a look at this patch and since I am not able to reproduce the
>>problem, it makes it a little unclear as to what is happening.  I can't
>>really figure out how this patch fixes a problem with the hash table. 
>>According to the source code, whenever an extra element is parsed, the
>>code inserts the extra element into a list of extra data on a per
>>metric basis.  This means that only one extra element for a spoof host
>>is ever stored for a metric.
> 
> yes, it's the summary table that's the problem, not the host table.
> 
>> Then when the code moves into the summary
>>data portion, it specifically checks to make sure that it is not
>>duplicating an extra element value before it inserts it into the
>>summary node (check the for loop at around line #827 in the 3.1.2
>>version of the source code).  If it detects a duplicate value, then it
>>skips the insert and just updates the rest of the summary node in the
>>hash table. 
> 
> in this loop ->
> 
>   for (i = 0; i < sum_metric.ednameslen; i++) {
>   char *chk_name = getfield(sum_metric.strings, sum_metric.ednames[i]);
>   char *chk_value = getfield(sum_metric.strings, 
> sum_metric.edvalues[i]);
>   
>   if (!strcasecmp(chk_name, new_name) && !strcasecmp(chk_value, 
> new_value)) {
>   found = TRUE;
>   break;
>   }
>   }
> 
> here's an example of what happens for a spoof'd metric ->
> 
>   (chk_name == new_name) 1 && (chk_value == new_value) 0 ==> 0 - chk_name 
> SPOOF_HOST new_name SPOOF_HOST chk_value 10.1.1.30:v30 new_value 
> 10.1.1.37:v37
>   (chk_name == new_name) 1 && (chk_value == new_value) 0 ==> 0 - chk_name 
> SPOOF_HOST new_name SPOOF_HOST chk_value 10.1.1.31:v31 new_value 
> 10.1.1.37:v37
>   (chk_name == new_name) 1 && (chk_value == new_value) 0 ==> 0 - chk_name 
> SPOOF_HOST new_name SPOOF_HOST chk_value 10.1.1.32:v32 new_value 
> 10.1.1.37:v37
>   (chk_name == new_name) 1 && (chk_value == new_value) 0 ==> 0 - chk_name 
> SPOOF_HOST new_name SPOOF_HOST chk_value 10.1.1.33:v33 new_value 
> 10.1.1.37:v37
>   (chk_name == new_name) 1 && (chk_value == new_value) 0 ==> 0 - chk_name 
> SPOOF_HOST new_name SPOOF_HOST chk_value 10.1.1.34:v34 new_value 
> 10.1.1.37:v37
>   (chk_name == new_name) 1 && (chk_value == new_value) 0 ==> 0 - chk_name 
> SPOOF_HOST new_name SPOOF_HOST chk_value 10.1.1.35:v35 new_value 
> 10.1.1.37:v37
>   (chk_name == new_name) 1 && (chk_value == new_value) 0 ==> 0 - chk_name 
> SPOOF_HOST new_name SPOOF_HOST chk_value 10.1.2.80:v176 new_value 
> 10.1.1.37:v37
>   (chk_name == new_name) 1 && (chk_value == new_value) 0 ==> 0 - chk_name 
> SPOOF_HOST new_name SPOOF_HOST chk_value 10.1.1.36:v36 new_value 
> 10.1.1.37:v37
>   (chk_name == new_name) 1 && (chk_value == new_value) 0 ==> 0 - chk_name 
> SPOOF_HOST new_name SPOOF_HOST chk_value 10.1.2.81:v177 new_value 
> 10.1.1.37:v37
>   ...
> 
> you can see that every EXTRA_ELEMENT "name" field matches, but as
> each spoof'd entry comes from a different host, then every "value" is
> different, so 'found' is always FALSE.
> 
> so a new EXTRA_ELEMENT is always inserted for every spoof'd host.
> ie. for one spoof'd metric on N hosts then there would be N
> EXTRA_ELEMENT's stored next to it in the summary table.
> 
> when the number of spoofed hosts is > few * MAX_EXTRA_ELEMENTS, then
> corruption occurs in the summary hash. the upshot of which is that the
> summary table gets corrupted and the checks in gmetad.c mean that
> (unless you get very lucky) the __SummaryInfo__/* rrd file for the
> spoof'd metric is never written.
> 

Now I get it.  I'll take a look at it from that angle.

Brad

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] Fwd: Getting started with developing a C++ DSO module

2009-11-19 Thread Brad Nicholes

>>> On 11/19/2009 at 6:32 AM, in message
<75fb37ae0911190532t17685eb0uc1db8390546b4...@mail.gmail.com>, Sylvester Steele
 wrote:
>>  Take a look at the pyth_metric_init() function in the mod_python.c module.  
> At the end of the function, mod_python takes all of the metric definitions 
> and pushes them into an APR array.  Then it sets the metric_info field of the 
> module structure with the metric_info->elts value.
>>
>>python_module.metrics_info = (Ganglia_25metric *)metric_info->elts;
>>
>> Basically it is just a matter of calling the APR function 
> apr_array_push(metric_info); for each metric definition and then filling in 
> the structure that is returned.
>>
>> Brad
>>
> 
> Thanks Brad,
> 
> I went through that function you mentioned- if I understood it right-
> that function adds different metadata for different metrics. So for
> example:
> 
> {0, "cpu_num",1200, GANGLIA_VALUE_UNSIGNED_SHORT, "CPUs", "zero",
> "%hu",  UDP_HEADER_SIZE+8, "Total number of CPUs"},
>{0, "cpu_speed",  1200, GANGLIA_VALUE_UNSIGNED_INT,   "MHz",  "zero",
> "%u",  UDP_HEADER_SIZE+8, "CPU Speed in terms of MHz"},
> 
> I guess this adds more metadata to say the cpu_speed metric. I don't
> want to do that, All my metrics will have the same meta-data- its just
> that I don't know how many metrics I will have at compile time. The
> number of metrics will be determined at runtime by reading a file.
> 
> I was wondering if I could just run a loop and put in the metadata this way:
> 
> static Ganglia_25metric *mem_metric_info;
> 
> mem_metric_info= new Ganglia_25metric[num_of_metrics];
> 
> for (i=0 to  mem_metric_info[i]=  {0, some name,1200,
> GANGLIA_VALUE_UNSIGNED_SHORT, "CPUs", "zero", "%hu", DP_HEADER_SIZE+8,
> "Total number of CPUs"},
> }
> 

I'm not sure what you are looking for.  The purpose of the code that I referred 
to was to show how a module would generate the metric definitions during the 
initialization phase of gmond.  Basically what happens is that when gmond is 
started it loads each module and calls the metric_init function for each 
module.  At that point each module has the opportunity to tell gmond what 
metrics it supports by passing back an array of metric definitions.  That is 
how gmond determines which module supports which metrics.  The contents of the 
metric definition array is completely up to the module itself.  However once 
the module returns the list of metric definitions, that list can not be changed 
until the next time that gmond stops and restarts.  There is no way to alter 
the list of metrics that gmond is monitoring on the fly during normal gmond 
operation.  If the latter is what you are looking for, gmond does not support 
on-the-fly functionality yet.

Brad

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] 3.1.4 to go GA?

2009-11-18 Thread Brad Nicholes

I've been running it on a very small set of machines.  It all looks good to me.

Brad

>>> On 11/18/2009 at  9:42 AM, in message
, Bernard Li
 wrote: 
> I haven't had a chance to test it out yet -- has anybody else been
> able to give it a spin?
> 
> Cheers,
> 
> Bernard
> 
> On Wed, Nov 18, 2009 at 7:22 AM, Daniel Pocock  wrote:
>>
>>
>> How do people feel about making 3.1.4 GA?
>>
>>
>> --
>> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
>> trial. Simplify your report design, integration and deployment - and focus on
>> what you do best, core application coding. Discover what's new with
>> Crystal Reports now.  http://p.sf.net/sfu/bobj-july
>> ___
>> Ganglia-developers mailing list
>> Ganglia-developers@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/ganglia-developers
>>
> 
> --
> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
> trial. Simplify your report design, integration and deployment - and focus on 
> 
> what you do best, core application coding. Discover what's new with
> Crystal Reports now.  http://p.sf.net/sfu/bobj-july
> ___
> Ganglia-developers mailing list
> Ganglia-developers@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/ganglia-developers




--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] gmetad no summary for spoof'd data patch

2009-11-17 Thread Brad Nicholes

>>> On 11/7/2009 at 12:06 AM, in message
<20091107070643.ga20...@porcupine.cita.utoronto.ca>, Robin Humble
 wrote:
> Hi,
> 
> I spoof a bunch of temperature and power metrics via ILOM for a few
> hundred nodes and I noticed that gmetad wasn't making a summary table
> (.../__SummaryInfo__/*) for most of the spoof'd values.
> 
> turns out that there's a SPOOF_HOST EXTRA_ELEMENT attached to each
> spoof'd metric, and when 100's of hosts (>40 or so should trigger it)
> have spoof'd entries, then those add up and then corrupt the summary
> Metric structure enough to destroy the .type and stop the rrd being
> generated.
> I'm guessing it's the same as the MAX_EXTRA_ELEMENTS problem, except
> for the summary table instead of the host table.
> 
> attached is a simplistic patch that fixes the problem.
> it could probably be done better, but works for me. it's against 3.1.2,
> but should apply to 3.1.4 as well.
> 
> apologies if I have some of the ganglia/gmetad terminology wrong - I've
> been using it for years, but this my first dive into the code.
> 

I took a look at this patch and since I am not able to reproduce the problem, 
it makes it a little unclear as to what is happening.  I can't really figure 
out how this patch fixes a problem with the hash table.  According to the 
source code, whenever an extra element is parsed, the code inserts the extra 
element into a list of extra data on a per metric basis.  This means that only 
one extra element for a spoof host is ever stored for a metric.  Then when the 
code moves into the summary data portion, it specifically checks to make sure 
that it is not duplicating an extra element value before it inserts it into the 
summary node (check the for loop at around line #827 in the 3.1.2 version of 
the source code).  If it detects a duplicate value, then it skips the insert 
and just updates the rest of the summary node in the hash table.  Since I am 
not able to duplicate the problem, could you step further through the original 
source code to make sure that the check for a duplicate value is actually 
happening and that the code is not taking some other path that could be causing 
the problem.

You might also want to check in the source code at the point where the summary 
table is actually written to see if there is some clue there why your summary 
rrd files are not being created or updated.

Brad

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] Fwd: Getting started with developing a C++ DSO module

2009-11-16 Thread Brad Nicholes

>>> On 11/16/2009 at 3:04 PM, in message <002c01ca6708$d7770020$866500...@com>,
"Sylvester Steele"  wrote:
> Kim,
> 
> I got the tarball to which you'd put up the link earlier on in the mailing
> list. I got your module to work no problem there!
> 
> But, I have a question: 
> 
> All the Ganglia modules have a metric array. The mod_cpu has this:
> 
> static Ganglia_25metric cpu_metric_info[] = 
> {
> {0, "cpu_num",1200, GANGLIA_VALUE_UNSIGNED_SHORT, "CPUs", "zero",
> "%hu",  UDP_HEADER_SIZE+8, "Total number of CPUs"},
> {0, "cpu_speed",  1200, GANGLIA_VALUE_UNSIGNED_INT,   "MHz",  "zero",
> "%u",  UDP_HEADER_SIZE+8, "CPU Speed in terms of MHz"},
> {0, "cpu_user", 90, GANGLIA_VALUE_FLOAT,  "%","both",
> "%.1f", UDP_HEADER_SIZE+8, "Percentage of CPU utilization that occurred
> while executing at the user level"},
> {0, "cpu_nice", 90, GANGLIA_VALUE_FLOAT,  "%","both",
> "%.1f", UDP_HEADER_SIZE+8, "Percentage of CPU utilization that occurred
> while executing at the user level with nice priority"},
> {0, "cpu_system",   90, GANGLIA_VALUE_FLOAT,  "%","both",
> "%.1f", UDP_HEADER_SIZE+8, "Percentage of CPU utilization that occurred
> while executing at the system level"},
> {0, "cpu_idle", 90, GANGLIA_VALUE_FLOAT,  "%","both",
> "%.1f", UDP_HEADER_SIZE+8, "Percentage of time that the CPU or CPUs were
> idle and the system did not have an outstanding disk I/O request"},
> {0, "cpu_aidle",  3800, GANGLIA_VALUE_FLOAT,  "%","both",
> "%.1f", UDP_HEADER_SIZE+8, "Percent of time since boot idle CPU"},
> {0, "cpu_wio",  90, GANGLIA_VALUE_FLOAT,  "%","both",
> "%.1f", UDP_HEADER_SIZE+8, "Percentage of time that the CPU or CPUs were
> idle during which the system had an outstanding disk I/O request"},
> {0, "cpu_intr", 90, GANGLIA_VALUE_FLOAT,  "%","both",
> "%.1f", UDP_HEADER_SIZE+8, "cpu_intr"},
> {0, "cpu_sintr",90, GANGLIA_VALUE_FLOAT,  "%","both",
> "%.1f", UDP_HEADER_SIZE+8, "cpu_sintr"},
> {0, NULL}
>   
> };
> 
> 
> Now, what if the number of metrics that my module monitors changes? Ie If I
> want to monitor 5 metrics today but 10 tomorrow- Is it possible to
> dynamically initialize the metrics somehow?
> 
> Will the following work:
> 
> 
> For (how many ever metrics)
> {
> cpu_metric_info[i]= appropriate string
> }
> 
> And put this for loop in the metric_init function..
> 
> Will that do the trick?
> 


Take a look at the pyth_metric_init() function in the mod_python.c module.  At 
the end of the function, mod_python takes all of the metric definitions and 
pushes them into an APR array.  Then it sets the metric_info field of the 
module structure with the metric_info->elts value.

python_module.metrics_info = (Ganglia_25metric *)metric_info->elts;

Basically it is just a matter of calling the APR function 
apr_array_push(metric_info); for each metric definition and then filling in the 
structure that is returned.

Brad




--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

[Ganglia-developers] Contributing the source code (was:Getting started with developing a C++ DSO module)

2009-11-12 Thread Brad Nicholes

>>> On 11/11/2009 at 4:13 PM, in message
, JB Kim 
wrote:
> I've written the iostat standalone DSO module a while back in C. I do  
> have the whole build process documented (to some degree) and provided  
> template for creating standalone DSO.
> 
> I think you can search for "iostat" in the archives. If you can't find  
> it, I'll dig it up and reply again.
> 

The Groundwork Open Source people started a monitoring project repository for 
monitoring related projects and components.  There are already a couple of 
small Ganglia related components in the repository.  Would you be interested in 
posting your iostat module as a Ganglia component in Monitoring Forge?  
http://monitoringforge.org/


Brad


--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] Fwd: Getting started with developing a C++ DSO module

2009-11-11 Thread Brad Nicholes

>>> On 11/10/2009 at 8:30 PM, in message
<75fb37ae0911101930g55978c94u1a16c48fd5cc2...@mail.gmail.com>, Sylvester Steele
 wrote:
> -- Forwarded message --
> From: Sylvester Steele 
> Date: Tue, Nov 10, 2009 at 10:19 PM
> Subject: Re: [Ganglia-developers] Getting started with developing a
> C++ DSO module
> To: Brad Nicholes 
> 
> 
>>> 1. Compile to .so
>>> 2. Place compiled .so in the /usr/lib/ganglia folder
>>> 3. Make appropriate changes to the gmond.conf file - so it can pickup
>>> the the new .so and the mmodule in it
>>> 4. Restart gmond
>>>
> 
>> It sounds like you are trying to build your module inside of the Ganglia 
> build environment.  Since your module isn't part of the standard Ganglia 
> modules, you should be creating your own make file and building your module 
> outside of the Ganglia environment as a stand-alone build.  Unfortunately I 
> don't have a good example of a stand-alone module build to point you to but 
> there may be others that have done this in the past for a C module.  I did 
> this at one point a couple of years ago just to make sure that we had 
> everything in place to build a module outside of Ganglia, but I can't seem to 
> find the makefiles that I used.  You might want to search back through the 
> develop mailing list to see if there were discussions about this.  I know 
> that there was a module contribution about 6 months ago where they were 
> building their own C module.  The code never made it into the repository but 
> it should still be on the mailing list somewhere.
>>
> 
> Oh ok. I'll see if I can find it in the mailing list archives. Will
> the 4 steps I've listed above be enough to make Ganglia pickup my
> module and start sending the metrics?
> 

Sure.  As long as gmond can find the module and load it will all dependancies, 
things should be good.

Brad


--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] Storing Ganglia data in MySql.

2009-11-10 Thread Brad Nicholes

>>> On 11/10/2009 at 9:30 AM, in message <4af9952a.9070...@pocock.com.au>, 
>>> Daniel
Pocock  wrote:
> Brad Nicholes wrote:
>>>>> On 11/10/2009 at 4:11 AM, in message
>>>>> 
>> , Himanshu Sharma
>>  wrote:
>>   
>>> Hello all,
>>>
>>> We were looking to store Ganglia data in MySql rather than just an
>>> RRD. There was a discussion earlier on the same issue -
>>> http://www.mail-archive.com/ganglia-developers@lists.sourceforge.net/msg02100.
>>> html.
>>> It would be great if there was some reusable code available or if
>>> there was any outcome out of it as to what could be the best possible
>>> approach.
>>>
>>> 
>>
>> One solution to this was the rewrite of gmetad in python.  We did this a 
> couple of years ago and added it to the SVN repository.  One of the new 
> features of the python rewrite was the introduction of gmetad plugins.  The 
> plugin interface allows you to plug in a python module where you can do 
> anything you want with the data that is being gathered from the gmond agents. 
>  There are examples of plugins that store data in an RRD database as well as 
> one that generates email alerts.  You should be able to use the RRD database 
> plugin as an example to easily create a plugin that stores the data in a 
> MySQL database instead or in addition to RRD.  
>>
>>   
> Which gmetad is intended to be on the future roadmap?
> 
> For a large site, do you believe it is fair to say that the C 
> implementation is best for performance?
> 
> I was thinking of patching gmetad so that it can get the metrics from a 
> local gmond instance using shared memory rather than XML, and some 
> various other optimizations too
> 

This is yet-to-be-determined.  Basically the Ganglia community needs to decide 
which gmetad  will be going forward.  The python rewrite adds some new 
functionality which is not available in the C version.  However if the C 
version continues to be good enough, then the python rewrite may never really 
see the light of day.  However, if there are more and more requests for metric 
data to be stored or used in different ways outside of just trending, then the 
python rewrite of gmetad will probably be the way to go rather than trying to 
enhance the C version with the same features.  The python rewrite of gmetad has 
obviously not had the level of testing and stabilization as the C version which 
puts the python version at a disadvantage.  However with the plugin interface, 
gmetad and Ganglia could really grow up to include not just trending but 
alerts, health and complex data analysis too (or anything else you can dream up 
for a plugin).  So the bottom line is that the future roadmap of gmetad is up 
to the community and who wants to step up to make things happen.

Brad

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] Storing Ganglia data in MySql.

2009-11-10 Thread Brad Nicholes

>>> On 11/10/2009 at 4:11 AM, in message
, Himanshu Sharma
 wrote:
> Hello all,
> 
> We were looking to store Ganglia data in MySql rather than just an
> RRD. There was a discussion earlier on the same issue -
> http://www.mail-archive.com/ganglia-developers@lists.sourceforge.net/msg02100.
> html.
> It would be great if there was some reusable code available or if
> there was any outcome out of it as to what could be the best possible
> approach.
> 

One solution to this was the rewrite of gmetad in python.  We did this a couple 
of years ago and added it to the SVN repository.  One of the new features of 
the python rewrite was the introduction of gmetad plugins.  The plugin 
interface allows you to plug in a python module where you can do anything you 
want with the data that is being gathered from the gmond agents.  There are 
examples of plugins that store data in an RRD database as well as one that 
generates email alerts.  You should be able to use the RRD database plugin as 
an example to easily create a plugin that stores the data in a MySQL database 
instead or in addition to RRD.  

Brad

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] Getting started with developing a C++ DSO module

2009-11-03 Thread Brad Nicholes

>>> On 11/2/2009 at 9:00 PM, in message <007e01ca5c3a$2ded5eb0$89c81c...@com>,
"Sylvester Steele"  wrote:
> Hi Folks,
> 
> I want to develop a C++ DSO for ganglia. While I did see a bit of
> documentation for a python based thing, I haven't seen much for a C++ DSO.
> So where should I begin if I want to develop a C++ DSO?
> 
> Also, How do I search the mailing list archive?
> 
> Thanks,
> Sylvester

Since gmond is built on top of the APR (Apache Portable Runtime) and the fact 
that the ganglia modules were patterned after apache modules, building a 
ganglia module in C++ should be very similar to building an Apache module in 
C++.  You could probably start by looking at 
http://marc.info/?l=apache-modules&m=115406581404410&w=2 and there should be a 
lot more information about building Apache modules in C++.  Bottomline is that 
you would need to put extern C {} around the module structure.  Keep in mind 
that as you find information about how to build an apache module in C++, 
Ganglia modules are much simpler than Apache modules so you wouldn't have to 
worry about the hooks or anything else like that in a ganglia module.

Brad

--
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] release names?

2009-10-27 Thread Brad Nicholes

>>> On 10/27/2009 at 4:23 AM, in message <4ae6ca27.9080...@pocock.com.au>, 
>>> Daniel
Pocock  wrote:

> 
> In the wiki, it says `version numbers are cheap'
> http://sourceforge.net/apps/trac/ganglia/wiki/how_project_works 
> 
> However, the convention of naming the releases puts a little bit more 
> emphasis on the significance of each tag and release.  Skipping a 
> release (e.g. 3.1.3) certainly doesn't give due credit to some of those 
> people the releases are named after.
> 
> To encourage more frequent releases (maybe even every 4-6 weeks?) maybe 
> release names should be dropped, and only the version number used?
> 

+1 release names are nice, but I haven't really found any use for them.  The 
version number is the most important thing.

Brad


--
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] Feeble attempt at gmond aliasing

2009-10-22 Thread Brad Nicholes

Unless I am misunderstanding the issue, a missing configuration option 
shouldn't be a problem for libconfuse.  Follow the 'Title' configuration 
directive on a metric.  Every metric can optionally have a title that is 
ultimately passed up through the XML.  The code in gmond.c asks libconfuse for 
the title when the metric definition is read.  If no title has been given in 
the configuration file, then the return from libconfuse when asked for the 
title, is NULL.

Brad

>>> On 10/21/2009 at 8:58 AM, in message
, Jesse Becker
 wrote:
> Minor update on this:
> 
> It appears that libconfuse is completely unable to handle
> missing/default values for configuration options[1].  So adding an
> 'alias' option to gmond will mean that every gmond.conf file has to be
> updated to include an "alias=" line.
> 
> The libconfuse documentation is...limited.  Could someone more
> familiar with it than I am offer suggestions as to how to set a
> default value and handle the case where the "alias=" line is not
> present?
> 
> [1] This is a really stupid design decision, IMO. :-(
> 
> On Thu, Oct 1, 2009 at 22:08, Jesse Becker  wrote:
>> Here's my poor attempt at a patch to add aliasing to gmond, in an
>> effort to stimulate some discussion on the topic.  The patch is
>> against trunk.  I've done some basic testing (e.g. no immediate core
>> dumps), but that's it for the moment.
>>
>> Comments?  Improvements?
>>
>> Index: lib/libgmond.c
>> ===
>> --- lib/libgmond.c  (revision 2093)
>> +++ lib/libgmond.c  (working copy)
>> @@ -66,6 +66,7 @@
>>   CFG_BOOL("gexec", 0, CFGF_NONE),
>>   CFG_INT("send_metadata_interval", 0, CFGF_NONE),
>>   CFG_STR("module_dir", NULL, CFGF_NONE),
>> +  CFG_STR("alias",NULL,CFGF_NONE),
>>   CFG_END()
>>  };
>>
>> Index: gmond/gmond.c
>> ===
>> --- gmond/gmond.c   (revision 2093)
>> +++ gmond/gmond.c   (working copy)
>> @@ -301,6 +301,18 @@
>>  }
>>
>>  static void
>> +handle_alias( void ) {
>> +   cfg_t *tmp = cfg_getsec( config_file, "globals");
>> +   char *tmp_myname;
>> +   /* Allow for hostname aliases */
>> +   tmp_myname = cfg_getstr(tmp, "alias");
>> +   if (tmp_myname) {
>> +   strncpy(myname, tmp_myname, APRMAXHOSTLEN);
>> +   debug_msg("Aliasing hostname to [%s]", myname);
>> +   }
>> +}
>> +
>> +static void
>>  daemonize_if_necessary( char *argv[] )
>>  {
>>   int should_daemonize;
>> @@ -2630,6 +2642,8 @@
>>
>>   gmond_argv = argv;
>>
>> +  myname[0] = '\0';
>> +
>>   if (cmdline_parser (argc, argv, &args_info) != 0)
>>   exit(1) ;
>>
>> @@ -2658,6 +2672,7 @@
>> }
>>
>>   process_configuration_file();
>> +  handle_alias();
>>
>>   if(args_info.metrics_flag)
>> {
>> @@ -2686,7 +2701,8 @@
>>   load_metric_modules();
>>
>>   /* Collect my hostname */
>> -  apr_gethostname( myname, APRMAXHOSTLEN+1, global_context);
>> +  if (!*myname)
>> +apr_gethostname( myname, APRMAXHOSTLEN+1, global_context);
>>
>>   apr_signal( SIGPIPE, SIG_IGN );
>>   apr_signal( SIGINT, sig_handler );
>>
>>
>> --
>> Jesse Becker
>>
> 
> 





--
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] [Ganglia-general] Ganglia 3.1.3 beta ready for testing

2009-10-12 Thread Brad Nicholes

>>> On 10/11/2009 at 10:36 PM, in message <4ad2b254.9090...@pocock.com.au>, 
>>> Daniel
Pocock  wrote:
> Bernard Li wrote:
>> Hi Brad:
>>
>> On Thu, Oct 1, 2009 at 3:57 PM, Brad Nicholes  wrote:
>>
>>   
>>> If this is just a simple fix, then I would vote for scraping 3.1.3, rolling 
> 3.1.4 with the fix and resetting the test period.  The other option, since 
> this isn't a regression, would be to release 3.1.3 as is with the defect 
> noted in the release notes.  Then release 3.1.4 next month with the fixes.  I 
> would vote for the first option, but I'm OK with the second if that is the 
> way everybody else wants to go.
>>> 
>>
>> Since Daniel is the Release Manager on 3.1.3, I'd rather defer this
>> decision to him.  However he's on vacation for another week so perhaps
>> we can hold off on the release until then.
>>   
> 
> Another issue I found: the gmond binary built on RHEL3 can't run 
> properly because APR_POLLSET_THREADSAFE is not supported on that 
> platform.  The fix for this is relatively trivial, we will only use that 
> option for kernel >= 2.6.
> 
> I think it is best to pass over 3.1.3 - although people should still 
> test it and report their results - and aim to release 3.1.4 to beta by 
> the end of this week.  I'm happy to volunteer as release manager for 
> 3.1.4 as well, given that it follows on from the 3.1.3 evaluation process.

Sounds good - your call

Brad


--
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] Feeble attempt at gmond aliasing

2009-10-02 Thread Brad Nicholes

>>> On 10/2/2009 at 6:34 AM, in message
, Jesse Becker
 wrote:
> On Fri, Oct 2, 2009 at 01:43, Rick Cobb  wrote:
>> Well, as far as generating discussion goes, I think we're better off
>> only aliasing/spoofing IP addresses @ the gmond level, and resolving
>> all names with gmetad.   That removes all issues of, e.g., whether the
>> host thinks it should send a FQDN or just a basename, or how well
>> dns / resolv.conf is set up on every machine in every cluster, etc.
>> Only the gmetad servers need to have well-configured resolvers, and
>> there are orders of magnitude fewer of those in many networks.
>> Besides: fewer system calls on the boxes that are doing the real work
>> our clusters our built to do.
> 
> All good points.  Sending only the IP address also potentially could
> make the packets just slightly smaller, as an IPv4 packet will fit
> into 32bits total, instead of one byte per character.  (Of course,
> this nicely avoids the whole IPv6 and wide-char hostname issue.)
> 
> In my (again feeble) defense, there's also nothing stopping anyone
> from setting IP addresses in the "alias=" field.
> 
> There are, it seems, two issues related to this.  The first is many
> people have requested aliasing abilities for gmond for various
> reasons.  The other is a broader shift in what gmond actually reports
> (i.e. sending FQDN or just IP).  Fixing the first issue doesn't
> prevent fixing the 2nd issue; do it in stages.
> 
>> Never did get that patch finished, though, so I probably should stay
>> out of the discussion :-)
> 
> Incorrect!  :-)  Finish your patch, and let's see it.  I'm not deeply
> attached to what I posted.

How well does this fit into the previous discussions of using a GUID to 
identify a box rather than an IP or FQDN?  Are aliasing and GUID identifiers 
related or are they two separate issues?

Brad



--
Come build with us! The BlackBerry® Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9-12, 2009. Register now!
http://p.sf.net/sfu/devconf
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] [Ganglia-general] Ganglia 3.1.3 beta ready for testing

2009-10-01 Thread Brad Nicholes

>>> On 10/1/2009 at 4:33 PM, in message
, Bernard Li
 wrote:
> So has anybody else given 3.1.3 a test run?
> 
> I have found some minor issues.
> 
> It looks like there are new configure options added in regards to
> setuid and setgid:
> 
>   --enable-debug  turn on debugging output and compile options
>   --enable-gexec  turn on gexec support (platform-specific)
>   "--enable-setuid=USER  turn on setuid support (default setuid=nobody)"
>   "--enable-setgid=GROUP  turn on setgid support (default setgid=daemon)"
> 
> There are 2 issues:
> 
> - extra quotation marks in the text
> - --enable-setuid is OFF by default.  This is the opposite behaviour
> from previous released versions
> 
> On top of that, our spec file has not been updated with this new
> configure option and therefore the RPMs I posted do *not* setuid.
> 
> I'm not sure if we should consider this as show stopper, but a simple
> fix would simply be to change the default configure option so that it
> reflects the previous behaviour.
> 
> Please let me know what you guys think.
> 

If this is just a simple fix, then I would vote for scraping 3.1.3, rolling 
3.1.4 with the fix and resetting the test period.  The other option, since this 
isn't a regression, would be to release 3.1.3 as is with the defect noted in 
the release notes.  Then release 3.1.4 next month with the fixes.  I would vote 
for the first option, but I'm OK with the second if that is the way everybody 
else wants to go.

Brad


--
Come build with us! The BlackBerry® Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9-12, 2009. Register now!
http://p.sf.net/sfu/devconf
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] Ganglia 3.1.3 beta ready for testing

2009-09-21 Thread Brad Nicholes

Up and running on my SLES-10.2 test machine.  Everything is looking good so far.

thanks,

Brad

>>> On 9/18/2009 at 11:09 PM, in message
, Bernard Li
 wrote:
> Dear all:
> 
> The Ganglia 3.1.3 beta is now ready for testing at:
> 
> http://ganglia.info/testing 
> 
> Changelog for this release:
> 
> * gmond: Fix the allow_extra_data configuration directive(BUG199)
> * gmond: Ensure that a complete XML dump is delivered before closing
> the send socket. Submitted by: Jerry 
> * gmond: add bind and bind_hostname parameters for udp_send_channel()
> * gmetad: BUG232: eliminate case-sensitive hostname bug, user can
> choose to maintain legacy behavior though
> * gmond: BUG237: revise fix for segfault on Solaris where first CPU
> not in slot 0
> * gmond: support for HUP signal on platforms with execve
> * gmond: delay daemonization until after other initialization steps are done
> * gmond: status module: return gmond version info as string metrics
> * gmond: Check return status of apr_pollset_create.  Use
> APR_POLLSET_THREADSAFE on Linux.
> * build: various configure options: Solaris 8 with Sun Studio 11
> support, extra modules for static linking, default setuid, release
> number, build multicpu and status during static builds, support for
> SYSCONFDIR (BUG16)
> * RPM: include status module, allow packager to supply own gmond.conf
> * build: Look in lib64 rather than lib for apr, confuse and expat on
> x86_64 Linux builds
> * Bug fixes and Enhancements
> 
> Special thanks to Daniel Pocock for volunteering to be the release
> manager and other developers for their hard work in putting this
> release together.
> 
> The window for testing will be two weeks and if no major bugs are
> found this will be released as the official 3.1.3 release.
> 
> Thanks for your continual support, and please let us know if you run
> into any issues!
> 
> Cheers,
> 
> Bernard
> 
> (on behalf of the Ganglia team)
> 
> --
> Come build with us! The BlackBerry® Developer Conference in SF, CA
> is the only developer event you need to attend this year. Jumpstart your
> developing skills, take BlackBerry mobile applications to market and stay 
> ahead of the curve. Join us from November 9-12, 2009. Register now!
> http://p.sf.net/sfu/devconf 
> ___
> Ganglia-developers mailing list
> Ganglia-developers@lists.sourceforge.net 
> https://lists.sourceforge.net/lists/listinfo/ganglia-developers 




--
Come build with us! The BlackBerry® Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9-12, 2009. Register now!
http://p.sf.net/sfu/devconf
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] Preparing for 3.1.3

2009-09-16 Thread Brad Nicholes

>>> On 9/16/2009 at 8:26 AM, in message <4ab0f594.4050...@pocock.com.au>, Daniel
Pocock  wrote:

> 
> As discussed a few weeks back, I'm volunteering to manage the 3.1.3 release.
> 
> Most of the changes were made a few weeks ago now, and I've been running 
> some of these patches on several platforms for some time, so I don't 
> think we need to allow a lot of time for additional testing.  What I'd 
> propose is that 3.1.3 is tagged Friday as a beta, and if it is good, 
> then we make it GA two weeks later.
> 
> If problems are found, then 3.1.3 will not be made GA, and we will aim 
> to gather feedback and release 3.1.4 in 2-3 weeks.  I'm also happy to be 
> the release manager for that follow-up release if it becomes necessary.
> 
> I note that part of the release process involves setting the release 
> name for the next release - is this up to the release manager's 
> initiative, or are there some rules for this?
> 
> 

This sounds like a good plan.  I am backporting one patch from the status file 
that fixes the allow_extra_data configuration directive.  I should have it 
committed in the next few minutes but  well before the Friday tag. ;)  As far 
as the release name goes, I think it is basically up to the Release Manager.  
In the past the release name has kind of been following an aviation theme.  But 
I'm not sure if that is a real rule or not.  Bernard is probably the best 
person to answer this question.

Brad


--
Come build with us! The BlackBerry® Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9-12, 2009. Register now!
http://p.sf.net/sfu/devconf
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] gmetad compiled with -O0

2009-09-16 Thread Brad Nicholes

>>> On 9/16/2009 at 8:03 AM, in message <4ab0f04c.9060...@pocock.com.au>, Daniel
Pocock  wrote:

> 
> I notice in gmetad/Makefile.am that AM_CFLAGS includes -O0
> 
> This is the case for both 3.1 and trunk.
> 
> Is this intended for some reason?  I've looked through the SVN history, 
> I can see that it has always been this way since AM_CFLAGS was added to 
> Makefile.am in r384
> 
> It refuses to build on Solaris 8/Sun Studio 11 with this setting, so 
> I'll be removing it for 3.1.3
> 

If it's not there already, the -O0 should probably be moved to the 
--enable-debug option.

Brad


--
Come build with us! The BlackBerry® Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9-12, 2009. Register now!
http://p.sf.net/sfu/devconf
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] bugzilla/roadmap/3.1.3?

2009-08-19 Thread Brad Nicholes

>>> On 8/19/2009 at 8:42 AM, in message <4a8c0f3a.5080...@pocock.com.au>, Daniel
Pocock  wrote:
> Bernard Li wrote:
>> On Tue, Aug 18, 2009 at 8:22 AM, Brad Nicholes wrote:
>>
>>   
>>> I'm not sure that there has been any definitive issue tracker for releases, 
> at least not for the 3.1.x releases.  The road map going forward has been 
> basically left up to the community.  For 3.1.0, .1 and .2 releases, I 
> volunteered as the release manager with a lot of help from Bernard.  For me, 
> it was just a matter of recognizing that there was enough new functionality 
> or bug fixes to warrant a new release.  At the time it was basically being 
> driven by the modular metric functionality.  The 3.1.2 release of Ganglia 
> basically finished off all of the functionality that I had in mind.  But I'm 
> sure that there is more that could be done in that area.  At one point we had 
> created a wish list which was published on the wiki site 
> http://sourceforge.net/apps/trac/ganglia/wiki/ganglia_wish-list.  I don't 
> think that these items were ever entered into bugzilla as enhancements and 
> I'm also not sure how accurate the list is anymore.  It should probably be 
> updated.   With all of the work that you and others have done recently, it 
> might be a good time to produce a 3.1.3 release.  You might want to take a 
> look at the Ganglia wiki page 
> http://sourceforge.net/apps/trac/ganglia/wiki/how_project_works under the 
> section "Release Manager and Additional Release Information" for an idea of 
> how it was done in the past.  Anyway, if you think it is time to release 
> 3.1.3, I'll support that.
>>> 
>>
>> I could help with the release when 3.1.3 is ready.
>>
>>   
> I'll obviously be willing to support those features that I've added, 
> particularly when any release candidate is issued and if any problems arise.
> 
> Don't release just yet though, there's probably one other change I'd 
> like to include for 3.1.3 - eliminating calls to sleep() in the gmetad 
> threads, and making the randomization more reliable for lower intervals 
> (making it a percentage of the interval rather than an absolute value).  
> Any preference for nanosleep or apr_sleep?  I notice that apr doesn't 
> appear to be used from within gmetad/*.c, and I wasn't sure why.

It would be nice to put gmond and gmetad fully on top of APR.  That would make 
cross platform compatibility much easier and probably eliminate the need for 
cygwin on Windows.  

BTW, whenever you think that 3.1.3 is ready, just volunteer to be the Release 
Manager and Bernard and I can help you through producing the release 
candidate(s) and final release.  It is actually very simple.

Brad


--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] bugzilla/roadmap/3.1.3?

2009-08-18 Thread Brad Nicholes

>>> On 8/18/2009 at 5:09 AM, in message <4a8a8bd8.4080...@pocock.com.au>, Daniel
Pocock  wrote:

> 
> Hi,
> 
> I've just been looking at Bugzilla to try and establish what is pending 
> for 3.1.3
> 
> I did a search for items that are blocking, critical or major, 12 items 
> found
> 
> Most of them appear to have been there for a while, despite several 
> other releases.  Some of them are no longer bugs.
> 
> Is some other mechanism being used to track the issues that must be 
> satisfied for the next release, e.g. 3.1.3?
> 

I'm not sure that there has been any definitive issue tracker for releases, at 
least not for the 3.1.x releases.  The road map going forward has been 
basically left up to the community.  For 3.1.0, .1 and .2 releases, I 
volunteered as the release manager with a lot of help from Bernard.  For me, it 
was just a matter of recognizing that there was enough new functionality or bug 
fixes to warrant a new release.  At the time it was basically being driven by 
the modular metric functionality.  The 3.1.2 release of Ganglia basically 
finished off all of the functionality that I had in mind.  But I'm sure that 
there is more that could be done in that area.  At one point we had created a 
wish list which was published on the wiki site 
http://sourceforge.net/apps/trac/ganglia/wiki/ganglia_wish-list.  I don't think 
that these items were ever entered into bugzilla as enhancements and I'm also 
not sure how accurate the list is anymore.  It should probably be updated.   
With all of the work that you and others have done recently, it might be a good 
time to produce a 3.1.3 release.  You might want to take a look at the Ganglia 
wiki page http://sourceforge.net/apps/trac/ganglia/wiki/how_project_works under 
the section "Release Manager and Additional Release Information" for an idea of 
how it was done in the past.  Anyway, if you think it is time to release 3.1.3, 
I'll support that.

Brad 

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] Ganglia Gmond Memory

2009-07-30 Thread Brad Nicholes

>>> On 7/30/2009 at  2:30 PM, in message
<669f1ab30907301330s2944e0cxa31c21fea1a5...@mail.gmail.com>, Mahendra Kutare
 wrote: 
> On Thu, Jul 30, 2009 at 12:25 PM, Brad Nicholes wrote:
> 
>> >>> On 7/30/2009 at 9:08 AM, in message
>> <669f1ab30907300808y67c403eev9a1653240c27c...@mail.gmail.com>, Mahendra
>> Kutare
>>  wrote:
>> > On Thu, Jul 30, 2009 at 10:31 AM, Brad Nicholes > >wrote:
>> >
>> >> >>> On 7/29/2009 at 11:23 PM, in message
>> >> <669f1ab30907292223t2734f551lc8d9b98201d7f...@mail.gmail.com>, Mahendra
>> >> Kutare
>> >>  wrote:
>> >> > Hi All,
>> >> >
>> >> > If I have configured gmond.conf with a udp_recv_channel with just a
>> port
>> >> > number will that configure ganglia gmond to listen on that particular
>> >> port
>> >> > any incoming data and thus making it essentially unicast communication
>> >> > channel ?
>> >> >
>> >>
>> >> Yes, specifying just a port will configure gmond's recv channel  in
>> unicast
>> >> mode
>> >>
>> >> > What happens if the sending side sends data every 1 sec will that be
>> >> > transferred immediately to gmond or it waits to collects some packets
>> of
>> >> > data and then delivers to gmond listening side ?
>> >> >
>> >> > I started sending some data from outside of gmond interface to gmond
>> >> which
>> >> > is configured as mentioned above to a udp_recv_channel on port 8108.
>> >> >
>> >> > Now even though the sending side is pushing data in every 1sec. I do
>> not
>> >> see
>> >> > gmond showing in debug mode on the console that its processing Ganglia
>> >> > message from sender side every 1 sec.
>> >> >
>> >> > Is it just the display part of the problem or ganglia does some
>> >> > sophisticated processing of incoming data i.e waiting for a message
>> size
>> >> > before delivering it ?
>> >> >
>> >>
>> >> How did you configure gmond to send data every 1 sec.?  Gmond sends its
>> >> data in collection groups and each collection group is configured with a
>> >> send time threshold.  At the very worst, the collection group will send
>> all
>> >> of the metric values within that group once the group's collection
>> threshold
>> >> has been exceeded.  In addition, each metric is assigned a value
>> threshold
>> >> which is a percent of change differential.  If any of the metrics within
>> the
>> >> collection group, differential change exceeds the value threshold, the
>> >> entire group of metrics is immediately sent.  So even though a
>> collection
>> >> group is set to collect every 1 second, that doesn't mean that the
>> metrics
>> >> are sent every 1 second.  Also, by default the rrd files are configured
>> by
>> >> gmetad to store metrics at an interval of every 15 seconds.  So even if
>> the
>> >> metrics were sent every 1 second, you will still only be seeing 15
>> second
>> >> averages in the front end.
>> >>
>> >
>> > Thanks Brad.  I am trying to do it to understand the ganglia protocol and
>> > this helps.
>> > Right now its fine with me even if Gmetad sees only 15 seconds average in
>> > frontend as you described.
>> >
>> > So as I see there are other configuration in collection groups such as -
>> >
>> > 1. collect_once and collect_every
>> >
>> > I understand that collect_once with make some collection to be collected
>> > only once and just send it other gmond every time_threshold.
>> > Also, If I am not wrong If  I configured collect_every = 20 and
>> > time_threshold=90, gmond will collect every 20 sec and send every 90 sec
>> to
>> > other gmond.
>> >
>>
>> Under normal circumstances it will send every 90 seconds but if one of the
>> metric value_thresholds has been exceeded, the entire collection group will
>> be sent immediately.  The purpose for this is to make sure that
>> abnormalities or spikes are caught and reported.
>>
>> > Now the part I am not clear is if I am collecting more frequently than I
>> am
>> > sending does that mean we are keeping more in memory ? I mean say after
>> > first

Re: [Ganglia-developers] Ganglia Gmond

2009-07-30 Thread Brad Nicholes

>>> On 7/30/2009 at 9:08 AM, in message
<669f1ab30907300808y67c403eev9a1653240c27c...@mail.gmail.com>, Mahendra Kutare
 wrote:
> On Thu, Jul 30, 2009 at 10:31 AM, Brad Nicholes wrote:
> 
>> >>> On 7/29/2009 at 11:23 PM, in message
>> <669f1ab30907292223t2734f551lc8d9b98201d7f...@mail.gmail.com>, Mahendra
>> Kutare
>>  wrote:
>> > Hi All,
>> >
>> > If I have configured gmond.conf with a udp_recv_channel with just a port
>> > number will that configure ganglia gmond to listen on that particular
>> port
>> > any incoming data and thus making it essentially unicast communication
>> > channel ?
>> >
>>
>> Yes, specifying just a port will configure gmond's recv channel  in unicast
>> mode
>>
>> > What happens if the sending side sends data every 1 sec will that be
>> > transferred immediately to gmond or it waits to collects some packets of
>> > data and then delivers to gmond listening side ?
>> >
>> > I started sending some data from outside of gmond interface to gmond
>> which
>> > is configured as mentioned above to a udp_recv_channel on port 8108.
>> >
>> > Now even though the sending side is pushing data in every 1sec. I do not
>> see
>> > gmond showing in debug mode on the console that its processing Ganglia
>> > message from sender side every 1 sec.
>> >
>> > Is it just the display part of the problem or ganglia does some
>> > sophisticated processing of incoming data i.e waiting for a message size
>> > before delivering it ?
>> >
>>
>> How did you configure gmond to send data every 1 sec.?  Gmond sends its
>> data in collection groups and each collection group is configured with a
>> send time threshold.  At the very worst, the collection group will send all
>> of the metric values within that group once the group's collection threshold
>> has been exceeded.  In addition, each metric is assigned a value threshold
>> which is a percent of change differential.  If any of the metrics within the
>> collection group, differential change exceeds the value threshold, the
>> entire group of metrics is immediately sent.  So even though a collection
>> group is set to collect every 1 second, that doesn't mean that the metrics
>> are sent every 1 second.  Also, by default the rrd files are configured by
>> gmetad to store metrics at an interval of every 15 seconds.  So even if the
>> metrics were sent every 1 second, you will still only be seeing 15 second
>> averages in the front end.
>>
> 
> Thanks Brad.  I am trying to do it to understand the ganglia protocol and
> this helps.
> Right now its fine with me even if Gmetad sees only 15 seconds average in
> frontend as you described.
> 
> So as I see there are other configuration in collection groups such as -
> 
> 1. collect_once and collect_every
> 
> I understand that collect_once with make some collection to be collected
> only once and just send it other gmond every time_threshold.
> Also, If I am not wrong If  I configured collect_every = 20 and
> time_threshold=90, gmond will collect every 20 sec and send every 90 sec to
> other gmond.
> 

Under normal circumstances it will send every 90 seconds but if one of the 
metric value_thresholds has been exceeded, the entire collection group will be 
sent immediately.  The purpose for this is to make sure that abnormalities or 
spikes are caught and reported.

> Now the part I am not clear is if I am collecting more frequently than I am
> sending does that mean we are keeping more in memory ? I mean say after
> first occurance of collect in 20 sec if I am not sending it across to gmonds
> am I just keeping it in memory hash ? If not, whats the behaviour ?
> 

No, if you are collecting every 20 seconds but the collection group is only 
sending every 90 seconds, the only metric that is sent or reported is the last 
metric collected with the 90 second interval.  This is the purpose of the 
metric value_threshold.  If for example, you collected a metric 4 times within 
a 90 second period and the delta between each collected metric value only 
varied by 5 percent, storing and reporting each of the metrics would just end 
up being noise on the wire because the percent of change between the values is 
insignificant.  So just sending the last metric collected in this case is good 
enough.  However if the metric saw a spike within the 90 second period but then 
immediately dropped back to normal, you want to make sure that the metric spike 
is sent and recorded so gmond sends it immediately.

> 2. What does this configuation  *cleanup_threshold* = 300 /*secs *

Re: [Ganglia-developers] Ganglia Gmond

2009-07-30 Thread Brad Nicholes

>>> On 7/29/2009 at 11:23 PM, in message
<669f1ab30907292223t2734f551lc8d9b98201d7f...@mail.gmail.com>, Mahendra Kutare
 wrote:
> Hi All,
> 
> If I have configured gmond.conf with a udp_recv_channel with just a port
> number will that configure ganglia gmond to listen on that particular port
> any incoming data and thus making it essentially unicast communication
> channel ?
> 

Yes, specifying just a port will configure gmond's recv channel  in unicast mode

> What happens if the sending side sends data every 1 sec will that be
> transferred immediately to gmond or it waits to collects some packets of
> data and then delivers to gmond listening side ?
> 
> I started sending some data from outside of gmond interface to gmond which
> is configured as mentioned above to a udp_recv_channel on port 8108.
> 
> Now even though the sending side is pushing data in every 1sec. I do not see
> gmond showing in debug mode on the console that its processing Ganglia
> message from sender side every 1 sec.
> 
> Is it just the display part of the problem or ganglia does some
> sophisticated processing of incoming data i.e waiting for a message size
> before delivering it ?
> 

How did you configure gmond to send data every 1 sec.?  Gmond sends its data in 
collection groups and each collection group is configured with a send time 
threshold.  At the very worst, the collection group will send all of the metric 
values within that group once the group's collection threshold has been 
exceeded.  In addition, each metric is assigned a value threshold which is a 
percent of change differential.  If any of the metrics within the collection 
group, differential change exceeds the value threshold, the entire group of 
metrics is immediately sent.  So even though a collection group is set to 
collect every 1 second, that doesn't mean that the metrics are sent every 1 
second.  Also, by default the rrd files are configured by gmetad to store 
metrics at an interval of every 15 seconds.  So even if the metrics were sent 
every 1 second, you will still only be seeing 15 second averages in the front 
end.

Brad

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] Code pointing to metrics collection->call backhandler->pushing to channel

2009-07-29 Thread Brad Nicholes

>>> On 7/29/2009 at 2:17 PM, in message
<669f1ab30907291317h4162d8c5m3dfd008d4187b...@mail.gmail.com>, Mahendra Kutare
 wrote:
> Hi,
> 
> Can someone point me to the ganglia code files where the core metrics
> collection happens , which then initiates call to call back handler, stores
> the data in memory hash or sends it to the wire to say gmetad ?
> 
> Thanks
> Mahendra

All of that is done in the main gmond.c file.

Brad


--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] mod_gstatus disabled?

2009-07-28 Thread Brad Nicholes

>>> On 7/28/2009 at 9:32 AM, in message <4a6f1a0a.7070...@pocock.com.au>, Daniel
Pocock  wrote:
> Brad Nicholes wrote:
>>>>> On 7/28/2009 at 7:27 AM, in message <4a6efcac.4060...@pocock.com.au>, 
>>>>> Daniel
>>>>> 
>> Pocock  wrote:
>>
>>   
>>> I noticed a few things about mod_gstatus:
>>>
>>> - the spec file doesn't include it at all, and deliberately removes the 
>>> config file for it
>>>
>>> - gmond/modules/Makefile.am excludes it from static builds
>>>
>>> Given that Ganglia is modular, is there a good reason for not having 
>>> this module in the RPM along with all the other modules?
>>>
>>> I successfully compiled it on Cygwin (static build), so is there also a 
>>> reason for not having it on static builds, or in other words, does 
>>> anyone object if I tweak Makefile.am so it will be in the static build 
>>> from now on?
>>>
>>> Also, I'm adding some extra metrics to mod_gstatus - for instance, a 
>>> string metric with the Ganglia version - does this seem like the best 
>>> place to add this?
>>>
>>>
>>> 
>>
>> The only reason for removing it from the RPM and static builds is basically 
> due to its likely usefulness to the general user.  When I wrote mod_gstatus 
> it was mainly for debugging purposes.  I needed something that would monitor 
> the XDR packets that were being sent between the gmond nodes and using 
> ganglia to monitor itself seemed like the most obvious idea.  If the 
> community thinks that mod_gstatus would be generally useful, I don't have a 
> problem with including it as a standard module.
>>   
> My only concern about enabling it by default is the config file - all 
> the metrics should probably be commented out, and people can uncomment 
> them if needed.
> 
> It is probably quite useful for a UDP collector that is under heavy 
> load, and I also think it is a good place to put things like a string 
> metric reporting the package version.  That's probably something that 
> could be enabled by default on any node.
> 
> 

I'm OK with that as long as the community thinks it is useful.  I just didn't 
want to load up extra modules and use up more memory if the metrics don't mean 
anything to the end user.


--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] mod_gstatus disabled?

2009-07-28 Thread Brad Nicholes

>>> On 7/28/2009 at 7:27 AM, in message <4a6efcac.4060...@pocock.com.au>, Daniel
Pocock  wrote:

> 
> 
> I noticed a few things about mod_gstatus:
> 
> - the spec file doesn't include it at all, and deliberately removes the 
> config file for it
> 
> - gmond/modules/Makefile.am excludes it from static builds
> 
> Given that Ganglia is modular, is there a good reason for not having 
> this module in the RPM along with all the other modules?
> 
> I successfully compiled it on Cygwin (static build), so is there also a 
> reason for not having it on static builds, or in other words, does 
> anyone object if I tweak Makefile.am so it will be in the static build 
> from now on?
> 
> Also, I'm adding some extra metrics to mod_gstatus - for instance, a 
> string metric with the Ganglia version - does this seem like the best 
> place to add this?
> 
> 

The only reason for removing it from the RPM and static builds is basically due 
to its likely usefulness to the general user.  When I wrote mod_gstatus it was 
mainly for debugging purposes.  I needed something that would monitor the XDR 
packets that were being sent between the gmond nodes and using ganglia to 
monitor itself seemed like the most obvious idea.  If the community thinks that 
mod_gstatus would be generally useful, I don't have a problem with including it 
as a standard module.

Brad


--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] backports

2009-07-28 Thread Brad Nicholes

>>> On 7/28/2009 at 7:03 AM, in message <4a6ef728.2080...@pocock.com.au>, Daniel
Pocock  wrote:

> 
> 
> Is it preferred to raise backport proposals for 3.1 all in a single 
> email, or start a separate thread for each?
> 
> I've just fixed bug 237, this is an essential backport I believe, as it 
> fixes a seg fault/coding error. (trunk r2006)
> 
> My fix for bug 232 is also backwards compatible and safe to backport 
> now, thanks to a configuration option that allows people to decide when 
> they want to adopt the new behavior.  I believe that backporting this to 
> 3.1 provides people with the opportunity to migrate to lowercase 
> hostname directories independently of when they migrate to 3.2 or 
> trunk.  (trunk r2004 and r2005 contain this fix)
> 

IMO, starting separate threads would be preferable.  That way if we have to go 
back into the email archives to review any discussion, it will be easier to 
find.  Also, it is a good idea to put the backport into the STATUS file for the 
3.1 branch so that we can track what has been backported.  For an explanation 
of how to use the STATUS file, please see 
http://ganglia.wiki.sourceforge.net/how_project_works .  Describing and noting 
the backport in the STATUS file just makes it easier for us to compile a list 
of what changed when we do the next release.  Make sure that you follow the 
backporting guidelines that are posted on the wiki with the most important 
guideline being "don't break backward compatibility" :)  

Brad

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] Replacing core metrics with Python metricmodules

2009-07-22 Thread Brad Nicholes

>>> On 7/22/2009 at 3:02 PM, in message
<20090722210201.gm14...@alcatraz.americas.sgi.com>, Martin Hicks 
wrote:

> I have a situation where there is already a mechanism that is collecting
> metrics on a compute host in a cluster (Performance Co-Pilot) and
> pushing them up to the head node.
> 
> I was wondering if is possible to write a Python metric module that
> could replace the core set of metrics that gmond usually collects on the
> compute node, and instead grab the data from PCP that is running on the
> head node.
> 
> Are there any real differences between the metrics that are normally
> collected by gmond, and those user-defined metrics collected by a Python
> module?
> 
> The goal is to not have to double collect these metrics on each compute
> host.
> 

There isn't any difference other than the fact that the core metrics are 
implemented in C based modules rather than python.  If you wanted to replace 
some of the core metrics, you could do it by simply not loading the metric 
module that implements the set of core metrics that you want to replace.  Then 
reimplement the core metrics with the same metric names and definitions but as 
python modules.  You can tell which metric module loads which metrics by 
starting gmond with a -m parameter.  With the -m parameter, gmond will list all 
of the metrics that it will collect along with the module that implements the 
collection.

Brad

--
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] fix for bug 232

2009-07-16 Thread Brad Nicholes

>>> On 7/16/2009 at 9:30 AM, in message <4a5f47a1.3020...@pocock.com.au>, Daniel
Pocock  wrote:
> Brad Nicholes wrote:
>>>>> On 7/16/2009 at 9:10 AM, in message <4a5f42d8.9060...@pocock.com.au>, 
>>>>> Daniel
>>>>> 
>> Pocock  wrote:
>>   
>>> Brad Nicholes wrote:
>>> 
>>>>>>> On 7/16/2009 at 8:07 AM, in message <4a5f3430.20...@pocock.com.au>, 
>>>>>>> Daniel
>>>>>>> 
>>>>>>> 
>>>> Pocock  wrote:
>>>>
>>>>   
>>>>   
>>>>> I tried to attach this solution to the bug report, but I get this error:
>>>>>
>>>>> You did not enter a valid attachment number.
>>>>>
>>>>>
>>>>> Anyhow, this is a solution for bug 232:
>>>>>
>>>>> http://bugzilla.ganglia.info/cgi-bin/bugzilla/show_bug.cgi?id=232 
>>>>>
>>>>> As a consequence of applying this patch:
>>>>> - whenever an RRD is created/updated, the hostname directory name will 
>>>>> be converted to lowercase
>>>>> - any capitalization can be used with the `h' parameter to the web 
>>>>> interface
>>>>> - whenever gmetad receives a hostname in the XML, it will use a 
>>>>> non-case-sensitive comparison to decide if it already has data for that 
>>>>> host
>>>>> - the XML emitted by gmetad will show the capitalization that was 
>>>>> received in the XML, not the lowercase version
>>>>>
>>>>> Anyone applying this patch needs to rename all their hostname 
>>>>> directories to lowercase.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Daniel
>>>>> 
>>>>> 
>>>> This patch seems reasonable to me.  The only part that bothers me is the 
>>>>   
>>> fact that an upgrade from a previous version might break existing installs 
>>> unless they rename all of their rrd directories.  That could be a problem 
> for 
>>> some users that have a large number of monitored boxes.
>>> 
>>>>   
>>>>   
>>> Maybe we just make it a part of trunk and the 3.2 release?  People 
>>> (should) look more closely at the readme file when going from 3.1 to 3.2.
>>>
>>> 
>>
>> I haven't actually tested the patch yet, but I'm OK with just putting it in 
> trunk and not backporting it to 3.1.  There was a big change between 3.0 and 
> 3.1.  I would expect that there would be some incompatible changes between 
> 3.1 and 3.2 as well.  I also think that when 3.2 is released, we should also 
> have a helper script like hawson suggested in our /contrib repository.  We 
> just need to make sure that this in doc'ed somewhere so that when 3.2 is 
> released, we can include the doc in the upgrade notes.
>>
>>
>>   
> One way to make it non-disruptive for 3.1 would be making this new 
> behavior configurable (as I suggested in the bug) - is it worth the 
> extra effort of adding a config option for this, or is 3.2 intended to 
> be released in the new future?
> 

There isn't a planned date for a 3.2 release so far.  I'm not sure we have any 
new functionality that is significant enough to call for a 3.2 release yet.


> Here's something that can be used as the basis for the helper script 
> and/or the %post section of the spec file:
> 
> killall gmetad
> cd $RRDROOT
> find . -type d -name '*[A-Z]*' ! -name __SummaryInfo__ -mindepth 2
> -maxdepth 2 | while read ;
> do
>   OLD_NAME=`echo "$REPLY" | cut -f3 -d/`
>   NEW_NAME=`echo "$OLD_NAME" | tr [A-Z] [a-z]`
>   CLUSTER_NAME=`echo "$REPLY" | cut -f2 -d/`
>   echo mv "$REPLY" "${CLUSTER_NAME}/${NEW_NAME}"
>   #mv "$REPLY" "${CLUSTER_NAME}/${NEW_NAME}"
> done

Sounds good, add it to the patch.  :)

Brad


--
Enter the BlackBerry Developer Challenge  
This is your chance to win up to $100,000 in prizes! For a limited time, 
vendors submitting new applications to BlackBerry App World(TM) will have
the opportunity to enter the BlackBerry Developer Challenge. See full prize  
details at: http://p.sf.net/sfu/Challenge
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] fix for bug 232

2009-07-16 Thread Brad Nicholes

>>> On 7/16/2009 at 9:10 AM, in message <4a5f42d8.9060...@pocock.com.au>, Daniel
Pocock  wrote:
> Brad Nicholes wrote:
>>>>> On 7/16/2009 at 8:07 AM, in message <4a5f3430.20...@pocock.com.au>, Daniel
>>>>> 
>> Pocock  wrote:
>>
>>   
>>> I tried to attach this solution to the bug report, but I get this error:
>>>
>>> You did not enter a valid attachment number.
>>>
>>>
>>> Anyhow, this is a solution for bug 232:
>>>
>>> http://bugzilla.ganglia.info/cgi-bin/bugzilla/show_bug.cgi?id=232 
>>>
>>> As a consequence of applying this patch:
>>> - whenever an RRD is created/updated, the hostname directory name will 
>>> be converted to lowercase
>>> - any capitalization can be used with the `h' parameter to the web interface
>>> - whenever gmetad receives a hostname in the XML, it will use a 
>>> non-case-sensitive comparison to decide if it already has data for that host
>>> - the XML emitted by gmetad will show the capitalization that was 
>>> received in the XML, not the lowercase version
>>>
>>> Anyone applying this patch needs to rename all their hostname 
>>> directories to lowercase.
>>>
>>> Regards,
>>>
>>> Daniel
>>> 
>>
>> This patch seems reasonable to me.  The only part that bothers me is the 
> fact that an upgrade from a previous version might break existing installs 
> unless they rename all of their rrd directories.  That could be a problem for 
> some users that have a large number of monitored boxes.
>>   
> Maybe we just make it a part of trunk and the 3.2 release?  People 
> (should) look more closely at the readme file when going from 3.1 to 3.2.
> 

I haven't actually tested the patch yet, but I'm OK with just putting it in 
trunk and not backporting it to 3.1.  There was a big change between 3.0 and 
3.1.  I would expect that there would be some incompatible changes between 3.1 
and 3.2 as well.  I also think that when 3.2 is released, we should also have a 
helper script like hawson suggested in our /contrib repository.  We just need 
to make sure that this in doc'ed somewhere so that when 3.2 is released, we can 
include the doc in the upgrade notes.


> Or maybe someone can suggest a clever way to handle existing installs?  
> However it is done, it would involve scanning all the cluster 
> directories, I'm not sure I would want gmetad to do that every time it 
> starts.
> 
>> BTW, I attached the patch file to the bug
>>
>>   
> I tried to do it from iceweasel (Firefox on Debian lenny) - any idea why 
> it worked for you and not me?
> 
> Incidentally, I often find that when I'm in the bug tracking system, I 
> have to re-enter my password each time I submit a bug.

I'm not sure why you would have had problem attaching the patch file.  Bernard 
knows a lot more about our Bugzilla system than I do.  Maybe he has some idea.

Brad


--
Enter the BlackBerry Developer Challenge  
This is your chance to win up to $100,000 in prizes! For a limited time, 
vendors submitting new applications to BlackBerry App World(TM) will have
the opportunity to enter the BlackBerry Developer Challenge. See full prize  
details at: http://p.sf.net/sfu/Challenge
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] fix for bug 232

2009-07-16 Thread Brad Nicholes

>>> On 7/16/2009 at 8:07 AM, in message <4a5f3430.20...@pocock.com.au>, Daniel
Pocock  wrote:

> 
> I tried to attach this solution to the bug report, but I get this error:
> 
> You did not enter a valid attachment number.
> 
> 
> Anyhow, this is a solution for bug 232:
> 
> http://bugzilla.ganglia.info/cgi-bin/bugzilla/show_bug.cgi?id=232 
> 
> As a consequence of applying this patch:
> - whenever an RRD is created/updated, the hostname directory name will 
> be converted to lowercase
> - any capitalization can be used with the `h' parameter to the web interface
> - whenever gmetad receives a hostname in the XML, it will use a 
> non-case-sensitive comparison to decide if it already has data for that host
> - the XML emitted by gmetad will show the capitalization that was 
> received in the XML, not the lowercase version
> 
> Anyone applying this patch needs to rename all their hostname 
> directories to lowercase.
> 
> Regards,
> 
> Daniel

This patch seems reasonable to me.  The only part that bothers me is the fact 
that an upgrade from a previous version might break existing installs unless 
they rename all of their rrd directories.  That could be a problem for some 
users that have a large number of monitored boxes.

BTW, I attached the patch file to the bug

Brad


--
Enter the BlackBerry Developer Challenge  
This is your chance to win up to $100,000 in prizes! For a limited time, 
vendors submitting new applications to BlackBerry App World(TM) will have
the opportunity to enter the BlackBerry Developer Challenge. See full prize  
details at: http://p.sf.net/sfu/Challenge
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] [Ganglia-svn] SF.net SVN: ganglia:[2001]tags/monitor-core-3.1.2/STATUS

2009-07-14 Thread Brad Nicholes

>>> On 7/9/2009 at 9:21 AM, in message
, 
wrote:
> Revision: 2001
>   http://ganglia.svn.sourceforge.net/ganglia/?rev=2001&view=rev 
> Author:   hawson
> Date: 2009-07-09 15:21:20 + (Thu, 09 Jul 2009)
> 
> Log Message:
> ---
> Add backport proposal for r2000
> 
> Modified Paths:
> --
> tags/monitor-core-3.1.2/STATUS
> 
Hawson,
This shouldn't be committed to the STATUS file in the tag, it should be 
commit to the STATUS file in the 3.1 branch.  The tag should reflect the source 
code of the 3.1.2 release (in this case).  

BTW, I am +1 for this backport as well.

Brad

> Modified: tags/monitor-core-3.1.2/STATUS
> ===
> --- tags/monitor-core-3.1.2/STATUS2009-07-09 15:12:35 UTC (rev 2000)
> +++ tags/monitor-core-3.1.2/STATUS2009-07-09 15:21:20 UTC (rev 2001)
> @@ -162,6 +162,12 @@
>  +1: bernardli, carenas
>  carenas: patched generated files; cleanup to follow
>  
> +
> +  * Force null termination in string metrics for python modules.
> +http://ganglia.svn.sourceforge.net/viewvc/ganglia?view=rev&revision=2000 
> ++1: hawson
> +
> +
>  BACKPORT PROPOSALS FUTURE VERSION:
>These proposals are too unstabilizing for the current version, incomplete 
> or
>haven't been tested enough to be considered for a stable release.
> 
> 
> This was sent by the SourceForge.net collaborative development platform, the 
> world's largest Open Source development site.
> 
> --
> Enter the BlackBerry Developer Challenge  
> This is your chance to win up to $100,000 in prizes! For a limited time, 
> vendors submitting new applications to BlackBerry App World(TM) will have
> the opportunity to enter the BlackBerry Developer Challenge. See full prize  
> 
> details at: http://p.sf.net/sfu/Challenge 
> ___
> Ganglia-svn mailing list
> ganglia-...@lists.sourceforge.net 
> https://lists.sourceforge.net/lists/listinfo/ganglia-svn 




--
Enter the BlackBerry Developer Challenge  
This is your chance to win up to $100,000 in prizes! For a limited time, 
vendors submitting new applications to BlackBerry App World(TM) will have
the opportunity to enter the BlackBerry Developer Challenge. See full prize  
details at: http://p.sf.net/sfu/Challenge
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] Patch for multithread gmond

2009-07-13 Thread Brad Nicholes

>>> On 7/13/2009 at 8:17 AM, in message
, utopia zh
 wrote:
> Hi,
> 
> While trying to use gmond to monitor our applications, we found some issues:
> - Metric collecting may take long time to finish, such examples
> include collecting master/slave status from LDAP, parsing web pages to
> get statistics.
> - Receiving metrics updating from other node may fail due to blocking
> metric collecting.
> 
> To avoid this blocking behavior, we tried to change gmond to multithread:
> - Main thread to collect metrics, we also have the metric collection
> python script changed to be non-blocking
> - Dedicate thread to receive updates from other nodes.
> - Dedicated thread to server clients (gmetad/telnet, etc).
> 
> Could you help review the patch? We'd like to contribute the patch to
> the community if multi-thread is a generally required features.
> 
> Any comments will be appreciated. Thanks.
> 
> p.s. Just to be curious, why ganglia was changed from multi-thread
> into single thread from 2..5.7 to 3.0?
> 
> 
> Cheers,
> Hang

I haven't reviewed the patch yet but in most cases this issue can be solved 
within the python module itself.  The tcpconn.py module is an example of this.  
Basically what it does is create it own thread that simply gathers the metrics 
on some module defined interval.  This thread updates a metrics cache within 
the module itself.  Then whenever gmond queries the module for its metrics, the 
metric values are simply read from the module's metric cache rather than 
actually performing the gathering process on the main gmond thread.  This 
method ensures that nothing is blocking the main gmond thread when querying the 
data from any module that might take a lot of time actually acquiring the 
metric value from the metric source.

I haven't looked that closely at the 2.5.x code so I have no idea what level of 
threading it may have supported or why it might have been changed in 3.0.  If 
multithreading gmond 3.1.x is something that would increase the performance of 
gmond, then I am sure that this patch should be something we should add to the 
code base.

thanks,
Brad

--
Enter the BlackBerry Developer Challenge  
This is your chance to win up to $100,000 in prizes! For a limited time, 
vendors submitting new applications to BlackBerry App World(TM) will have
the opportunity to enter the BlackBerry Developer Challenge. See full prize  
details at: http://p.sf.net/sfu/Challenge
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] Disk IO as gmond core metric

2009-07-10 Thread Brad Nicholes

>>> On 7/10/2009 at 2:41 PM, in message 
>>> <20090710204117.gl10...@pi941c2n1.ms.com>,
JB Kim  wrote:
> On 07/10/09 14:07:14, Brad Nicholes wrote:
>> >>> On 7/9/2009 at  5:43 PM, in message
>> <8121824c0907091643od6832c5y3c4ffa37696e4...@mail.gmail.com>, JB Kim
>>  wrote: 
>> > Ok I've isolated iostat code into its own module and managed to get
>> > the whole autoconf/automake work.
>> > 
>> > http://www.remnantone.com/pkgs/ganglia/modiostat.tar.gz 
>> > 
>> > Provided that ganglia 3.1.x is already installed, it should just be a
>> > matter of running ./configure & make.
>> > 
>> > Also, here's my attempt at making the independent DSO build/deployment 
>> > template.
>> > Within autoconf file, I've retained much of the standard checks that
>> > ganglia does. In addition, it will
>> > check for existing ganglia installation (libganglia.so and headers)
>> > along with other necessary libs such as apr,confuse,expat.
>> > I've also provided setup.sh script in the template to make it simpler
>> > to deploy. Hopefully it will be useful for folks
>> > trying to write their own DSO modules.
>> > 
>> > http://www.remnantone.com/pkgs/ganglia/dso_template.tar.gz 
>> > 
>> > Here's the readme doc on how to use this build template:
>> > 
>> > http://www.remnantone.com/pkgs/ganglia/README_DSO 
>> > 
>> > Let me know how it works out..
>> > 
>> 
>> I downloaded the tarball and gave it a try.  Everything built and loaded as 
> expected but I am not seeing any metrics.  I'm not exactly sure why.  I 
> tested this on a SLED 10.2 box so I don't know if that has anything to do 
> with it.  Others may want to download the tarball and give it a try.
> 
> Hurm... I tested this on ubuntu 8.10 (or something).
> I have #ifdef LINUX on that module, so it would compile on platforms that 
> doesn't have that config macro defined, but report just zeros...
> 

That is what I am seeing, just zeros.  Like I say, I haven't spent the time to 
try to debug it.  I was more concerned with making sure that it built and 
installed.

Brad


--
Enter the BlackBerry Developer Challenge  
This is your chance to win up to $100,000 in prizes! For a limited time, 
vendors submitting new applications to BlackBerry App World(TM) will have
the opportunity to enter the BlackBerry Developer Challenge. See full prize  
details at: http://p.sf.net/sfu/Challenge
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] Disk IO as gmond core metric

2009-07-10 Thread Brad Nicholes

>>> On 7/10/2009 at 2:23 PM, in message
, Jesse Becker
 wrote:
> On Fri, Jul 10, 2009 at 14:07, Brad Nicholes wrote:
> 
>> Anyway everything else worked just fine.  I have a couple of suggestion. You 
> should probably include a sample .conf file so that the user doesn't have to 
> go figure everything out like all of the module information.  I have attached 
> the one that I used.  This way they can just drop the .conf file in the 
> conf.d directory and everything just works.  You also might want to update 
> the COPYING and INSTALL files to reflect the current state of things.  The 
> INSTALL file should contain information about building and installing the 
> module.
> 
> Would it be possible, in the future--I know you can't do this now--to
> allow for module configuration directly in gmond.conf?
> 

You can already do module configuration directly in gmond.conf.  That is where 
the base metric modules are still being configured.  All you do is just put the 
configuration in gmond.confn rather than in a separate .conf file.  Using a 
separate .conf file just means that you can just copy the .conf file to a 
conf.d directory and then restart gmond without having to edit anything.  Then 
removing the module is simply a matter of deleting or renaming the 
corresponding .conf and then restarting gmond.conf.

Brad



--
Enter the BlackBerry Developer Challenge  
This is your chance to win up to $100,000 in prizes! For a limited time, 
vendors submitting new applications to BlackBerry App World(TM) will have
the opportunity to enter the BlackBerry Developer Challenge. See full prize  
details at: http://p.sf.net/sfu/Challenge
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] Disk IO as gmond core metric

2009-07-10 Thread Brad Nicholes

>>> On 7/9/2009 at  5:43 PM, in message
<8121824c0907091643od6832c5y3c4ffa37696e4...@mail.gmail.com>, JB Kim
 wrote: 
> Ok I've isolated iostat code into its own module and managed to get
> the whole autoconf/automake work.
> 
> http://www.remnantone.com/pkgs/ganglia/modiostat.tar.gz
> 
> Provided that ganglia 3.1.x is already installed, it should just be a
> matter of running ./configure & make.
> 
> Also, here's my attempt at making the independent DSO build/deployment 
> template.
> Within autoconf file, I've retained much of the standard checks that
> ganglia does. In addition, it will
> check for existing ganglia installation (libganglia.so and headers)
> along with other necessary libs such as apr,confuse,expat.
> I've also provided setup.sh script in the template to make it simpler
> to deploy. Hopefully it will be useful for folks
> trying to write their own DSO modules.
> 
> http://www.remnantone.com/pkgs/ganglia/dso_template.tar.gz
> 
> Here's the readme doc on how to use this build template:
> 
> http://www.remnantone.com/pkgs/ganglia/README_DSO
> 
> Let me know how it works out..
> 

I downloaded the tarball and gave it a try.  Everything built and loaded as 
expected but I am not seeing any metrics.  I'm not exactly sure why.  I tested 
this on a SLED 10.2 box so I don't know if that has anything to do with it.  
Others may want to download the tarball and give it a try.

Anyway everything else worked just fine.  I have a couple of suggestion. You 
should probably include a sample .conf file so that the user doesn't have to go 
figure everything out like all of the module information.  I have attached the 
one that I used.  This way they can just drop the .conf file in the conf.d 
directory and everything just works.  You also might want to update the COPYING 
and INSTALL files to reflect the current state of things.  The INSTALL file 
should contain information about building and installing the module.

I haven't looked at the dso_template.tar.gz file yet but I am thinking that we 
should add this to the wiki and use the text from the your README.DSO file to 
explain how to use it.

Bernard, Jesse, what do you think?

Brad

modiostat.conf
Description: Binary data
--
Enter the BlackBerry Developer Challenge  
This is your chance to win up to $100,000 in prizes! For a limited time, 
vendors submitting new applications to BlackBerry App World(TM) will have
the opportunity to enter the BlackBerry Developer Challenge. See full prize  
details at: http://p.sf.net/sfu/Challenge___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

[Ganglia-developers] Metric modules for Perl and Ruby (was:Re: Disk IO as gmond core metric)

2009-07-10 Thread Brad Nicholes

>>> On 7/9/2009 at 5:43 PM, in message
<8121824c0907091643od6832c5y3c4ffa37696e4...@mail.gmail.com>, JB Kim
 wrote:
> 
> Lastly, is anyone already working on a perl equivalent module of mod_python?
> With the 3.1.x gmond framework, it would be definitely possible to
> further extend DSO functionality
> by running embedded interpreters like perl and R.
> 
> 

Not that I know of but this is something that we have been talking about since 
the introduction of mod_python.  In fact one of the reasons why the python 
interface was embedded this way was to allow for other interpreters to do the 
same.  The intention was to someday write a mod_perl, mod_ruby, mod_ 
in order to support other languages.

Brad


--
Enter the BlackBerry Developer Challenge  
This is your chance to win up to $100,000 in prizes! For a limited time, 
vendors submitting new applications to BlackBerry App World(TM) will have
the opportunity to enter the BlackBerry Developer Challenge. See full prize  
details at: http://p.sf.net/sfu/Challenge
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] PATCH: ensure string metrics are nullterminated for python-based user metrics

2009-07-09 Thread Brad Nicholes

>>> On 7/9/2009 at 12:16 PM, in message
, Greg Bruno
 wrote:
> On Thu, Jul 9, 2009 at 10:52 AM, Brad Nicholes wrote:
>>
>> One of the new feature in Ganglia 3.1 is the ability to add extra data to 
> the metric definition that is passed on with the metric metadata.  Would 
> anything like that help you?
> 
> 
> yes, that may help.
> 
> do you have an example of how add extra data from a user-defined python 
> metric?

Yes, the disk python module under gmond/python_modules/disk adds a 'mount' 
extra data to the definition.  Basically in a python module, any extra 
properties that you add to the metric definition dictionary will show up as 
extra data in the metadata for the metric.  Of course it is up to you to modify 
the web front end to take advantage of the extra metadata.  You should be able 
to see in the front end PHP code where extra data like 'Group' and 'Title' are 
being used.

Brad

--
Enter the BlackBerry Developer Challenge  
This is your chance to win up to $100,000 in prizes! For a limited time, 
vendors submitting new applications to BlackBerry App World(TM) will have
the opportunity to enter the BlackBerry Developer Challenge. See full prize  
details at: http://p.sf.net/sfu/Challenge
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] PATCH: ensure string metrics arenull terminated for python-based user metrics

2009-07-09 Thread Brad Nicholes

>>> On 7/9/2009 at 9:27 AM, in message <4a560c5a.2090...@mail.nih.gov>, Jesse
Becker  wrote:
> Greg Bruno wrote:
>> also, regarding MAX_G_STRING_SIZE, would it be possible to increase it
>> in future releases? i've currently set it to 128 in
>> include/gm_value.h.
> 
> I don't object, although I'm not the most familiar with this part of the 
> code. 
>   I'm curious:  what metrics are you trying to use that are that long?
> 

I would have to verify this, but I think making MAX_G_STRING_SIZE larger would 
affect the XDR packets.  Any change to the XDR packets would cause gmond to be 
incompatible with previous versions.  It would probably have to wait for the 
next major Ganglia release.

Brad


--
Enter the BlackBerry Developer Challenge  
This is your chance to win up to $100,000 in prizes! For a limited time, 
vendors submitting new applications to BlackBerry App World(TM) will have
the opportunity to enter the BlackBerry Developer Challenge. See full prize  
details at: http://p.sf.net/sfu/Challenge
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] PATCH: ensure string metrics are nullterminated for python-based user metrics

2009-07-09 Thread Brad Nicholes

>>> On 7/9/2009 at 10:03 AM, in message
, Greg Bruno
 wrote:
> On Thu, Jul 9, 2009 at 8:27 AM, Jesse Becker wrote:
>>
>>> also, regarding MAX_G_STRING_SIZE, would it be possible to increase it
>>> in future releases? i've currently set it to 128 in
>>> include/gm_value.h.
>>
>> I don't object, although I'm not the most familiar with this part of the
>> code.  I'm curious:  what metrics are you trying to use that are that long?
> 
> we're sending info about the top CPU intensive processes so we can
> roll them up for a rudimentary 'cluster top' web page. an example
> metric value would be:
> 
> pid=11790, cmd=dd, user=root, %cpu=0.66, %mem=0.00, size=40,
> data=220, shared=508, vm=63340
> 
> we are also sending info about the queue management system (in our
> case, SGE) and those values are also longer than 32 bytes.
> 

One of the new feature in Ganglia 3.1 is the ability to add extra data to the 
metric definition that is passed on with the metric metadata.  Would anything 
like that help you?

Brad


--
Enter the BlackBerry Developer Challenge  
This is your chance to win up to $100,000 in prizes! For a limited time, 
vendors submitting new applications to BlackBerry App World(TM) will have
the opportunity to enter the BlackBerry Developer Challenge. See full prize  
details at: http://p.sf.net/sfu/Challenge
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] Disk IO as gmond core metric

2009-07-02 Thread Brad Nicholes

Creating a standard template for building a module independent of the core is 
something that we really haven't gotten around to doing.  So I think in this 
instance, you would be the template creator :).  You should be able to start 
with the Ganglia autoconf stuff and derive something from there.  Sorry, I wish 
we had a standard template because I know that would make life easier.

Brad


>>> On 7/1/2009 at 7:46 PM, in message
<8121824c0907011846q7934d103t980e76728842f...@mail.gmail.com>, JB Kim
 wrote:
> Thanks for the feedback. Creating these metrics into independent
> module makes sense. I'll refactor it to isolate the code within
> mod_iostat.c and see how it works out.
> Before I do so, is there a "standard" template for configure/make
> files to build these DSO modules independently?
> Having such deployment template (perhaps also .spec) structure would
> encourage other developers to contribute code more easily, I think.
> 
> On Wed, Jul 1, 2009 at 11:15 AM, Brad Nicholes wrote:
>>   Thanks for the new module code.  I haven't had a chance to actually look 
> at the code yet but considering that this is a new metric module, it might be 
> better to decouple it from the rest of the Ganglia code as an independent 
> module rather than having the code integrated into metrics.c and the core 
> build system.  If the module is independently buildable then it really 
> doesn't matter if it is linux only or cross platform.  Even if the module 
> remained linux only, only those who are interested would download, build and 
> use it which is fine.  It would also make it much easier for us to simply 
> drop the tarball into the /contrib directory of SVN in order to make it 
> available immediately rather than having to integrate it into the core build 
> and re-release the whole project.  The ideal situation would be to also 
> include a .spec file that would allow the module to be built and packaged 
> into its own RPM.  But just a buildable source tarball would be great.
>>
>>   It is good to see people contributing new modules to the project even if 
> they are only for a single platform.  The more modules we have to offer, the 
> better the whole project is for everybody.
>>
>> Brad
>>
>>>>> On 6/30/2009 at 9:00 PM, in message
>> <8121824c0906302000y181b23adr87ebd98124450...@mail.gmail.com>, JB Kim
>>  wrote:
>>> Hi folks
>>>
>>> Wow, it didn't occur to me this thread was more than a year ago. Time
>>> surely flies when you have a newborn at home. :-)
>>> In any case, I've made necessary modifications to 3.1.2 release to
>>> allow iostat-related metrics for linux.
>>>
>>> Here is the tarball that compiles on linux and reports 7 extra metrics
>>> from a new DSO module called iostat.
>>>
>>> http://www.remnantone.com/pkgs/ganglia/ganglia-3.1.2_io.tar.gz 
>>>
>>> The mod_iostat contains the following metrics:
>>>
>>>   - io_readtot
>>>   - io_readkbtot
>>>   - io_writetot
>>>   - io_writekbtot
>>>   - io_svctmax
>>>   - io_queuemax
>>>   - io_busymax
>>>
>>> The code changes have been made to:
>>>
>>>   - libmetrics/libmetrics.h
>>>   - libmetrics/linux/metrics.c
>>>   - gmond/modules/mod_iostat.c
>>>   - Makefile.am changes to include a new module build
>>>
>>> There are couple of points:
>>>
>>> * The new set of metrics are only for linux at this point. (supports
>>> 2.4 and 2.6 kernels)
>>>   As you can see, all of the metric functions are implemented within
>>> libmetrics/linux/metrics.c
>>> * These metric functions report aggregated values. 4 of them are sums
>>> across disks, 3 of them are max across the disks.
>>>   These metrics would be ideal for cluster computing nodes which often
>>> has 1 or 2 disks, not for large servers with multitude of disks.
>>> * I had thought about making things isolated to modules/iostat/mod_iostat.c.
>>>   However, since the current implementation only works on linux, I
>>> decided it was best to place it in libmetrics/linux, which is
>>>   already os-dependent code, rather than trying to support multi-os
>>> build with bunch of #ifdef/#endif inside of
>>> modules/iostat/mod_iostat.c.
>>> * Future improvements should consider reporting independent io metrics
>>> for user supplied list of disks instead of aggregating the whole.
>>> * Lastly, apologies for ugly code...
>>>
>>> If there's sufficient interest and you would like these me

Re: [Ganglia-developers] Disk IO as gmond core metric

2009-07-01 Thread Brad Nicholes

   Thanks for the new module code.  I haven't had a chance to actually look at 
the code yet but considering that this is a new metric module, it might be 
better to decouple it from the rest of the Ganglia code as an independent 
module rather than having the code integrated into metrics.c and the core build 
system.  If the module is independently buildable then it really doesn't matter 
if it is linux only or cross platform.  Even if the module remained linux only, 
only those who are interested would download, build and use it which is fine.  
It would also make it much easier for us to simply drop the tarball into the 
/contrib directory of SVN in order to make it available immediately rather than 
having to integrate it into the core build and re-release the whole project.  
The ideal situation would be to also include a .spec file that would allow the 
module to be built and packaged into its own RPM.  But just a buildable source 
tarball would be great.

   It is good to see people contributing new modules to the project even if 
they are only for a single platform.  The more modules we have to offer, the 
better the whole project is for everybody.  

Brad  

>>> On 6/30/2009 at 9:00 PM, in message
<8121824c0906302000y181b23adr87ebd98124450...@mail.gmail.com>, JB Kim
 wrote:
> Hi folks
> 
> Wow, it didn't occur to me this thread was more than a year ago. Time
> surely flies when you have a newborn at home. :-)
> In any case, I've made necessary modifications to 3.1.2 release to
> allow iostat-related metrics for linux.
> 
> Here is the tarball that compiles on linux and reports 7 extra metrics
> from a new DSO module called iostat.
> 
> http://www.remnantone.com/pkgs/ganglia/ganglia-3.1.2_io.tar.gz 
> 
> The mod_iostat contains the following metrics:
> 
>   - io_readtot
>   - io_readkbtot
>   - io_writetot
>   - io_writekbtot
>   - io_svctmax
>   - io_queuemax
>   - io_busymax
> 
> The code changes have been made to:
> 
>   - libmetrics/libmetrics.h
>   - libmetrics/linux/metrics.c
>   - gmond/modules/mod_iostat.c
>   - Makefile.am changes to include a new module build
> 
> There are couple of points:
> 
> * The new set of metrics are only for linux at this point. (supports
> 2.4 and 2.6 kernels)
>   As you can see, all of the metric functions are implemented within
> libmetrics/linux/metrics.c
> * These metric functions report aggregated values. 4 of them are sums
> across disks, 3 of them are max across the disks.
>   These metrics would be ideal for cluster computing nodes which often
> has 1 or 2 disks, not for large servers with multitude of disks.
> * I had thought about making things isolated to modules/iostat/mod_iostat.c.
>   However, since the current implementation only works on linux, I
> decided it was best to place it in libmetrics/linux, which is
>   already os-dependent code, rather than trying to support multi-os
> build with bunch of #ifdef/#endif inside of
> modules/iostat/mod_iostat.c.
> * Future improvements should consider reporting independent io metrics
> for user supplied list of disks instead of aggregating the whole.
> * Lastly, apologies for ugly code...
> 
> If there's sufficient interest and you would like these metrics to be
> included in the subsequent release, I'll enhance/modify the code as
> necessary.
> 
> Thanks!
> 
> On Tue, Apr 29, 2008 at 8:58 PM, JB Kim wrote:
>> Sure, sounds like a plan. I'll take a crack at it and let you know.
>>
>> On Tue, Apr 29, 2008 at 12:18 PM, Brad Nicholes  wrote:
>>> >>> On 4/28/2008 at 8:26 PM, in message
>>> <8121824c0804281926wb285fe4u5f269cfbf58a0...@mail.gmail.com>, "JB Kim"
>>>
>>>  wrote:
>>> > Folks,
>>> >
>>> > I've made some changes to ganglia 3.0.7 gmond code to provide aggregated
>>> > disk IO
>>> > statistics for linux hosts. Since a given host can have one or more disks,
>>> > the
>>> > values from each individual disk are aggregated to a sum or to a max.
>>> >
>>> > It seems like a lot of folks are using a wrapper for iostat command to 
>>> > send
>>> > data
>>> > via gmetric. While this is also a useful approach, I thought it would be
>>> > nice
>>> > and convenient to have this reported from gmond, although the data
>>> > would be summarized
>>> > for an entire host. The code simply reads from either /proc/partitions
>>> > or /proc/diskstats, and
>>> > maintains the old and the new values for each disk to calculate the diff.
>>> >
>>> > These are the new metrics that were

Re: [Ganglia-developers] 3.1.2 not mentioned on http://www.ganglia.info/

2009-04-01 Thread Brad Nicholes

>>> On 4/1/2009 at 7:49 AM, in message <49d370cc.1010...@sara.nl>, Ramon 
>>> Bastiaans
 wrote:
> FYI, I noticed;
> 
> At a first glance, version 3.1.1 is the latest version mentioned on 
> http://www.ganglia.info/ 
> 
> 3.1.2 is only found in the Sourceforge downloads, but mentioned no place 
> else.


Good point.  I will see about getting something posted.

Brad


--
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] Building a ganglia interface into collectl

2009-03-31 Thread Brad Nicholes

The only thing that I would suggest is that you use the APIs to create and send 
the packets.  The problem with hand crafting the packets is that if the gmond 
XDR packet definition ever changes, your interface will be broken.  I don't 
foresee the XDR packet definition change very frequently, hopefully not in the 
near future since it has been crafted to be expandible, but it could cause you 
problems if it does.  Basically all you would need to do is either dynamically 
link the ganglia library or use dlopen() to look for it.  If it succeeds then 
use dlsym() to import the functions and start calling the APIs.   The downside 
is that you would probably have to deploy the ganglia library yourself since 
you are trying to use collectl rather than gmond to gather metrics.

Brad

>>> On 3/31/2009 at 11:32 AM, in message
<9fce3a46fe7c8045a6207ae4b42e9f9a4794075...@gvw1119exc.americas.hpqcorp.net>,
"Seger, Mark"  wrote:
> This uses 3.1   Here's an example of the output I generate when debugging is 
> turned on:
> 
> I have a routine I pass 3 parameters to:  name of the variable, its units 
> and the value.  I generate 2 packets, the first with a header and the second 
> with the data, which looks like this:
> 
> 13:41:45.014 Name: ctxint.ctx   Units: switches/sec  Val:
>   562 TTL: 60  sent
> 00 00 00 80 00 00 00 0c 63 61 67 2d 64 6c 35 38 35 2d 30 32 00 00 00 0a 63 
> 74 78 69 6e 74 2e 69 6e 74 00 00 00 00 00 00 00 00 00 06 64 6f 75 62 6c 65 00 
> 00 00 00 00 0a 63 74 78 69 6e 74 2e 69 6e 74 00 00 00 00 00 0b 69 6e 74 72 70 
> 74 73 2f 73 65 63 00 00 00 00 03 00 00 00 3c 00 00 00 00 00 00 00 00
> 00 00 00 85 00 00 00 0c 63 61 67 2d 64 6c 35 38 35 2d 30 32 00 00 00 0a 63 
> 74 78 69 6e 74 2e 69 6e 74 00 00 00 00 00 00 00 00 00 02 25 73 00 00 00 00 00 
> 04 31 30 37 32
> 
> Not sure if seeing the binary is of much value without the mappings, but as 
> I said a V3.1 gmond is very happy with what I'm sending.  We actually did 
> think of xdr, but that would but additional requirements on collectl and it 
> feels like doing things in binary will help minimize overhead.
> 
> -mark
> 
> |-Original Message-
> |From: Brad Nicholes [mailto:bnicho...@novell.com] 
> |Sent: Tuesday, March 31, 2009 1:13 PM
> |To: Seger, Mark; ganglia-developers@lists.sourceforge.net 
> |Cc: Evan J Felix
> |Subject: Re: [Ganglia-developers] Building a ganglia interface into
> |collectl
> |
> |>>> On 3/31/2009 at 9:56 AM, in message <49d23d46.2090...@hp.com>, Mark
> |Seger
> | wrote:
> |>
> |> This then leads to my question, which is what is the best way to send
> |> data to ganglia.  I want to keep my messages very dense and so we
> |chose
> |> to simply send out binary data in the same format gmond expects.  In
> |the
> |> case of pnnl, where they have a monitoring hierarchy, we've completely
> |> replaced all the monitoring gmonds with a dozen that act only as
> |> aggregators.  There are about 190 nodes running collectl sending UPD
> |> messages to each aggregator gmonds and it seems to run just fine.
> |Does
> |> this make sense?  Is there anything to watch out for?
> |>
> |> If anyone else is interested in trying this out while we're shaking
> |out
> |> the code, I'd be happy to share some pre-release code with a few
> |people.
> |>
> |
> |Which version of Ganglia are you targeting (3.0.x or 3.1.x).  Ganglia
> |uses XDR to pack and unpack the metric packets.  However the actual
> |format changed significantly between 3.0.x and 3.1.x.  You can see the
> |XDR packet layout in the file lib/protocol.x for 3.0.x or
> |lib/gm_protocol.x for 3.1.x.  The 3.1.x version is a bit more complex
> |than the 3.0.x version.  The 3.0.x version is a very simple XDR packet
> |that basically contains a metric ID and a value.  Gmond 3.0.x can get
> |away with just sending an ID in the packet because every 3.0.x gmond
> |hardcodes the metric metadata.  Gmond 3.1.x made this more flexible by
> |splitting the packets in to metadata and value packets.  Probably the
> |easiest way to communicate directly with gmond is to use the message
> |creation and sending APIs that are part of the ganglia library.  Take a
> |look at the gmetric utility code for an example of how to use these
> |APIs.  Gmetric is basically doing what you want to do but as a
> |standalone utility.  For Ganglia 3.0.x you will have to include
> |lib/ganglia.h, for Ganglia 3.1.x you will include include/ganglia.h.
> |The libraries for Ganglia 3.1.x version have been made a little more
> |developer friendly by putting the public headers in the include/
> |directory and converting the library to be a .so rather than static.
> |
> |Brad
> |



--
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] Building a ganglia interface into collectl

2009-03-31 Thread Brad Nicholes

>>> On 3/31/2009 at 9:56 AM, in message <49d23d46.2090...@hp.com>, Mark Seger
 wrote:
> 
> This then leads to my question, which is what is the best way to send 
> data to ganglia.  I want to keep my messages very dense and so we chose 
> to simply send out binary data in the same format gmond expects.  In the 
> case of pnnl, where they have a monitoring hierarchy, we've completely 
> replaced all the monitoring gmonds with a dozen that act only as 
> aggregators.  There are about 190 nodes running collectl sending UPD 
> messages to each aggregator gmonds and it seems to run just fine.  Does 
> this make sense?  Is there anything to watch out for?
> 
> If anyone else is interested in trying this out while we're shaking out 
> the code, I'd be happy to share some pre-release code with a few people.
> 

Which version of Ganglia are you targeting (3.0.x or 3.1.x).  Ganglia uses XDR 
to pack and unpack the metric packets.  However the actual format changed 
significantly between 3.0.x and 3.1.x.  You can see the XDR packet layout in 
the file lib/protocol.x for 3.0.x or lib/gm_protocol.x for 3.1.x.  The 3.1.x 
version is a bit more complex than the 3.0.x version.  The 3.0.x version is a 
very simple XDR packet that basically contains a metric ID and a value.  Gmond 
3.0.x can get away with just sending an ID in the packet because every 3.0.x 
gmond hardcodes the metric metadata.  Gmond 3.1.x made this more flexible by 
splitting the packets in to metadata and value packets.  Probably the easiest 
way to communicate directly with gmond is to use the message creation and 
sending APIs that are part of the ganglia library.  Take a look at the gmetric 
utility code for an example of how to use these APIs.  Gmetric is basically 
doing what you want to do but as a standalone utility.  For Ganglia 3.0.x you 
will have to include lib/ganglia.h, for Ganglia 3.1.x you will include 
include/ganglia.h.  The libraries for Ganglia 3.1.x version have been made a 
little more developer friendly by putting the public headers in the include/ 
directory and converting the library to be a .so rather than static.

Brad

--
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] optional metrics

2009-03-18 Thread Brad Nicholes

>>> On 3/18/2009 at 5:18 AM, in message
 wrote:

> 
> Hi,
> 
> gmond refuses to start if any of the metrics fail to initialise.
> 
> In this new era of modular metrics, it is likely that some modules will
> relate to things that are not static (e.g. a NIC that existed before but
> has now gone away).
> 
> Maybe there needs to be a configuration option to declare certain
> metrics to be non-essential at startup time?
> 
> Beyond that, it may also be nice to support metrics that only exist
> intermittently (e.g. if someone temporarily mounts an iSCSI LUN for 30
> minutes, and they want metrics without reconfiguring and restarting
> gmond)
> 

This goes back to previous discussions that we have had on this list around 
making gmond recognize new metrics without having to stop and restart gmond.  
In other words, being able to dynamically add and remove metric module 
configuration.  Currently gmond can't handle adding or removing metric 
configuration on the fly.  It reads all of the configuration at startup and 
creates an internal list of metrics and associated data.  Gmond depends on that 
internal list for everything and changing it on the fly would cause major 
problems.  

In order to do what you are suggesting, which I think would be great BTW, the 
whole mechanism for tracking metrics and metric data will have to be reworked.  
This can certainly be done, but it will take some well thought out design and 
coding.

Brad

--
Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are
powering Web 2.0 with engaging, cross-platform capabilities. Quickly and
easily build your RIAs with Flex Builder, the Eclipse(TM)based development
software that enables intelligent coding and step-through debugging.
Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

1 2 3 4 5 6 >

1 - 100 of 564 matches

Mail list logo