Re: [Ganglia-developers] Bugfix not summited because bugzilla not accessable from my area
There are links to bugzilla on the ganglia.info web site. For example, the last sentence on the ganglia.info support tab has a link. (http://ganglia.info/?page_id=68) Brad >>> Bernard Li 9/25/2013 12:55 PM >>> Hello: We are no longer using Bugzilla, please submit the bugs in GitHub Issues: https://github.com/ganglia/ganglia-web/issues Just curious though, where did you see the link to Bugzilla? Thanks! Bernard On Wed, Sep 25, 2013 at 10:07 AM, 田甲 wrote: > Will someone help me submit this bugfix? Thanks. > > Module: ganglia-web > File: stacked.php > Description: This bug happens when showing stacked graph of custom metrics > on selected few hosts. In this case, count($hosts) is regarded as the count > of valid ones, which results in a overwhelming value of $cx, leading to > blank color 0xFF. > Diff: > > $ diff stacked.php stacked.php2 > 98d97 > < $i = 0; > 100,101c99 > < $cx = $i/(1+count($hosts)); > < $i++; > --- > > $cx = $index/(1+count($hosts)); > > -- > Regards, > Hydrogenesis > oxygenera...@gmail.com > > > -- > October Webinars: Code for Performance > Free Intel webinars can help you accelerate application performance. > Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most > from > the latest Intel processors and coprocessors. See abstracts and register > > http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk > ___ > Ganglia-developers mailing list > Ganglia-developers@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/ganglia-developers > > -- October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register > http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Trac Wiki, Bugzilla and GitHub
I'm not sure that just abandoning the issues in Bugzilla is a good idea without at least trying to follow up with the person who submitted the issue. Some of the issues may still be valid and we certainly don't want to abandon those. How much effort would it be to try to either validate or follow up on the open issues? OTOH, I guess leaving bugzilla as read-only means that the issues will still be there and not necessarily be lost. We could close out each ticket in bugzilla with a note to the submitter that says that if the bug is still an issue then resubmit the bug to GitHub Issues. I guess what I am saying is that I don't have a strong opinion either way. Brad >>> On 5/14/2012 at 11:37 AM, in message , Bernard Li wrote: > I spoke with Vladimir briefly on IRC and he recommends that we just > move to GitHub Issues, reason being it works better with the GitHub > workflow (as Alex Dean also mentioned in his email). > > I am okay with this, as long as we take the effort to go through > bugzilla.ganglia.info and close out obsolete tickets and move all the > relevant open ones to GitHub Issues. We can leave the old bugs in > Bugzilla for archival purposes and in read-only mode. > > Another option which Vladimir suggest is just forget about the old > tickets in Bugzilla and start fresh in GitHub Issues. > > I am leaning towards option 1 -- what do you guys think? > > Thanks, > > Bernard > > On Sat, May 12, 2012 at 2:12 AM, Daniel Pocock wrote: >> >> >> On 12/05/12 00:44, Bernard Li wrote: >>> Hi Daniel: >>> >>> On Fri, May 11, 2012 at 3:08 AM, Daniel Pocock wrote: >>> If I host it, it would purely be on a voluntary basis, so I would be hoping for upstream and/or Debian to be providing convenient packages and security updates. Although I am quite capable of installing it manually, time spent maintaining such an install of bugzilla would cut into time spent maintaining any other open source packages I contribute to >>> >>> Thanks to Ben Hartshorne, I was able to find this: >>> >>> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=638705 >>> >>> So yeah, bugzilla is temporarily removed from Debian. However, it's >> >> Yes, that was the same link I posted - it doesn't say temporary or >> permanent, it just says they need at least 2 people willing to support >> the package in some sense. It also suggests that the way upstream >> distributes the tarball makes it necessary to do a lot of patching, that >> deters people from maintaining a package. >> >>> still available in EPEL: >>> >>> http://dl.fedoraproject.org/pub/epel/6/x86_64/ >>> >>> Is this really an issue? >> >> Yes, definitely, because if something like that is publicly accessible, >> it needs security updates. Debian and RHEL often put out security >> updates for supported packages within a matter of hours (much faster >> than the non-Linux platform vendor) >> >> The reason for using Debian is that I already have a VM running for >> reSIProcate, it could be shared for the Ganglia project, used to >> bootstrap releases, etc. The physical server is under a commercial >> hosting contract in Telehouse, one of London's most well connected data >> centres: >> http://en.wikipedia.org/wiki/Telehouse_Europe#London > > -- > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > ___ > Ganglia-developers mailing list > Ganglia-developers@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/ganglia-developers -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Trac Wiki, Bugzilla and GitHub
+1 for sticking with bugzilla. If we can move it to somewhere that is more maintainable, that would be better. But I would hate to just abandon everything there. Brad >>> On 5/10/2012 at 10:01 AM, in message , Bernard Li wrote: > Hi Daniel: > > Just for the record, I actually like Bugzilla and would like to keep using > it. However because we do not have direct ownership to the server (it is > being hosted at UC Berkeley) it makes it hard to maintain. For instance it > has currently been down for at least two days and so far I have not been > able to get ahold of the admins who could tell us what's going on. This is > not the first time it has happened. > > So either we move the Bugzilla instance to somewhere we have more control > or we move them to GitHub Issues, it just can't stay where it is. > > I agree however that there are probably more bugs in Bugzilla than GitHub > Issues so perhaps moving from GitHub Issues -> Bugzilla and disabling > GitHub Issues is the way to go. But I am also under the impression some > folks like GitHub Issues better. > > Anybody else have any comments? > > Thanks! > > Bernard > > On Thursday, May 10, 2012, Daniel Pocock wrote: > >> >> > This is our request for help. We need someone to take charge of >> > managing our documentation making sure they are up to date and in one >> > canonical location. We'll also need someone to help with importing the >> > bugs in Bugzilla to GitHub Issues. >> >> >> We definitely have to abandon bugzilla? >> >> Can we just turn off the issue tracker in github to avoid people opening >> issues in the wrong place? >> >> >> -- >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> ___ >> Ganglia-developers mailing list >> Ganglia-developers@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/ganglia-developers >> -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] 3.3.3 tagged
>>> On 3/21/2012 at 12:48 PM, in message , Vladimir Vuksan wrote: > I agree with Alex. We are churning through too many versions. I would > personally be OK with overriding the existing 3.3.2 tag and going with > 3.3.2 instead of 3.3.4. > The problem with reusing a version number is that you end up with different snapshots of the code under the same version number in the wild. Whether or not a specific snapshot of the code has actually be designated as a release doesn't matter, the fact is that the version stamped snapshot has been posted and available to the public. All it takes is for one user or distributor to get a hold of the wrong snapshot and the Ganglia project ends up wasting time answering and debunking bugs that don't actually exist in the actual release. It is much easier to explain to a user that a specific version number was never released rather than tell them that the 3.3.2 version of their code is the wrong one. Skipping version numbers is actually a common practice. The Apache HTTPD project is actually doing the same thing as we speak due to a bad release candidate. Brad -- This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
[Ganglia-developers] Other project related stuff (was:Re: releasing 3.3.2 today?)
All, My comments on this subject probably have more to do with the overall way that the Ganglia projects works rather than just versioning. Right now we have a wiki page that is hosted at SourceForge (http://sourceforge.net/apps/trac/ganglia) which describes how the Ganglia project used to work when the repository was SVN. Now that things have moved to Github and some of the people who were running the project at the time are not as involved anymore (namely me :), it seems as though things are getting a little confusing. For example, the versioning rules for Ganglia releases is also described on the sourceforge wiki (http://sourceforge.net/apps/trac/ganglia/wiki/how_project_works). Although in some ways it might be similar to what has been discussed recently, it is different. People who are trying to figure out how the Ganglia project works will probably run across the older wiki page first (since it is still linked to Ganglia.info) and then be confused by how versioning is actually handled now. Also, since the procedures and policies on the older wiki page were modeled around SVN, the Git way of doing things obviously makes a lot of the older wiki information obsolete. I think it is important that the Ganglia project gits on the same (wiki) page with regards to how the project works and what information the project wants to provide to new users and developers (all puns intended ;-) . Especially in light of the fact that many of us are working on the Ganglia Monitoring book. Hopefully once the book is released, it will generate more interest in the Ganglia project and it would probably look bad if we had two different wiki pages providing conflicting information. Since I haven't been as involved with the project especially since the source code was moved to Git, is there someone who could review the SourceForge wiki page and determine what information is still valid and which isn't? At that point we could decided whether to just update the SourceForge wiki or move it all to Github wiki. Comments? Brad -- This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] "binchols" in GitHub
>>> On 3/13/2012 at 11:42 PM, in message , Bernard Li wrote: > Not sure who added "bnichols" to GitHub, but he's not Brad Nicholes: > > https://github.com/bnichols > > I've revoked his membership. > > Brad, could you please confirm your id on GitHub? It's "bnicholes", right? > > https://github.com/bnicholes > Yes, this one is mine. Thanks for catching that. Brad -- This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Ganglia REST interface (was:Re: Gauging interest in writing a Ganglia eBook)
>>> On 12/5/2011 at 12:09 PM, in message <20111205190901.go17...@mail.nih.gov>, Jesse Becker wrote: > On Mon, Dec 05, 2011 at 01:17:19PM -0500, Brad Nicholes wrote: >>All, >> I just wanted to get some feedback on how much interest there would be > for a REST interface for Ganglia. I spent a few days putting together a POC > of a REST interface and was able to get something that implements the > following REST URLs: >> >>/clusters >>/clusters/{cid} >>/clusters/{cid}/hosts >>/hosts >>/hosts/{hid} >>/hosts/{hid}/metrics >>/hosts/{hid}/metrics/{mid} >>/hosts/{hid}/metrics/{mid}/data >>/hosts/{hid}/metrics/{mid}/graph >>/hosts/{hid}/metrics/{mid}/info > > For [chm]id, what is considered valid? Are the "common" names of these > valid, or do we have use some sort of unique ID instead? > As it stands right now, the identifier is just the common name found in the XML. We would need to do something else in gmond like we have talked about before (something like a GUID) in order to change this. But since the URL itself qualifies the resource uniquely, common names shouldn't be a problem. > It's certainly a good idea I think, and could be used to simplify a lot > of the frontend UI code. > > For things like '/hosts/{hid}/metrics/{mid}/graph', how would various > graphing options be passed? I have taken a very simplistic approach for now. A graphing URL would like something like /hosts/foo.org/metrics/cpu_user/graph?from=now-2day&to=now-1day&title="cpu user"&vlabel="percentage"&arealabel="cpu" Obviously there are many more graphing options than these that are defined in rrdtool. We would just need to figure out the best way to expose them. Options like CDEFs and VDEFs might be interesting. Then there are the stacked graphs. I'm not sure how to handle that. But if it can be expressed on a command line, we should be able to figure out how to express it in REST. > > For /hosts/{hid}/metrics/{mid}/data, could that be expanded to return > LAST, MIN,MAX, AVERAGE (etc) values as well? > > Absolutely, it would just be a matter of defining more query parameters Brad -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
[Ganglia-developers] Ganglia REST interface (was:Re: Gauging interest in writing a Ganglia eBook)
All, I just wanted to get some feedback on how much interest there would be for a REST interface for Ganglia. I spent a few days putting together a POC of a REST interface and was able to get something that implements the following REST URLs: /clusters /clusters/{cid} /clusters/{cid}/hosts /hosts /hosts/{hid} /hosts/{hid}/metrics /hosts/{hid}/metrics/{mid} /hosts/{hid}/metrics/{mid}/data /hosts/{hid}/metrics/{mid}/graph /hosts/{hid}/metrics/{mid}/info /clusters/... and /hosts/... pull data directly from the XML produced by gmetad. .../data, .../graph, .../info pull data or graphs directly from the rrd's through rrdtool. .../graph actually produces an rrdtool graph for the specified metric given some query params that affect the attributes of the graph. I would show you all a live demo but I don't have access to a web server outside our firewall. Before I can contribute the code, I need to get permission from my employer first so before going to that trouble, I just wanted to see if I was headed in the right direction. comments (on at least the little bit of information I gave you :) Brad >>> On 12/2/2011 at 5:31 PM, in message , Matt Massie wrote: > Brad- > > Can you open a new thread on the developer's list? I think there's going > to be quite a bit of interest in a REST interface to Ganglia. It would be > really useful to have. I know I've been tempted to write one myself. > > -Matt > > > On Fri, Dec 2, 2011 at 4:21 PM, Vladimir Vuksan wrote: > >> I am sure lots of people would appreciate REST interface to Ganglia. >> Myself and Jeff Buchbinder have been talking on how we could implement >> it but if you already have it completed that would be an awesome >> addition ;-). >> >> Vladimir >> >> On 02.12.2011 10:45, Brad Nicholes wrote: >> > Hey Matt, >> > How are you? It's been a while. I know I haven't been biggest >> > contributor to the Ganglia project lately but I still monitor the >> > mailing lists and this book sounds like a great idea. Count me in >> > anywhere I can help. >> > >> > On a slightly different note: >> > >> > I have managed to carve out a little time over the past few weeks to >> > get back into a little Ganglia development. Since we are gauging >> > interest, would anybody be interested in a REST interface for >> > Ganglia? >> > I have worked up a POC that allows a user to query metrics from >> > gmetad through REST as well as pull data and graphs directly from the >> > RRD files. I still have to get permission from my employer before I >> > can contribute the REST code to the Ganglia project, but before I go >> > to that effort I just wanted to see if this is something that the >> > Ganglia community would be interested in. >> > >> > Brad >> > >> > >> > >> >>>> On 12/1/2011 at 12:31 PM, in message >> > , >> > Matt >> > Massie wrote: >> >> There's an O'reilly editor who's interested in publishing a ~50-page >> >> eBook >> >> on ganglia. >> >> >> >> I have no doubt the ganglia community would benefit from a book >> >> covering >> >> topics like: >> >> >> >>- Ganglia's components and overall architecture >> >>- Typical deployment configurations including simple steps for >> >> verifying >> >>an installation (e.g. unicast/multicast, single cluster/multiple >> >>distributed clusters/datacenter) >> >>- Navigating and using the new web interface >> >>- Tips for extending ganglia's functionality (e.g. gmetric, >> >> modules) >> >>- Common integration points (e.g. Hadoop metrics, Nagios) >> >>- A simple step-by-step checklist for debugging common ganglia >> >> issues >> >>with pointers to our web site, mailing lists, irc channel, etc. >> >>- Supported platforms and core metrics >> >>- Scaling to clusters > 1000 nodes >> >> >> >> These are just ideas off the top of my head and not meant to final >> >> or >> >> comprehensive but meant to provide a list for discussion. Of >> >> course, let >> >> me know if there's topics the community would like to know more (or >> >> less) >> >> about. The purpose of the book is to serve as a first-read book for >> >> people >> >> new to ganglia. Keep in mind, for much of the book, we
Re: [Ganglia-developers] Gauging interest in writing a Ganglia eBook
Hey Matt, How are you? It's been a while. I know I haven't been biggest contributor to the Ganglia project lately but I still monitor the mailing lists and this book sounds like a great idea. Count me in anywhere I can help. On a slightly different note: I have managed to carve out a little time over the past few weeks to get back into a little Ganglia development. Since we are gauging interest, would anybody be interested in a REST interface for Ganglia? I have worked up a POC that allows a user to query metrics from gmetad through REST as well as pull data and graphs directly from the RRD files. I still have to get permission from my employer before I can contribute the REST code to the Ganglia project, but before I go to that effort I just wanted to see if this is something that the Ganglia community would be interested in. Brad >>> On 12/1/2011 at 12:31 PM, in message , Matt Massie wrote: > There's an O'reilly editor who's interested in publishing a ~50-page eBook > on ganglia. > > I have no doubt the ganglia community would benefit from a book covering > topics like: > >- Ganglia's components and overall architecture >- Typical deployment configurations including simple steps for verifying >an installation (e.g. unicast/multicast, single cluster/multiple >distributed clusters/datacenter) >- Navigating and using the new web interface >- Tips for extending ganglia's functionality (e.g. gmetric, modules) >- Common integration points (e.g. Hadoop metrics, Nagios) >- A simple step-by-step checklist for debugging common ganglia issues >with pointers to our web site, mailing lists, irc channel, etc. >- Supported platforms and core metrics >- Scaling to clusters > 1000 nodes > > These are just ideas off the top of my head and not meant to final or > comprehensive but meant to provide a list for discussion. Of course, let > me know if there's topics the community would like to know more (or less) > about. The purpose of the book is to serve as a first-read book for people > new to ganglia. Keep in mind, for much of the book, we won't be starting > from scratch. We already have a good amount of documentation that just > needs to be organized and edited. > > I'll be happy to contribute time to make this eBook a reality; however, I > want the book authors to be the leaders and experts in the ganglia > community. I think it best we divide and conquer and write the book as a > team. Who is interesting in helping write the book? > > -Matt -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] AUTHORS file
>>> On 2/22/2011 at 5:50 PM, in message , Bernard Li wrote: > Hi all: > > I'd like to propose that we replace AUTHORS file with the following > contents: > > --- cut --- > Ganglia Development Team > > For a full list of current/past developers and contributors, please > see: http://ganglia.info/?page_id=325 > --- cut --- > > Thoughts? > +1 -- Free Software Download: Index, Search & Analyze Logs and other IT data in Real-Time with Splunk. Collect, index and harness all the fast moving IT data generated by your applications, servers and devices whether physical, virtual or in the cloud. Deliver compliance at lower cost and gain new business insights. http://p.sf.net/sfu/splunk-dev2dev ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] dmax for python and C gmond modules
>>> On 2/1/2011 at 12:02 PM, in message , Bernard Li wrote: > Hi all: > > When I was writing a Python module for monitoring GPU metrics, I > noticed that the interface does not provide a way to set dmax. > According to the gmetric man page, dmax is: > >-d, --dmax=INT > The lifetime in seconds of this metric (default=*0*) > > So I was wondering why this isn't available for the Python or C module > interfaces. > > Right now, if I had added a new metric via the interfaces and no > longer want it, I would have to: > > 1) Comment it out in the conf > 2) Restart the gmond where the module originates from > 3) Restart the collector gmond > 4) Restart gmetad > I would probably have to go back and figure out what I was thinking at the time, but I vaguely recall that dmax was hardcoded for all of the standard metric in the 3.0 version of gmond. So at the time I was probably thinking that exposing dmax just wasn't necessary. That was probably a wrong assumption. Thinking back now, it would have made since for dmax to be hardcoded in 3.0 because there was no way to add or remove metrics in that version. But in the 3.1 version where you can do it, dmax shouldn't have been hardcoded and should have been exposed. Brad -- Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! Finally, a world-class log management solution at an even better price-free! Download using promo code Free_Logger_4_Dev2Dev. Offer expires February 28th, so secure your free ArcSight Logger TODAY! http://p.sf.net/sfu/arcsight-sfd2d ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] send_metadata_interval
>>> On 1/10/2011 at 4:52 PM, in message , Bernard Li wrote: > Hi Brad: > > Thanks for your reply. > > On Mon, Jan 10, 2011 at 8:06 AM, Brad Nicholes wrote: > >> The purpose of setting the send_metadata_interval to 0 by default was to > avoid unnecessary traffic for our default configuration of multicast. > Setting the directive to anything other than 0 will cause each gmond to start > sending all of its metric metadata on that interval. If you are going to set > it by default, IMO 30 seconds is too low. The problem is that people only > notice this in the first few minutes after restarting a gmond. They expect > metrics to start showing up immediately. After the gmond node finally does > send its metadata, rebroadcasting the metadata at any interval is just > consuming unnecessary bandwidth on the network. Especially in a multicast > environment where it isn't needed at all. Also consider that the more gmond > nodes you have the more traffic you are going to but on the network where 99% > of the time the extra traffic is totally unnecessary. > > I have a perhaps naive question. It sounds like > send_metadata_interval is only relevant to unicast configuration, so > why is multicast affected as well? How difficult of a code change > would it be if we make the send_metadata_interval directive to only > affect unicast? > We could add code to gmond to always disable resending metadata based on an interval. But then that is what the default value of 0 was doing. > Also multicast is the default configuration due to historic reasons > but not because it is more common. It is however easier to set up if > your environment supports it. Is it time for us to evaluate whether > we should switch to unicast as the default? And if so how? What is > the actual spread between unicast and multicast users? If it turns > out that the majority of our (new) users are using unicast, should we > spend more time/effort making it easier for them to use Ganglia? > Actually I think this is a good idea. In my experience, unicast seems to be more the norm rather than the exception now. If we were to make unicast the default, then that would make the suggestion above more relevant. We would probably want to put something in the code to automatically disable the send metadata for multicast. >> 300 or 600 seconds is probably good enough for a default. But no matter > what the default is, users still have to understand what that directive is > for and how to optimize it. The value of send_metadata_interval will > probably be different for every installation when you take into consideration > the number of nodes, the number of metrics and any other network related > variables. > > A couple more ideas came out of a brief brainstorming session on IRC > between Vladimir, Jesse and myself: > > 1) Collector gmond should request metadata from all gmonds when it has > been freshly (re)started This already happens in multicast mode. Whenever a gmond node receives a metric packet for which it has no metadata, it automatically sends out a request on the channel for metadata. The end result is that all gmond nodes are constantly resyncing themselves until all nodes in a cluster have a complete metadata picture. However, the same can not be done for unicast because, by definition, there is no two-way communication. In order to make the same functionality work for unicast, we would have to introduce a new listen port on every gmond that would accept commands and respond to whatever they are. Doing that opens up a security risk that would have to be dealt with correctly. > 2) Add a configuration check for gmond so upon starting, if > configuration is unicast-based, and send_metadata_interval is 0, warn > the user to set it to a sane number This would be a good idea no matter what else we do. > 3) Find a middle ground of default send_metadata_interval which does > not hurt new users in HPC space wanting to use unicast > > 2) and 3) are workarounds which could be implemented relatively > quickly, 1) maybe not so much. agreed Brad -- Gaining the trust of online customers is vital for the success of any company that requires sensitive data to be transmitted over the Web. Learn how to best implement a security strategy that keeps consumers' information secure and instills the confidence they need to proceed with transactions. http://p.sf.net/sfu/oracle-sfdevnl ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] send_metadata_interval
>>> On 1/7/2011 at 9:10 PM, in message , Jesse Becker wrote: > On Fri, Jan 7, 2011 at 15:25, Bernard Li wrote: >> Hi all: >> >> Since the release of Ganglia 3.1, we have introduced the new >> configuration option send_metadata_interval in gmond.conf. This is >> set to 0 by default and the user must set this to a sane number if >> using unicast otherwise if gmonds are restarted, hosts may appear to >> be offline (this is documented in the release notes). A bug has >> already been filed: >> >> http://bugzilla.ganglia.info/cgi-bin/bugzilla/show_bug.cgi?id=242 >> >> We recently have a lot of users having this issue and Vladimir >> recommend that we just set a sane number as the default and be done >> with it, since we end up spending a lot of time on IRC/mailing-list to >> solve the same problem over and over again. >> >> Since there have been some commits to the 3.1 branch since tagging >> 3.1.7, I propose we just copy 3.1.7 tag, update the send_meta_data >> interval in the configuration file and release that as 3.1.8. >> >> This is not the normal procedure for making a release, so I'd like to >> get some feedback from other developers. >> >> BTW I am thinking of setting send_metadata_interval to 30 seconds. >> Also, does anybody know if this setting affects multicast setups in >> any way? > > I think that it's fine to set this to a non-zero value, but I wonder > if 30 seconds is too high. I did a quick set of checking on the > actual packets that are sent--and specifically the metadata packets. > I haven't been able to really delve into the code to figure exactly > what's going on (this part of the code is't terribly transparent to > me), but I *think* that they are really large--on the order of several > KB when fully assembled, as compared to less than 100-120 bytes for a > typical metric packet . I think that size will increase with the > number of metrics stored, since each one must be described in full XML > each time. > > The reason for the large size is that an entire XML description of the > metrics appears to be sent each time. Metadata packets also appear to > go over TCP, not UDP. > > My testing was pretty simple: > 1) setup a gmond (from SVN, well after 3.1 came out) in unicast mode. > 2) set 'send_metadata_interfaval' to 1 > 3) disable all modules, except for 'mod_core' > 4) remove all collection groups. > 5) start gmond, and run tcpdump. > > On a large cluster, with lots of metrics per host, I can see problems > if the metadata packets are sent too frequently. I have hosts that > send well over 300 metrics (lots of CPU cores makes for lots of > metrics...). Each of these need to be described in the metadata > packets. > > So I think that setting a non-zero default is fine. But think that > something like 300 or 600 seconds would be preferable. > The purpose of setting the send_metadata_interval to 0 by default was to avoid unnecessary traffic for our default configuration of multicast. Setting the directive to anything other than 0 will cause each gmond to start sending all of its metric metadata on that interval. If you are going to set it by default, IMO 30 seconds is too low. The problem is that people only notice this in the first few minutes after restarting a gmond. They expect metrics to start showing up immediately. After the gmond node finally does send its metadata, rebroadcasting the metadata at any interval is just consuming unnecessary bandwidth on the network. Especially in a multicast environment where it isn't needed at all. Also consider that the more gmond nodes you have the more traffic you are going to but on the network where 99% of the time the extra traffic is totally unnecessary. 300 or 600 seconds is probably good enough for a default. But no matter what the default is, users still have to understand what that directive is for and how to optimize it. The value of send_metadata_interval will probably be different for every installation when you take into consideration the number of nodes, the number of metrics and any other network related variables. Brad -- Gaining the trust of online customers is vital for the success of any company that requires sensitive data to be transmitted over the Web. Learn how to best implement a security strategy that keeps consumers' information secure and instills the confidence they need to proceed with transactions. http://p.sf.net/sfu/oracle-sfdevnl ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Ganglia Web top-level project + versioning
>>> On 11/4/2010 at 6:21 PM, in message , Bernard Li wrote: > Hi Brad: > > [I've changed the subject line to be more reflective of the current > discussions] > > On Thu, Nov 4, 2010 at 8:50 AM, Brad Nicholes wrote: > >> I'm not sure that we need to physically split the web frontend from the > backend as far as the Ganglia project goes. IMO, why not just follow the > pattern that we already have in SVN under trunk. Right now we have > trunk/monitor-core which includes everything. Could we just create a new > directory under trunk called web-frontend and move everything that has to do > with the web frontend out of monitor-core and into web-frontend. From that > point on, they could both be treated as separate projects with their own > release cycles without physically splitting the code into different > repositories. Tagging and branches would also work the same way. > > That's fine. > > How about versioning? Or am I thinking too much? One potential issue > is that ganglia-core would be at 4.0 and ganglia-web will be at 3.5 -- > this might cause confusion as to what combination is supported, or > vice versa. > As far as versioning goes, I think that ganglia-web would just follow its own version scheme. The frontend might have to include some kind of check on the version of the backend to make sure that it is compatible. I'm not sure how flexible the frontend could be, but since all it is doing is consuming XML, I am guessing that it could be fairly flexible when it comes to backward compatibility. I am guessing that the most likely scenario is that a user would upgrade the frontend a lot more frequently than the backend. So there probably wouldn't have to be much need to worry about an older frontend having to support a newer backend. I think it would be a natural thing for a Ganglia user to automatically upgrade the frontend whenever the backend is upgraded. But they would probably upgrade the frontend routinely wthout a backend upgrade. Anyway, yes I think you are thinking too much :-) Documenting compatibility would probably be sufficient. Of course we as the Ganglia developers, wouldn't be able to test every new release of the frontend with every previous release of the backend. But like I said, since the frontend is just consuming XML, it should be flexible enough to handle backwards compatibility. Also the fact that the XML schema isn't expected to change, at least no drastically, within a major version of the backend, backward compatibility should be simple. Brad -- The Next 800 Companies to Lead America's Growth: New Video Whitepaper David G. Thomson, author of the best-selling book "Blueprint to a Billion" shares his insights and actions to help propel your business during the next growth cycle. Listen Now! http://p.sf.net/sfu/SAP-dev2dev ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] [Ganglia-general] IRC chat on Ganglia Web Frontend re-write 10/13/2010 (Wed) 9-10am PDT
I'm not sure that we need to physically split the web frontend from the backend as far as the Ganglia project goes. IMO, why not just follow the pattern that we already have in SVN under trunk. Right now we have trunk/monitor-core which includes everything. Could we just create a new directory under trunk called web-frontend and move everything that has to do with the web frontend out of monitor-core and into web-frontend. From that point on, they could both be treated as separate projects with their own release cycles without physically splitting the code into different repositories. Tagging and branches would also work the same way. The only purpose I see for splitting them into two different projects is to try to grow two different communities (ie. developers with rights to the web project who don't necessarily have rights to the monitor-core project and vice-versa). Given the fact that we don't really have a large developer community, I'm not sure that it would be a good idea to split the community that we have. Brad >>> On 11/4/2010 at 1:15 AM, in message , Bernard Li wrote: > Hi all: > > The other day we were talking on IRC regarding how to proceed with > this "re-write" effort for the frontend. In the beginning, I was > gung-ho on this re-write from scratch, however, recently Vladimir has > been hacking away adding new features to the existing code in trunk. > You can get a taste of it here: > > http://ec2-184-72-167-114.compute-1.amazonaws.com/ganglia-new/ > > Which got me to thinking... is a re-write from scratch the best > approach, or should we just try to keep extending what we have? > > Another administrative issue that cropped up, is whether to split out > Ganglia-Web as a separate project such that it doesn't need to follow > the main Ganglia release cycle (since the frontend code is usually > backward/forward compatible with Ganglia releases anyway). > > My idea is to create a new project for the frontend, give it a new > name and start with a new version. With that, we can tell users that > after Ganglia X, we will no longer be shipping the web component, use > Y for that. > > Another approach is to retain the Ganglia name, but say that after > Ganglia 3.2, there are 2 separate projects, ganglia and ganglia-web, > in which case ganglia-web will be on a different release cycle than > ganglia. > > Sounds confusing? Yes it is! :) > > I don't really care either way, as long as it causes the least > confusion to the users -- feel free to offer Plan C. > > Another plan I have in mind is after we create branch-3.2 from trunk, > we remove the web component from the code base, in which case all > future bug fixes to ganglia-web goes into that branch only, and we > will move development to GitHub (just for the frontend). > > Thanks! > > Bernard > > On Thu, Oct 21, 2010 at 11:29 PM, Bernard Li wrote: >> Hi all: >> >> Sorry for the delay in posting the log, but I have finally uploaded >> it. Thanks Jesse for logging: >> >> http://therealms.org/oss/ganglia/ganglia_frontend_rewrite_irc_101310.txt >> >> I have left the log as is, I just filtered out people's hostnames and >> stuff. I chopped off at the end when we started discussing outside >> the scope of the frontend re-write. >> >> I will try to summarize the log in the next few days, but if anybody >> else who was there would like to take a stab at it, please feel free. >> >> I think Erik and Vladimir have been hard at work hacking at a Ganglia >> installation on an AWS instance. We will try to schedule another time >> to sync up and discuss further (would a phone teleconference be better >> this time, or should we stick with IRC)? >> >> It doesn't look like the hackathon would happen next month. It might >> become a virtual hackathon but I would really like to put all the >> developers in a room, but anyway, we'll see. >> >> Thanks again for all who showed up, and for all the great discussions. >> >> Cheers, >> >> Bernard >> >> On Wed, Oct 13, 2010 at 11:52 AM, Jesse Becker wrote: >>> I have a log that I will try to clean up and post later today. >>> >>> On Wed, Oct 13, 2010 at 14:46, Dave Josephsen wrote: Hey all, Did anyone take minutes? I wasn't able to attend but am interested in > hearing about the chat. Thanks -dave - Original Message - From: "Bernard Li" To: ganglia-developers@lists.sourceforge.net, "Ganglia" > Sent: Thursday, October 7, 2010 1:55:26 PM GMT -06:00 US/Canada Central Subject: [Ganglia-general] IRC chat on Ganglia Web Frontend re-write > 10/13/2010 (Wed) 9-10am PDT Dear all: I've been talking to people on and off about doing a web frontend re-write, in fact I have been thinking about it since almost three years ago when I started the "wishlist" thread: > http://www.mail-archive.com/ganglia-developers@lists.sourceforge.net/msg03070.h > > tml I've mana
Re: [Ganglia-developers] sFlow counters in Ganglia
Apparently DNSSD is not working for me. I don't have access to the DNS server so my guess is the #4 below is not set properly. I turned off DNSSD and instead manually added a collector which pointed to my Ganglia server running the enabled gmond. After restarting hsflowd, I started to see packets showing up on the gmond server. After fixing another gmetad configuration problem, my box running sflow showed up. Very cool. Brad >>> On 10/18/2010 at 12:50 PM, in message <4d8a91b2-20d9-45a2-aea8-1de79d656...@inmon.com>, Neil McKee wrote: > It sounds like you are on the right track, but here is a little hsflowd > troubleshooting checklist... > > On the source (box running hsflowd): > > (1). Any error messages in /var/log/messages? > (2). Check /etc/hsflowd.conf, is the collector set, or is DNSSD=on? > (3). If using DNSSD, is the "search" setting correct in /etc/resolv.conf? > (4). If using DNSSD, are the SRV and TXT records in the zone file on the > DNS server? > (5). You can run hsflowd at the shell prompt with debug logging: "root> > hsflowd -dd" > > On the destination (box running gmond): > > (6). sFlow enabled in gmond.conf? > (7). check firewall settings (e.g. "root> iptables --list") > (8). Watch for an sFlow packet arriving every 20 seconds or so: "root> > /usr/sbin/tcpdump -n udp port 6343" > (Just remember that tcpdump sees packets before they hit the firewall, so > check (7) may still apply) > > On the UI: > > Should look exactly the same as is if gmond were running on the source too. > > Regards, > Neil > > > On Oct 18, 2010, at 10:46 AM, Bernard Li wrote: > >> Hi Brad: >> >> On Mon, Oct 18, 2010 at 9:55 AM, Brad Nicholes wrote: >> >>> I built gmond with the sflow patch and got it up and running. Then I > downloaded hsflowd from the sourceforge project as described in the gmond > doc. hsflowd build and installed as expected and everything seemed to be up > and running. I also added the extra udp_recv_channel block to the gmond.conf > file. But now after everything is up and running, I don't see anything > different in Ganglia. The web front end is just showing the same monitored > computers as it did before with the same metrics. If I query gmond through > telnet, I am not seeing any new metrics or spoofed nodes. What am I missing? >>> >>> I also tried to trace the network traffic on one of the machines that is > running hsflowd for anything from port 6343. I'm not seeing anything there > either even though the box says that hsflowd is running. I currently have > hsflowd running on two different boxes. One is a SLED 10 box and the other > is a SLES 10 box. >> >> You will need 2 hosts to test this, one running gmond and the other >> running hsflowd (don't run gmond on this host). >> >> On gmond.conf, add the extra udp_recv_channel as you had done and on >> the hsflowd.conf, follow these instructions: >> >> http://bugzilla.ganglia.info/cgi-bin/bugzilla/show_bug.cgi?id=276#c5 >> >> Then the hsflowd host should show up on the gmond XML stream as if >> it's running gmond. >> >> Cheers, >> >> Bernard -- Download new Adobe(R) Flash(R) Builder(TM) 4 The new Adobe(R) Flex(R) 4 and Flash(R) Builder(TM) 4 (formerly Flex(R) Builder(TM)) enable the development of rich applications that run across multiple browsers and platforms. Download your free trials today! http://p.sf.net/sfu/adobe-dev2dev ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] sFlow counters in Ganglia
I built gmond with the sflow patch and got it up and running. Then I downloaded hsflowd from the sourceforge project as described in the gmond doc. hsflowd build and installed as expected and everything seemed to be up and running. I also added the extra udp_recv_channel block to the gmond.conf file. But now after everything is up and running, I don't see anything different in Ganglia. The web front end is just showing the same monitored computers as it did before with the same metrics. If I query gmond through telnet, I am not seeing any new metrics or spoofed nodes. What am I missing? I also tried to trace the network traffic on one of the machines that is running hsflowd for anything from port 6343. I'm not seeing anything there either even though the box says that hsflowd is running. I currently have hsflowd running on two different boxes. One is a SLED 10 box and the other is a SLES 10 box. Brad >>> On 10/11/2010 at 4:33 PM, in message <9dd4668f-8a92-484e-9aac-8bece1c2a...@inmon.com>, Neil McKee wrote: > As suggested, I moved the sFlow receiver into a new file "sflow.c" and > eliminated any C99 assumptions. This time there is a "--disable-sflow" > configure option too: > > http://bugzilla.ganglia.info/cgi-bin/bugzilla/show_bug.cgi?id=276 > > In order for sflow.c to feed data directly into the repository, I had to > expose 1 structure definition and 5 functions that were previously private to > gmond.c. Hence the new .h file, "gmond_internal.h". > > Neil > > > On Oct 7, 2010, at 2:56 PM, Peter Phaal wrote: > >> Brad, >> >> Thanks for the feedback. My comments are in-line. >> >> On Oct 7, 2010, at 12:27 PM, Brad Nicholes wrote: >> >>> Sorry to jump into this thread so late but I thought that I would throw my >>> 2 > cents in. >>> >>> I finally got a chance to take a look at the code. I was able to compile >>> it > but ran into some C99 issues with variable declarations. Once I got the code > to compile, I was able to take a closer look at what it was doing. From what > I could tell, it looks like the sflow integration is based around reading XDR > packets from an sflow agent and turning them into gmond spoofing metrics. My > first question after seeing this is why does this code have to be built into > gmond.c? Why can't it just do the same thing in a module that would be > plugged into gmond? >>> >>> The reason why I ask this is because we went to a lot of work to pull all >>> of > the metric gathering out of gmond and into modules (including all of the > standard metrics). Some of the main reasons for this is so that metric > gathering could be pluggable without having to affect the gmond code itself. > That way if a bug was ever found and fixed for a specific metrics, we > wouldn't have to re-release all of Ganglia just for one metric fix. Also, > modules give the user the ability to customize each gmond agent to conform to > the specific needs of the node where gmond is running. Regarding sflow, it > seems that in order to integrate the sflow metrics into the Ganglia > monitoring system, only a single gmond node needs to be configured to gather > the sflow metrics. All of the other gmond agents can continue to be > configured and run as they were. Given that, it would make more sense to > integrate sflow as a module that could be loaded under a single gmond agent > rather than replacing all the gmond agents or even upgrading just a single > agent. It would also seem to follow the way that other metric modules and > spoofing modules have been implemented as well. >> >> >> I am not very familiar with gmod modules, but it looks like they are > designed around a polling model and used to retrieve metrics from the server > that the particular instance of gmond is running on: >> 1. a module is loaded in the modules section of the gmond.conf file and > registers a set of metrics it can provide >> 2. metrics are then included in collection_group sections and polled at the > specified intervals >> >> With sFlow, the counters are being pushed by remote servers. There may be > hundreds of sFlow agents sending XDR packets to the single gmond instance. > Our code acts as a gateway, translating the metrics from the remote hosts and > presenting them as if they had arrived in the form of Ganglia XDR datagrams > from remote gmond instances. This function needs to be part of the main > datagram processing loop. I don't see a way for a module to inject code into > the packet processing loop(?) >> >> We do of course plan to limit the changes t
Re: [Ganglia-developers] 3.0.x release, make 3.1 maintenance branch, create 3.2 release branch
>>> On 10/8/2010 at 1:31 AM, in message <20101008073146.gc8...@sajinet.com.pe>, Carlo Marcelo Arenas Belon wrote: > On Wed, Oct 06, 2010 at 03:23:40PM -0700, Bernard Li wrote: >> Hi all: >> >> I'd like to request that we make one last 3.0.x bug-fix release and >> EOL that branch. > > mostly everyone using EPEL packages in CentOS/RHEL is using 3.0 and > so EOL than branch (like was done for 2.5 which was until recently > all that was provided by debian and still what is distributed in > by Novell) would be IMHO a bad idea. > >> This will make way for shifting 3.1 to maintenance >> branch and creating a 3.2. release branch from trunk. > > if all you want is to have a 3.2 release branch (what would be the > main feature on it, though?) then just do so; why affect 3.0 or 3.1 > but that decision? > I think that Bernard is just basically stating the obvious and would just like to make the obvious, official. There hasn't been any significant changes to the 3.0.x branch in 2 years nor has there been a release in that long either. The last release of from the 3.0.x branch was done by Bernard and I have a feeling that he isn't really interested in doing any more. So the fact that there really isn't anybody willing to maintain the 3.0.x branch and that there are no release worthy bug fixes or enhancements to it, the branch is effectively EOL anyway. As far as what would be new in 3.2, the sflow stuff is looking really cool. I would rather see it implemented as a module, but if it isn't and ends up in gmond itself, that may be reason enough to call it 3.2. Brad -- Beautiful is writing same markup. Internet Explorer 9 supports standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 & L3. Spend less time writing and rewriting code and more time creating great experiences on the web. Be a part of the beta today. http://p.sf.net/sfu/beautyoftheweb ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] sFlow counters in Ganglia
>>> On 10/7/2010 at 3:56 PM, in message <6b7a9ac5-465d-4311-ac85-fd42c219d...@inmon.com>, Peter Phaal wrote: > Brad, > > Thanks for the feedback. My comments are in-line. > > On Oct 7, 2010, at 12:27 PM, Brad Nicholes wrote: > >> Sorry to jump into this thread so late but I thought that I would throw my 2 > cents in. >> >> I finally got a chance to take a look at the code. I was able to compile it > but ran into some C99 issues with variable declarations. Once I got the code > to compile, I was able to take a closer look at what it was doing. From what > I could tell, it looks like the sflow integration is based around reading XDR > packets from an sflow agent and turning them into gmond spoofing metrics. My > first question after seeing this is why does this code have to be built into > gmond.c? Why can't it just do the same thing in a module that would be > plugged into gmond? >> >> The reason why I ask this is because we went to a lot of work to pull all of > the metric gathering out of gmond and into modules (including all of the > standard metrics). Some of the main reasons for this is so that metric > gathering could be pluggable without having to affect the gmond code itself. > That way if a bug was ever found and fixed for a specific metrics, we > wouldn't have to re-release all of Ganglia just for one metric fix. Also, > modules give the user the ability to customize each gmond agent to conform to > the specific needs of the node where gmond is running. Regarding sflow, it > seems that in order to integrate the sflow metrics into the Ganglia > monitoring system, only a single gmond node needs to be configured to gather > the sflow metrics. All of the other gmond agents can continue to be > configured and run as they were. Given that, it would make more sense to > integrate sflow as a module that could be loaded under a single gmond agent > rather than replacing all the gmond agents or even upgrading just a single > agent. It would also seem to follow the way that other metric modules and > spoofing modules have been implemented as well. > > > I am not very familiar with gmod modules, but it looks like they are > designed around a polling model and used to retrieve metrics from the server > that the particular instance of gmond is running on: > 1. a module is loaded in the modules section of the gmond.conf file and > registers a set of metrics it can provide > 2. metrics are then included in collection_group sections and polled at the > specified intervals > > With sFlow, the counters are being pushed by remote servers. There may be > hundreds of sFlow agents sending XDR packets to the single gmond instance. > Our code acts as a gateway, translating the metrics from the remote hosts and > presenting them as if they had arrived in the form of Ganglia XDR datagrams > from remote gmond instances. This function needs to be part of the main > datagram processing loop. I don't see a way for a module to inject code into > the packet processing loop(?) > The way that I would envision an sflow module working would be similar to the spoofing example module that is currently checked into the Ganglia SVN repository. The spoofing module can be found at http://ganglia.svn.sourceforge.net/viewvc/ganglia/trunk/monitor-core/gmond/python_modules/example/spfexample.py?revision=1895&view=markup . Unfortunately it is a python module example rather than a C module but hopefully you can get the idea of what I am talking about from the code. One application of this kind of spoofing module would be to load it under a gmond instance running on a VM host. It would then query each VM running on the box and register a set of spoofed metrics for each VM. From that point on, the module just reports the metrics for each spoofed VM and returns them as if gmond were running on each of the VMs. I actually have another python module that does exactly that, but I haven't been able to release the source code for it yet. You can also look at the modpython.c module to get an idea of how to do the spoofing in C code. But then you guys have already worked with the spoofing code as part of the patch that you already did so you probably already know how that works. Basically an sflow module would be loaded like any other module and a collection interval would be set in the configuration file. In the sflow module itself, register a spoofed metric for each managed sflow monitored node. How you get the list of nodes to register is up to you. It could be from the gmond.conf file, some other configuration file or by listening to the sflow data packets themselves. The module would then start a thread that would read the XDR packets in exactly t
Re: [Ganglia-developers] sFlow counters in Ganglia
>>> On 9/13/2010 at 3:07 PM, in message , "Peter Phaal" wrote: > We have started this project and the pieces seem to be falling into place > nicely. We already have the first metrics showing up in the web interface. > > The changes needed to implement sFlow support are contained entirely within > the gmond.c file and are limited to the process_udp_recv_channel method. > > Adding the following lines to the gmond.conf file enables sFlow support: > /* sFlow channel */ > /* Note: 6343 is the IANA registered port for the sFlow protocol */ > udp_recv_channel { > port = 6343 > } > > Our initial goal is to populate all the standard metrics from libmetrics. > Once we have that working, we will send a patch containing the changes to > gmond.c. > Sorry to jump into this thread so late but I thought that I would throw my 2 cents in. I finally got a chance to take a look at the code. I was able to compile it but ran into some C99 issues with variable declarations. Once I got the code to compile, I was able to take a closer look at what it was doing. From what I could tell, it looks like the sflow integration is based around reading XDR packets from an sflow agent and turning them into gmond spoofing metrics. My first question after seeing this is why does this code have to be built into gmond.c? Why can't it just do the same thing in a module that would be plugged into gmond? The reason why I ask this is because we went to a lot of work to pull all of the metric gathering out of gmond and into modules (including all of the standard metrics). Some of the main reasons for this is so that metric gathering could be pluggable without having to affect the gmond code itself. That way if a bug was ever found and fixed for a specific metrics, we wouldn't have to re-release all of Ganglia just for one metric fix. Also, modules give the user the ability to customize each gmond agent to conform to the specific needs of the node where gmond is running. Regarding sflow, it seems that in order to integrate the sflow metrics into the Ganglia monitoring system, only a single gmond node needs to be configured to gather the sflow metrics. All of the other gmond agents can continue to be configured and run as they were. Given that, it would make more sense to integrate sflow as a module that could be loaded under a single gmond agent rather than replacing all the gmond agents or even upgrading just a single agent. It would also seem to follow the way that other metric modules and spoofing modules have been implemented as well. Implementing the sflow integration as a module would also allow it to change whenever a newer version of sflow is released or whenever the sflow spec or transport changes. A user could simply upgrade his ganglia sflow module and be up to date with the latest spec without having to wait for the Ganglia project to re-release ganglia. Anyway, the more that I am learning about sflow and what it does especially in relation to Ganglia and what it does, this all seems like a really cool idea. I am looking forward to seeing this integration done especially if it is through a pluggable module. Brad -- Beautiful is writing same markup. Internet Explorer 9 supports standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 & L3. Spend less time writing and rewriting code and more time creating great experiences on the web. Be a part of the beta today. http://p.sf.net/sfu/beautyoftheweb ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] 3.0.x release, make 3.1 maintenance branch, create 3.2 release branch
>>> On 10/6/2010 at 4:23 PM, in message , Bernard Li wrote: > Hi all: > > I'd like to request that we make one last 3.0.x bug-fix release and > EOL that branch. This will make way for shifting 3.1 to maintenance > branch and creating a 3.2. release branch from trunk. > > Thoughts? > +1, I don't know how many people are still on 3.0.x but it seems like most of the activity, at least from the mailing lists, seems to be on 3.1 Brad -- Beautiful is writing same markup. Internet Explorer 9 supports standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 & L3. Spend less time writing and rewriting code and more time creating great experiences on the web. Be a part of the beta today. http://p.sf.net/sfu/beautyoftheweb ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Replaced TemplatePower with Dwoo in trunk web code
>>> On 9/9/2010 at 09:19 AM, in message <716533.84440...@web113303.mail.gq1.yahoo.com>, Martin Knoblauch wrote: > Hi, > > my € 0.02 are: > > +1 for Trunk > -1 for 3.1.x and 3.0.x. > > why am I opposed to the backports? Dwoo introduces considerable new > infrastructure that *I* view not suitable for the "stable" and "legacy" > trees. > Both of them are bug-fix only in my opinion. It is fine for trunk, no > question. > > Does the GPL licensing cause any real issues to end users? Just curious. > The > situation is pretty old by now anyway. > The GPL could cause problems for the end user of they decide to modify or customize the frontend and then redistribute it. In this case they would be required to release the source code. That may not be an issue since all of the code is PHP script code anyway. But the bigger issues is that there is a license conflict which can bring heartache to some customers. The fact that we have an option in trunk whether we backport to the 3.1 or 3.0 branches, gives end users an option if they feel nervous about powertemplate. At the very least, we should probably backport the Dwoo code to 3.1 and 3.0 and make if available even if we don't put the backport in the main repository branch. Brad -- This SF.net Dev2Dev email is sponsored by: Show off your parallel programming skills. Enter the Intel(R) Threading Challenge 2010. http://p.sf.net/sfu/intel-thread-sfd ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Replaced TemplatePower with Dwoo in trunk web code
>>> On 9/9/2010 at 7:30 AM, in message <20100909133028.gb31...@sajinet.com.pe>, Carlo Marcelo Arenas Belon wrote: > On Wed, Sep 08, 2010 at 05:00:28PM -0700, Bernard Li wrote: >> >> Just a quick note saying that I have replaced TemplatePower with Dwoo >> as our PHP templating engine in the trunk web code: > > why would we want to do that and throw away useful and time > tested code? why would we do this in trunk destabilizing the > development branch instead of in an independent branch which > could be tested and validated before it gets merged into trunk > if proven to at least be as usefull as the old code?, what is > the scope of the work that is required on the templates and > the rest of the PHP code to make this transition? > >> Please test and report any issues (especially security related). I'd >> like to get this backported to 3.1.x and 3.0.x branches soon. > > -1 in both accounts, they are both maintenance branches and shouldn't > have any major rearchitecture done in them. > >> Dwoo is modified/new BSD-licensed, which is the same as Ganglia. > > and this doesn't make a difference at all AFAIK with the fact that > templatepower is GPL and some of the PHP code LGPL (for a discusion > on that read old threads on this issue) specially considering that > the frontend code is not "linked" with anything else. > Bernard is talking about the Ganglia license as a whole which includes the front end code. The fact that templatepower is GPL has a direct affect on the front end code and therefore affects Ganglia as a project. It is true that the front end code does not link with any of the backend (ie. gmond, gmetad), but it does link with all of the PHP code. Therefore removing templatepower not only from trunk but from the 3.1 and 3.0 branches as well, relieves our end users from having to worry about licensing and any modifications that they make to their customized PHP code. Basically this move just brings Ganglia more inline with regards to licensing. I don't see any harm in replacing templatepower in trunk and then after a sufficient amount of testing, backporting this change to the 3.1 and 3.0 branches. That is exactly the purpose of trunk and complies with the guidelines that we have established on the wiki. BTW, the 3.1.x branch is not a maintenance branch. It is currently our release branch. Any new releases of Ganglia will be produced from the 3.1.x branch. In addition, trunk is our development branch which should allow for new contributions at any time. Therefore being able to move forward with a change like incorporating Dwoo rather than TemplatePower in trunk, for whatever reason, is the appropriate thing to do. Brad -- This SF.net Dev2Dev email is sponsored by: Show off your parallel programming skills. Enter the Intel(R) Threading Challenge 2010. http://p.sf.net/sfu/intel-thread-sfd ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] [Ganglia-general] Spoofing functionality in 3.1.x branch...
>>> On 6/24/2010 at 4:37 PM, in message , Bernard Li I don't think that I intended for the spoofing example module to be included in the distribution tarball. If I remember right, I think I checked it in just so that the knowledge on how to do it didn't get lost. If you want to include it, you can. But I don't think that it is really necessary and it certainly shouldn't be enabled since it really does nothing at all. Brad wrote: > Hi Brad: > > I am doing some cleanup in the trunk repo and found that the > spfexample (python module) was not included in the distribution > tarball and/or backported to the 3.1.x branch. Should it be? I'm not > saying we should activate it by default, but should just include it > much like the other example python module. > > Thanks, > > Bernard > > On Thu, Dec 4, 2008 at 9:14 AM, Brad Nicholes wrote: >> >> For those that are interested in the module based spoofing feature, all of > the functionality should be complete and has been backported to the 3.1.x > branch. I have also added some spoofing module examples to trunk that can be > downloaded from monitor-core/gmond/python_modules/example/spfexample.py in > the > trunk repository. There is also a small .pyconf file in > monitor-core/gmond/python_modules/conf.d/spfexample.pyconf. This example > module should give you enough guidance so that you can build your own > spoofing module. Please let me know if anything is missing. >> >> Brad >> >> >> - >> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge >> Build the coolest Linux based applications with Moblin SDK & win great prizes >> Grand prize is a trip for two to an Open Source event anywhere in the world >> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >> ___ >> Ganglia-general mailing list >> ganglia-gene...@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/ganglia-general >> -- This SF.net email is sponsored by Sprint What will you do first with EVO, the first 4G phone? Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] RFE suggestions for Ganglia 3.1.7
>>> On 6/1/2010 at 10:56 AM, in message , Jesse Becker wrote: > On Tue, Jun 1, 2010 at 11:52, Art Peck wrote: >> >> I am very impressed with Ganglia 3.1.7. I would like to create an addon > package to facilitate monitoring of the Oracle Sun Ray Server Software and > associated desktop devices. >> >> I've already created one Python module and integrated it into gmond and > gmetad. For the most part, it is working as I wanted. However, I would really > like to be able to manipulate the formatting of the resulting graph(s). For > example, I would greatly prefer a line graph to an area and I really need to > STACK several metrics on a graph. So I have the follow RFE's: > > Writing custom graphing modules has been more easily supported for > several releases now. If you have need to create customized charts > (like many of us do), there is a well documented framework for doing > so. Take a look at the various *_report.php files in the web/graph.d > directory of your ganglia installation. I've specifically written a > storage report script that uses stacked graphs. > >> (1) Extend the descriptor dictionary to include key=value pairs that get > passed to gmetd and the web frontend allowing for specification of more of > the rrdgraph formatting options. Maybe something like 'graph_type' = 'Line', > 'line_color' = 'Red', 'background_color' = 'Lt Blue', etc > > I believe that you could "fake" this already using string metrics. > This is true. You can actually add anything that you want to the metric definition in your module. If Gmond does not recognize the extra elements in the definition, it will simply add the key/value pair as in the XML that is produced by gmond. Both gmond and gmetad will ignore extra data and just continue to pass it through in the XML. At that point you could write a specialized graph module in the web front end that does understand the extra data tags in the XML and reacts to them appropriately. Brad -- ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] [Ganglia-general] ganglia using ports < 1024?
>>> On 3/24/2010 at 07:22 PM, in message <57153a00-be68-4cee-b867-4cea86925...@crackpot.org>, Alex Dean wrote: > On Mar 24, 2010, at 3:40 PM, Brad Nicholes wrote: > >> [Moved to the dev list] >> >>>>> On 3/22/2010 at 7:49 AM, in message >> , Winnie >> Lacesso >> wrote: >> >>> Dear All, >>> >>> I'm new to ganglia, but it looks wonderful. >>> Platform is Scientific Linux 4 & 5. >>> Due to no firewall > 1024 it might be more prudent for ganglia to use >>> ports < 1024 - I'm nervous to have anything php-related listening on >>> unfirewalled network. Am trying to configure ganglia using ports 849, >>> 851 for xml & 852 for interactive. >>> Since gmetad & gmond run as user ganglia, I think this is the >>> problem - it >>> can't create ports < 1024? >>> Errors are >>> tcp_listen() on xml_port failed: Permission denied >>> >>> I can't seem to find any example where ganglia is used with ports < >>> 1024. >>> Impossible? >>> Pointers gratefully deferenced >>> >> >> The problem here is that the socket creation is happening after >> gmond has dropped root. Therefore unless the runas user has >> sufficient rights to create a port <1024 you will run into problems >> trying to use ports <1024. >> >> I know there was a lot of discussion several months ago about when >> to daemonize and when to setuid. Can somebody who is more familiar >> with the discussion respond with what the outcome was? Carlo, >> Daniel maybe?? >> >> Can we just move the calls to setup_listen_channels_pollset and >> Ganglia_udp_send_channels_create before the call to >> setid_if_necessary? It would still be happening after the call to >> daemonize_if_necessary. > > There's nothing written in PHP listening anywhere is there? No daemon > processes written in PHP. PHP is run as an Apache module, so it will > be on whatever port Apache is listening on and (presumably) will be > running as an unprivileged user. No, this is an issue with gmond only Brad -- Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] [Ganglia-general] ganglia using ports < 1024?
[Moved to the dev list] >>> On 3/22/2010 at 7:49 AM, in message , Winnie Lacesso wrote: > Dear All, > > I'm new to ganglia, but it looks wonderful. > Platform is Scientific Linux 4 & 5. > Due to no firewall > 1024 it might be more prudent for ganglia to use > ports < 1024 - I'm nervous to have anything php-related listening on > unfirewalled network. Am trying to configure ganglia using ports 849, > 851 for xml & 852 for interactive. > Since gmetad & gmond run as user ganglia, I think this is the problem - it > can't create ports < 1024? > Errors are > tcp_listen() on xml_port failed: Permission denied > > I can't seem to find any example where ganglia is used with ports < 1024. > Impossible? > Pointers gratefully deferenced > The problem here is that the socket creation is happening after gmond has dropped root. Therefore unless the runas user has sufficient rights to create a port <1024 you will run into problems trying to use ports <1024. I know there was a lot of discussion several months ago about when to daemonize and when to setuid. Can somebody who is more familiar with the discussion respond with what the outcome was? Carlo, Daniel maybe?? Can we just move the calls to setup_listen_channels_pollset and Ganglia_udp_send_channels_create before the call to setid_if_necessary? It would still be happening after the call to daemonize_if_necessary. Brad -- Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] gmond fails with "apr_pollset_create failed: Invalid argument" when no udp_recv_channels or tcp_accept_channels are defined
>>> On 3/19/2010 at 4:03 PM, in message , Bernard Li wrote: > Dear all: > > Looks like we have a bug in setup_listen_channels_pollset() in gmond.c. > > If your gmond.conf has no udp_recv_channel or tcp_accept_channel > defined, gmond will fail to run with the error message: > > apr_pollset_create failed: Invalid argument > > The error checking for apr_pollset_create() was recently implemented > since r2041. > > The issue seems to be that on certain platform, apr_pollset_create() > will fail if "total_listen_channels" = 0 (this is the "size" argument > according to the apr_pollset_create definition). > > Previously, since there was no error checking, the code would continue > merrily without erroring out. listen_channels will still be NULL and > thus would set deaf = 1 in main(). Now since we have error checking, > it actually bombs out. > > One fix is basically to check whether total_listen_channels is 0 prior > to the apr_pollset_create() call and if so just return. This should > give the same behaviour as before. > > So far I have been able to reproduce it on CentOS 5 x86_64. However, > there has been conflicting reports regarding whether this fails on > Ubuntu 9.04 or not. So if you guys could test this out and report > back what platforms you encounter this bug, that would be great. > > To reproduce the bug, simply comment out the udp_recv_channel and > tcp_accept_channel clauses and run gmond. It should fail with the > error message mentioned. > Fails on SLED-10 [glibc-2.4] Works on OpenSuse 11.2 [glibc-2.10] +1 on the fix that you suggested. Brad -- Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] some changes
That's not an uncommon situation to be in. I haven't been able to contribute as much to the project as I did in the past for many of the same reasons. Just don't wander off too far. It's always nice to have a good developer around. Brad >>> On 3/8/2010 at 2:10 PM, in message <4b9567bb.1030...@pocock.com.au>, Daniel Pocock wrote: > > Hi everyone, > > As some of you are aware, I have been employed to work full time on a > project involving the customization, deployment and reporting from > Ganglia in a large enterprise. This has allowed me to dedicate some of > my time to collaborating with the open source community to (hopefully) > improve what was already quite a neat product before I was first > introduced to it. > > In the near future I am making a transition back to the world of IT > consulting and contracting. Although I do see myself using and > contributing to Ganglia in the future, it may not be on the same scale > as what I have done over the last couple of years. > > I think it's quite important that I make people aware of this because of > my role as the release manager for the most current release, 3.1.7. I > am very aware that some of the changes I have introduced are a little > controversial and some people may have preferred alternative solutions > (or maintained the status quo). I am also aware that some of these > changes will lead to some additional requests for clarification and > support on the user's email list. > > One final comment: I just want to thank all those who have contributed > to a project that is being used very successfully on an enormously large > scale. When participating in this project through the mailing list and > IRC, I can't help noticing that there is an observable difference > between the quality of the support and interaction here as compared to > support from some large, well paid commercial vendors that I have dealt > with over the years. > > Regards, > > Daniel > > > -- > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > ___ > Ganglia-developers mailing list > Ganglia-developers@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/ganglia-developers -- Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Problems building trunk r2290
Makes sense to me Brad >>> On 3/3/2010 at 1:54 PM, in message , Bernard Li wrote: > Hi Brad: > > Thanks for the confirmation. > > However, I have another issue related to the first problem. Basically > my x86_64 CentOS is detected as "x86_64-unknown-linux-gnu" and thus it > is setting LIB_SUFFIX to "lib" instead of "lib64". > > Do RHEL hosts really identify themselves as x86_64-redhat-linux*? How > about Fedora? I do confirm that this works as expected on an openSUSE > box. > > Perhaps we should reverse the logic and make a special case for Debian > instead? > > Cheers, > > Bernard > > On Wed, Mar 3, 2010 at 11:28 AM, Brad Nicholes wrote: >> I am seeing the same thing. To get past it I just hardcoded the path to the > sed utility. I'm guessing that either some platforms or some version of > libtool isn't setting the SED environment variable but the configure script > assumes that it is. >> >> Brad >> >>>>> On 3/2/2010 at 6:21 PM, in message >> , Bernard Li >> wrote: >>> Hi all: >>> >>> I am having problems building trunk r2290. >>> >>> Specifically I have 2 issues: >>> >>> 1) During ./configure >>> >>> ./configure: line 20056: syntax error near unexpected token `)' >>> ./configure: line 20056: `x86_64-suse-linux*)' >>> >>> 2) During make -C web conf.php >>> >>> make: Entering directory `/root/code/ganglia.trunk/web' >>> ../scripts/fixconfig conf.php.in >>> ../scripts/fixconfig: line 60: @SED@: command not found >>> ../scripts/fixconfig: line 67: : No such file or directory >>> make: *** [conf.php] Error 1 >>> make: Leaving directory `/root/code/ganglia.trunk/web' >>> >>> The first issue could be fixed by the following patch: >>> >>> Index: configure.in >>> === >>> --- configure.in (revision 2290) >>> +++ configure.in (working copy) >>> @@ -341,8 +341,7 @@ >>> # (insert others here) >>> LIB_SUFFIX=lib >>> case $host in >>> -x86_64-redhat-linux*) >>> -x86_64-suse-linux*) >>> +x86_64-redhat-linux* | x86_64-suse-linux*) >>>LIB_SUFFIX=lib64 >>>;; >>> esac >>> >>> The second issue... does it have something to do with the old >>> autotools that I'm using? >>> >>> Thanks, >>> >>> Bernard >>> >>> -- >>> Download Intel® Parallel Studio Eval >>> Try the new software tools for yourself. Speed compiling, find bugs >>> proactively, and fine-tune applications for parallel performance. >>> See why Intel Parallel Studio got high marks during beta. >>> http://p.sf.net/sfu/intel-sw-dev >>> ___ >>> Ganglia-developers mailing list >>> Ganglia-developers@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/ganglia-developers >> >> >> >> >> -- >> Download Intel® Parallel Studio Eval >> Try the new software tools for yourself. Speed compiling, find bugs >> proactively, and fine-tune applications for parallel performance. >> See why Intel Parallel Studio got high marks during beta. >> http://p.sf.net/sfu/intel-sw-dev >> ___ >> Ganglia-developers mailing list >> Ganglia-developers@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/ganglia-developers >> -- Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Problems building trunk r2290
I am seeing the same thing. To get past it I just hardcoded the path to the sed utility. I'm guessing that either some platforms or some version of libtool isn't setting the SED environment variable but the configure script assumes that it is. Brad >>> On 3/2/2010 at 6:21 PM, in message , Bernard Li wrote: > Hi all: > > I am having problems building trunk r2290. > > Specifically I have 2 issues: > > 1) During ./configure > > ./configure: line 20056: syntax error near unexpected token `)' > ./configure: line 20056: `x86_64-suse-linux*)' > > 2) During make -C web conf.php > > make: Entering directory `/root/code/ganglia.trunk/web' > ../scripts/fixconfig conf.php.in > ../scripts/fixconfig: line 60: @SED@: command not found > ../scripts/fixconfig: line 67: : No such file or directory > make: *** [conf.php] Error 1 > make: Leaving directory `/root/code/ganglia.trunk/web' > > The first issue could be fixed by the following patch: > > Index: configure.in > === > --- configure.in (revision 2290) > +++ configure.in (working copy) > @@ -341,8 +341,7 @@ > # (insert others here) > LIB_SUFFIX=lib > case $host in > -x86_64-redhat-linux*) > -x86_64-suse-linux*) > +x86_64-redhat-linux* | x86_64-suse-linux*) >LIB_SUFFIX=lib64 >;; > esac > > The second issue... does it have something to do with the old > autotools that I'm using? > > Thanks, > > Bernard > > -- > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > ___ > Ganglia-developers mailing list > Ganglia-developers@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/ganglia-developers -- Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] [Ganglia-general] Ganglia 3.1.7 ready for testing
>>> On 3/2/2010 at 4:23 AM, in message <4b8cf534.7090...@pocock.com.au>, Daniel Pocock wrote: > Thanks to those who provided feedback - any objections to making 3.1.7 > generally available? I would like to make it GA within the next 1-2 > days now. > > +1 > Michael Perzl wrote: >> I have successfully compiled and tested 3.1.7 on >> - AIX 5.1 ML04 >> - AIX 5.3 ML00 >> - AIX 5.3 TL07 >> - AIX 6.1 TL03 >> >> Regards, >> Michael >> >> On 02/22/2010 12:15 PM, Daniel Pocock wrote: >> >>> Just a reminder - any feedback is welcome, or feel free to discuss 3.1.7 >>> on IRC >>> >>> It would be good to have positive confirmation of which platforms this >>> has been tested on, so far, I have tested >>> - Debian lenny, >>> - RHEL3/4/5, >>> - CentOS 5, >>> - Solaris 8 and >>> - Cygwin. >>> >>> and Brad has done some testing on SLES10 >>> >>> Regards, >>> >>> Daniel >>> >>> Daniel Pocock wrote: >>> >>> I've tagged 3.1.7 and built a tarball: http://ganglia.info/testing/ganglia-3.1.7.tar.gz The md5sum for 3.1.7 is: 6aa5e2109c2cc8007a6def0799cf1b4c Since 3.1.6, only two things have changed and may need to be tested again by those who tested 3.1.6: - the build system (support for commas in CFLAGS) - the multicpu module - percentages reported differently This is not confirmation that the release is in GA status - a further notification will be sent when the testing period has elapsed without any serious defect. Users are invited to test the tarball and submit feedback. Please do not commit on branches/monitor-core-3.1 until after 3.1.7 goes GA, in case further tweaks are needed to facilitate a successful release. Below are the release notes from the STATUS file. Other documentation has also changed since 3.1.2 and should be reviewed: GANGLIA 3.1 STATUS: -*-text-*- Last modified at [$Date: 2010-02-17 11:01:08 + (Wed, 17 Feb 2010) $] The current version of this file can be found at: * > http://ganglia.svn.sourceforge.net/svnroot/ganglia/branches/monitor-core-3.1/ST > > ATUS Release history: 3.1.7 : Tagged: Feb 17, 2010 3.1.6 : Tagged: Feb 4, 2010 (not released for GA) 3.1.5(hargrave) : Tagged: Nov 24, 2009 (not released for GA) 3.1.4(hargrave) : Tagged: Oct 26, 2009 (not released for GA) 3.1.3(avenger): Tagged: Sep 19, 2009 (not released for GA) 3.1.2(langley): Released: Feb 17, 2009 3.1.1(wien) : Released: Sep 10, 2008 3.1.0(amelia) : Released: Jul 30, 2008 Contributors looking for a mission: * Just do an egrep on "TODO", "XXX" or "FIXME" in the source. * Review the bug database at: http://bugzilla.ganglia.info/ * Open bugs in the bug database. * Implement a feature from the wishlist at: http://sourceforge.net/apps/trac/ganglia/wiki/ganglia_wish-list CURRENT RELEASE NOTES: (Please update this area with a brief description of bug fixes and enhancements that have been backported for the current release) Note: 3.1.3, 3.1.4, 3.1.5 and 3.1.6 never became GA, therefore, the release notes for all of them are combined below. 3.1.7: * Fix build support for RHEL5/issue with commas in CFLAGS * multicpu module: show CPU utilization as a value between 0-100% for each core 3.1.6: * Merge commit 1966 from trunk to fix "contrib/removespikes.pl" * Bootstrapping with Debian 5.0 (lenny) versions of autotools for this and future releases. > http://www.mail-archive.com/ganglia-developers@lists.sourceforge.net/msg05352.h > > tml > http://www.mail-archive.com/ganglia-gene...@lists.sourceforge.net/msg04688.html > > * Require user to explicitly specify sysconfdir when building from source, due to the fact that the old behavior was not consistent with the documented behavior. * Configuration files and scripts are now created during the install phase rather than during configure. This allows values such as @sysconfdir@ to be used in the template configuration files. * Abolish the use of release names - only release numbers will be used to distinguish versions in future * libmetrics: workaround system header conflict in DFBSD>= 2.4 (BUG245) * Use PCRE regex matching to configure metrics using the name_match directive * rrdcached support * gmetad now uses apr and the sleep intervals between polls are randomized in a way that supports shorter polling intervals * FreeBSD support: fixes for crashes and disk statistics (BUG153)
Re: [Ganglia-developers] Ganglia 3.1.7 ready for testing
Gmond, So far so good on SUSE Linux 10. Built, installed, gathering metrics without a problem :) Still need to install Gmetad on my server. Brad >>> On 2/17/2010 at 4:31 AM, in message <4b7bd38c.3010...@pocock.com.au>, Daniel Pocock wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > > I've tagged 3.1.7 and built a tarball: > > http://ganglia.info/testing/ganglia-3.1.7.tar.gz > > The md5sum for 3.1.7 is: 6aa5e2109c2cc8007a6def0799cf1b4c > > Since 3.1.6, only two things have changed and may need to be tested > again by those who tested 3.1.6: > - the build system (support for commas in CFLAGS) > - the multicpu module - percentages reported differently > > This is not confirmation that the release is in GA status - a further > notification will be sent when the testing period has elapsed without > any serious defect. Users are invited to test the tarball and submit > feedback. > > Please do not commit on branches/monitor-core-3.1 until after 3.1.7 > goes GA, in case further tweaks are needed to facilitate a successful > release. > > Below are the release notes from the STATUS file. Other documentation > has also changed since 3.1.2 and should be reviewed: > > GANGLIA 3.1 STATUS: -*-text-*- > Last modified at [$Date: 2010-02-17 11:01:08 + (Wed, 17 Feb 2010) $] > > The current version of this file can be found at: > > * > http://ganglia.svn.sourceforge.net/svnroot/ganglia/branches/monitor-core-3.1/S > > TATUS > > Release history: > > 3.1.7 : Tagged: Feb 17, 2010 > 3.1.6 : Tagged: Feb 4, 2010 (not released for GA) > 3.1.5(hargrave) : Tagged: Nov 24, 2009 (not released for GA) > 3.1.4(hargrave) : Tagged: Oct 26, 2009 (not released for GA) > 3.1.3(avenger): Tagged: Sep 19, 2009 (not released for GA) > 3.1.2(langley): Released: Feb 17, 2009 > 3.1.1(wien) : Released: Sep 10, 2008 > 3.1.0(amelia) : Released: Jul 30, 2008 > > Contributors looking for a mission: > > * Just do an egrep on "TODO", "XXX" or "FIXME" in the source. > * Review the bug database at: http://bugzilla.ganglia.info/ > * Open bugs in the bug database. > * Implement a feature from the wishlist at: > http://sourceforge.net/apps/trac/ganglia/wiki/ganglia_wish-list > > CURRENT RELEASE NOTES: > (Please update this area with a brief description of bug fixes and >enhancements that have been backported for the current release) > > Note: 3.1.3, 3.1.4, 3.1.5 and 3.1.6 never became GA, therefore, > the release notes for all of them are combined below. > > 3.1.7: > > * Fix build support for RHEL5/issue with commas in CFLAGS > * multicpu module: show CPU utilization as a value between 0-100% for > each core > > 3.1.6: > > * Merge commit 1966 from trunk to fix "contrib/removespikes.pl" > * Bootstrapping with Debian 5.0 (lenny) versions of autotools for > this and future releases. > > http://www.mail-archive.com/ganglia-developers@lists.sourceforge.net/msg05352. > html > > http://www.mail-archive.com/ganglia-gene...@lists.sourceforge.net/msg04688.htm > > l > * Require user to explicitly specify sysconfdir when building from > source, > due to the fact that the old behavior was not consistent with the > documented behavior. > * Configuration files and scripts are now created during the install > phase > rather than during configure. This allows values such as > @sysconfdir@ > to be used in the template configuration files. > * Abolish the use of release names - only release numbers will be used > to distinguish versions in future > * libmetrics: workaround system header conflict in DFBSD >= 2.4 (BUG245) > * Use PCRE regex matching to configure metrics using the name_match > directive > * rrdcached support > * gmetad now uses apr and the sleep intervals between polls are > randomized > in a way that supports shorter polling intervals > * FreeBSD support: fixes for crashes and disk statistics (BUG153) > * Further tweaks to Solaris build support (remove C99 hack) > * Eliminate conflict with ncpus symbol name on older Solaris > * AIX support: determine if the host is a virtual server (BUG226) > * AIX support: setting linker flags (BUG227), add -lm > * AIX support: tweaks for AIX >= v6.1 > * AIX support: revised init scripts for gmond and gmetad > * Check for Python.h explicitly > * Include the necessary Python files in the distribution tarball, > regardless > of how BUILD_PYTHON is set (r2215). > * Remove references to GNU toolchain in documentation > * Fortify write_data_to_rrd against overflows > * Web interface: minor formatting changes > * mcast_if implementation tweaked so that the send channel will be bound > to the IP of the outgoing interface > * Documentation updates relating to the options for multihomed hosts, > particularly bind, bind_hostname and mcast_if
Re: [Ganglia-developers] multicpu module: r2116 and other issues
I'm not sure that this is the kind of response that you are looking for on this issue, but I would tend to agree with your alternative options below. I don't think that it is mandatory for every module to work on every platform. One of the main purposes for modules is to make it easier for metrics to be added or removed from the overall gathering system. With Ganglia 3.0.x and below, this was not an option so every built-in metric was required to work on every supported platform. With the modular concept, I don't believe that requiring every module to also work on every platform is necessary any more. What it does mean is that every module will work on every platform that has a developer interested in porting the module to a new platform. If there is nobody willing to step up to the plate or no real need for a particular metric on a certain platform, then why require the effort out of the few developers that we have? Where we know that some modules don't work on some platforms, it should be a simple matter of documenting that and also noting that we could use the help to extend the module. Brad >>> On 2/16/2010 at 8:03 AM, in message <4b7ab3ca.4090...@pocock.com.au>, Daniel Pocock wrote: > Some further multicpu comments, I've been looking at this discussion > about `Irix' mode: > > http://www.mail-archive.com/ganglia-gene...@lists.sourceforge.net/msg04567.htm > > l > > and I feel that there may be some confusion about Irix mode and Solaris > mode in top. > > The top man page says that Irix mode and Solaris mode should only impact > the display of per-task CPU stats. It doesn't appear to say that these > modes should impact the per-core stats. > > I also notice Carlo's patch on trunk (r2116) appears to be an attempt to > address this issue, although Carlo has mentioned more work is required. > > Can anyone else make any comment on this specific issue, what else they > expect from multicpu, or what flaws are outstanding? > > > Daniel Pocock wrote: >> I've been contemplating the multicpu module, which currently only works >> on Linux and Cygwin. >> >> Carlo has indicated that promoting it's use (as a consequence of the >> PCRE patch) may not be ideal for two reasons: >> >> a) bugs on the supported platforms (Linux and Cygwin) >> >> b) not functional on other platforms (e.g. Solaris) where it gives no >> meaningful error if a user tries to load it >> >> For the Solaris platform, I was considering the idea of a generic kstat >> module. It would generate thousands of metric names (gmond -m output), >> but CPU metrics could then be selectively enabled with the PCRE >> support. So a dedicated multicpu module for Solaris may not be needed. >> >> I don't think it is necessary for every module to run on every platform >> - maybe this one just shouldn't be compiled at all except on Linux and >> Cygwin. >> >> Maybe it is also possible to consider some other options: >> >> a) mark some modules as experimental/beta, and have a single configure >> option for enabling all experimental modules, a separate package for >> them, etc >> >> b) split the development of some modules from the monitor-core-3.1 >> branch so that they don't hold back releases >> >> >> >> -- >> This SF.Net email is sponsored by the Verizon Developer Community >> Take advantage of Verizon's best-in-class app development support >> A streamlined, 14 day to market process makes app distribution fast and easy >> Join now and get one step closer to millions of Verizon customers >> http://p.sf.net/sfu/verizon-dev2dev >> ___ >> Ganglia-developers mailing list >> Ganglia-developers@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/ganglia-developers >> > > > -- > SOLARIS 10 is the OS for Data Centers - provides features such as DTrace, > Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW > http://p.sf.net/sfu/solaris-dev2dev > ___ > Ganglia-developers mailing list > Ganglia-developers@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/ganglia-developers -- SOLARIS 10 is the OS for Data Centers - provides features such as DTrace, Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW http://p.sf.net/sfu/solaris-dev2dev ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] versioning confusion
>>> On 2/8/2010 at 1:39 PM, in message <20100208203937.ga22...@gentoo.org>, >>> Justin Bronder wrote: > On 08/02/10 20:04 +, Daniel Pocock wrote: >> >> >> So, why not put the "rc" or "pre" Tag into an GANGLIA_EXTRA_VERSION and >> >> embed >> >> that into the code. That way there would be no confusion about what is in >> >> the tarball. Then we could have as many testing releases before the final >> >> one. SVN tags are cheap. What am I missing? I mean, now we are confuing >> >> people with skipped "releases". >> >> >> >> >> > >> > Basically for the reasons that I mentioned above. Agreed that SVN tags >> > are > cheap but the major reasons are to reduce the number of publically available > tarballs and to make sure that the release process itself does not allow for > problems to creep into the code. By releasing exactly what we are testing, > it reduces the number of steps in the testing and release process and at the > same time ensures that an officially released tarball is exactly the same > tarball that was tested and approved by the community during the testing > period. Also remember that we haven't ever skipped a "release". We have > only skipped revision numbers. The Ganglia web site and the sourceforge > project site are still the definitive authority on what our current release > is. By simply checking those sites, there should be no question or confusion > on what our current release is. It would be a big mistake for someone to > pull a tarball from the testing download area and deploy that into their > production >> e >> > nvironment. Like every other project, the only official download area, >> > as > far as the Ganglia project is concerned, is the sourceforge down web page and > currently the latest release available on that site is 3.1.2. If, hopefully > in a few weeks, we release 3.1.6 or whatever the final revision number is, > that will become the official Ganglia release and it really doesn't matter > what happened to any of the previous revisions. >> > >> > >> >> >> I have tested on several platforms, and for 3.1.6, I provided snapshots >> every few days for other people to do testing, but one issue slipped >> through the cracks, so 3.1.7 will be released imminently to fix that. >> Maybe there needs to be a sign-off process, e.g. a RHEL user, a Solaris >> user, etc who must test the final snapshot before a tag is done, and >> maybe we should do that before 3.1.7 is tagged. >> >> I agree with Brad's point about releasing the tarball that has actually >> been tested. If we went through the process of signing-off the >> snapshot, then the process would need to be repeated for the tag too. >> >> There is another factor as well: I have been quite aggressive about >> fixing bugs and backporting minor functionality improvements. This was >> done between 3.1.2 and 3.1.3. There was then another whole bunch of >> stuff done between 3.1.5 and 3.1.6. At this stage, the intention is to >> make the minimum possible changes to provide a usable release (hopefully >> 3.1.7), and then some more pro-active bug fixing will resume again. > > > I think Martin's point is being missed here. Speaking as just a distro > maintainer, the use of rc and pre tags do provide some significant benefits. > Consider the following simplified workflow: > > - Development happens in branches/3.1.1. > - When a release is being considered, the version is updated to report > 3.1.1_rc1 and a copy of the branch is created in tags/3.1.1_rc1. > - Should bugs be found in rc1, development is again done in branches/3.1.1 > and when complete, tags/3.1.1_rc2 is created. > - And so on, until an rc is declared to be stable. Then only the version is > updated and tags/3.1.1 is created. > > If any changes that need to be backported are found in the above process, > they are committed to branches/3.1.0, and the same tagging rc process is > used. > > While a number of extra svn 'branches' are created during this process, you > can now create the multiple tarballs with no confusion as to from which svn > revision the originated from. Backporting from branch to branch should also > be fairly simple. > > This process has one major advantage. Each rc tarball can be packaged and > released by the distros for wider testing. As it stands now, that cannot be > done. For instance in Gentoo, I'd have no problem pushing rc's to overlays > that are marked for testing newer releases, but this requires the tarball > and > version to correctly change if what is packaged changes. > > > If the above came of as preachy, I apologize. Ganglia is a great product > and > this is just my suggestion. Thanks Justin, really the one thing that you suggested and that we aren't doing is actually creating the tag in SVN. What we have been doing is whenever we create snapshots we add a fourth revision number which happens to match the SVN revision that the tarball was created from. T
Re: [Ganglia-developers] versioning confusion
>>> On 2/5/2010 at 3:58 AM, in message <815639.84803...@web113311.mail.gq1.yahoo.com>, Martin Knoblauch wrote: > - Original Message ---- > >> From: Brad Nicholes >> To: Martin Knoblauch ; Ramon Bastiaans > >> Cc: "ganglia-developers@lists.sourceforge.net" > >> Sent: Thu, February 4, 2010 4:33:31 PM >> Subject: Re: [Ganglia-developers] versioning confusion >> >> >>> On 2/4/2010 at 6:50 AM, in message <4b6ad096.8030...@sara.nl>, Ramon >> Bastiaans >> wrote: >> > Ahh, I see. >> > >> > On 02/04/2010 12:11 PM, Martin Knoblauch wrote: >> >> >> >> If we were to make release candidates publically available with a release > number >> other than major.minor.revision (for example 3.1.3rc1), we would also be >> required to put this same release number in the source code itself to ensure > >> that there is a differentiation between a release candidate and the official >> release since both would be made public (one during the testing period and > the >> other being an official release). In order to transition the release > candidate, >> in this case to an official release, we would be required to explode the >> tarball, change the version number, retag SVN with the changed file and > revision >> number, re-boot strap the source code, recreate the tarball and then finally >> make the new tarball publically available under the final release number. > All >> of this leaves the final tarball open to potential problems. It just makes > more >> sense from a testing and release prospective to release the tarball in the > exact >> condition as it was tested. This leaves no possibility for errors or > problems >> creeping into the final released tarball. > > So, why not put the "rc" or "pre" Tag into an GANGLIA_EXTRA_VERSION and > embed > that into the code. That way there would be no confusion about what is in > the tarball. Then we could have as many testing releases before the final > one. SVN tags are cheap. What am I missing? I mean, now we are confuing > people with skipped "releases". > Basically for the reasons that I mentioned above. Agreed that SVN tags are cheap but the major reasons are to reduce the number of publically available tarballs and to make sure that the release process itself does not allow for problems to creep into the code. By releasing exactly what we are testing, it reduces the number of steps in the testing and release process and at the same time ensures that an officially released tarball is exactly the same tarball that was tested and approved by the community during the testing period. Also remember that we haven't ever skipped a "release". We have only skipped revision numbers. The Ganglia web site and the sourceforge project site are still the definitive authority on what our current release is. By simply checking those sites, there should be no question or confusion on what our current release is. It would be a big mistake for someone to pull a tarball from the testing download area and deploy that into their production environment. Like every other project, the only official download area, as far as the Ganglia project is concerned, is the sourceforge down web page and currently the latest release available on that site is 3.1.2. If, hopefully in a few weeks, we release 3.1.6 or whatever the final revision number is, that will become the official Ganglia release and it really doesn't matter what happened to any of the previous revisions. > > >> Another option would be to tag and tar the source code under the final > release >> version number and make it available for testing. Then if bugs are found > during >> testing, fix the bugs, retag and retar under the same version number. The >> problem with this is that we could end up with multiple different tarballs > all >> with the same version number publically available. The only way to tell > which >> one was the real release would be by the date on the tarball rather than > version >> number. >> > > much to convoluted and confusing. Agreed. > >> Anyway, you can read more about this process on the Ganglia wiki page at >> http://sourceforge.net/apps/trac/ganglia/wiki/how_project_works This > release >> process was basically patterned after the way that the Apache httpd project >> produces testing and official tarballs. >> > > As I said in the past, that process may work for Apache. I do not see many > skipped releases there. Maybe they have a more strict project management. > It is
Re: [Ganglia-developers] versioning confusion
>>> On 2/4/2010 at 8:42 AM, in message <4b6aeb00.1070...@pocock.com.au>, Daniel Pocock wrote: >> available. The only way to tell which one was the real release would be by > the date on the tarball rather than version number. >> > Not quite - we could digitally sign the release tarball at the point > where it is confirmed to be stable. People could make sure they had a > stable release by comparing the md5sum, something that they should do > anyway. > True, but I think that the version number is still the primary differentiator. Once the tarball has been exploded and deployed, we are back to file dates again. Brad -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Ganglia SLED and SLES testing...
>>> On 2/4/2010 at 12:49 AM, in message <4b6a7c0c.8070...@pocock.com.au>, Daniel Pocock wrote: >>> One other problem. After doing a configure and make I tried to do a make >>> distdir just to get the built files in the web directory. In the resulting >>> dist directory, nothing was built in the web directory. conf.php.in was >>> there but no conf.php etc. Is there something different I need to do now > to >>> get all of the .in files resolved in the web directory? >>> >>> Try >>> >>> make -C web conf.php version.php >>> >>> You'll notice that I've put something like that in the spec file >>> >> >> OK, that worked but it begs the question, shouldn't this just happen on a > "make", "make install" or "make dist*" rather than it being a separate > command? I'm sure there must be a good reason for it, just curious. Is this > documented somewhere other than .spec file? If I wanted to just pull the > tarball and build it myself, I shouldn't have to be guessing at the build > steps other than configure/make/make install. >> >> > `make install' has never actually installed the web files anywhere > > What has changed though is that conf.php and version.php are only > generated at the last moment rather than at the configure stage. > > However, I can probably tweak this a little more to ensure they are > generated for you, but you still have to copy them to where you want them. > > It would be nice to end up with a deployable web directory after a make dist or make distdir (or any of the other make dist* targets). Also one other issue that I ran into which I think is probably not a regression but it is annoying. If you explode the tarball then configure/make/make dist*, the make dist* will fail looking for ../scripts/svn2cl.sh. I think that this script is stripped out when the final tarball is created. Even if it were there, since the exploded tarball is not under source control, the script would fail anyway. Sorry for the late testing reports, Brad -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] versioning confusion
>>> On 2/4/2010 at 6:50 AM, in message <4b6ad096.8030...@sara.nl>, Ramon >>> Bastiaans wrote: > Ahh, I see. > > On 02/04/2010 12:11 PM, Martin Knoblauch wrote: >> >> 3.1.3 .. 3.1.5 were canned during testing. Apparently our process does not > allow for fixing bugs/regressions between tagging and final release, so it > was decided to never publish the intermediates. >> >> > Perhaps in stead of tagging the "public beta" releases as a final > version, they could be tagged as "release candidate". I.e. call it > 3.1.6rc1 or 3.1.6pre1 or something similar. > >> One of the reasons might be lack of good beta testing (which I am guilty > of myself :-(, but I do not really understand, why we couldn't just keep > 3.1.3 > as the name of the release. >> >> > The public beta's are a good way to counter that, but it seems a bit > silly to me to skip entire version levels just because of release > procedures. > The reason for skipping revision numbers is to make sure that we don't end up with confusion about a version in relation to what has already been released (I know, that statement in itself seems confusing but let me explain :). Each testing candidate is tagged with a release number and the tarball is built as if it were an official release. The tarball is then made available on the Ganglia site for testing. If the testing proves that the release candidate is valid, then the tarball is simply copied to the official release download site and becomes the official release. The status file and web page are also updated to reflect the release and corresponding release number. Under this process nothing had to be done to the actual physical tarball in order to transition it from a release candidate to an official release. BTW, the ganglia web site is correct, the current release is 3.1.2 and we are preparing 3.1.6 to go to testing. If we were to make release candidates publically available with a release number other than major.minor.revision (for example 3.1.3rc1), we would also be required to put this same release number in the source code itself to ensure that there is a differentiation between a release candidate and the official release since both would be made public (one during the testing period and the other being an official release). In order to transition the release candidate, in this case to an official release, we would be required to explode the tarball, change the version number, retag SVN with the changed file and revision number, re-boot strap the source code, recreate the tarball and then finally make the new tarball publically available under the final release number. All of this leaves the final tarball open to potential problems. It just makes more sense from a testing and release prospective to release the tarball in the exact condition as it was tested. This leaves no possibility for errors or problems creeping into the final released tarball. Another option would be to tag and tar the source code under the final release version number and make it available for testing. Then if bugs are found during testing, fix the bugs, retag and retar under the same version number. The problem with this is that we could end up with multiple different tarballs all with the same version number publically available. The only way to tell which one was the real release would be by the date on the tarball rather than version number. Anyway, you can read more about this process on the Ganglia wiki page at http://sourceforge.net/apps/trac/ganglia/wiki/how_project_works This release process was basically patterned after the way that the Apache httpd project produces testing and official tarballs. Brad -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Ganglia SLED and SLES testing...
>>> On 2/3/2010 at 3:41 PM, in message <4b69fbb1.6090...@pocock.com.au>, Daniel Pocock wrote: > Brad Nicholes wrote: >>>>> On 2/3/2010 at 03:06 PM, in message <4b69f36a.40...@pocock.com.au>, Daniel >>>>> >> Pocock wrote: >> >> >>>> I have tried a quick test of your latest snap shot and so far the only >>>> thing > >>>> >>> that I am seeing is that the include path to the conf.d directory in the >>> gmond.conf file is not getting set correctly. It is still pointing to >>> ./conf.d/*.conf rather than the value that was passed in with --sysconfdir. >>> >>> I'm not sure if this is a regression or not, but it is a problem. That is > as >>> far as I have tested so far. >>> >>>> >>>> >>> Some things to check: >>> >>> In your source tree, what is in lib/default_conf.h ? >>> >>> >> >> include ('" SYSCONFDIR "/conf.d/*.conf')\n >> >> >>> Does gmond/gmond.conf exist in the source tree? If so, it is used >>> instead of auto-generating >>> >>> >> >> No, after doing a make install, gmond.conf did not exist anywhere. I > created it using ./gmond -t > gmond.conf command >> >> > >>> What is the output of gmond -t? >>> >>> >> >> The result I got with the invalid gmond.conf file was with the above > command. But I just did it again and came up with a correct gmond.conf. So > I am thing user error right now. >> >> > Ok, please let me know if you can reproduce the problem >>> Remember, the spec file also tries to copy /etc/gmond.conf to >>> /etc/ganglia - you didn't have an old /etc/gmond.conf on the box? >>> >> >> I was actually trying to install into a test area rather than /etc/. That > might be why I didn't see a gmond.conf after the build if /etc/gmond.conf is > essentially hard coded. >> > It is only hard coded in the spec file, not in any binary > > The spec file looks for /etc/gmond.conf and moves it to > /etc/ganglia/gmond.conf > > rpm would also leave an existing gmond.conf intact > > You only get the fresh gmond.conf if none of (/etc/gmond.conf, > /etc/ganglia/gmond.conf) exists > >> One other problem. After doing a configure and make I tried to do a make > distdir just to get the built files in the web directory. In the resulting > dist directory, nothing was built in the web directory. conf.php.in was > there but no conf.php etc. Is there something different I need to do now to > get all of the .in files resolved in the web directory? >> >> > Try > > make -C web conf.php version.php > > You'll notice that I've put something like that in the spec file OK, that worked but it begs the question, shouldn't this just happen on a "make", "make install" or "make dist*" rather than it being a separate command? I'm sure there must be a good reason for it, just curious. Is this documented somewhere other than .spec file? If I wanted to just pull the tarball and build it myself, I shouldn't have to be guessing at the build steps other than configure/make/make install. Brad -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Ganglia SLED and SLES testing...
>>> On 2/3/2010 at 03:06 PM, in message <4b69f36a.40...@pocock.com.au>, Daniel Pocock wrote: >> I have tried a quick test of your latest snap shot and so far the only thing > that I am seeing is that the include path to the conf.d directory in the > gmond.conf file is not getting set correctly. It is still pointing to > ./conf.d/*.conf rather than the value that was passed in with --sysconfdir. > I'm not sure if this is a regression or not, but it is a problem. That is as > far as I have tested so far. >> >> > > > Some things to check: > > In your source tree, what is in lib/default_conf.h ? > include ('" SYSCONFDIR "/conf.d/*.conf')\n > Does gmond/gmond.conf exist in the source tree? If so, it is used > instead of auto-generating > No, after doing a make install, gmond.conf did not exist anywhere. I created it using ./gmond -t > gmond.conf command > What is the output of gmond -t? > The result I got with the invalid gmond.conf file was with the above command. But I just did it again and came up with a correct gmond.conf. So I am thing user error right now. > Remember, the spec file also tries to copy /etc/gmond.conf to > /etc/ganglia - you didn't have an old /etc/gmond.conf on the box? I was actually trying to install into a test area rather than /etc/. That might be why I didn't see a gmond.conf after the build if /etc/gmond.conf is essentially hard coded. One other problem. After doing a configure and make I tried to do a make distdir just to get the built files in the web directory. In the resulting dist directory, nothing was built in the web directory. conf.php.in was there but no conf.php etc. Is there something different I need to do now to get all of the .in files resolved in the web directory? Brad -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Policy on updating files in 3.1.x/contrib
>>> On 2/3/2010 at 5:04 AM, in message <4b696663.6010...@pocock.com.au>, Daniel Pocock wrote: >> what is the policy for updating files in the "contrib" directory of 3.0.x > and 3.1.x? Do I need to do the backport approval dance (*)? Or can I just go > ahead. The "removespikes.pl" file needs an update in the 3.1.x branch. >> >> > Any updates to 3.1 require co-ordination from the release manager > (myself) when a release is imminent (as it is now). Generally, let me > know the commit number(s) on trunk and then I will let you know if you > can backport it on 3.1.6 or wait for 3.1.7. According to the policies, > the release manager has the final say, but I am open to consider anyone > who has an opinion for/against a particular patch. > Do we include the contrib directory with the release? I didn't think we were, but even if we do, the contrib directory is not under the same rules as the standard release. AFAIUI, the contrib directory is basically a "use at your own risk" kind of thing. They are user contributions that the Ganglia project does not maintain or support. We documented that in the README.contrib file located in the directory. You should still commit any changes into trunk first and then backport to the 3.1.x branch just to make sure that the two areas are in sync. But other than that, there are no other guidelines. Just as a side note, now that Groundwork Opensource is maintaining their monitorforge site, we have been encouraging people to post their contributions there. One main reason for that is so that the contributor themselves can have complete control over the contribution and any updates rather than having to rely on a Ganglia developer to commit changes on the contributor's behalf. Brad -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] tcpconn.py issues
>>> On 2/2/2010 at 10:28 AM, in message <4b6860c0.9070...@pocock.com.au>, Daniel Pocock wrote: >> If tcpconn is functioning normally after the initial startup, then that > basically answers the questions. It appears that at least on CentOS/RHEL5 > python is not yielding after calling start() and therefore not allowing the > threading module to call the threads run() method. The result is that by not > yielding, it is opening the window wider and allowing multiple calls to the > start() function. Adding a try...catch block around the start() call and > ignoring the exceptions will probably fix the problem. I would call this a > showstopper or a regression for the same reason that you stated. It is more > of a cosmetic annoyance which can be fixed. Nothing about this issue > actually prevents tcpconn from functioning normally. I can add the > try...catch block and check that code in or you can just tag and we fix it > next time. >> >> > > Can you please try this on trunk, and we will aim to deliver the fix > with 3.1.7. To get something tagged before Friday, I would prefer not > to include any last minute changes unless we find an issue that is a > serious regression or showstopper. Checked into trunk r2264 -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] tcpconn.py issues
>>> On 2/2/2010 at 9:57 AM, in message <4b68598e.6050...@pocock.com.au>, Daniel Pocock wrote: > Brad Nicholes wrote: >>>>> On 2/2/2010 at 6:23 AM, in message <4b682769.6000...@pocock.com.au>, >>>>> Daniel >>>>> >> Pocock wrote: >> >> >>> I've just been testing r2258 on CentOS 5. rpmbuild runs successfully >>> and the packages install and run. >>> >>> However, I notice that some of the tcpconn metrics are failing. >>> tcpconn.py doesn't appear to have changed since r1658 (August 2008). It >>> is the only python module that is loaded by default. >>> >>> The commit mentions moving the netstat thread start - are you able to >>> have a look at this Brad? >>> >>> You can get my tarball from http://www.pocock.com.au/ganglia/test if you >>> need to. It is bootstrapped on Debian 5. >>> >>> >>> metric 'tcp_established' being collected now >>> metric 'tcp_established' has value_threshold 1.00 >>> metric 'tcp_listen' being collected now >>> [PYTHON] Can't call the metric handler function for [tcp_listen] in the >>> python module [tcpconn]. >>> >>> Traceback (most recent call last): >>> File "/usr/lib/ganglia/python_modules/tcpconn.py", line 67, in >>> TCP_Connections >>> _WorkerThread.start() >>> File "/usr/lib/python2.4/threading.py", line 410, in start >>> assert not self.__started, "thread already started" >>> AssertionError: thread already started >>> metric 'tcp_listen' has value_threshold 1.00 >>> metric 'tcp_timewait' being collected now >>> [PYTHON] Can't call the metric handler function for [tcp_timewait] in >>> the python module [tcpconn]. >>> >>> >> >> I can't reproduce the problem so all I can do is take a guess at what might > be happening and leave it to somebody who is seeing the issue to verify what > is happening. The exception that you are seeing is a result of a thread > trying to be started multiple times. There is an if statement in > TCP_connections() that is suppose to prevent this from happening. This if > statement checks two thread variables that should indicate what state the > thread is in. The running thread variable is set to false during thread > initialization and is set to true as soon as the threads run method is > called. The run method is of the thread is called as a result of calling the > start() method on the thread object. Each time that one of the tcpconn > metrcs is gathered, the metric callback hits the thread start if statement. > If the run thread variable is set to true, then no other metric invocation > should be allowed to start the thread again. >> >> > When you say you can't reproduce the problem, are you trying on a > CentOS5/RHEL5 box, or something different? > Something different. All I have available is SLES and SLED boxes. I don't have access to CentOS or RHEL5. >> There is a very small window where, on initial startup, two metric callbacks > could get past the if statement in TCP_connections() and try to start the > thread a second time. The windows would be caused by a delay between the > time that the start() method is called and when the threading module finally > calls the threads run() method. We could add a try...catch block around the > start() call to catch and ignore the exception if the thread is started a > second time. But the part that bothers me is that in the list of exceptions, > the thread was obviously attempted more than just a second time. >> >> So my questions are, is the thread really running when the second or more > attempts are made? Is the thread bailing out somewhere before the "running" > thread variable is set? If we added the try...catch block and ignored the > thread, does this leave the thread running and in a functional state? > Without being able to reproduce the problem, I can't really answer these > questions. >> >> > I don't know exactly how to check those things > > What I can see is that the errors only appear when the daemon starts > (maybe the first time it collects each metric). After that, the values > are transmitted. Can you give any examples of how to debug this for > someone who is not a Python expert? > > Do you think this is a showstopper for 3.1.6? I don't believe it can be > a regression on this release o
Re: [Ganglia-developers] tcpconn.py issues
>>> On 2/2/2010 at 6:23 AM, in message <4b682769.6000...@pocock.com.au>, Daniel Pocock wrote: > > I've just been testing r2258 on CentOS 5. rpmbuild runs successfully > and the packages install and run. > > However, I notice that some of the tcpconn metrics are failing. > tcpconn.py doesn't appear to have changed since r1658 (August 2008). It > is the only python module that is loaded by default. > > The commit mentions moving the netstat thread start - are you able to > have a look at this Brad? > > You can get my tarball from http://www.pocock.com.au/ganglia/test if you > need to. It is bootstrapped on Debian 5. > > > metric 'tcp_established' being collected now > metric 'tcp_established' has value_threshold 1.00 > metric 'tcp_listen' being collected now > [PYTHON] Can't call the metric handler function for [tcp_listen] in the > python module [tcpconn]. > > Traceback (most recent call last): > File "/usr/lib/ganglia/python_modules/tcpconn.py", line 67, in > TCP_Connections > _WorkerThread.start() > File "/usr/lib/python2.4/threading.py", line 410, in start > assert not self.__started, "thread already started" > AssertionError: thread already started > metric 'tcp_listen' has value_threshold 1.00 > metric 'tcp_timewait' being collected now > [PYTHON] Can't call the metric handler function for [tcp_timewait] in > the python module [tcpconn]. > I can't reproduce the problem so all I can do is take a guess at what might be happening and leave it to somebody who is seeing the issue to verify what is happening. The exception that you are seeing is a result of a thread trying to be started multiple times. There is an if statement in TCP_connections() that is suppose to prevent this from happening. This if statement checks two thread variables that should indicate what state the thread is in. The running thread variable is set to false during thread initialization and is set to true as soon as the threads run method is called. The run method is of the thread is called as a result of calling the start() method on the thread object. Each time that one of the tcpconn metrcs is gathered, the metric callback hits the thread start if statement. If the run thread variable is set to true, then no other metric invocation should be allowed to start the thread again. There is a very small window where, on initial startup, two metric callbacks could get past the if statement in TCP_connections() and try to start the thread a second time. The windows would be caused by a delay between the time that the start() method is called and when the threading module finally calls the threads run() method. We could add a try...catch block around the start() call to catch and ignore the exception if the thread is started a second time. But the part that bothers me is that in the list of exceptions, the thread was obviously attempted more than just a second time. So my questions are, is the thread really running when the second or more attempts are made? Is the thread bailing out somewhere before the "running" thread variable is set? If we added the try...catch block and ignored the thread, does this leave the thread running and in a functional state? Without being able to reproduce the problem, I can't really answer these questions. Brad -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] [RFC] two step gmond initialization
>>> On 12/11/2009 at 6:21 AM, in message <4b224750.2090...@pocock.com.au>, >>> Daniel Pocock wrote: >> it replaces apr_proc_detach with an inline implementation of it on plain >> POSIX and that should be most likely as portable (at least for the platforms >> we care of) and doesn't intentionally include any error checking to make it >> > How about Cygwin and mingw? I'm not sure if the use of pipe(), fork(), > etc is possible there > > I think we need to take a broader decision about the way we support the > Windows platform anyway, we may not need to support detach on that > platform. With Cygwin or with mingw, we should be able to include > native code for running as a service. > > So my proposal would be that we extend Carlo's concept so that there are > two variations of it, using #ifdef : > > - a UNIX variation of gmond that has detach functionality implemented > with fork, pipe, etc > > - a Windows variation of gmond that has built in support for running as > a service > > The cygrunsrv source code here provides us with an example of how to go > about it: > > http://sourceware.org/cgi-bin/cvsweb.cgi/cygrunsrv/?cvsroot=cygwin-apps#dirlis > > t > > What do people think about having this type of native code in gmond > rather than just using apr? Or should we try to patch apr to provide > the functionality? > I have to admit that I haven't dug into this issue to understand exactly why we are having problems with APR. APR is designed to solve these problems in a cross platform way and we are proposing that we abandon the cross platform solution in favor of a platform specific solution. I know that httpd doesn't have these issues and they detach and run just fine across a wide variety of platforms including windows, BSD, solaris, etc. Why are we having these problems when httpd doesn't? Is the real solution as simple as going to the APR mailing list and asking why this issue exists in APR and if there is a workaround? I haven't really seen this issue show up on the APR mailing list so far or did I miss it? One of the problems that we already have with gmond is that there is already too much platform specific code in it which is why we have to rely on cygwin in order to run on windows. It is also the reason why gmetad doesn't really run on windows because it wasn't built on top of a cross platform solution. My gut feel is that we should be moving ganglia more towards APR rather than away from it. My 2 cents Brad -- Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] [Ganglia-general] Ganglia 3.1.5 beta ready for final testing
>>> On 12/2/2009 at 7:21 AM, in message <4b1677e4.8000...@pocock.com.au>, Daniel Pocock wrote: > I would like gmond to return a non-zero return code if it fails to > initialise, e.g. if it is unable to bind or if it is unable to resolve a > hostname mentioned in gmond.conf > > Otherwise, the init-script always says that it started '[OK]' even if > the daemon process has died on startup. > > That is why this change was made. However, I see a few solutions going > forward: > > - we can discard the patch completely > > - we can discard the patch, and I could write another patch that does > some tests (e.g. resolving host names) before daemonizing > > - we can #ifdef the patch so that on BSD systems, it daemonizes earlier, > and on other systems it does so later > > - we can modify the init script to sleep and then call `ps -C gmond' and > determine if it kept running > > - post the problem on the apr dev list and discuss it there before > making any decision > > I'm not sure that I have anything to add as far as the discussion of this issue goes, but I have commit rights on the APR project. If you go with the last option and take this discussion to the APR-dev list, I can certainly get whatever patch is agreed upon committed and backported in APR. The downside to that option is that we would have to bundle the latest APR RPMs or tarball with Ganglia rather than using the distro version. So even if we do find a solution in APR, we will probably still have to build in a workaround in gmond. Brad -- Join us December 9, 2009 for the Red Hat Virtual Experience, a free event focused on virtualization and cloud computing. Attend in-depth sessions from your desk. Your couch. Anywhere. http://p.sf.net/sfu/redhat-sfdev2dev ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Fwd: Getting started with developing a C++ DSO module
>>> On 11/25/2009 at 11:12 AM, in message <75fb37ae0911251012i328f8f00u5586dab199c97...@mail.gmail.com>, Sylvester Steele wrote: >> I don't know why you would be getting a segfault on this line. Gmond > expects the array to be NULL terminated so all you are doing is adding one > extra entry and filling it will NULLs. With the array being NULL terminated, > gmond doesn't have to keep track of the metric count, it only has to look for > a NULL entry. >> > > > More modifications- and I can't figure out where the problem is. > > I tried > > gmi->name=NULL > > that didn't work either > > Then I thought I should remove the null metric and have only one metric > like: > > > > gmi = (Ganglia_25metric*)apr_array_push(metric_info); > > gmi->name= apr_pstrdup (pool,"Random_Numbers_2"); > gmi->tmax=90; > gmi->type=GANGLIA_VALUE_UNSIGNED_INT; > gmi->msg_size= UDP_HEADER_SIZE+8; > gmi->units= apr_pstrdup (pool,"Num"); > gmi->slope=apr_pstrdup (pool,"both"); > gmi->fmt=apr_pstrdup (pool,"%u"); > gmi->desc= apr_pstrdup (pool,"Example module metric (random numbers) 2"); > MMETRIC_INIT_METADATA(gmi, pool); > MMETRIC_ADD_METADATA(gmi,MGROUP,"example_2"); > > printf ("\n First metric done"); > > /* > gmi = (Ganglia_25metric*)apr_array_push(metric_info); > printf ("\nStarted second metric.."); > > gmi->name= apr_pstrdup (pool,"Constant_Number_2"); > gmi->tmax=90; > gmi->type=GANGLIA_VALUE_UNSIGNED_INT; > gmi->msg_size= UDP_HEADER_SIZE+8; > gmi->units= apr_pstrdup (pool,"Num"); > gmi->slope=apr_pstrdup (pool,"zero"); > gmi->fmt=apr_pstrdup (pool,"%u"); > gmi->desc= apr_pstrdup (pool,"Example module metric (constant number) 2"); > MMETRIC_INIT_METADATA(gmi, pool); > MMETRIC_ADD_METADATA(gmi,MGROUP,"example_2"); > > printf ("\nSecond metric done"); > */ > //gmi = (Ganglia_25metric*)apr_array_push(metric_info); > //printf ("\nStarted null metric"); > > //gmi->name= apr_pstrdup (pool,NULL); > > //memset (gmi, 0, sizeof(*gmi)); > printf ("\nMetric initing done"); > return 0; > > And this gives a segfault too! And here is the output: > > In the ex_metric_init function > Got first GMI > First metric done > Segmentation fault > > > ie there is a segfault between the last and second last printf > statements! Any clues? > Sorry, my only suggestion would be to run it in the debugger to get a better idea of what is happening. Brad -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Fwd: Getting started with developing a C++ DSO module
>>> On 11/25/2009 at 10:19 AM, in message <008b01ca6df3$823a2690$86ae73...@com>, "Sylvester Steele" wrote: >> >My guess is because you have static string pointers being passed from a > DSO module to gmond. I would suggest using apr_pstrdup(p, >>here>) to allocate the memory from an APR memory pool before handing the > pointers back to gmond. > > Thanks Brad- that helped, but I am still getting a seg fault from the second > line: > > gmi = (Ganglia_25metric*)apr_array_push(metric_info); > memset (gmi, 0, sizeof(*gmi)); > > I am doing this to set the last metric to null. Why should this be > happening? BTW- my metric_info has size=10 and I am putting in only two > metrics before this (The null metric is the third) > > I don't know why you would be getting a segfault on this line. Gmond expects the array to be NULL terminated so all you are doing is adding one extra entry and filling it will NULLs. With the array being NULL terminated, gmond doesn't have to keep track of the metric count, it only has to look for a NULL entry. Brad -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Fwd: Getting started with developing a C++ DSO module
>>> On 11/25/2009 at 9:16 AM, in message <75fb37ae0911250816k5e7e0373x25ad2ee613930...@mail.gmail.com>, Sylvester Steele wrote: > Ok, so I tried to make a dynamically initializing module. I am > basically trying to convert the example module to a dynamically > initializing one.. > > My metrc_init function looks like this: > > static int ex_metric_init ( apr_pool_t *p ) > { > > > Ganglia_25metric* gmi; >apr_pool_create(&pool, p); > > metric_info = apr_array_make(pool, 10, sizeof(Ganglia_25metric)); >// metric_mapping_info = apr_array_make(pool, 10, sizeof(mapped_info_t)); > > gmi = (Ganglia_25metric*)apr_array_push(metric_info); > > gmi->name= "Random_Numbers_2"; > gmi->tmax=90; > gmi->type=GANGLIA_VALUE_UNSIGNED_INT; > gmi->msg_size= UDP_HEADER_SIZE+8; > gmi->units= "Num"; > gmi->slope="both"; > gmi->fmt="%u"; > gmi->desc= "Example module metric (random numbers) 2"; > MMETRIC_INIT_METADATA(gmi, pool); > MMETRIC_ADD_METADATA(gmi,MGROUP,"example_2"); > > gmi = (Ganglia_25metric*)apr_array_push(metric_info); > > gmi->name= "Constant_Number_2"; > gmi->tmax=90; > gmi->type=GANGLIA_VALUE_UNSIGNED_INT; > gmi->msg_size= UDP_HEADER_SIZE+8; > gmi->units= "Num"; > gmi->slope="zero"; > gmi->fmt="%u"; > gmi->desc= "Example module metric (constant number) 2"; > MMETRIC_INIT_METADATA(gmi, pool); > MMETRIC_ADD_METADATA(gmi,MGROUP,"example_2"); > > > gmi = (Ganglia_25metric*)apr_array_push(metric_info); > > memset (gmi, 0, sizeof(*gmi)); > > > return 0; > } > > > Q1. For some reason this gives me a segmentation fault. Any ideas why? > My guess is because you have static string pointers being passed from a DSO module to gmond. I would suggest using apr_pstrdup(p, ) to allocate the memory from an APR memory pool before handing the pointers back to gmond. > Q2. How do I put printf / cout statements so I can see them when gmond > runs- which will be very helpful for debugging. > You should be able to just use printf statements and then run gmond in debug mode. With gmond running in debug mode, all printf statements should print out on the console. Brad -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] [Ganglia-general] 3.1.4 to go GA?
>>> On 11/20/2009 at 8:46 AM, in message <4b06b9d6.5080...@pocock.com.au>, >>> Daniel Pocock wrote: > Brad Nicholes wrote: >>>>> On 11/20/2009 at 8:07 AM, in message <4b06b0af.1050...@pocock.com.au>, >>>>> Daniel >>>>> >> Pocock wrote: >> >>> Brad Nicholes wrote: >>> >>>> I've been running it on a very small set of machines. It all looks good >>>> to >>>> >>> me. >>> >>>> >>>> >>> No complaints from anyone... is that sufficient to go live? I'm not >>> sure if I have the access level to put the release on the SF site though. >>> >> >> You are the release manager. The decision to go live is your call. :) >> > Ok, 3.1.4 is now GA and ready for distribution. > > As usual, any feedback is still welcome and will be used to shape 3.1.5 > BTW, besides posting the tarball on SF for distribution, you will also need to fix up the Current Release Notes page on the Wiki and fix the configuration and installation pages with the latest documentation. Brad -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] 3.1.4 to go GA?
>>> On 11/20/2009 at 8:07 AM, in message <4b06b0af.1050...@pocock.com.au>, >>> Daniel Pocock wrote: > Brad Nicholes wrote: >> I've been running it on a very small set of machines. It all looks good to > me. >> > > No complaints from anyone... is that sufficient to go live? I'm not > sure if I have the access level to put the release on the SF site though. You are the release manager. The decision to go live is your call. :) Brad -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] gmetad using apr as of r2106, shorter polling intervals
Up until now, gmetad hasn't really used APR for any of its base functionality. If we are going to start putting gmetad on top of APR, there are a number of places where gmetad could really be improved. One of the most glaring areas is in the metric_hash code. This code is currently being generated through the gperf tool which produces C code that is very specific to a certain set of tags and how they should be hashed. Furthermore, if there are changes to any of the metric tags, this hashing code has to be manually generated before any of the autotools can be run. It would really be nice to remove gmetad's dependance on the gperf tool and instead put all of the hashing functionality on top of the apr_hash*** table functions. These functions are much more flexible and would remove a significant amount of complex code. In addition to that, there are many other areas such as threading and memory which could really benefit from APR. Not to mention portablility. Just a thought in case anybody is looking for someplace where they could really contribute to Ganglia. Brad >>> On 11/20/2009 at 8:05 AM, in message <4b06b01d.3050...@pocock.com.au>, >>> Daniel Pocock wrote: > > As discussed previously on the list, I've adapted gmetad to use apr's > sleep functionality. For anyone using trunk, please run autoreconf && > ./configure to get the newest gmetad/Makefile > > Changing the sleep code to randomize intervals using a percentage rather > than absolute value should be helpful for shorter polling intervals - I > would be interested in any feedback from people using Ganglia for > polling intervals smaller than 15 seconds. > > The change is in trunk but may be backported for 3.1.5 > > > r2106 | d_pocock | 2009-11-20 14:58:09 + (Fri, 20 Nov 2009) | 1 line > > Rewrite gmetad sleep code in various places to use apr, remove magic > numbers, sleep as a percentage of the step rather than an absolute > random adjustment > > > -- > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day > trial. Simplify your report design, integration and deployment - and focus on > > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > ___ > Ganglia-developers mailing list > Ganglia-developers@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/ganglia-developers -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Fwd: Getting started with developing a C++ DSO module
>>> On 11/19/2009 at 9:13 AM, in message <75fb37ae0911190813q66cf1f96w1afe84f8bdbe1...@mail.gmail.com>, Sylvester Steele wrote: >> I'm not sure what you are looking for. The purpose of the code that I > referred to was to show how a module would generate the metric definitions > during the initialization phase of gmond. Basically what happens is that > when gmond is started it loads each module and calls the metric_init function > for each module. At that point each module has the opportunity to tell gmond > what metrics it supports by passing back an array of metric definitions. > That is how gmond determines which module supports which metrics. The > contents of the metric definition array is completely up to the module > itself. However once the module returns the list of metric definitions, that > list can not be changed until the next time that gmond stops and restarts. > There is no way to alter the list of metrics that gmond is monitoring on the > fly during normal gmond operation. If the latter is what you are looking > for, gmond does not support on-the-fly functionality yet. >> >> >> > > > I don't want to change anything after a module starts. Changes to > which metrics are being collected can wait until a restart. But- every > time gmond restarts- this module may be collecting a different number > of standard metrics. So I don't need to change the metadata of the > metrics themselves- I just need to say at start: "OK we are collecting > these X metrics"-where X is variable and will change only at a > restart- but varies between restarts. No changes need be made after > initialization. I hope this clarifies things a bit. > So the mod_python code that I referred to is doing that. By creating a metric_info array in your metric_init function using the apr array calls you can create a dynamic array rather than using a hard coded static array in your code. However you still have another problem. Your configuration file still needs to match up with the metrics that are being gathered. In other words, you still need to have a corresponding metic block within a collection_group in your gmond.conf configuration file whose metric name matches a metric definition that is being returned by one of the loaded modules. Right now there isn't a way to dynamically generate the gmond configuration for a metric even though the metric module has the ability to collect data for the given metic. Basically what this means is that if you expect that on a given restart of gmond that X number of new metrics are going to be collected by your metric module, you have to manually enter their corresponding configuration into gmond.conf. Adding functionality to have the metric configuration be completely driven by a metric module still needs to be done. Basically a cool feature that is looking for somebody to implement it. Brad -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] gmetad no summary for spoof'd data patch
>>> On 11/18/2009 at 8:19 AM, in message <20091118151950.ga13...@porcupine.cita.utoronto.ca>, Robin Humble wrote: > Hi Brad, > > I appreciate you taking the time to look at the patch. > > On Tue, Nov 17, 2009 at 09:54:11AM -0700, Brad Nicholes wrote: >> On 11/7/2009 at 12:06 AM, in message > <20091107070643.ga20...@porcupine.cita.utoronto.ca>, Robin Humble > wrote: >>> turns out that there's a SPOOF_HOST EXTRA_ELEMENT attached to each >>> spoof'd metric, and when 100's of hosts (>40 or so should trigger it) >>> have spoof'd entries, then those add up and then corrupt the summary >>> Metric structure enough to destroy the .type and stop the rrd being >>> generated. >>> I'm guessing it's the same as the MAX_EXTRA_ELEMENTS problem, except >>> for the summary table instead of the host table. >>I took a look at this patch and since I am not able to reproduce the >>problem, it makes it a little unclear as to what is happening. I can't >>really figure out how this patch fixes a problem with the hash table. >>According to the source code, whenever an extra element is parsed, the >>code inserts the extra element into a list of extra data on a per >>metric basis. This means that only one extra element for a spoof host >>is ever stored for a metric. > > yes, it's the summary table that's the problem, not the host table. > >> Then when the code moves into the summary >>data portion, it specifically checks to make sure that it is not >>duplicating an extra element value before it inserts it into the >>summary node (check the for loop at around line #827 in the 3.1.2 >>version of the source code). If it detects a duplicate value, then it >>skips the insert and just updates the rest of the summary node in the >>hash table. > > in this loop -> > > for (i = 0; i < sum_metric.ednameslen; i++) { > char *chk_name = getfield(sum_metric.strings, sum_metric.ednames[i]); > char *chk_value = getfield(sum_metric.strings, > sum_metric.edvalues[i]); > > if (!strcasecmp(chk_name, new_name) && !strcasecmp(chk_value, > new_value)) { > found = TRUE; > break; > } > } > > here's an example of what happens for a spoof'd metric -> > > (chk_name == new_name) 1 && (chk_value == new_value) 0 ==> 0 - chk_name > SPOOF_HOST new_name SPOOF_HOST chk_value 10.1.1.30:v30 new_value > 10.1.1.37:v37 > (chk_name == new_name) 1 && (chk_value == new_value) 0 ==> 0 - chk_name > SPOOF_HOST new_name SPOOF_HOST chk_value 10.1.1.31:v31 new_value > 10.1.1.37:v37 > (chk_name == new_name) 1 && (chk_value == new_value) 0 ==> 0 - chk_name > SPOOF_HOST new_name SPOOF_HOST chk_value 10.1.1.32:v32 new_value > 10.1.1.37:v37 > (chk_name == new_name) 1 && (chk_value == new_value) 0 ==> 0 - chk_name > SPOOF_HOST new_name SPOOF_HOST chk_value 10.1.1.33:v33 new_value > 10.1.1.37:v37 > (chk_name == new_name) 1 && (chk_value == new_value) 0 ==> 0 - chk_name > SPOOF_HOST new_name SPOOF_HOST chk_value 10.1.1.34:v34 new_value > 10.1.1.37:v37 > (chk_name == new_name) 1 && (chk_value == new_value) 0 ==> 0 - chk_name > SPOOF_HOST new_name SPOOF_HOST chk_value 10.1.1.35:v35 new_value > 10.1.1.37:v37 > (chk_name == new_name) 1 && (chk_value == new_value) 0 ==> 0 - chk_name > SPOOF_HOST new_name SPOOF_HOST chk_value 10.1.2.80:v176 new_value > 10.1.1.37:v37 > (chk_name == new_name) 1 && (chk_value == new_value) 0 ==> 0 - chk_name > SPOOF_HOST new_name SPOOF_HOST chk_value 10.1.1.36:v36 new_value > 10.1.1.37:v37 > (chk_name == new_name) 1 && (chk_value == new_value) 0 ==> 0 - chk_name > SPOOF_HOST new_name SPOOF_HOST chk_value 10.1.2.81:v177 new_value > 10.1.1.37:v37 > ... > > you can see that every EXTRA_ELEMENT "name" field matches, but as > each spoof'd entry comes from a different host, then every "value" is > different, so 'found' is always FALSE. > > so a new EXTRA_ELEMENT is always inserted for every spoof'd host. > ie. for one spoof'd metric on N hosts then there would be N > EXTRA_ELEMENT's stored next to it in the summary table. > > when the number of spoofed hosts is > few * MAX_EXTRA_ELEMENTS, then > corruption occurs in the summary hash. the upshot of which is that the > summary table gets corrupted and the checks in gmetad.c mean that > (unless you get very lucky) the __SummaryInfo__/* rrd file for the > spoof'd metric is never written. > Now I get it. I'll take a look at it from that angle. Brad -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Fwd: Getting started with developing a C++ DSO module
>>> On 11/19/2009 at 6:32 AM, in message <75fb37ae0911190532t17685eb0uc1db8390546b4...@mail.gmail.com>, Sylvester Steele wrote: >> Take a look at the pyth_metric_init() function in the mod_python.c module. > At the end of the function, mod_python takes all of the metric definitions > and pushes them into an APR array. Then it sets the metric_info field of the > module structure with the metric_info->elts value. >> >>python_module.metrics_info = (Ganglia_25metric *)metric_info->elts; >> >> Basically it is just a matter of calling the APR function > apr_array_push(metric_info); for each metric definition and then filling in > the structure that is returned. >> >> Brad >> > > Thanks Brad, > > I went through that function you mentioned- if I understood it right- > that function adds different metadata for different metrics. So for > example: > > {0, "cpu_num",1200, GANGLIA_VALUE_UNSIGNED_SHORT, "CPUs", "zero", > "%hu", UDP_HEADER_SIZE+8, "Total number of CPUs"}, >{0, "cpu_speed", 1200, GANGLIA_VALUE_UNSIGNED_INT, "MHz", "zero", > "%u", UDP_HEADER_SIZE+8, "CPU Speed in terms of MHz"}, > > I guess this adds more metadata to say the cpu_speed metric. I don't > want to do that, All my metrics will have the same meta-data- its just > that I don't know how many metrics I will have at compile time. The > number of metrics will be determined at runtime by reading a file. > > I was wondering if I could just run a loop and put in the metadata this way: > > static Ganglia_25metric *mem_metric_info; > > mem_metric_info= new Ganglia_25metric[num_of_metrics]; > > for (i=0 to mem_metric_info[i]= {0, some name,1200, > GANGLIA_VALUE_UNSIGNED_SHORT, "CPUs", "zero", "%hu", DP_HEADER_SIZE+8, > "Total number of CPUs"}, > } > I'm not sure what you are looking for. The purpose of the code that I referred to was to show how a module would generate the metric definitions during the initialization phase of gmond. Basically what happens is that when gmond is started it loads each module and calls the metric_init function for each module. At that point each module has the opportunity to tell gmond what metrics it supports by passing back an array of metric definitions. That is how gmond determines which module supports which metrics. The contents of the metric definition array is completely up to the module itself. However once the module returns the list of metric definitions, that list can not be changed until the next time that gmond stops and restarts. There is no way to alter the list of metrics that gmond is monitoring on the fly during normal gmond operation. If the latter is what you are looking for, gmond does not support on-the-fly functionality yet. Brad -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] 3.1.4 to go GA?
I've been running it on a very small set of machines. It all looks good to me. Brad >>> On 11/18/2009 at 9:42 AM, in message , Bernard Li wrote: > I haven't had a chance to test it out yet -- has anybody else been > able to give it a spin? > > Cheers, > > Bernard > > On Wed, Nov 18, 2009 at 7:22 AM, Daniel Pocock wrote: >> >> >> How do people feel about making 3.1.4 GA? >> >> >> -- >> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day >> trial. Simplify your report design, integration and deployment - and focus on >> what you do best, core application coding. Discover what's new with >> Crystal Reports now. http://p.sf.net/sfu/bobj-july >> ___ >> Ganglia-developers mailing list >> Ganglia-developers@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/ganglia-developers >> > > -- > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day > trial. Simplify your report design, integration and deployment - and focus on > > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > ___ > Ganglia-developers mailing list > Ganglia-developers@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/ganglia-developers -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] gmetad no summary for spoof'd data patch
>>> On 11/7/2009 at 12:06 AM, in message <20091107070643.ga20...@porcupine.cita.utoronto.ca>, Robin Humble wrote: > Hi, > > I spoof a bunch of temperature and power metrics via ILOM for a few > hundred nodes and I noticed that gmetad wasn't making a summary table > (.../__SummaryInfo__/*) for most of the spoof'd values. > > turns out that there's a SPOOF_HOST EXTRA_ELEMENT attached to each > spoof'd metric, and when 100's of hosts (>40 or so should trigger it) > have spoof'd entries, then those add up and then corrupt the summary > Metric structure enough to destroy the .type and stop the rrd being > generated. > I'm guessing it's the same as the MAX_EXTRA_ELEMENTS problem, except > for the summary table instead of the host table. > > attached is a simplistic patch that fixes the problem. > it could probably be done better, but works for me. it's against 3.1.2, > but should apply to 3.1.4 as well. > > apologies if I have some of the ganglia/gmetad terminology wrong - I've > been using it for years, but this my first dive into the code. > I took a look at this patch and since I am not able to reproduce the problem, it makes it a little unclear as to what is happening. I can't really figure out how this patch fixes a problem with the hash table. According to the source code, whenever an extra element is parsed, the code inserts the extra element into a list of extra data on a per metric basis. This means that only one extra element for a spoof host is ever stored for a metric. Then when the code moves into the summary data portion, it specifically checks to make sure that it is not duplicating an extra element value before it inserts it into the summary node (check the for loop at around line #827 in the 3.1.2 version of the source code). If it detects a duplicate value, then it skips the insert and just updates the rest of the summary node in the hash table. Since I am not able to duplicate the problem, could you step further through the original source code to make sure that the check for a duplicate value is actually happening and that the code is not taking some other path that could be causing the problem. You might also want to check in the source code at the point where the summary table is actually written to see if there is some clue there why your summary rrd files are not being created or updated. Brad -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Fwd: Getting started with developing a C++ DSO module
>>> On 11/16/2009 at 3:04 PM, in message <002c01ca6708$d7770020$866500...@com>, "Sylvester Steele" wrote: > Kim, > > I got the tarball to which you'd put up the link earlier on in the mailing > list. I got your module to work no problem there! > > But, I have a question: > > All the Ganglia modules have a metric array. The mod_cpu has this: > > static Ganglia_25metric cpu_metric_info[] = > { > {0, "cpu_num",1200, GANGLIA_VALUE_UNSIGNED_SHORT, "CPUs", "zero", > "%hu", UDP_HEADER_SIZE+8, "Total number of CPUs"}, > {0, "cpu_speed", 1200, GANGLIA_VALUE_UNSIGNED_INT, "MHz", "zero", > "%u", UDP_HEADER_SIZE+8, "CPU Speed in terms of MHz"}, > {0, "cpu_user", 90, GANGLIA_VALUE_FLOAT, "%","both", > "%.1f", UDP_HEADER_SIZE+8, "Percentage of CPU utilization that occurred > while executing at the user level"}, > {0, "cpu_nice", 90, GANGLIA_VALUE_FLOAT, "%","both", > "%.1f", UDP_HEADER_SIZE+8, "Percentage of CPU utilization that occurred > while executing at the user level with nice priority"}, > {0, "cpu_system", 90, GANGLIA_VALUE_FLOAT, "%","both", > "%.1f", UDP_HEADER_SIZE+8, "Percentage of CPU utilization that occurred > while executing at the system level"}, > {0, "cpu_idle", 90, GANGLIA_VALUE_FLOAT, "%","both", > "%.1f", UDP_HEADER_SIZE+8, "Percentage of time that the CPU or CPUs were > idle and the system did not have an outstanding disk I/O request"}, > {0, "cpu_aidle", 3800, GANGLIA_VALUE_FLOAT, "%","both", > "%.1f", UDP_HEADER_SIZE+8, "Percent of time since boot idle CPU"}, > {0, "cpu_wio", 90, GANGLIA_VALUE_FLOAT, "%","both", > "%.1f", UDP_HEADER_SIZE+8, "Percentage of time that the CPU or CPUs were > idle during which the system had an outstanding disk I/O request"}, > {0, "cpu_intr", 90, GANGLIA_VALUE_FLOAT, "%","both", > "%.1f", UDP_HEADER_SIZE+8, "cpu_intr"}, > {0, "cpu_sintr",90, GANGLIA_VALUE_FLOAT, "%","both", > "%.1f", UDP_HEADER_SIZE+8, "cpu_sintr"}, > {0, NULL} > > }; > > > Now, what if the number of metrics that my module monitors changes? Ie If I > want to monitor 5 metrics today but 10 tomorrow- Is it possible to > dynamically initialize the metrics somehow? > > Will the following work: > > > For (how many ever metrics) > { > cpu_metric_info[i]= appropriate string > } > > And put this for loop in the metric_init function.. > > Will that do the trick? > Take a look at the pyth_metric_init() function in the mod_python.c module. At the end of the function, mod_python takes all of the metric definitions and pushes them into an APR array. Then it sets the metric_info field of the module structure with the metric_info->elts value. python_module.metrics_info = (Ganglia_25metric *)metric_info->elts; Basically it is just a matter of calling the APR function apr_array_push(metric_info); for each metric definition and then filling in the structure that is returned. Brad -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
[Ganglia-developers] Contributing the source code (was:Getting started with developing a C++ DSO module)
>>> On 11/11/2009 at 4:13 PM, in message , JB Kim wrote: > I've written the iostat standalone DSO module a while back in C. I do > have the whole build process documented (to some degree) and provided > template for creating standalone DSO. > > I think you can search for "iostat" in the archives. If you can't find > it, I'll dig it up and reply again. > The Groundwork Open Source people started a monitoring project repository for monitoring related projects and components. There are already a couple of small Ganglia related components in the repository. Would you be interested in posting your iostat module as a Ganglia component in Monitoring Forge? http://monitoringforge.org/ Brad -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Fwd: Getting started with developing a C++ DSO module
>>> On 11/10/2009 at 8:30 PM, in message <75fb37ae0911101930g55978c94u1a16c48fd5cc2...@mail.gmail.com>, Sylvester Steele wrote: > -- Forwarded message -- > From: Sylvester Steele > Date: Tue, Nov 10, 2009 at 10:19 PM > Subject: Re: [Ganglia-developers] Getting started with developing a > C++ DSO module > To: Brad Nicholes > > >>> 1. Compile to .so >>> 2. Place compiled .so in the /usr/lib/ganglia folder >>> 3. Make appropriate changes to the gmond.conf file - so it can pickup >>> the the new .so and the mmodule in it >>> 4. Restart gmond >>> > >> It sounds like you are trying to build your module inside of the Ganglia > build environment. Since your module isn't part of the standard Ganglia > modules, you should be creating your own make file and building your module > outside of the Ganglia environment as a stand-alone build. Unfortunately I > don't have a good example of a stand-alone module build to point you to but > there may be others that have done this in the past for a C module. I did > this at one point a couple of years ago just to make sure that we had > everything in place to build a module outside of Ganglia, but I can't seem to > find the makefiles that I used. You might want to search back through the > develop mailing list to see if there were discussions about this. I know > that there was a module contribution about 6 months ago where they were > building their own C module. The code never made it into the repository but > it should still be on the mailing list somewhere. >> > > Oh ok. I'll see if I can find it in the mailing list archives. Will > the 4 steps I've listed above be enough to make Ganglia pickup my > module and start sending the metrics? > Sure. As long as gmond can find the module and load it will all dependancies, things should be good. Brad -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Storing Ganglia data in MySql.
>>> On 11/10/2009 at 9:30 AM, in message <4af9952a.9070...@pocock.com.au>, >>> Daniel Pocock wrote: > Brad Nicholes wrote: >>>>> On 11/10/2009 at 4:11 AM, in message >>>>> >> , Himanshu Sharma >> wrote: >> >>> Hello all, >>> >>> We were looking to store Ganglia data in MySql rather than just an >>> RRD. There was a discussion earlier on the same issue - >>> http://www.mail-archive.com/ganglia-developers@lists.sourceforge.net/msg02100. >>> html. >>> It would be great if there was some reusable code available or if >>> there was any outcome out of it as to what could be the best possible >>> approach. >>> >>> >> >> One solution to this was the rewrite of gmetad in python. We did this a > couple of years ago and added it to the SVN repository. One of the new > features of the python rewrite was the introduction of gmetad plugins. The > plugin interface allows you to plug in a python module where you can do > anything you want with the data that is being gathered from the gmond agents. > There are examples of plugins that store data in an RRD database as well as > one that generates email alerts. You should be able to use the RRD database > plugin as an example to easily create a plugin that stores the data in a > MySQL database instead or in addition to RRD. >> >> > Which gmetad is intended to be on the future roadmap? > > For a large site, do you believe it is fair to say that the C > implementation is best for performance? > > I was thinking of patching gmetad so that it can get the metrics from a > local gmond instance using shared memory rather than XML, and some > various other optimizations too > This is yet-to-be-determined. Basically the Ganglia community needs to decide which gmetad will be going forward. The python rewrite adds some new functionality which is not available in the C version. However if the C version continues to be good enough, then the python rewrite may never really see the light of day. However, if there are more and more requests for metric data to be stored or used in different ways outside of just trending, then the python rewrite of gmetad will probably be the way to go rather than trying to enhance the C version with the same features. The python rewrite of gmetad has obviously not had the level of testing and stabilization as the C version which puts the python version at a disadvantage. However with the plugin interface, gmetad and Ganglia could really grow up to include not just trending but alerts, health and complex data analysis too (or anything else you can dream up for a plugin). So the bottom line is that the future roadmap of gmetad is up to the community and who wants to step up to make things happen. Brad -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Storing Ganglia data in MySql.
>>> On 11/10/2009 at 4:11 AM, in message , Himanshu Sharma wrote: > Hello all, > > We were looking to store Ganglia data in MySql rather than just an > RRD. There was a discussion earlier on the same issue - > http://www.mail-archive.com/ganglia-developers@lists.sourceforge.net/msg02100. > html. > It would be great if there was some reusable code available or if > there was any outcome out of it as to what could be the best possible > approach. > One solution to this was the rewrite of gmetad in python. We did this a couple of years ago and added it to the SVN repository. One of the new features of the python rewrite was the introduction of gmetad plugins. The plugin interface allows you to plug in a python module where you can do anything you want with the data that is being gathered from the gmond agents. There are examples of plugins that store data in an RRD database as well as one that generates email alerts. You should be able to use the RRD database plugin as an example to easily create a plugin that stores the data in a MySQL database instead or in addition to RRD. Brad -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Getting started with developing a C++ DSO module
>>> On 11/2/2009 at 9:00 PM, in message <007e01ca5c3a$2ded5eb0$89c81c...@com>, "Sylvester Steele" wrote: > Hi Folks, > > I want to develop a C++ DSO for ganglia. While I did see a bit of > documentation for a python based thing, I haven't seen much for a C++ DSO. > So where should I begin if I want to develop a C++ DSO? > > Also, How do I search the mailing list archive? > > Thanks, > Sylvester Since gmond is built on top of the APR (Apache Portable Runtime) and the fact that the ganglia modules were patterned after apache modules, building a ganglia module in C++ should be very similar to building an Apache module in C++. You could probably start by looking at http://marc.info/?l=apache-modules&m=115406581404410&w=2 and there should be a lot more information about building Apache modules in C++. Bottomline is that you would need to put extern C {} around the module structure. Keep in mind that as you find information about how to build an apache module in C++, Ganglia modules are much simpler than Apache modules so you wouldn't have to worry about the hooks or anything else like that in a ganglia module. Brad -- Come build with us! The BlackBerry(R) Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9 - 12, 2009. Register now! http://p.sf.net/sfu/devconference ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] release names?
>>> On 10/27/2009 at 4:23 AM, in message <4ae6ca27.9080...@pocock.com.au>, >>> Daniel Pocock wrote: > > In the wiki, it says `version numbers are cheap' > http://sourceforge.net/apps/trac/ganglia/wiki/how_project_works > > However, the convention of naming the releases puts a little bit more > emphasis on the significance of each tag and release. Skipping a > release (e.g. 3.1.3) certainly doesn't give due credit to some of those > people the releases are named after. > > To encourage more frequent releases (maybe even every 4-6 weeks?) maybe > release names should be dropped, and only the version number used? > +1 release names are nice, but I haven't really found any use for them. The version number is the most important thing. Brad -- Come build with us! The BlackBerry(R) Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9 - 12, 2009. Register now! http://p.sf.net/sfu/devconference ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Feeble attempt at gmond aliasing
Unless I am misunderstanding the issue, a missing configuration option shouldn't be a problem for libconfuse. Follow the 'Title' configuration directive on a metric. Every metric can optionally have a title that is ultimately passed up through the XML. The code in gmond.c asks libconfuse for the title when the metric definition is read. If no title has been given in the configuration file, then the return from libconfuse when asked for the title, is NULL. Brad >>> On 10/21/2009 at 8:58 AM, in message , Jesse Becker wrote: > Minor update on this: > > It appears that libconfuse is completely unable to handle > missing/default values for configuration options[1]. So adding an > 'alias' option to gmond will mean that every gmond.conf file has to be > updated to include an "alias=" line. > > The libconfuse documentation is...limited. Could someone more > familiar with it than I am offer suggestions as to how to set a > default value and handle the case where the "alias=" line is not > present? > > [1] This is a really stupid design decision, IMO. :-( > > On Thu, Oct 1, 2009 at 22:08, Jesse Becker wrote: >> Here's my poor attempt at a patch to add aliasing to gmond, in an >> effort to stimulate some discussion on the topic. The patch is >> against trunk. I've done some basic testing (e.g. no immediate core >> dumps), but that's it for the moment. >> >> Comments? Improvements? >> >> Index: lib/libgmond.c >> === >> --- lib/libgmond.c (revision 2093) >> +++ lib/libgmond.c (working copy) >> @@ -66,6 +66,7 @@ >> CFG_BOOL("gexec", 0, CFGF_NONE), >> CFG_INT("send_metadata_interval", 0, CFGF_NONE), >> CFG_STR("module_dir", NULL, CFGF_NONE), >> + CFG_STR("alias",NULL,CFGF_NONE), >> CFG_END() >> }; >> >> Index: gmond/gmond.c >> === >> --- gmond/gmond.c (revision 2093) >> +++ gmond/gmond.c (working copy) >> @@ -301,6 +301,18 @@ >> } >> >> static void >> +handle_alias( void ) { >> + cfg_t *tmp = cfg_getsec( config_file, "globals"); >> + char *tmp_myname; >> + /* Allow for hostname aliases */ >> + tmp_myname = cfg_getstr(tmp, "alias"); >> + if (tmp_myname) { >> + strncpy(myname, tmp_myname, APRMAXHOSTLEN); >> + debug_msg("Aliasing hostname to [%s]", myname); >> + } >> +} >> + >> +static void >> daemonize_if_necessary( char *argv[] ) >> { >> int should_daemonize; >> @@ -2630,6 +2642,8 @@ >> >> gmond_argv = argv; >> >> + myname[0] = '\0'; >> + >> if (cmdline_parser (argc, argv, &args_info) != 0) >> exit(1) ; >> >> @@ -2658,6 +2672,7 @@ >> } >> >> process_configuration_file(); >> + handle_alias(); >> >> if(args_info.metrics_flag) >> { >> @@ -2686,7 +2701,8 @@ >> load_metric_modules(); >> >> /* Collect my hostname */ >> - apr_gethostname( myname, APRMAXHOSTLEN+1, global_context); >> + if (!*myname) >> +apr_gethostname( myname, APRMAXHOSTLEN+1, global_context); >> >> apr_signal( SIGPIPE, SIG_IGN ); >> apr_signal( SIGINT, sig_handler ); >> >> >> -- >> Jesse Becker >> > > -- Come build with us! The BlackBerry(R) Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9 - 12, 2009. Register now! http://p.sf.net/sfu/devconference ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] [Ganglia-general] Ganglia 3.1.3 beta ready for testing
>>> On 10/11/2009 at 10:36 PM, in message <4ad2b254.9090...@pocock.com.au>, >>> Daniel Pocock wrote: > Bernard Li wrote: >> Hi Brad: >> >> On Thu, Oct 1, 2009 at 3:57 PM, Brad Nicholes wrote: >> >> >>> If this is just a simple fix, then I would vote for scraping 3.1.3, rolling > 3.1.4 with the fix and resetting the test period. The other option, since > this isn't a regression, would be to release 3.1.3 as is with the defect > noted in the release notes. Then release 3.1.4 next month with the fixes. I > would vote for the first option, but I'm OK with the second if that is the > way everybody else wants to go. >>> >> >> Since Daniel is the Release Manager on 3.1.3, I'd rather defer this >> decision to him. However he's on vacation for another week so perhaps >> we can hold off on the release until then. >> > > Another issue I found: the gmond binary built on RHEL3 can't run > properly because APR_POLLSET_THREADSAFE is not supported on that > platform. The fix for this is relatively trivial, we will only use that > option for kernel >= 2.6. > > I think it is best to pass over 3.1.3 - although people should still > test it and report their results - and aim to release 3.1.4 to beta by > the end of this week. I'm happy to volunteer as release manager for > 3.1.4 as well, given that it follows on from the 3.1.3 evaluation process. Sounds good - your call Brad -- Come build with us! The BlackBerry(R) Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9 - 12, 2009. Register now! http://p.sf.net/sfu/devconference ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Feeble attempt at gmond aliasing
>>> On 10/2/2009 at 6:34 AM, in message , Jesse Becker wrote: > On Fri, Oct 2, 2009 at 01:43, Rick Cobb wrote: >> Well, as far as generating discussion goes, I think we're better off >> only aliasing/spoofing IP addresses @ the gmond level, and resolving >> all names with gmetad. That removes all issues of, e.g., whether the >> host thinks it should send a FQDN or just a basename, or how well >> dns / resolv.conf is set up on every machine in every cluster, etc. >> Only the gmetad servers need to have well-configured resolvers, and >> there are orders of magnitude fewer of those in many networks. >> Besides: fewer system calls on the boxes that are doing the real work >> our clusters our built to do. > > All good points. Sending only the IP address also potentially could > make the packets just slightly smaller, as an IPv4 packet will fit > into 32bits total, instead of one byte per character. (Of course, > this nicely avoids the whole IPv6 and wide-char hostname issue.) > > In my (again feeble) defense, there's also nothing stopping anyone > from setting IP addresses in the "alias=" field. > > There are, it seems, two issues related to this. The first is many > people have requested aliasing abilities for gmond for various > reasons. The other is a broader shift in what gmond actually reports > (i.e. sending FQDN or just IP). Fixing the first issue doesn't > prevent fixing the 2nd issue; do it in stages. > >> Never did get that patch finished, though, so I probably should stay >> out of the discussion :-) > > Incorrect! :-) Finish your patch, and let's see it. I'm not deeply > attached to what I posted. How well does this fit into the previous discussions of using a GUID to identify a box rather than an IP or FQDN? Are aliasing and GUID identifiers related or are they two separate issues? Brad -- Come build with us! The BlackBerry® Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9-12, 2009. Register now! http://p.sf.net/sfu/devconf ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] [Ganglia-general] Ganglia 3.1.3 beta ready for testing
>>> On 10/1/2009 at 4:33 PM, in message , Bernard Li wrote: > So has anybody else given 3.1.3 a test run? > > I have found some minor issues. > > It looks like there are new configure options added in regards to > setuid and setgid: > > --enable-debug turn on debugging output and compile options > --enable-gexec turn on gexec support (platform-specific) > "--enable-setuid=USER turn on setuid support (default setuid=nobody)" > "--enable-setgid=GROUP turn on setgid support (default setgid=daemon)" > > There are 2 issues: > > - extra quotation marks in the text > - --enable-setuid is OFF by default. This is the opposite behaviour > from previous released versions > > On top of that, our spec file has not been updated with this new > configure option and therefore the RPMs I posted do *not* setuid. > > I'm not sure if we should consider this as show stopper, but a simple > fix would simply be to change the default configure option so that it > reflects the previous behaviour. > > Please let me know what you guys think. > If this is just a simple fix, then I would vote for scraping 3.1.3, rolling 3.1.4 with the fix and resetting the test period. The other option, since this isn't a regression, would be to release 3.1.3 as is with the defect noted in the release notes. Then release 3.1.4 next month with the fixes. I would vote for the first option, but I'm OK with the second if that is the way everybody else wants to go. Brad -- Come build with us! The BlackBerry® Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9-12, 2009. Register now! http://p.sf.net/sfu/devconf ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Ganglia 3.1.3 beta ready for testing
Up and running on my SLES-10.2 test machine. Everything is looking good so far. thanks, Brad >>> On 9/18/2009 at 11:09 PM, in message , Bernard Li wrote: > Dear all: > > The Ganglia 3.1.3 beta is now ready for testing at: > > http://ganglia.info/testing > > Changelog for this release: > > * gmond: Fix the allow_extra_data configuration directive(BUG199) > * gmond: Ensure that a complete XML dump is delivered before closing > the send socket. Submitted by: Jerry > * gmond: add bind and bind_hostname parameters for udp_send_channel() > * gmetad: BUG232: eliminate case-sensitive hostname bug, user can > choose to maintain legacy behavior though > * gmond: BUG237: revise fix for segfault on Solaris where first CPU > not in slot 0 > * gmond: support for HUP signal on platforms with execve > * gmond: delay daemonization until after other initialization steps are done > * gmond: status module: return gmond version info as string metrics > * gmond: Check return status of apr_pollset_create. Use > APR_POLLSET_THREADSAFE on Linux. > * build: various configure options: Solaris 8 with Sun Studio 11 > support, extra modules for static linking, default setuid, release > number, build multicpu and status during static builds, support for > SYSCONFDIR (BUG16) > * RPM: include status module, allow packager to supply own gmond.conf > * build: Look in lib64 rather than lib for apr, confuse and expat on > x86_64 Linux builds > * Bug fixes and Enhancements > > Special thanks to Daniel Pocock for volunteering to be the release > manager and other developers for their hard work in putting this > release together. > > The window for testing will be two weeks and if no major bugs are > found this will be released as the official 3.1.3 release. > > Thanks for your continual support, and please let us know if you run > into any issues! > > Cheers, > > Bernard > > (on behalf of the Ganglia team) > > -- > Come build with us! The BlackBerry® Developer Conference in SF, CA > is the only developer event you need to attend this year. Jumpstart your > developing skills, take BlackBerry mobile applications to market and stay > ahead of the curve. Join us from November 9-12, 2009. Register now! > http://p.sf.net/sfu/devconf > ___ > Ganglia-developers mailing list > Ganglia-developers@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/ganglia-developers -- Come build with us! The BlackBerry® Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9-12, 2009. Register now! http://p.sf.net/sfu/devconf ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Preparing for 3.1.3
>>> On 9/16/2009 at 8:26 AM, in message <4ab0f594.4050...@pocock.com.au>, Daniel Pocock wrote: > > As discussed a few weeks back, I'm volunteering to manage the 3.1.3 release. > > Most of the changes were made a few weeks ago now, and I've been running > some of these patches on several platforms for some time, so I don't > think we need to allow a lot of time for additional testing. What I'd > propose is that 3.1.3 is tagged Friday as a beta, and if it is good, > then we make it GA two weeks later. > > If problems are found, then 3.1.3 will not be made GA, and we will aim > to gather feedback and release 3.1.4 in 2-3 weeks. I'm also happy to be > the release manager for that follow-up release if it becomes necessary. > > I note that part of the release process involves setting the release > name for the next release - is this up to the release manager's > initiative, or are there some rules for this? > > This sounds like a good plan. I am backporting one patch from the status file that fixes the allow_extra_data configuration directive. I should have it committed in the next few minutes but well before the Friday tag. ;) As far as the release name goes, I think it is basically up to the Release Manager. In the past the release name has kind of been following an aviation theme. But I'm not sure if that is a real rule or not. Bernard is probably the best person to answer this question. Brad -- Come build with us! The BlackBerry® Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9-12, 2009. Register now! http://p.sf.net/sfu/devconf ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] gmetad compiled with -O0
>>> On 9/16/2009 at 8:03 AM, in message <4ab0f04c.9060...@pocock.com.au>, Daniel Pocock wrote: > > I notice in gmetad/Makefile.am that AM_CFLAGS includes -O0 > > This is the case for both 3.1 and trunk. > > Is this intended for some reason? I've looked through the SVN history, > I can see that it has always been this way since AM_CFLAGS was added to > Makefile.am in r384 > > It refuses to build on Solaris 8/Sun Studio 11 with this setting, so > I'll be removing it for 3.1.3 > If it's not there already, the -O0 should probably be moved to the --enable-debug option. Brad -- Come build with us! The BlackBerry® Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9-12, 2009. Register now! http://p.sf.net/sfu/devconf ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] bugzilla/roadmap/3.1.3?
>>> On 8/19/2009 at 8:42 AM, in message <4a8c0f3a.5080...@pocock.com.au>, Daniel Pocock wrote: > Bernard Li wrote: >> On Tue, Aug 18, 2009 at 8:22 AM, Brad Nicholes wrote: >> >> >>> I'm not sure that there has been any definitive issue tracker for releases, > at least not for the 3.1.x releases. The road map going forward has been > basically left up to the community. For 3.1.0, .1 and .2 releases, I > volunteered as the release manager with a lot of help from Bernard. For me, > it was just a matter of recognizing that there was enough new functionality > or bug fixes to warrant a new release. At the time it was basically being > driven by the modular metric functionality. The 3.1.2 release of Ganglia > basically finished off all of the functionality that I had in mind. But I'm > sure that there is more that could be done in that area. At one point we had > created a wish list which was published on the wiki site > http://sourceforge.net/apps/trac/ganglia/wiki/ganglia_wish-list. I don't > think that these items were ever entered into bugzilla as enhancements and > I'm also not sure how accurate the list is anymore. It should probably be > updated. With all of the work that you and others have done recently, it > might be a good time to produce a 3.1.3 release. You might want to take a > look at the Ganglia wiki page > http://sourceforge.net/apps/trac/ganglia/wiki/how_project_works under the > section "Release Manager and Additional Release Information" for an idea of > how it was done in the past. Anyway, if you think it is time to release > 3.1.3, I'll support that. >>> >> >> I could help with the release when 3.1.3 is ready. >> >> > I'll obviously be willing to support those features that I've added, > particularly when any release candidate is issued and if any problems arise. > > Don't release just yet though, there's probably one other change I'd > like to include for 3.1.3 - eliminating calls to sleep() in the gmetad > threads, and making the randomization more reliable for lower intervals > (making it a percentage of the interval rather than an absolute value). > Any preference for nanosleep or apr_sleep? I notice that apr doesn't > appear to be used from within gmetad/*.c, and I wasn't sure why. It would be nice to put gmond and gmetad fully on top of APR. That would make cross platform compatibility much easier and probably eliminate the need for cygwin on Windows. BTW, whenever you think that 3.1.3 is ready, just volunteer to be the Release Manager and Bernard and I can help you through producing the release candidate(s) and final release. It is actually very simple. Brad -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] bugzilla/roadmap/3.1.3?
>>> On 8/18/2009 at 5:09 AM, in message <4a8a8bd8.4080...@pocock.com.au>, Daniel Pocock wrote: > > Hi, > > I've just been looking at Bugzilla to try and establish what is pending > for 3.1.3 > > I did a search for items that are blocking, critical or major, 12 items > found > > Most of them appear to have been there for a while, despite several > other releases. Some of them are no longer bugs. > > Is some other mechanism being used to track the issues that must be > satisfied for the next release, e.g. 3.1.3? > I'm not sure that there has been any definitive issue tracker for releases, at least not for the 3.1.x releases. The road map going forward has been basically left up to the community. For 3.1.0, .1 and .2 releases, I volunteered as the release manager with a lot of help from Bernard. For me, it was just a matter of recognizing that there was enough new functionality or bug fixes to warrant a new release. At the time it was basically being driven by the modular metric functionality. The 3.1.2 release of Ganglia basically finished off all of the functionality that I had in mind. But I'm sure that there is more that could be done in that area. At one point we had created a wish list which was published on the wiki site http://sourceforge.net/apps/trac/ganglia/wiki/ganglia_wish-list. I don't think that these items were ever entered into bugzilla as enhancements and I'm also not sure how accurate the list is anymore. It should probably be updated. With all of the work that you and others have done recently, it might be a good time to produce a 3.1.3 release. You might want to take a look at the Ganglia wiki page http://sourceforge.net/apps/trac/ganglia/wiki/how_project_works under the section "Release Manager and Additional Release Information" for an idea of how it was done in the past. Anyway, if you think it is time to release 3.1.3, I'll support that. Brad -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Ganglia Gmond Memory
>>> On 7/30/2009 at 2:30 PM, in message <669f1ab30907301330s2944e0cxa31c21fea1a5...@mail.gmail.com>, Mahendra Kutare wrote: > On Thu, Jul 30, 2009 at 12:25 PM, Brad Nicholes wrote: > >> >>> On 7/30/2009 at 9:08 AM, in message >> <669f1ab30907300808y67c403eev9a1653240c27c...@mail.gmail.com>, Mahendra >> Kutare >> wrote: >> > On Thu, Jul 30, 2009 at 10:31 AM, Brad Nicholes > >wrote: >> > >> >> >>> On 7/29/2009 at 11:23 PM, in message >> >> <669f1ab30907292223t2734f551lc8d9b98201d7f...@mail.gmail.com>, Mahendra >> >> Kutare >> >> wrote: >> >> > Hi All, >> >> > >> >> > If I have configured gmond.conf with a udp_recv_channel with just a >> port >> >> > number will that configure ganglia gmond to listen on that particular >> >> port >> >> > any incoming data and thus making it essentially unicast communication >> >> > channel ? >> >> > >> >> >> >> Yes, specifying just a port will configure gmond's recv channel in >> unicast >> >> mode >> >> >> >> > What happens if the sending side sends data every 1 sec will that be >> >> > transferred immediately to gmond or it waits to collects some packets >> of >> >> > data and then delivers to gmond listening side ? >> >> > >> >> > I started sending some data from outside of gmond interface to gmond >> >> which >> >> > is configured as mentioned above to a udp_recv_channel on port 8108. >> >> > >> >> > Now even though the sending side is pushing data in every 1sec. I do >> not >> >> see >> >> > gmond showing in debug mode on the console that its processing Ganglia >> >> > message from sender side every 1 sec. >> >> > >> >> > Is it just the display part of the problem or ganglia does some >> >> > sophisticated processing of incoming data i.e waiting for a message >> size >> >> > before delivering it ? >> >> > >> >> >> >> How did you configure gmond to send data every 1 sec.? Gmond sends its >> >> data in collection groups and each collection group is configured with a >> >> send time threshold. At the very worst, the collection group will send >> all >> >> of the metric values within that group once the group's collection >> threshold >> >> has been exceeded. In addition, each metric is assigned a value >> threshold >> >> which is a percent of change differential. If any of the metrics within >> the >> >> collection group, differential change exceeds the value threshold, the >> >> entire group of metrics is immediately sent. So even though a >> collection >> >> group is set to collect every 1 second, that doesn't mean that the >> metrics >> >> are sent every 1 second. Also, by default the rrd files are configured >> by >> >> gmetad to store metrics at an interval of every 15 seconds. So even if >> the >> >> metrics were sent every 1 second, you will still only be seeing 15 >> second >> >> averages in the front end. >> >> >> > >> > Thanks Brad. I am trying to do it to understand the ganglia protocol and >> > this helps. >> > Right now its fine with me even if Gmetad sees only 15 seconds average in >> > frontend as you described. >> > >> > So as I see there are other configuration in collection groups such as - >> > >> > 1. collect_once and collect_every >> > >> > I understand that collect_once with make some collection to be collected >> > only once and just send it other gmond every time_threshold. >> > Also, If I am not wrong If I configured collect_every = 20 and >> > time_threshold=90, gmond will collect every 20 sec and send every 90 sec >> to >> > other gmond. >> > >> >> Under normal circumstances it will send every 90 seconds but if one of the >> metric value_thresholds has been exceeded, the entire collection group will >> be sent immediately. The purpose for this is to make sure that >> abnormalities or spikes are caught and reported. >> >> > Now the part I am not clear is if I am collecting more frequently than I >> am >> > sending does that mean we are keeping more in memory ? I mean say after >> > first
Re: [Ganglia-developers] Ganglia Gmond
>>> On 7/30/2009 at 9:08 AM, in message <669f1ab30907300808y67c403eev9a1653240c27c...@mail.gmail.com>, Mahendra Kutare wrote: > On Thu, Jul 30, 2009 at 10:31 AM, Brad Nicholes wrote: > >> >>> On 7/29/2009 at 11:23 PM, in message >> <669f1ab30907292223t2734f551lc8d9b98201d7f...@mail.gmail.com>, Mahendra >> Kutare >> wrote: >> > Hi All, >> > >> > If I have configured gmond.conf with a udp_recv_channel with just a port >> > number will that configure ganglia gmond to listen on that particular >> port >> > any incoming data and thus making it essentially unicast communication >> > channel ? >> > >> >> Yes, specifying just a port will configure gmond's recv channel in unicast >> mode >> >> > What happens if the sending side sends data every 1 sec will that be >> > transferred immediately to gmond or it waits to collects some packets of >> > data and then delivers to gmond listening side ? >> > >> > I started sending some data from outside of gmond interface to gmond >> which >> > is configured as mentioned above to a udp_recv_channel on port 8108. >> > >> > Now even though the sending side is pushing data in every 1sec. I do not >> see >> > gmond showing in debug mode on the console that its processing Ganglia >> > message from sender side every 1 sec. >> > >> > Is it just the display part of the problem or ganglia does some >> > sophisticated processing of incoming data i.e waiting for a message size >> > before delivering it ? >> > >> >> How did you configure gmond to send data every 1 sec.? Gmond sends its >> data in collection groups and each collection group is configured with a >> send time threshold. At the very worst, the collection group will send all >> of the metric values within that group once the group's collection threshold >> has been exceeded. In addition, each metric is assigned a value threshold >> which is a percent of change differential. If any of the metrics within the >> collection group, differential change exceeds the value threshold, the >> entire group of metrics is immediately sent. So even though a collection >> group is set to collect every 1 second, that doesn't mean that the metrics >> are sent every 1 second. Also, by default the rrd files are configured by >> gmetad to store metrics at an interval of every 15 seconds. So even if the >> metrics were sent every 1 second, you will still only be seeing 15 second >> averages in the front end. >> > > Thanks Brad. I am trying to do it to understand the ganglia protocol and > this helps. > Right now its fine with me even if Gmetad sees only 15 seconds average in > frontend as you described. > > So as I see there are other configuration in collection groups such as - > > 1. collect_once and collect_every > > I understand that collect_once with make some collection to be collected > only once and just send it other gmond every time_threshold. > Also, If I am not wrong If I configured collect_every = 20 and > time_threshold=90, gmond will collect every 20 sec and send every 90 sec to > other gmond. > Under normal circumstances it will send every 90 seconds but if one of the metric value_thresholds has been exceeded, the entire collection group will be sent immediately. The purpose for this is to make sure that abnormalities or spikes are caught and reported. > Now the part I am not clear is if I am collecting more frequently than I am > sending does that mean we are keeping more in memory ? I mean say after > first occurance of collect in 20 sec if I am not sending it across to gmonds > am I just keeping it in memory hash ? If not, whats the behaviour ? > No, if you are collecting every 20 seconds but the collection group is only sending every 90 seconds, the only metric that is sent or reported is the last metric collected with the 90 second interval. This is the purpose of the metric value_threshold. If for example, you collected a metric 4 times within a 90 second period and the delta between each collected metric value only varied by 5 percent, storing and reporting each of the metrics would just end up being noise on the wire because the percent of change between the values is insignificant. So just sending the last metric collected in this case is good enough. However if the metric saw a spike within the 90 second period but then immediately dropped back to normal, you want to make sure that the metric spike is sent and recorded so gmond sends it immediately. > 2. What does this configuation *cleanup_threshold* = 300 /*secs *
Re: [Ganglia-developers] Ganglia Gmond
>>> On 7/29/2009 at 11:23 PM, in message <669f1ab30907292223t2734f551lc8d9b98201d7f...@mail.gmail.com>, Mahendra Kutare wrote: > Hi All, > > If I have configured gmond.conf with a udp_recv_channel with just a port > number will that configure ganglia gmond to listen on that particular port > any incoming data and thus making it essentially unicast communication > channel ? > Yes, specifying just a port will configure gmond's recv channel in unicast mode > What happens if the sending side sends data every 1 sec will that be > transferred immediately to gmond or it waits to collects some packets of > data and then delivers to gmond listening side ? > > I started sending some data from outside of gmond interface to gmond which > is configured as mentioned above to a udp_recv_channel on port 8108. > > Now even though the sending side is pushing data in every 1sec. I do not see > gmond showing in debug mode on the console that its processing Ganglia > message from sender side every 1 sec. > > Is it just the display part of the problem or ganglia does some > sophisticated processing of incoming data i.e waiting for a message size > before delivering it ? > How did you configure gmond to send data every 1 sec.? Gmond sends its data in collection groups and each collection group is configured with a send time threshold. At the very worst, the collection group will send all of the metric values within that group once the group's collection threshold has been exceeded. In addition, each metric is assigned a value threshold which is a percent of change differential. If any of the metrics within the collection group, differential change exceeds the value threshold, the entire group of metrics is immediately sent. So even though a collection group is set to collect every 1 second, that doesn't mean that the metrics are sent every 1 second. Also, by default the rrd files are configured by gmetad to store metrics at an interval of every 15 seconds. So even if the metrics were sent every 1 second, you will still only be seeing 15 second averages in the front end. Brad -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Code pointing to metrics collection->call backhandler->pushing to channel
>>> On 7/29/2009 at 2:17 PM, in message <669f1ab30907291317h4162d8c5m3dfd008d4187b...@mail.gmail.com>, Mahendra Kutare wrote: > Hi, > > Can someone point me to the ganglia code files where the core metrics > collection happens , which then initiates call to call back handler, stores > the data in memory hash or sends it to the wire to say gmetad ? > > Thanks > Mahendra All of that is done in the main gmond.c file. Brad -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] mod_gstatus disabled?
>>> On 7/28/2009 at 9:32 AM, in message <4a6f1a0a.7070...@pocock.com.au>, Daniel Pocock wrote: > Brad Nicholes wrote: >>>>> On 7/28/2009 at 7:27 AM, in message <4a6efcac.4060...@pocock.com.au>, >>>>> Daniel >>>>> >> Pocock wrote: >> >> >>> I noticed a few things about mod_gstatus: >>> >>> - the spec file doesn't include it at all, and deliberately removes the >>> config file for it >>> >>> - gmond/modules/Makefile.am excludes it from static builds >>> >>> Given that Ganglia is modular, is there a good reason for not having >>> this module in the RPM along with all the other modules? >>> >>> I successfully compiled it on Cygwin (static build), so is there also a >>> reason for not having it on static builds, or in other words, does >>> anyone object if I tweak Makefile.am so it will be in the static build >>> from now on? >>> >>> Also, I'm adding some extra metrics to mod_gstatus - for instance, a >>> string metric with the Ganglia version - does this seem like the best >>> place to add this? >>> >>> >>> >> >> The only reason for removing it from the RPM and static builds is basically > due to its likely usefulness to the general user. When I wrote mod_gstatus > it was mainly for debugging purposes. I needed something that would monitor > the XDR packets that were being sent between the gmond nodes and using > ganglia to monitor itself seemed like the most obvious idea. If the > community thinks that mod_gstatus would be generally useful, I don't have a > problem with including it as a standard module. >> > My only concern about enabling it by default is the config file - all > the metrics should probably be commented out, and people can uncomment > them if needed. > > It is probably quite useful for a UDP collector that is under heavy > load, and I also think it is a good place to put things like a string > metric reporting the package version. That's probably something that > could be enabled by default on any node. > > I'm OK with that as long as the community thinks it is useful. I just didn't want to load up extra modules and use up more memory if the metrics don't mean anything to the end user. -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] mod_gstatus disabled?
>>> On 7/28/2009 at 7:27 AM, in message <4a6efcac.4060...@pocock.com.au>, Daniel Pocock wrote: > > > I noticed a few things about mod_gstatus: > > - the spec file doesn't include it at all, and deliberately removes the > config file for it > > - gmond/modules/Makefile.am excludes it from static builds > > Given that Ganglia is modular, is there a good reason for not having > this module in the RPM along with all the other modules? > > I successfully compiled it on Cygwin (static build), so is there also a > reason for not having it on static builds, or in other words, does > anyone object if I tweak Makefile.am so it will be in the static build > from now on? > > Also, I'm adding some extra metrics to mod_gstatus - for instance, a > string metric with the Ganglia version - does this seem like the best > place to add this? > > The only reason for removing it from the RPM and static builds is basically due to its likely usefulness to the general user. When I wrote mod_gstatus it was mainly for debugging purposes. I needed something that would monitor the XDR packets that were being sent between the gmond nodes and using ganglia to monitor itself seemed like the most obvious idea. If the community thinks that mod_gstatus would be generally useful, I don't have a problem with including it as a standard module. Brad -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] backports
>>> On 7/28/2009 at 7:03 AM, in message <4a6ef728.2080...@pocock.com.au>, Daniel Pocock wrote: > > > Is it preferred to raise backport proposals for 3.1 all in a single > email, or start a separate thread for each? > > I've just fixed bug 237, this is an essential backport I believe, as it > fixes a seg fault/coding error. (trunk r2006) > > My fix for bug 232 is also backwards compatible and safe to backport > now, thanks to a configuration option that allows people to decide when > they want to adopt the new behavior. I believe that backporting this to > 3.1 provides people with the opportunity to migrate to lowercase > hostname directories independently of when they migrate to 3.2 or > trunk. (trunk r2004 and r2005 contain this fix) > IMO, starting separate threads would be preferable. That way if we have to go back into the email archives to review any discussion, it will be easier to find. Also, it is a good idea to put the backport into the STATUS file for the 3.1 branch so that we can track what has been backported. For an explanation of how to use the STATUS file, please see http://ganglia.wiki.sourceforge.net/how_project_works . Describing and noting the backport in the STATUS file just makes it easier for us to compile a list of what changed when we do the next release. Make sure that you follow the backporting guidelines that are posted on the wiki with the most important guideline being "don't break backward compatibility" :) Brad -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Replacing core metrics with Python metricmodules
>>> On 7/22/2009 at 3:02 PM, in message <20090722210201.gm14...@alcatraz.americas.sgi.com>, Martin Hicks wrote: > I have a situation where there is already a mechanism that is collecting > metrics on a compute host in a cluster (Performance Co-Pilot) and > pushing them up to the head node. > > I was wondering if is possible to write a Python metric module that > could replace the core set of metrics that gmond usually collects on the > compute node, and instead grab the data from PCP that is running on the > head node. > > Are there any real differences between the metrics that are normally > collected by gmond, and those user-defined metrics collected by a Python > module? > > The goal is to not have to double collect these metrics on each compute > host. > There isn't any difference other than the fact that the core metrics are implemented in C based modules rather than python. If you wanted to replace some of the core metrics, you could do it by simply not loading the metric module that implements the set of core metrics that you want to replace. Then reimplement the core metrics with the same metric names and definitions but as python modules. You can tell which metric module loads which metrics by starting gmond with a -m parameter. With the -m parameter, gmond will list all of the metrics that it will collect along with the module that implements the collection. Brad -- ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] fix for bug 232
>>> On 7/16/2009 at 9:30 AM, in message <4a5f47a1.3020...@pocock.com.au>, Daniel Pocock wrote: > Brad Nicholes wrote: >>>>> On 7/16/2009 at 9:10 AM, in message <4a5f42d8.9060...@pocock.com.au>, >>>>> Daniel >>>>> >> Pocock wrote: >> >>> Brad Nicholes wrote: >>> >>>>>>> On 7/16/2009 at 8:07 AM, in message <4a5f3430.20...@pocock.com.au>, >>>>>>> Daniel >>>>>>> >>>>>>> >>>> Pocock wrote: >>>> >>>> >>>> >>>>> I tried to attach this solution to the bug report, but I get this error: >>>>> >>>>> You did not enter a valid attachment number. >>>>> >>>>> >>>>> Anyhow, this is a solution for bug 232: >>>>> >>>>> http://bugzilla.ganglia.info/cgi-bin/bugzilla/show_bug.cgi?id=232 >>>>> >>>>> As a consequence of applying this patch: >>>>> - whenever an RRD is created/updated, the hostname directory name will >>>>> be converted to lowercase >>>>> - any capitalization can be used with the `h' parameter to the web >>>>> interface >>>>> - whenever gmetad receives a hostname in the XML, it will use a >>>>> non-case-sensitive comparison to decide if it already has data for that >>>>> host >>>>> - the XML emitted by gmetad will show the capitalization that was >>>>> received in the XML, not the lowercase version >>>>> >>>>> Anyone applying this patch needs to rename all their hostname >>>>> directories to lowercase. >>>>> >>>>> Regards, >>>>> >>>>> Daniel >>>>> >>>>> >>>> This patch seems reasonable to me. The only part that bothers me is the >>>> >>> fact that an upgrade from a previous version might break existing installs >>> unless they rename all of their rrd directories. That could be a problem > for >>> some users that have a large number of monitored boxes. >>> >>>> >>>> >>> Maybe we just make it a part of trunk and the 3.2 release? People >>> (should) look more closely at the readme file when going from 3.1 to 3.2. >>> >>> >> >> I haven't actually tested the patch yet, but I'm OK with just putting it in > trunk and not backporting it to 3.1. There was a big change between 3.0 and > 3.1. I would expect that there would be some incompatible changes between > 3.1 and 3.2 as well. I also think that when 3.2 is released, we should also > have a helper script like hawson suggested in our /contrib repository. We > just need to make sure that this in doc'ed somewhere so that when 3.2 is > released, we can include the doc in the upgrade notes. >> >> >> > One way to make it non-disruptive for 3.1 would be making this new > behavior configurable (as I suggested in the bug) - is it worth the > extra effort of adding a config option for this, or is 3.2 intended to > be released in the new future? > There isn't a planned date for a 3.2 release so far. I'm not sure we have any new functionality that is significant enough to call for a 3.2 release yet. > Here's something that can be used as the basis for the helper script > and/or the %post section of the spec file: > > killall gmetad > cd $RRDROOT > find . -type d -name '*[A-Z]*' ! -name __SummaryInfo__ -mindepth 2 > -maxdepth 2 | while read ; > do > OLD_NAME=`echo "$REPLY" | cut -f3 -d/` > NEW_NAME=`echo "$OLD_NAME" | tr [A-Z] [a-z]` > CLUSTER_NAME=`echo "$REPLY" | cut -f2 -d/` > echo mv "$REPLY" "${CLUSTER_NAME}/${NEW_NAME}" > #mv "$REPLY" "${CLUSTER_NAME}/${NEW_NAME}" > done Sounds good, add it to the patch. :) Brad -- Enter the BlackBerry Developer Challenge This is your chance to win up to $100,000 in prizes! For a limited time, vendors submitting new applications to BlackBerry App World(TM) will have the opportunity to enter the BlackBerry Developer Challenge. See full prize details at: http://p.sf.net/sfu/Challenge ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] fix for bug 232
>>> On 7/16/2009 at 9:10 AM, in message <4a5f42d8.9060...@pocock.com.au>, Daniel Pocock wrote: > Brad Nicholes wrote: >>>>> On 7/16/2009 at 8:07 AM, in message <4a5f3430.20...@pocock.com.au>, Daniel >>>>> >> Pocock wrote: >> >> >>> I tried to attach this solution to the bug report, but I get this error: >>> >>> You did not enter a valid attachment number. >>> >>> >>> Anyhow, this is a solution for bug 232: >>> >>> http://bugzilla.ganglia.info/cgi-bin/bugzilla/show_bug.cgi?id=232 >>> >>> As a consequence of applying this patch: >>> - whenever an RRD is created/updated, the hostname directory name will >>> be converted to lowercase >>> - any capitalization can be used with the `h' parameter to the web interface >>> - whenever gmetad receives a hostname in the XML, it will use a >>> non-case-sensitive comparison to decide if it already has data for that host >>> - the XML emitted by gmetad will show the capitalization that was >>> received in the XML, not the lowercase version >>> >>> Anyone applying this patch needs to rename all their hostname >>> directories to lowercase. >>> >>> Regards, >>> >>> Daniel >>> >> >> This patch seems reasonable to me. The only part that bothers me is the > fact that an upgrade from a previous version might break existing installs > unless they rename all of their rrd directories. That could be a problem for > some users that have a large number of monitored boxes. >> > Maybe we just make it a part of trunk and the 3.2 release? People > (should) look more closely at the readme file when going from 3.1 to 3.2. > I haven't actually tested the patch yet, but I'm OK with just putting it in trunk and not backporting it to 3.1. There was a big change between 3.0 and 3.1. I would expect that there would be some incompatible changes between 3.1 and 3.2 as well. I also think that when 3.2 is released, we should also have a helper script like hawson suggested in our /contrib repository. We just need to make sure that this in doc'ed somewhere so that when 3.2 is released, we can include the doc in the upgrade notes. > Or maybe someone can suggest a clever way to handle existing installs? > However it is done, it would involve scanning all the cluster > directories, I'm not sure I would want gmetad to do that every time it > starts. > >> BTW, I attached the patch file to the bug >> >> > I tried to do it from iceweasel (Firefox on Debian lenny) - any idea why > it worked for you and not me? > > Incidentally, I often find that when I'm in the bug tracking system, I > have to re-enter my password each time I submit a bug. I'm not sure why you would have had problem attaching the patch file. Bernard knows a lot more about our Bugzilla system than I do. Maybe he has some idea. Brad -- Enter the BlackBerry Developer Challenge This is your chance to win up to $100,000 in prizes! For a limited time, vendors submitting new applications to BlackBerry App World(TM) will have the opportunity to enter the BlackBerry Developer Challenge. See full prize details at: http://p.sf.net/sfu/Challenge ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] fix for bug 232
>>> On 7/16/2009 at 8:07 AM, in message <4a5f3430.20...@pocock.com.au>, Daniel Pocock wrote: > > I tried to attach this solution to the bug report, but I get this error: > > You did not enter a valid attachment number. > > > Anyhow, this is a solution for bug 232: > > http://bugzilla.ganglia.info/cgi-bin/bugzilla/show_bug.cgi?id=232 > > As a consequence of applying this patch: > - whenever an RRD is created/updated, the hostname directory name will > be converted to lowercase > - any capitalization can be used with the `h' parameter to the web interface > - whenever gmetad receives a hostname in the XML, it will use a > non-case-sensitive comparison to decide if it already has data for that host > - the XML emitted by gmetad will show the capitalization that was > received in the XML, not the lowercase version > > Anyone applying this patch needs to rename all their hostname > directories to lowercase. > > Regards, > > Daniel This patch seems reasonable to me. The only part that bothers me is the fact that an upgrade from a previous version might break existing installs unless they rename all of their rrd directories. That could be a problem for some users that have a large number of monitored boxes. BTW, I attached the patch file to the bug Brad -- Enter the BlackBerry Developer Challenge This is your chance to win up to $100,000 in prizes! For a limited time, vendors submitting new applications to BlackBerry App World(TM) will have the opportunity to enter the BlackBerry Developer Challenge. See full prize details at: http://p.sf.net/sfu/Challenge ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] [Ganglia-svn] SF.net SVN: ganglia:[2001]tags/monitor-core-3.1.2/STATUS
>>> On 7/9/2009 at 9:21 AM, in message , wrote: > Revision: 2001 > http://ganglia.svn.sourceforge.net/ganglia/?rev=2001&view=rev > Author: hawson > Date: 2009-07-09 15:21:20 + (Thu, 09 Jul 2009) > > Log Message: > --- > Add backport proposal for r2000 > > Modified Paths: > -- > tags/monitor-core-3.1.2/STATUS > Hawson, This shouldn't be committed to the STATUS file in the tag, it should be commit to the STATUS file in the 3.1 branch. The tag should reflect the source code of the 3.1.2 release (in this case). BTW, I am +1 for this backport as well. Brad > Modified: tags/monitor-core-3.1.2/STATUS > === > --- tags/monitor-core-3.1.2/STATUS2009-07-09 15:12:35 UTC (rev 2000) > +++ tags/monitor-core-3.1.2/STATUS2009-07-09 15:21:20 UTC (rev 2001) > @@ -162,6 +162,12 @@ > +1: bernardli, carenas > carenas: patched generated files; cleanup to follow > > + > + * Force null termination in string metrics for python modules. > +http://ganglia.svn.sourceforge.net/viewvc/ganglia?view=rev&revision=2000 > ++1: hawson > + > + > BACKPORT PROPOSALS FUTURE VERSION: >These proposals are too unstabilizing for the current version, incomplete > or >haven't been tested enough to be considered for a stable release. > > > This was sent by the SourceForge.net collaborative development platform, the > world's largest Open Source development site. > > -- > Enter the BlackBerry Developer Challenge > This is your chance to win up to $100,000 in prizes! For a limited time, > vendors submitting new applications to BlackBerry App World(TM) will have > the opportunity to enter the BlackBerry Developer Challenge. See full prize > > details at: http://p.sf.net/sfu/Challenge > ___ > Ganglia-svn mailing list > ganglia-...@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/ganglia-svn -- Enter the BlackBerry Developer Challenge This is your chance to win up to $100,000 in prizes! For a limited time, vendors submitting new applications to BlackBerry App World(TM) will have the opportunity to enter the BlackBerry Developer Challenge. See full prize details at: http://p.sf.net/sfu/Challenge ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Patch for multithread gmond
>>> On 7/13/2009 at 8:17 AM, in message , utopia zh wrote: > Hi, > > While trying to use gmond to monitor our applications, we found some issues: > - Metric collecting may take long time to finish, such examples > include collecting master/slave status from LDAP, parsing web pages to > get statistics. > - Receiving metrics updating from other node may fail due to blocking > metric collecting. > > To avoid this blocking behavior, we tried to change gmond to multithread: > - Main thread to collect metrics, we also have the metric collection > python script changed to be non-blocking > - Dedicate thread to receive updates from other nodes. > - Dedicated thread to server clients (gmetad/telnet, etc). > > Could you help review the patch? We'd like to contribute the patch to > the community if multi-thread is a generally required features. > > Any comments will be appreciated. Thanks. > > p.s. Just to be curious, why ganglia was changed from multi-thread > into single thread from 2..5.7 to 3.0? > > > Cheers, > Hang I haven't reviewed the patch yet but in most cases this issue can be solved within the python module itself. The tcpconn.py module is an example of this. Basically what it does is create it own thread that simply gathers the metrics on some module defined interval. This thread updates a metrics cache within the module itself. Then whenever gmond queries the module for its metrics, the metric values are simply read from the module's metric cache rather than actually performing the gathering process on the main gmond thread. This method ensures that nothing is blocking the main gmond thread when querying the data from any module that might take a lot of time actually acquiring the metric value from the metric source. I haven't looked that closely at the 2.5.x code so I have no idea what level of threading it may have supported or why it might have been changed in 3.0. If multithreading gmond 3.1.x is something that would increase the performance of gmond, then I am sure that this patch should be something we should add to the code base. thanks, Brad -- Enter the BlackBerry Developer Challenge This is your chance to win up to $100,000 in prizes! For a limited time, vendors submitting new applications to BlackBerry App World(TM) will have the opportunity to enter the BlackBerry Developer Challenge. See full prize details at: http://p.sf.net/sfu/Challenge ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Disk IO as gmond core metric
>>> On 7/10/2009 at 2:41 PM, in message >>> <20090710204117.gl10...@pi941c2n1.ms.com>, JB Kim wrote: > On 07/10/09 14:07:14, Brad Nicholes wrote: >> >>> On 7/9/2009 at 5:43 PM, in message >> <8121824c0907091643od6832c5y3c4ffa37696e4...@mail.gmail.com>, JB Kim >> wrote: >> > Ok I've isolated iostat code into its own module and managed to get >> > the whole autoconf/automake work. >> > >> > http://www.remnantone.com/pkgs/ganglia/modiostat.tar.gz >> > >> > Provided that ganglia 3.1.x is already installed, it should just be a >> > matter of running ./configure & make. >> > >> > Also, here's my attempt at making the independent DSO build/deployment >> > template. >> > Within autoconf file, I've retained much of the standard checks that >> > ganglia does. In addition, it will >> > check for existing ganglia installation (libganglia.so and headers) >> > along with other necessary libs such as apr,confuse,expat. >> > I've also provided setup.sh script in the template to make it simpler >> > to deploy. Hopefully it will be useful for folks >> > trying to write their own DSO modules. >> > >> > http://www.remnantone.com/pkgs/ganglia/dso_template.tar.gz >> > >> > Here's the readme doc on how to use this build template: >> > >> > http://www.remnantone.com/pkgs/ganglia/README_DSO >> > >> > Let me know how it works out.. >> > >> >> I downloaded the tarball and gave it a try. Everything built and loaded as > expected but I am not seeing any metrics. I'm not exactly sure why. I > tested this on a SLED 10.2 box so I don't know if that has anything to do > with it. Others may want to download the tarball and give it a try. > > Hurm... I tested this on ubuntu 8.10 (or something). > I have #ifdef LINUX on that module, so it would compile on platforms that > doesn't have that config macro defined, but report just zeros... > That is what I am seeing, just zeros. Like I say, I haven't spent the time to try to debug it. I was more concerned with making sure that it built and installed. Brad -- Enter the BlackBerry Developer Challenge This is your chance to win up to $100,000 in prizes! For a limited time, vendors submitting new applications to BlackBerry App World(TM) will have the opportunity to enter the BlackBerry Developer Challenge. See full prize details at: http://p.sf.net/sfu/Challenge ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Disk IO as gmond core metric
>>> On 7/10/2009 at 2:23 PM, in message , Jesse Becker wrote: > On Fri, Jul 10, 2009 at 14:07, Brad Nicholes wrote: > >> Anyway everything else worked just fine. I have a couple of suggestion. You > should probably include a sample .conf file so that the user doesn't have to > go figure everything out like all of the module information. I have attached > the one that I used. This way they can just drop the .conf file in the > conf.d directory and everything just works. You also might want to update > the COPYING and INSTALL files to reflect the current state of things. The > INSTALL file should contain information about building and installing the > module. > > Would it be possible, in the future--I know you can't do this now--to > allow for module configuration directly in gmond.conf? > You can already do module configuration directly in gmond.conf. That is where the base metric modules are still being configured. All you do is just put the configuration in gmond.confn rather than in a separate .conf file. Using a separate .conf file just means that you can just copy the .conf file to a conf.d directory and then restart gmond without having to edit anything. Then removing the module is simply a matter of deleting or renaming the corresponding .conf and then restarting gmond.conf. Brad -- Enter the BlackBerry Developer Challenge This is your chance to win up to $100,000 in prizes! For a limited time, vendors submitting new applications to BlackBerry App World(TM) will have the opportunity to enter the BlackBerry Developer Challenge. See full prize details at: http://p.sf.net/sfu/Challenge ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Disk IO as gmond core metric
>>> On 7/9/2009 at 5:43 PM, in message <8121824c0907091643od6832c5y3c4ffa37696e4...@mail.gmail.com>, JB Kim wrote: > Ok I've isolated iostat code into its own module and managed to get > the whole autoconf/automake work. > > http://www.remnantone.com/pkgs/ganglia/modiostat.tar.gz > > Provided that ganglia 3.1.x is already installed, it should just be a > matter of running ./configure & make. > > Also, here's my attempt at making the independent DSO build/deployment > template. > Within autoconf file, I've retained much of the standard checks that > ganglia does. In addition, it will > check for existing ganglia installation (libganglia.so and headers) > along with other necessary libs such as apr,confuse,expat. > I've also provided setup.sh script in the template to make it simpler > to deploy. Hopefully it will be useful for folks > trying to write their own DSO modules. > > http://www.remnantone.com/pkgs/ganglia/dso_template.tar.gz > > Here's the readme doc on how to use this build template: > > http://www.remnantone.com/pkgs/ganglia/README_DSO > > Let me know how it works out.. > I downloaded the tarball and gave it a try. Everything built and loaded as expected but I am not seeing any metrics. I'm not exactly sure why. I tested this on a SLED 10.2 box so I don't know if that has anything to do with it. Others may want to download the tarball and give it a try. Anyway everything else worked just fine. I have a couple of suggestion. You should probably include a sample .conf file so that the user doesn't have to go figure everything out like all of the module information. I have attached the one that I used. This way they can just drop the .conf file in the conf.d directory and everything just works. You also might want to update the COPYING and INSTALL files to reflect the current state of things. The INSTALL file should contain information about building and installing the module. I haven't looked at the dso_template.tar.gz file yet but I am thinking that we should add this to the wiki and use the text from the your README.DSO file to explain how to use it. Bernard, Jesse, what do you think? Brad modiostat.conf Description: Binary data -- Enter the BlackBerry Developer Challenge This is your chance to win up to $100,000 in prizes! For a limited time, vendors submitting new applications to BlackBerry App World(TM) will have the opportunity to enter the BlackBerry Developer Challenge. See full prize details at: http://p.sf.net/sfu/Challenge___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
[Ganglia-developers] Metric modules for Perl and Ruby (was:Re: Disk IO as gmond core metric)
>>> On 7/9/2009 at 5:43 PM, in message <8121824c0907091643od6832c5y3c4ffa37696e4...@mail.gmail.com>, JB Kim wrote: > > Lastly, is anyone already working on a perl equivalent module of mod_python? > With the 3.1.x gmond framework, it would be definitely possible to > further extend DSO functionality > by running embedded interpreters like perl and R. > > Not that I know of but this is something that we have been talking about since the introduction of mod_python. In fact one of the reasons why the python interface was embedded this way was to allow for other interpreters to do the same. The intention was to someday write a mod_perl, mod_ruby, mod_ in order to support other languages. Brad -- Enter the BlackBerry Developer Challenge This is your chance to win up to $100,000 in prizes! For a limited time, vendors submitting new applications to BlackBerry App World(TM) will have the opportunity to enter the BlackBerry Developer Challenge. See full prize details at: http://p.sf.net/sfu/Challenge ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] PATCH: ensure string metrics are nullterminated for python-based user metrics
>>> On 7/9/2009 at 12:16 PM, in message , Greg Bruno wrote: > On Thu, Jul 9, 2009 at 10:52 AM, Brad Nicholes wrote: >> >> One of the new feature in Ganglia 3.1 is the ability to add extra data to > the metric definition that is passed on with the metric metadata. Would > anything like that help you? > > > yes, that may help. > > do you have an example of how add extra data from a user-defined python > metric? Yes, the disk python module under gmond/python_modules/disk adds a 'mount' extra data to the definition. Basically in a python module, any extra properties that you add to the metric definition dictionary will show up as extra data in the metadata for the metric. Of course it is up to you to modify the web front end to take advantage of the extra metadata. You should be able to see in the front end PHP code where extra data like 'Group' and 'Title' are being used. Brad -- Enter the BlackBerry Developer Challenge This is your chance to win up to $100,000 in prizes! For a limited time, vendors submitting new applications to BlackBerry App World(TM) will have the opportunity to enter the BlackBerry Developer Challenge. See full prize details at: http://p.sf.net/sfu/Challenge ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] PATCH: ensure string metrics arenull terminated for python-based user metrics
>>> On 7/9/2009 at 9:27 AM, in message <4a560c5a.2090...@mail.nih.gov>, Jesse Becker wrote: > Greg Bruno wrote: >> also, regarding MAX_G_STRING_SIZE, would it be possible to increase it >> in future releases? i've currently set it to 128 in >> include/gm_value.h. > > I don't object, although I'm not the most familiar with this part of the > code. > I'm curious: what metrics are you trying to use that are that long? > I would have to verify this, but I think making MAX_G_STRING_SIZE larger would affect the XDR packets. Any change to the XDR packets would cause gmond to be incompatible with previous versions. It would probably have to wait for the next major Ganglia release. Brad -- Enter the BlackBerry Developer Challenge This is your chance to win up to $100,000 in prizes! For a limited time, vendors submitting new applications to BlackBerry App World(TM) will have the opportunity to enter the BlackBerry Developer Challenge. See full prize details at: http://p.sf.net/sfu/Challenge ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] PATCH: ensure string metrics are nullterminated for python-based user metrics
>>> On 7/9/2009 at 10:03 AM, in message , Greg Bruno wrote: > On Thu, Jul 9, 2009 at 8:27 AM, Jesse Becker wrote: >> >>> also, regarding MAX_G_STRING_SIZE, would it be possible to increase it >>> in future releases? i've currently set it to 128 in >>> include/gm_value.h. >> >> I don't object, although I'm not the most familiar with this part of the >> code. I'm curious: what metrics are you trying to use that are that long? > > we're sending info about the top CPU intensive processes so we can > roll them up for a rudimentary 'cluster top' web page. an example > metric value would be: > > pid=11790, cmd=dd, user=root, %cpu=0.66, %mem=0.00, size=40, > data=220, shared=508, vm=63340 > > we are also sending info about the queue management system (in our > case, SGE) and those values are also longer than 32 bytes. > One of the new feature in Ganglia 3.1 is the ability to add extra data to the metric definition that is passed on with the metric metadata. Would anything like that help you? Brad -- Enter the BlackBerry Developer Challenge This is your chance to win up to $100,000 in prizes! For a limited time, vendors submitting new applications to BlackBerry App World(TM) will have the opportunity to enter the BlackBerry Developer Challenge. See full prize details at: http://p.sf.net/sfu/Challenge ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Disk IO as gmond core metric
Creating a standard template for building a module independent of the core is something that we really haven't gotten around to doing. So I think in this instance, you would be the template creator :). You should be able to start with the Ganglia autoconf stuff and derive something from there. Sorry, I wish we had a standard template because I know that would make life easier. Brad >>> On 7/1/2009 at 7:46 PM, in message <8121824c0907011846q7934d103t980e76728842f...@mail.gmail.com>, JB Kim wrote: > Thanks for the feedback. Creating these metrics into independent > module makes sense. I'll refactor it to isolate the code within > mod_iostat.c and see how it works out. > Before I do so, is there a "standard" template for configure/make > files to build these DSO modules independently? > Having such deployment template (perhaps also .spec) structure would > encourage other developers to contribute code more easily, I think. > > On Wed, Jul 1, 2009 at 11:15 AM, Brad Nicholes wrote: >> Thanks for the new module code. I haven't had a chance to actually look > at the code yet but considering that this is a new metric module, it might be > better to decouple it from the rest of the Ganglia code as an independent > module rather than having the code integrated into metrics.c and the core > build system. If the module is independently buildable then it really > doesn't matter if it is linux only or cross platform. Even if the module > remained linux only, only those who are interested would download, build and > use it which is fine. It would also make it much easier for us to simply > drop the tarball into the /contrib directory of SVN in order to make it > available immediately rather than having to integrate it into the core build > and re-release the whole project. The ideal situation would be to also > include a .spec file that would allow the module to be built and packaged > into its own RPM. But just a buildable source tarball would be great. >> >> It is good to see people contributing new modules to the project even if > they are only for a single platform. The more modules we have to offer, the > better the whole project is for everybody. >> >> Brad >> >>>>> On 6/30/2009 at 9:00 PM, in message >> <8121824c0906302000y181b23adr87ebd98124450...@mail.gmail.com>, JB Kim >> wrote: >>> Hi folks >>> >>> Wow, it didn't occur to me this thread was more than a year ago. Time >>> surely flies when you have a newborn at home. :-) >>> In any case, I've made necessary modifications to 3.1.2 release to >>> allow iostat-related metrics for linux. >>> >>> Here is the tarball that compiles on linux and reports 7 extra metrics >>> from a new DSO module called iostat. >>> >>> http://www.remnantone.com/pkgs/ganglia/ganglia-3.1.2_io.tar.gz >>> >>> The mod_iostat contains the following metrics: >>> >>> - io_readtot >>> - io_readkbtot >>> - io_writetot >>> - io_writekbtot >>> - io_svctmax >>> - io_queuemax >>> - io_busymax >>> >>> The code changes have been made to: >>> >>> - libmetrics/libmetrics.h >>> - libmetrics/linux/metrics.c >>> - gmond/modules/mod_iostat.c >>> - Makefile.am changes to include a new module build >>> >>> There are couple of points: >>> >>> * The new set of metrics are only for linux at this point. (supports >>> 2.4 and 2.6 kernels) >>> As you can see, all of the metric functions are implemented within >>> libmetrics/linux/metrics.c >>> * These metric functions report aggregated values. 4 of them are sums >>> across disks, 3 of them are max across the disks. >>> These metrics would be ideal for cluster computing nodes which often >>> has 1 or 2 disks, not for large servers with multitude of disks. >>> * I had thought about making things isolated to modules/iostat/mod_iostat.c. >>> However, since the current implementation only works on linux, I >>> decided it was best to place it in libmetrics/linux, which is >>> already os-dependent code, rather than trying to support multi-os >>> build with bunch of #ifdef/#endif inside of >>> modules/iostat/mod_iostat.c. >>> * Future improvements should consider reporting independent io metrics >>> for user supplied list of disks instead of aggregating the whole. >>> * Lastly, apologies for ugly code... >>> >>> If there's sufficient interest and you would like these me
Re: [Ganglia-developers] Disk IO as gmond core metric
Thanks for the new module code. I haven't had a chance to actually look at the code yet but considering that this is a new metric module, it might be better to decouple it from the rest of the Ganglia code as an independent module rather than having the code integrated into metrics.c and the core build system. If the module is independently buildable then it really doesn't matter if it is linux only or cross platform. Even if the module remained linux only, only those who are interested would download, build and use it which is fine. It would also make it much easier for us to simply drop the tarball into the /contrib directory of SVN in order to make it available immediately rather than having to integrate it into the core build and re-release the whole project. The ideal situation would be to also include a .spec file that would allow the module to be built and packaged into its own RPM. But just a buildable source tarball would be great. It is good to see people contributing new modules to the project even if they are only for a single platform. The more modules we have to offer, the better the whole project is for everybody. Brad >>> On 6/30/2009 at 9:00 PM, in message <8121824c0906302000y181b23adr87ebd98124450...@mail.gmail.com>, JB Kim wrote: > Hi folks > > Wow, it didn't occur to me this thread was more than a year ago. Time > surely flies when you have a newborn at home. :-) > In any case, I've made necessary modifications to 3.1.2 release to > allow iostat-related metrics for linux. > > Here is the tarball that compiles on linux and reports 7 extra metrics > from a new DSO module called iostat. > > http://www.remnantone.com/pkgs/ganglia/ganglia-3.1.2_io.tar.gz > > The mod_iostat contains the following metrics: > > - io_readtot > - io_readkbtot > - io_writetot > - io_writekbtot > - io_svctmax > - io_queuemax > - io_busymax > > The code changes have been made to: > > - libmetrics/libmetrics.h > - libmetrics/linux/metrics.c > - gmond/modules/mod_iostat.c > - Makefile.am changes to include a new module build > > There are couple of points: > > * The new set of metrics are only for linux at this point. (supports > 2.4 and 2.6 kernels) > As you can see, all of the metric functions are implemented within > libmetrics/linux/metrics.c > * These metric functions report aggregated values. 4 of them are sums > across disks, 3 of them are max across the disks. > These metrics would be ideal for cluster computing nodes which often > has 1 or 2 disks, not for large servers with multitude of disks. > * I had thought about making things isolated to modules/iostat/mod_iostat.c. > However, since the current implementation only works on linux, I > decided it was best to place it in libmetrics/linux, which is > already os-dependent code, rather than trying to support multi-os > build with bunch of #ifdef/#endif inside of > modules/iostat/mod_iostat.c. > * Future improvements should consider reporting independent io metrics > for user supplied list of disks instead of aggregating the whole. > * Lastly, apologies for ugly code... > > If there's sufficient interest and you would like these metrics to be > included in the subsequent release, I'll enhance/modify the code as > necessary. > > Thanks! > > On Tue, Apr 29, 2008 at 8:58 PM, JB Kim wrote: >> Sure, sounds like a plan. I'll take a crack at it and let you know. >> >> On Tue, Apr 29, 2008 at 12:18 PM, Brad Nicholes wrote: >>> >>> On 4/28/2008 at 8:26 PM, in message >>> <8121824c0804281926wb285fe4u5f269cfbf58a0...@mail.gmail.com>, "JB Kim" >>> >>> wrote: >>> > Folks, >>> > >>> > I've made some changes to ganglia 3.0.7 gmond code to provide aggregated >>> > disk IO >>> > statistics for linux hosts. Since a given host can have one or more disks, >>> > the >>> > values from each individual disk are aggregated to a sum or to a max. >>> > >>> > It seems like a lot of folks are using a wrapper for iostat command to >>> > send >>> > data >>> > via gmetric. While this is also a useful approach, I thought it would be >>> > nice >>> > and convenient to have this reported from gmond, although the data >>> > would be summarized >>> > for an entire host. The code simply reads from either /proc/partitions >>> > or /proc/diskstats, and >>> > maintains the old and the new values for each disk to calculate the diff. >>> > >>> > These are the new metrics that were
Re: [Ganglia-developers] 3.1.2 not mentioned on http://www.ganglia.info/
>>> On 4/1/2009 at 7:49 AM, in message <49d370cc.1010...@sara.nl>, Ramon >>> Bastiaans wrote: > FYI, I noticed; > > At a first glance, version 3.1.1 is the latest version mentioned on > http://www.ganglia.info/ > > 3.1.2 is only found in the Sourceforge downloads, but mentioned no place > else. Good point. I will see about getting something posted. Brad -- ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Building a ganglia interface into collectl
The only thing that I would suggest is that you use the APIs to create and send the packets. The problem with hand crafting the packets is that if the gmond XDR packet definition ever changes, your interface will be broken. I don't foresee the XDR packet definition change very frequently, hopefully not in the near future since it has been crafted to be expandible, but it could cause you problems if it does. Basically all you would need to do is either dynamically link the ganglia library or use dlopen() to look for it. If it succeeds then use dlsym() to import the functions and start calling the APIs. The downside is that you would probably have to deploy the ganglia library yourself since you are trying to use collectl rather than gmond to gather metrics. Brad >>> On 3/31/2009 at 11:32 AM, in message <9fce3a46fe7c8045a6207ae4b42e9f9a4794075...@gvw1119exc.americas.hpqcorp.net>, "Seger, Mark" wrote: > This uses 3.1 Here's an example of the output I generate when debugging is > turned on: > > I have a routine I pass 3 parameters to: name of the variable, its units > and the value. I generate 2 packets, the first with a header and the second > with the data, which looks like this: > > 13:41:45.014 Name: ctxint.ctx Units: switches/sec Val: > 562 TTL: 60 sent > 00 00 00 80 00 00 00 0c 63 61 67 2d 64 6c 35 38 35 2d 30 32 00 00 00 0a 63 > 74 78 69 6e 74 2e 69 6e 74 00 00 00 00 00 00 00 00 00 06 64 6f 75 62 6c 65 00 > 00 00 00 00 0a 63 74 78 69 6e 74 2e 69 6e 74 00 00 00 00 00 0b 69 6e 74 72 70 > 74 73 2f 73 65 63 00 00 00 00 03 00 00 00 3c 00 00 00 00 00 00 00 00 > 00 00 00 85 00 00 00 0c 63 61 67 2d 64 6c 35 38 35 2d 30 32 00 00 00 0a 63 > 74 78 69 6e 74 2e 69 6e 74 00 00 00 00 00 00 00 00 00 02 25 73 00 00 00 00 00 > 04 31 30 37 32 > > Not sure if seeing the binary is of much value without the mappings, but as > I said a V3.1 gmond is very happy with what I'm sending. We actually did > think of xdr, but that would but additional requirements on collectl and it > feels like doing things in binary will help minimize overhead. > > -mark > > |-Original Message- > |From: Brad Nicholes [mailto:bnicho...@novell.com] > |Sent: Tuesday, March 31, 2009 1:13 PM > |To: Seger, Mark; ganglia-developers@lists.sourceforge.net > |Cc: Evan J Felix > |Subject: Re: [Ganglia-developers] Building a ganglia interface into > |collectl > | > |>>> On 3/31/2009 at 9:56 AM, in message <49d23d46.2090...@hp.com>, Mark > |Seger > | wrote: > |> > |> This then leads to my question, which is what is the best way to send > |> data to ganglia. I want to keep my messages very dense and so we > |chose > |> to simply send out binary data in the same format gmond expects. In > |the > |> case of pnnl, where they have a monitoring hierarchy, we've completely > |> replaced all the monitoring gmonds with a dozen that act only as > |> aggregators. There are about 190 nodes running collectl sending UPD > |> messages to each aggregator gmonds and it seems to run just fine. > |Does > |> this make sense? Is there anything to watch out for? > |> > |> If anyone else is interested in trying this out while we're shaking > |out > |> the code, I'd be happy to share some pre-release code with a few > |people. > |> > | > |Which version of Ganglia are you targeting (3.0.x or 3.1.x). Ganglia > |uses XDR to pack and unpack the metric packets. However the actual > |format changed significantly between 3.0.x and 3.1.x. You can see the > |XDR packet layout in the file lib/protocol.x for 3.0.x or > |lib/gm_protocol.x for 3.1.x. The 3.1.x version is a bit more complex > |than the 3.0.x version. The 3.0.x version is a very simple XDR packet > |that basically contains a metric ID and a value. Gmond 3.0.x can get > |away with just sending an ID in the packet because every 3.0.x gmond > |hardcodes the metric metadata. Gmond 3.1.x made this more flexible by > |splitting the packets in to metadata and value packets. Probably the > |easiest way to communicate directly with gmond is to use the message > |creation and sending APIs that are part of the ganglia library. Take a > |look at the gmetric utility code for an example of how to use these > |APIs. Gmetric is basically doing what you want to do but as a > |standalone utility. For Ganglia 3.0.x you will have to include > |lib/ganglia.h, for Ganglia 3.1.x you will include include/ganglia.h. > |The libraries for Ganglia 3.1.x version have been made a little more > |developer friendly by putting the public headers in the include/ > |directory and converting the library to be a .so rather than static. > | > |Brad > | -- ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Building a ganglia interface into collectl
>>> On 3/31/2009 at 9:56 AM, in message <49d23d46.2090...@hp.com>, Mark Seger wrote: > > This then leads to my question, which is what is the best way to send > data to ganglia. I want to keep my messages very dense and so we chose > to simply send out binary data in the same format gmond expects. In the > case of pnnl, where they have a monitoring hierarchy, we've completely > replaced all the monitoring gmonds with a dozen that act only as > aggregators. There are about 190 nodes running collectl sending UPD > messages to each aggregator gmonds and it seems to run just fine. Does > this make sense? Is there anything to watch out for? > > If anyone else is interested in trying this out while we're shaking out > the code, I'd be happy to share some pre-release code with a few people. > Which version of Ganglia are you targeting (3.0.x or 3.1.x). Ganglia uses XDR to pack and unpack the metric packets. However the actual format changed significantly between 3.0.x and 3.1.x. You can see the XDR packet layout in the file lib/protocol.x for 3.0.x or lib/gm_protocol.x for 3.1.x. The 3.1.x version is a bit more complex than the 3.0.x version. The 3.0.x version is a very simple XDR packet that basically contains a metric ID and a value. Gmond 3.0.x can get away with just sending an ID in the packet because every 3.0.x gmond hardcodes the metric metadata. Gmond 3.1.x made this more flexible by splitting the packets in to metadata and value packets. Probably the easiest way to communicate directly with gmond is to use the message creation and sending APIs that are part of the ganglia library. Take a look at the gmetric utility code for an example of how to use these APIs. Gmetric is basically doing what you want to do but as a standalone utility. For Ganglia 3.0.x you will have to include lib/ganglia.h, for Ganglia 3.1.x you will include include/ganglia.h. The libraries for Ganglia 3.1.x version have been made a little more developer friendly by putting the public headers in the include/ directory and converting the library to be a .so rather than static. Brad -- ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] optional metrics
>>> On 3/18/2009 at 5:18 AM, in message wrote: > > Hi, > > gmond refuses to start if any of the metrics fail to initialise. > > In this new era of modular metrics, it is likely that some modules will > relate to things that are not static (e.g. a NIC that existed before but > has now gone away). > > Maybe there needs to be a configuration option to declare certain > metrics to be non-essential at startup time? > > Beyond that, it may also be nice to support metrics that only exist > intermittently (e.g. if someone temporarily mounts an iSCSI LUN for 30 > minutes, and they want metrics without reconfiguring and restarting > gmond) > This goes back to previous discussions that we have had on this list around making gmond recognize new metrics without having to stop and restart gmond. In other words, being able to dynamically add and remove metric module configuration. Currently gmond can't handle adding or removing metric configuration on the fly. It reads all of the configuration at startup and creates an internal list of metrics and associated data. Gmond depends on that internal list for everything and changing it on the fly would cause major problems. In order to do what you are suggesting, which I think would be great BTW, the whole mechanism for tracking metrics and metric data will have to be reworked. This can certainly be done, but it will take some well thought out design and coding. Brad -- Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are powering Web 2.0 with engaging, cross-platform capabilities. Quickly and easily build your RIAs with Flex Builder, the Eclipse(TM)based development software that enables intelligent coding and step-through debugging. Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers