I am seeing two major problems: First, what exactly is the purpose of the new GRID tag? It seems that it is a mandatory part of the new gmetad system, and more importantly the authority attribute is mandatory. Is this correct? It appears that if you setup one gmetad to collect data from another gmetad that adds this grid tag and authority attribute, then the second gmetad only writes summary rrds, correct? This is a problem for us because of the way we had and would like to continue using ganglia at our computer facility. We would like to have an internal ganglia monitoring host to monitor our whole facility which spans 5 separate experiment clusters. Then allow the individual experiments to get the xml data from our gmetad collector and reproduce it on their own webserver without having links redirect them to our main webfrontend. They might also be adding in clusters from outside of BNL that are part of their experiment. Our main facility gmetad would monitor everything, but its webfrontend would not be visible outside of the BNL firewall, so passing around the authority attribute would not work and should be optional. I would suggest having the ability to turn this new authority option off. If the authority is there then gmetad should assume that it can redirect you there to find the rrd graphs, but if it is off then the second gmetad that polls it should reproduce all of the rrds locally. Does this make sense and sound reasonable?
The second problem I have is if I just do a quick hack to turn off the grid tag in the gmetad xml output. In this case I think the new timestamp patch is screwing up the second collector when it tries to write to the rrds. When first started, I get a lot of errors trying to create and update the rrds. Each time I get an error it looks like it aborts the xml parsing and stops creating the rrds. After waiting a long time, finally all of the rrds get created, but I still get frequent errors like: Mar 14 11:36:51 www /usr/sbin/gmetad[32213]: RRD_update: illegal attempt to update using time 1047659775 when last update time is 1047659775 (minimum one second step) Which cause the now famous gaps in the rrd graphs when looking at the hour resolution. The 2.5.2 version of gmetad does not have a problem getting data from gmetad with the grid tag removed, but 2.5.3 does so I can only assume it must be related to the new timestamp patch. Does anyone have an idea what might be wrong? I probably won't have time to investigate this more till next week. Sorry for the long email, ~Jason -- /------------------------------------------------------------------\ | Jason A. Smith Email: [EMAIL PROTECTED] | | Atlas Computing Facility, Bldg. 510M Phone: (631)344-4226 | | Brookhaven National Lab, P.O. Box 5000 Fax: (631)344-7616 | | Upton, NY 11973-5000 | \------------------------------------------------------------------/