Matt/Martin and all. I am finding that I am still getting occassional truncated XML from gmond, even after the EAGAIN patches to gmond.c. Interestingly, when the data was truncated, it ended with a </HOST> tag. i.e. a host boundary.
Looking at the code, I see this: <snip> /* Walk the host hash */ for(hi = apr_hash_first(client_context, hosts); hi; hi = apr_hash_next(hi)) { apr_hash_this(hi, NULL, NULL, &val); status = print_host_start(client, (Ganglia_host *)val); if(status != APR_SUCCESS) { goto close_accept_socket; } </snip> Ahh. This is another place that we need the EAGAIN retry loop. In fact to be safe, the print_xml_header code should also be protected. Do you guys agree with the analysis? My gmonds run on windows - for some reason windows/cygwin often gives me the EGAIN returns while the Linux daemons never seem to. regards, Richard -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Martin Knoblauch Sent: 21 March 2006 12:14 To: Grevis, Richard: IT (LDN); ganglia-developers@lists.sourceforge.net Subject: Re: [Ganglia-developers] Possible bug in hosts up calculation when federating clusters. Hi Richard, oops. You are both right and wrong :-) Looking at the code and comments for "old", it seems that the whole logic is pre-3.0. It was used to distinguish between 2.5.x and prior versions. For that purpose the code is right. Of course, it now fails when it encounters the 3.0.X string. Something like this should solve it. Care to test? I am not sure about setting old to 0 in the other case. In any case, the whole xmldata_t structure is initialized early on. diff -u -r1.45 process_xml.c --- process_xml.c 18 Nov 2004 20:14:31 -0000 1.45 +++ process_xml.c 21 Mar 2006 12:09:16 -0000 @@ -821,7 +821,7 @@ if (xt->tag == VERSION_TAG) { /* Process the version tag later */ - if(! strstr( attr[i+1], "2.5." ) ) + if( strcmp( attr[i+1], "2.5." ) < 0 ) { debug_msg("[%s] is an OLD version", xmldata->ds->name); xmldata->old = 1; Cheers Martin --- [EMAIL PROTECTED] wrote: > All, > > when I had debugging turned on in gmetad, the daemon was announcing > data sources > as old, when they were not. Looking at the code, 3.0.2 or 3.0.3 > gmetad/process_xml.c, line 821 or so, function > startElement_GANGLIA_XML: > <snip> > if (xt->tag == VERSION_TAG) > { > /* Process the version tag later */ > if(! strstr( attr[i+1], "2.5." ) ) > { > debug_msg("[%s] is an OLD version", > xmldata->ds->name); > xmldata->old = 1; > } > } > } > </snip> > > It seems there are two problems here. First, is not the strstr test > the wrong > way round? Second, if it is a new version of ganglia, xmldata->old > should be > explicitely set to zero. This seemed to make it better: > <snip> > if (xt->tag == VERSION_TAG) > { > /* Process the version tag later */ > if( strstr( attr[i+1], "2.5." ) ) > { > debug_msg("[%s] version %s is an OLD > version", > xmldata->ds->name, attr[i+1]); > xmldata->old = 1; > } else { > debug_msg("[%s] version %s is a NEW version", > xmldata->ds->name, attr[i+1]); > xmldata->old = 0; > } > } > } > </snip> > > You actually don't notice the problem if your clocks are well > syncronised everywhere, because when clusters/grids are tagged as old, > it does the up/down calculation from > the current time, and it still works. > > What do you guys all think? > > kind regards, > > Richard Grevis > CTO wallah, > Barclays Capital > ------------------------------------------------------ Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 _______________________________________________ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers ------------------------------------------------------------------------ For more information about Barclays Capital, please visit our web site at http://www.barcap.com. Internet communications are not secure and therefore the Barclays Group does not accept legal responsibility for the contents of this message. Although the Barclays Group operates anti-virus programmes, it does not accept responsibility for any damage whatsoever that is caused by viruses being passed. Any views or opinions presented are solely those of the author and do not necessarily represent those of the Barclays Group. Replies to this email may be monitored by the Barclays Group for operational or business reasons. ------------------------------------------------------------------------