On Thu, Aug 07, 2008 at 05:09:49PM -0600, Brad Nicholes wrote:
> >>> On 8/7/2008 at 11:44 AM, in message <[EMAIL PROTECTED]>, Carlo
> Marcelo Arenas Belon <[EMAIL PROTECTED]> wrote:
> > On Thu, Aug 07, 2008 at 09:51:14AM -0600, Brad Nicholes wrote:
> >> I reverted the workaround in hash.c
> > 
> > I think we should leave that one in there anyway (maybe as an assert?) as a
> > hash lookup in a NULL pointer hash is invalid anyway and will result in a
> > segfault (if we make a mistake somewhere else that goes that path)
> 
> I could go either way on that.  If we leave it in, then we run the risk
> of masking any future problems rather than quickly detecting it because
> of a segfault.  It may make problems harder to find.

agree, but having a segfault as a way to return an error is a bug on itself,
and having a production user having to handle it (like windows crash reports)
is IMHO a bad user experience as well.

will reimplement it then as an assert or similar so that it will abort the
program with an explanation, at least for debug builds, while keeping
production builds from crashing which should improve both: debugability and
stability.

> >> and fixed process_xml.c to process the EXTRA_ELEMENT tag correctly.
> >> The problem was the fact that the EXTRA_ELEMENT data should only be
> >> processing this tag if gmetad is in authority mode.  It was missing
> >> a simple check for authority mode that the other tags included.
> > 
> > nice, does this mean that all other tags were confirmed to have the check?
> > and could we in this cases get rid as well of the "EXTRA_DATA" tags?
> 
> All of the other tags from CLUSTER on down to EXTRA_ELEMENT all include
> a check for the authority flag.

great.

> I'm not sure what you mean by "get rid as well of the "EXTRA_DATA" tags".

I mean that when there are no EXTRA_ELEMENT there is also no reason to have
EXTRA_DATA and a quick look at the DTD seem to imply that is also a valid
interface (it was valid for 3.0 anyway and is part of the XML compatibility
layer because of that)

> The EXTRA_DATA tag is used to encapsulate multiple EXTRA_ELEMENT tags.
> Even though the EXTRA_DATA tag does carry a value, it is still needed.

as explained before it is not needed by the DTD, and if the code needs it
that is a bug as well.

agree this bug is not directly related to the original one (the segfault)
and so there shouldn't be a need to fix them together either but it should
be fixed regardless as it will also save bandwidth and processing time
(helping with scalability) and is wasteful otherwise.

> >> Everything should be good now with gmetad and once the backport has been
> >> reviewed, we can included it in the next 3.1.1 release.
> > 
> > verified it works as expected, at least in the configuration that was 
> > reported
> > with problems, will be interesting to see how it behaves with sources that
> > have no authority information (like 2.5 gmetad) though.
> > 
> > tried it with "scalable off" and seems to be doing the right thing (even if 
> > it
> > is triggering a double update bug while updating rrds), but that is probably
> > not a regression specifically from this fix.
> > 
> > in any case, since 3.1.0 has packages published in fedora, debian and gentoo
> > as soon as we get an official backport patch will be a good idea to forward
> > that to them, so that fixed 3.1.0 packages could be used instead of the
> > current broken ones.
> 
> The backport is already in the STATUS file.

voted, but testing with 2.5 leaf gmetads (as used currently in SuSE and Debian)
will be a good idea.

> It does include the revert of hash.c, but I can create a patch that just
> applies to process_xml.c.  Where should be put the patch file?

we might end up having to upload it to their corresponding bug systems
but at least Kostas and Stu (who are most likely already in the loop for this
issue) are probably already making their own based on the information
provided, and in case they are not, I have it attached to this email and have
it uploaded to BUG188 as well for completeness.

> Should it go into www.ganglia.info/releases with the gmond.conf patch
> file or should we create a www.ganglia.info/patches/3.1.0 location?

having a "patches" repository might be a good idea (specially considering we
are not meant to release packages with nano versions) and considering with
the current development model we are going to take some time to stabilize the
next release and which will include some patches important enough to expedite
(like this one).

not sure about the location though as I would expect it to hang from releases
and the current releases tree doesn't seem to be aligned with the fact that we
are doing releases and development in at least 2 branches (3.0 and 3.1) or
include a release package at all.

Carlo
Index: gmetad/process_xml.c
===================================================================
--- gmetad/process_xml.c        (revision 1634)
+++ gmetad/process_xml.c        (working copy)
@@ -724,7 +724,11 @@
     
     if (!xmldata->host_alive) 
         return 0;
-    
+
+    /* Only keep extra element details if we are the authority on this 
cluster. */
+    if (!authority_mode(xmldata))
+       return 0;
+
     hashkey.data = (void*) name;
     hashkey.size =  strlen(name) + 1;
     
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Reply via email to