On Mon, Nov 30, 2009 at 01:29:34PM +0000, Daniel Pocock wrote:
> Carlo Marcelo Arenas Belon wrote:
>>
>> Your call, eventhough a fix for this feature will be probably preferred as
>> there is nothing special about the BSD for them to be affected and it might
>> be that the problem is therefore more generic.
>>   
> It may be that this bug is revealing a more serious issue in the way  
> initialisation is done, so I would prefer to know the real cause rather  
> than just revert the change that forces the problem to show itself.

agree and as I said before the reason why I didn't just revert it from trunk
or 3.1 as a "fix" even if it seems to "resolve" the problem.

>> At least a revert would be needed for 3.1 as this accounts for a regression
>> but haven't done so either waiting for you to first revert it on trunk and
>> then decide on how to proceed from there depending on how critical this
>> feature was for the release.
>>   
> I agree that it is a recession, but reverting it may cause the real  
> culprit to remain hidden.  I'd rather hold the release while we look  
> more closely.

not sure if I understand what you meant here, since it would be obvious to
me that 3.1.5 can't be released if a fix (even if it is just reverting the
change) is committed.

are you saying you want to hold of on deciding to release or not 3.1.5 or
to see what will be in 3.1.6?, if the later I would suggest also pulling
some other fixes and of course that would also require for us to agree
on a bootstrapping environment for this release at least.

>>> The change has been working on Linux, Solaris and Cygwin.
>>
>> Other than just doing a manual bisect (using git instead of svn here would
>> had been useful) to find where the problem was introduced and validate that
>> reverting it corrects the problem haven't done much analysis of it, but the
>> fact that it broke in such a strange way (was indeed expecting the culprit
>> to be somewhere else, specially considering all recent changes in the
>> networking and the fact that it seemed originally to be triggered by a TCP
>> request) probably points to a bigger issue which just happens to have not
>> been visible on the configurations used to test Linux, Solaris and Cygwin,
>> specially considering how pervasive it was (broke all BSD I had access to
>> test, at least)
>>   
> Can you provide output from strace/truss and also a stack trace from the  
> point where it is in the infinite loop?

filed BUG246 with the trace information (collected from OpenBSD 4.5 amd64)
using ktrace, but you got me there.

from the way the problem represents itself isn't really obvious were the
offending code is and is difficult to debug as well since it dissapears
when in debug mode or not running as a daemon, which is the reason why
I haven't been able to capture a backtrace yet either.

> There is a good reason for moving the daemonize code the way I did - an  
> alternative would be to daemonize, but make the original process hang  
> around until the daemon process has entered the main loop.

OK, and assume it is probably related to the cases were gmond "suddenly"
dies at startup without notification but some clarification on what was
the problem you were trying to solve would be probably usefull too.

Carlo

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Reply via email to