Hey team,

I am getting more and more concerned about the length of time that
master has been cursed.

It seems that sometime recently we have introduced serious instability
in cmd/jujud/agent, and it is often getting wedged and killed by the
test timeout.

I have spent some time looking, but I have not yet found a definitive
cause.  At least some of the time the agent is failing to stop and is
deadlocked.

This is an intermittent failure, but intermittent enough that often at
least one of the unit test runs fails with this problem cursing the
entire run.

One think I have considered to aid in the debugging is to add some code
to the juju base suites somewhere (or in testing) that adds a goroutine
that will dump the gocheck log just before the test gets killed due to
timeout - perhaps a minute before. Not sure if we have access to the
timeout or not, but we can at least make a sensible guess.

This would give us at least some logging to work through on these
situations where the test is getting killed due to running too long.

If no one looks at this and fixes it overnight, I'll start poking it
with a long stick tomorrow.

Cheers,
Tim

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Reply via email to