On Thursday 14 October 2010 12:42:27 Jonathan Lange wrote: > On Thu, Oct 14, 2010 at 12:25 PM, Steve Kowalik > > <[email protected]> wrote: > > Hi guys, > > > > I seem to constantly get thread-based failures when submitting a > > branch to ec2, or when Hudson performs a build. I got sick enough of it > > today to actually sit down and talk to Robert and Maris about it, and > > did a little bit of debugging. > > > > It does seem like certain tests will leave a thread hanging > > around, which then zope gets caught up in. > > > > test: > > lp.codehosting.puller.tests.test_worker.TestWorkerProgressReporting.test_ > > n etwork > > Thread Name: MainThread > > Is Daemon?: False > > Thread target: None > > > > Thread Name: Thread-18 > > Is Daemon?: True > > Thread target: <bound method HttpServer._http_start of > > HttpServer(127.0.0.1:3711 > > 1)> > > > > Thread Name: Thread-20 > > Is Daemon?: 1 > > Thread target: <bound method > > TestingThreadingHTTPServer.process_request_thread o > > f <bzrlib.tests.http_server.TestingThreadingHTTPServer instance at > > 0x6e78128>> > > > > time: 2010-10-14 10:53:44.596568Z > > successful: > > lp.codehosting.puller.tests.test_worker.TestWorkerProgressReporting.test_ > > network test: > > lp.codehosting.puller.tests.test_worker.TestWorkerProgressReporting.test_ > > network tags: zope:threads > > error: > > lp.codehosting.puller.tests.test_worker.TestWorkerProgressReporting.test_ > > network [ multipart > > Content-Type: text/plain;charset=utf8 > > garbage > > 34 > > [<Thread(Thread-18, started daemon 47971215480592)>]0 > > ] > > time: 2010-10-14 10:53:44.596847Z > > > > So it looks like the HttpServer instance needs to be killed in the test > > or in the teardown? I'm at a little bit of a loss, personally, so > > thought I'd throw it out there first. > > This seems a lot like https://bugs.edge.launchpad.net/bzr/+bug/193253, > although there it's a socket leaking check rather than a thread > leaking check. I don't know what's caused it to regress. > > Specifically, there's code hidden by bzrlib that isn't cleaning up > after itself. Whether it should or not is an open question. From one > point of view, our thread checker is being overzealous, catching a > leak in something that's never going to affect production. From > another point of view, HttpServer.stop_server() should darn well stop > the server. > > Anyway, fixes are: > * Fix bzrlib.tests.http_server to clean up its thread in stop_server > * Find some way of getting the thread leaking checker to ignore the > thread > > Perhaps there are more fundamental issues that could be address. Them, > I leave to Rob. > > CCing vila because of the history.
This and other failures have been occuring on Steve's Hudson instance ever since he started it: https://hudson.wedontsleep.org/job/devel/104/ I guess we've all been too busy to notice/deal with this, but I do find it disturbing that we can't get a consistent test run in different environments. The failures above don't appear in buildbot but they are in ec2. It seems like there's a pattern to do with threads and/or external processes; I hope someone with more knowledge than I can diagnose them. For now I am copying jml and looking at Rob. :-) Cheers. _______________________________________________ Mailing list: https://launchpad.net/~launchpad-dev Post to : [email protected] Unsubscribe : https://launchpad.net/~launchpad-dev More help : https://help.launchpad.net/ListHelp

