Processed: Re: Bug#654783: race condition in libpthread causes hangs in python2.7 testsuite
Processing commands for cont...@bugs.debian.org: > tag 654783 pending Bug #654783 [libc0.1] race condition in libpthread causes hangs in python2.7 testsuite Added tag(s) pending. > thanks Stopping processing here. Please contact me if you need assistance. -- 654783: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=654783 Debian Bug Tracking System Contact ow...@bugs.debian.org with problems -- To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/handler.s.c.133494699830980.transcr...@bugs.debian.org
Bug#654783: race condition in libpthread causes hangs in python2.7 testsuite
tag 654783 pending thanks El 20 d’abril de 2012 8:47, Petr Salinger ha escrit: > In the original (plain linuxthreads) code, with thread implemented as > freebsd process, the wakeup signal is sent to thread manager from kernel, > after exit of thread. > > In current variant, with thread implemented as freebsd kernel thread, > the wakeup signal is sent to thread manager from userspace, a few moments > before exit. It is an expected race condition. It is also the reason, why > "|| main_thread_exiting" have been added. I expected, that loss of a > wakeup does not matter, the "child thread" will be "eaten" only slightly > later, when another thread exits and sends wake up. The only problem should > be, when there is no another thread, it should be solved by > "|| main_thread_exiting". But it does not suffice. > > The "try eat dead child" everytime is just workaround. Yep, eating dead children everytime doesn't sound like the cleanest option to me either ;-) > The better way might be to add atomic counter > [using gcc's __sync_fetch_and_add()] to track the number of expected "dead > or soon to be dead" child > and "try to eat dead child" when the number is above zero. Thanks for the heads-up. I notice you already fixed this in pkg-glibc SVN. Maybe it's not worth improving further... (IMHO time would be better spent on NPTL). Thank you! -- Robert Millan -- To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/CAOfDtXOQpP8BOjXXS5vHTLOFW=yr-kyvvw8fdg8syntxhi5...@mail.gmail.com
Bug#654783: race condition in libpthread causes hangs in python2.7 testsuite
That's really nice. Petr, could you give some explanation on that one-line patch you provided? Is it supposed to be the correct fix or is more work necessary? I'm not familiar with the whole picture but if you give some pointers I may be able to help. In the original (plain linuxthreads) code, with thread implemented as freebsd process, the wakeup signal is sent to thread manager from kernel, after exit of thread. In current variant, with thread implemented as freebsd kernel thread, the wakeup signal is sent to thread manager from userspace, a few moments before exit. It is an expected race condition. It is also the reason, why "|| main_thread_exiting" have been added. I expected, that loss of a wakeup does not matter, the "child thread" will be "eaten" only slightly later, when another thread exits and sends wake up. The only problem should be, when there is no another thread, it should be solved by "|| main_thread_exiting". But it does not suffice. The "try eat dead child" everytime is just workaround. The better way might be to add atomic counter [using gcc's __sync_fetch_and_add()] to track the number of expected "dead or soon to be dead" child and "try to eat dead child" when the number is above zero. And (of course) in long term, do not use manager thread anymore. Petr -- To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/alpine.lrh.2.02.1204200833090.20...@sci.felk.cvut.cz
Bug#654783: race condition in libpthread causes hangs in python2.7 testsuite
On 19/04/12 20:54, Robert Millan wrote: > CCing #575302 > > El 19 dâabril de 2012 1:12, Steven Chamberlain ha > escrit: >> Also, perhaps related, I got through the (Python-powered) iceweasel >> 10.0.3esr test suite for the first time, without hangs (see #575302). >> Maybe this helped. > > That's really nice. Petr, could you give some explanation on that > one-line patch you provided? Is it supposed to be the correct fix or > is more work necessary? I'm not familiar with the whole picture but > if you give some pointers I may be able to help. I only thought to test iceweasel because in #658704 you mentioned an infinite poll() loop (but you didn't show the timing, which you would get from kdump -T). Maybe if __pthread_sig_cancel is missed somehow, Petr's diff works around that by checking anyway for terminated child threads every couple of seconds. Just guessing. Python 2.7.3~rc2 fixed something else, that could have been causing iceweasel's test harness to hang (like waf in #668240) so that maybe also helped here. Regards, -- Steven Chamberlain ste...@pyro.eu.org -- To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/4f907e97.3040...@pyro.eu.org
Bug#654783: race condition in libpthread causes hangs in python2.7 testsuite
On 19/04/12 20:51, Robert Millan wrote: > CCing #663056 > > El 19 dâabril de 2012 1:12, Steven Chamberlain ha > escrit: >> For now I still have Petr's change applied. I notice that libsoup2.4's >> connection-test (see #663056) has stopped failing. (Just had 100/100 >> test passes, was previously seeing about 50% failures.) > > Are you sure? You mean you tried 100 times? It passed 100 times in a row. And another 100 times just now. I'm not sure that Petr's patch is what really fixed it, but I can try to narrow it down. You say the cause was well-known...? Regards, -- Steven Chamberlain ste...@pyro.eu.org -- To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/4f907c84.8000...@pyro.eu.org
Bug#654783: race condition in libpthread causes hangs in python2.7 testsuite
CCing #575302 El 19 d’abril de 2012 1:12, Steven Chamberlain ha escrit: > Also, perhaps related, I got through the (Python-powered) iceweasel > 10.0.3esr test suite for the first time, without hangs (see #575302). > Maybe this helped. That's really nice. Petr, could you give some explanation on that one-line patch you provided? Is it supposed to be the correct fix or is more work necessary? I'm not familiar with the whole picture but if you give some pointers I may be able to help. -- Robert Millan -- To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/CAOfDtXPRvgHCxe3igp43�7ny-vrkwdawkfgrpp6vd+80p...@mail.gmail.com
Bug#654783: race condition in libpthread causes hangs in python2.7 testsuite
CCing #663056 El 19 d’abril de 2012 1:12, Steven Chamberlain ha escrit: > For now I still have Petr's change applied. I notice that libsoup2.4's > connection-test (see #663056) has stopped failing. (Just had 100/100 > test passes, was previously seeing about 50% failures.) Are you sure? You mean you tried 100 times? I don't know about connection-test, but context-test was a race condition. I'm also 100% sure Petr's change doesn't fix that (the reason for connection-test failure is well-known). After fixing context-test I got a connection-test pass, but I only tried once (at that time I assumed it was the same issue as context-test). -- Robert Millan -- To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/caofdtxpxc99mw9+gz8jqug0fs3to6rmdmh2tqy_wd2ufhz4...@mail.gmail.com
Bug#654783: race condition in libpthread causes hangs in python2.7 testsuite
On 18/04/12 19:59, Robert Millan wrote: > El 18 dâabril de 2012 15:46, Steven Chamberlain ha > escrit: >> With it, I hit a tst-timer5 regression during build. > > Don't worry about tst-timer5, it's a fake regression. Previously it > "succeeded" by exitting without testing anything. Oh okay. For now I still have Petr's change applied. I notice that libsoup2.4's connection-test (see #663056) has stopped failing. (Just had 100/100 test passes, was previously seeing about 50% failures.) Also, perhaps related, I got through the (Python-powered) iceweasel 10.0.3esr test suite for the first time, without hangs (see #575302). Maybe this helped. Regards, -- Steven Chamberlain ste...@pyro.eu.org -- To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/4f8f4a47.4000...@pyro.eu.org