Re: Possible libc_r pthread bug
Alfred Perlstein wrote: * Daniel Eischen [EMAIL PROTECTED] [011130 16:17] wrote: On Fri, 30 Nov 2001, Louis-Philippe Gagnon wrote: If at first you don't succeed... I've encountered a problem using pthread_cancel, pthread_join and pthread_setcanceltype, I'm hoping someone can shed some light. (in a nutshell : pthread_setcanceltype doesn't seem to work in FreeBSD 4.4) (posted to -current and -hackers; if there's a more appropriate mailing list for this, please let me know) I recently encountered a situation where, after calling pthread_cancel to cancel a thread, the call to pthread_join hangs indefinitely. I quickly figured out that it was because the thread being cancelled was never reaching a cancellation point (in fact it was an infinite loop with no function calls at all). Sure enough, adding a pthread_testcancel() in the loop allowed pthread_join to return. However this solution isn't acceptable for my requirements. please test the following patch: There are already cancellation tests when resuming threads whose contexts are not saved as a result of a signal interrupt (ctxtype != CTX_UC). You shouldn't test for cancellation when ctxtype == CTX_UC because you are running on the scheduler stack, not the threads stack. You also have a bug in the way you changed the check for cancellation flags. There only clean way to fix this is to add a return frame to the interrupted context so that it can check for cancellation (and other things) before returning to the threads interrupted context. -- Dan Eischen To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Possible libc_r pthread bug
* Dan Eischen [EMAIL PROTECTED] [011204 06:26] wrote: There are already cancellation tests when resuming threads whose contexts are not saved as a result of a signal interrupt (ctxtype != CTX_UC). You shouldn't test for cancellation when ctxtype == CTX_UC because you are running on the scheduler stack, not the threads stack. That makes sense, but why? You also have a bug in the way you changed the check for cancellation flags. What? There only clean way to fix this is to add a return frame to the interrupted context so that it can check for cancellation (and other things) before returning to the threads interrupted context. No way to work around this? Shouldn't the thread exit library know which stack exactly to clean up even in the context of a signal handler? -- -Alfred Perlstein [[EMAIL PROTECTED]] 'Instead of asking why a piece of software is using 1970s technology, start asking why software is ignoring 30 years of accumulated wisdom.' http://www.morons.org/rants/gpl-harmful.php3 To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Possible libc_r pthread bug
* Alfred Perlstein [EMAIL PROTECTED] [011204 11:45] wrote: * Dan Eischen [EMAIL PROTECTED] [011204 06:26] wrote: There are already cancellation tests when resuming threads whose contexts are not saved as a result of a signal interrupt (ctxtype != CTX_UC). You shouldn't test for cancellation when ctxtype == CTX_UC because you are running on the scheduler stack, not the threads stack. That makes sense, but why? You also have a bug in the way you changed the check for cancellation flags. What? There only clean way to fix this is to add a return frame to the interrupted context so that it can check for cancellation (and other things) before returning to the threads interrupted context. No way to work around this? Shouldn't the thread exit library know which stack exactly to clean up even in the context of a signal handler? Are you sure this is 100% needed? Here's a recap of that patch, are you saying that the problem is that the thread will use the current sp if it exits rather than some value stashed away in the private pthread struct? Also, I think my tests for cancellation are correct. Although I sort of think the PTHREAD_AT_CANCEL_POINT test should be removed because the code will catch this when it leaves the cancellation point. Index: uthread_kern.c === RCS file: /home/ncvs/src/lib/libc_r/uthread/uthread_kern.c,v retrieving revision 1.39 diff -u -r1.39 uthread_kern.c --- uthread_kern.c 7 Oct 2001 02:34:43 - 1.39 +++ uthread_kern.c 4 Dec 2001 17:58:31 - @@ -180,7 +180,7 @@ struct pthread *curthread = _get_curthread(); pthread_t pthread, pthread_h; unsigned intcurrent_tick; - int add_to_prioq; + int add_to_prioq, cfl; /* If the currently running thread is a user thread, save it: */ if ((curthread-flags PTHREAD_FLAGS_PRIVATE) == 0) @@ -604,6 +604,15 @@ */ _thread_kern_in_sched = 0; + /* +* test for async cancel: +*/ + cfl = curthread-cancelflags; + + cfl = (PTHREAD_CANCEL_ASYNCHRONOUS| + PTHREAD_AT_CANCEL_POINT); + if (cfl != 0) + pthread_testcancel(); #if NOT_YET _setcontext(curthread-ctx.uc); #else @@ -1078,6 +1087,8 @@ curthread-sig_defer_count--; } else if (curthread-sig_defer_count == 1) { + int cfl; + /* Reenable signals: */ curthread-sig_defer_count = 0; @@ -1091,8 +1102,9 @@ * Check for asynchronous cancellation before delivering any * pending signals: */ - if (((curthread-cancelflags PTHREAD_AT_CANCEL_POINT) == 0) - ((curthread-cancelflags PTHREAD_CANCEL_ASYNCHRONOUS) != 0)) + cfl = curthread-cancelflags; + cfl = (PTHREAD_CANCEL_ASYNCHRONOUS|PTHREAD_AT_CANCEL_POINT); + if (cfl != 0) pthread_testcancel(); /* -- -Alfred Perlstein [[EMAIL PROTECTED]] 'Instead of asking why a piece of software is using 1970s technology, start asking why software is ignoring 30 years of accumulated wisdom.' http://www.morons.org/rants/gpl-harmful.php3 To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Possible libc_r pthread bug
Alfred Perlstein wrote: * Dan Eischen [EMAIL PROTECTED] [011204 06:26] wrote: There are already cancellation tests when resuming threads whose contexts are not saved as a result of a signal interrupt (ctxtype != CTX_UC). You shouldn't test for cancellation when ctxtype == CTX_UC because you are running on the scheduler stack, not the threads stack. That makes sense, but why? Because when a thread gets cancelled, pthread_exit gets called which then calls the scheduler again. It is also possible to get interrupted during this process and the threads context (which is operating on the scheduler stack) could get saved. The scheduler could get entered again, and if the thread gets resumed, it'll longjmp to the saved context which is the scheduler stack (and which was just trashed by entering the scheduler again). It is too confusing to try to handle conditions like this, and the threads library doesn't need to get any more confusing ;-) Once the scheduler is entered, no pthread routines should be called and the scheduler should not be recursively entered. The only way out of the scheduler should be a longjmp or sigreturn to a saved threads context. You also have a bug in the way you changed the check for cancellation flags. What? When a thread is at a cancellation point, you want to let the cancellable routine handle the cancel. The check as coded before avoided calling pthread_testcancel() when at a cancellation point. I think you check for either PTHREAD_AT_CANCEL_POINT or PTHREAD_CANCEL_ASYNCHRONOUS being set when you really want ((flags PTHREAD_AT_CANCEL_POINT) == 0) ((flags PTHREAD_CANCEL_ASYNCHRONOUS) != 0)) There only clean way to fix this is to add a return frame to the interrupted context so that it can check for cancellation (and other things) before returning to the threads interrupted context. No way to work around this? Shouldn't the thread exit library know which stack exactly to clean up even in the context of a signal handler? It assumes that you're running on the current threads stack. I don't view this particular bug as a big problem. It is a somewhat perverse program that has a CPU bound thread that never gets to any sort of blocking condition and yet still wants to be cancelled. The submitter of the problem doesn't even want to upgrade to get a fix. It can be worked around easily enough by checking for cancellation or by using pthread_kill to send a signal to the thread and have the signal handler exit the thread or longjmp back to the thread at a place that can exit and cleanup. There is already a minor race condition in trying to resume a thread that was interrupted by a signal. Adding some code to munge the stack of an interrupted context so that it calls a wrapper function would solve both problems. The signal handling code already does this to install a signal handler wrapper on a threads stack. -- Dan Eischen To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Possible libc_r pthread bug
* Daniel Eischen [EMAIL PROTECTED] [011204 12:32] wrote: Alfred Perlstein wrote: * Dan Eischen [EMAIL PROTECTED] [011204 06:26] wrote: There are already cancellation tests when resuming threads whose contexts are not saved as a result of a signal interrupt (ctxtype != CTX_UC). You shouldn't test for cancellation when ctxtype == CTX_UC because you are running on the scheduler stack, not the threads stack. That makes sense, but why? Because when a thread gets cancelled, pthread_exit gets called which then calls the scheduler again. It is also possible to get interrupted during this process and the threads context (which is operating on the scheduler stack) could get saved. The scheduler could get entered again, and if the thread gets resumed, it'll longjmp to the saved context which is the scheduler stack (and which was just trashed by entering the scheduler again). It is too confusing to try to handle conditions like this, and the threads library doesn't need to get any more confusing ;-) Once the scheduler is entered, no pthread routines should be called and the scheduler should not be recursively entered. The only way out of the scheduler should be a longjmp or sigreturn to a saved threads context. Ok, for the sake of beating a clue into me... in uthread_kern.c:_thread_kern_sched /* Save the state of the current thread: */ if (_setjmp(curthread-ctx.jb) == 0) { /* Flag the jump buffer was the last state saved: */ curthread-ctxtype = CTX_JB_NOSIG; curthread-longjmp_val = 1; } else { DBG_MSG(Returned from ___longjmp, thread %p\n, curthread); /* * This point is reached when a longjmp() is called * to restore the state of a thread. * * This is the normal way out of the scheduler. */ _thread_kern_in_sched = 0; if (curthread-sig_defer_count == 0) { if (((curthread-cancelflags PTHREAD_AT_CANCEL_POINT) == 0) ((curthread-cancelflags PTHREAD_CANCEL_ASYNCHRONOUS) != 0)) /* * Cancellations override signals. * * Stick a cancellation point at the * start of each async-cancellable * thread's resumption. * * We allow threads woken at cancel * points to do their own checks. */ pthread_testcancel(); } Why isn't this working, shouldn't it be doing the right thing? What if curthread-sig_defer_count wasn't tested? Maybe this should be a test against curthread-sig_defer_count = 1? I'll play with this some more when I get back to my box at home, it just seems bizarro to me. -- -Alfred Perlstein [[EMAIL PROTECTED]] 'Instead of asking why a piece of software is using 1970s technology, start asking why software is ignoring 30 years of accumulated wisdom.' http://www.morons.org/rants/gpl-harmful.php3 To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Possible libc_r pthread bug
Alfred Perlstein wrote: * Daniel Eischen [EMAIL PROTECTED] [011204 12:32] wrote: Alfred Perlstein wrote: * Dan Eischen [EMAIL PROTECTED] [011204 06:26] wrote: There are already cancellation tests when resuming threads whose contexts are not saved as a result of a signal interrupt (ctxtype != CTX_UC). You shouldn't test for cancellation when ctxtype == CTX_UC because you are running on the scheduler stack, not the threads stack. That makes sense, but why? Because when a thread gets cancelled, pthread_exit gets called which then calls the scheduler again. It is also possible to get interrupted during this process and the threads context (which is operating on the scheduler stack) could get saved. The scheduler could get entered again, and if the thread gets resumed, it'll longjmp to the saved context which is the scheduler stack (and which was just trashed by entering the scheduler again). It is too confusing to try to handle conditions like this, and the threads library doesn't need to get any more confusing ;-) Once the scheduler is entered, no pthread routines should be called and the scheduler should not be recursively entered. The only way out of the scheduler should be a longjmp or sigreturn to a saved threads context. Ok, for the sake of beating a clue into me... in uthread_kern.c:_thread_kern_sched /* Save the state of the current thread: */ if (_setjmp(curthread-ctx.jb) == 0) { /* Flag the jump buffer was the last state saved: */ curthread-ctxtype = CTX_JB_NOSIG; curthread-longjmp_val = 1; } else { DBG_MSG(Returned from ___longjmp, thread %p\n, curthread); /* * This point is reached when a longjmp() is called * to restore the state of a thread. * * This is the normal way out of the scheduler. */ _thread_kern_in_sched = 0; if (curthread-sig_defer_count == 0) { if (((curthread-cancelflags PTHREAD_AT_CANCEL_POINT) == 0) ((curthread-cancelflags PTHREAD_CANCEL_ASYNCHRONOUS) != 0)) /* * Cancellations override signals. * * Stick a cancellation point at the * start of each async-cancellable * thread's resumption. * * We allow threads woken at cancel * points to do their own checks. */ pthread_testcancel(); } Why isn't this working, shouldn't it be doing the right thing? What if curthread-sig_defer_count wasn't tested? Maybe this should be a test against curthread-sig_defer_count = 1? Because this is the normal way into the scheduler -- when a thread hits a blocking condition or yields. A signal interrupting a thread does not go through this section. The interrupted threads context is argument 3 of the signal handler, and this context gets stored in curthread-ctx.uc. This is the crux of the problem. When you resume this context, you are not in the thread scheduling code and so you can't check for cancellation. I'm suggesting that the proper way to handle this is to munge this interrupted context (a ucontext_t) so that it first returns to a small wrapper function that can check for cancellation (and clear the in scheduler flag which is the other problem I mentioned) before returning to the interrupted context. There is another way to handle this, but it is more complicated although probably better than the above method. This would involve changing the signal handling code to not use an alternate signal stack, so an interrupted thread could do something like: void sighandler(int sig, siginfo_t *info, ucontext_t *ucp) { ... { ucontext_t uc; /* Save interrupted context on stack: */ uc = *ucp; uc.uc_sigmask = _process_sigmask; /* Enter the scheduler: */ _thread_kern_sched(NULL); /* * After return from the scheduler, the * in scheduler flag
Re: Possible libc_r pthread bug
* Daniel Eischen [EMAIL PROTECTED] [011130 16:17] wrote: On Fri, 30 Nov 2001, Louis-Philippe Gagnon wrote: If at first you don't succeed... I've encountered a problem using pthread_cancel, pthread_join and pthread_setcanceltype, I'm hoping someone can shed some light. (in a nutshell : pthread_setcanceltype doesn't seem to work in FreeBSD 4.4) (posted to -current and -hackers; if there's a more appropriate mailing list for this, please let me know) I recently encountered a situation where, after calling pthread_cancel to cancel a thread, the call to pthread_join hangs indefinitely. I quickly figured out that it was because the thread being cancelled was never reaching a cancellation point (in fact it was an infinite loop with no function calls at all). Sure enough, adding a pthread_testcancel() in the loop allowed pthread_join to return. However this solution isn't acceptable for my requirements. please test the following patch: Index: uthread/uthread_kern.c === RCS file: /home/ncvs/src/lib/libc_r/uthread/uthread_kern.c,v retrieving revision 1.39 diff -u -r1.39 uthread_kern.c --- uthread/uthread_kern.c 7 Oct 2001 02:34:43 - 1.39 +++ uthread/uthread_kern.c 4 Dec 2001 08:22:22 - @@ -579,6 +579,18 @@ curthread); } /* +* If the currently running thread is a user thread, +* test for async cancel: +*/ + if ((curthread-flags PTHREAD_FLAGS_PRIVATE) == 0) { + int cfl = curthread-cancelflags; + + cfl = (PTHREAD_CANCEL_ASYNCHRONOUS| + PTHREAD_AT_CANCEL_POINT); + if (cfl != 0) + pthread_testcancel(); + } + /* * Continue the thread at its current frame: */ switch(curthread-ctxtype) { @@ -1078,6 +1090,8 @@ curthread-sig_defer_count--; } else if (curthread-sig_defer_count == 1) { + int cfl; + /* Reenable signals: */ curthread-sig_defer_count = 0; @@ -1091,8 +1105,9 @@ * Check for asynchronous cancellation before delivering any * pending signals: */ - if (((curthread-cancelflags PTHREAD_AT_CANCEL_POINT) == 0) - ((curthread-cancelflags PTHREAD_CANCEL_ASYNCHRONOUS) != 0)) + cfl = curthread-cancelflags; + cfl = (PTHREAD_CANCEL_ASYNCHRONOUS|PTHREAD_AT_CANCEL_POINT); + if (cfl != 0) pthread_testcancel(); /* -- -Alfred Perlstein [[EMAIL PROTECTED]] 'Instead of asking why a piece of software is using 1970s technology, start asking why software is ignoring 30 years of accumulated wisdom.' http://www.morons.org/rants/gpl-harmful.php3 To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Possible libc_r pthread bug
* Louis-Philippe Gagnon [EMAIL PROTECTED] [011130 15:57] wrote: If at first you don't succeed... I've encountered a problem using pthread_cancel, pthread_join and pthread_setcanceltype, I'm hoping someone can shed some light. Provide me with minimal sample code and a makefile and i should have the problem fixed in a couple of days, either that or a scathing flame for you misusing the API. :) -Alfred To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Possible libc_r pthread bug
On Fri, 30 Nov 2001, Louis-Philippe Gagnon wrote: If at first you don't succeed... I've encountered a problem using pthread_cancel, pthread_join and pthread_setcanceltype, I'm hoping someone can shed some light. (in a nutshell : pthread_setcanceltype doesn't seem to work in FreeBSD 4.4) (posted to -current and -hackers; if there's a more appropriate mailing list for this, please let me know) I recently encountered a situation where, after calling pthread_cancel to cancel a thread, the call to pthread_join hangs indefinitely. I quickly figured out that it was because the thread being cancelled was never reaching a cancellation point (in fact it was an infinite loop with no function calls at all). Sure enough, adding a pthread_testcancel() in the loop allowed pthread_join to return. However this solution isn't acceptable for my requirements. I discovered the pthread_setcanceltype function and its PTHREAD_CANCEL_ASYNCHRONOUS parameter, which looked like they would give me exactly what I needed : allow threads to be cancelled regardless of what they are doing (basically a pthread equivalent to TerminateThread). Unfortunately, my tests have been less than conclusive : pthread_setcanceltype doesn't seem to do anything at all. It tells me it succeeds, subsequent calls properly report the previous cancellation type as ASYNCHRONOUS. But pthread_join still hangs, and adding pthread_testcancel calls still makes it work... I'm working on a FreeBSD 4.4-release machine; I ran the same test under FreeBSD 4.3-release and got the same results. However, running it on a Linux box (Mandrake release, 2.4.x kernel), I get exactly the results I was expecting (that is, setting the cancellation type to asynchronous allows the thread to be cancelled at any time) see the end of this message for my test program So the questions are -am I doing something wrong or misinterpreting the man pages? No, not really. -if not, is this a known bug? Or feature? -if so, is there a workaround (or is it already fixed)? Not fixed. Work-around could be to use pthread_signal and exit the thread from there. -if not, can someone investigate? (I once had a look at the libc_r code and ran away screaming) Since your thread is compute bound, it is only woken up from the thread library's scheduling signal handler. In this case, it can only resume the thread from the interrupted context, and so there is no check for the thread being canceled. -- Dan Eischen To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message