Re: Possible libc_r pthread bug

2001-12-04 Thread Dan Eischen

Alfred Perlstein wrote:
 
 * Daniel Eischen [EMAIL PROTECTED] [011130 16:17] wrote:
  On Fri, 30 Nov 2001, Louis-Philippe Gagnon wrote:
   If at first you don't succeed...
  
   I've encountered a problem using pthread_cancel, pthread_join and
   pthread_setcanceltype, I'm hoping someone can shed some light.
  
   (in a nutshell : pthread_setcanceltype doesn't seem to work in FreeBSD 4.4)
  
   (posted to -current and -hackers; if there's a more appropriate mailing list
   for this, please let me know)
  
   I recently encountered a situation where, after calling pthread_cancel to
   cancel a thread, the call to pthread_join hangs indefinitely. I quickly figured
   out that it was because the thread being cancelled was never reaching a
   cancellation point (in fact it was an infinite loop with no function calls at 
all).
   Sure enough, adding a pthread_testcancel() in the loop allowed
   pthread_join to return. However this solution isn't acceptable for my 
requirements.
 
 please test the following patch:

There are already cancellation tests when resuming threads
whose contexts are not saved as a result of a signal interrupt
(ctxtype != CTX_UC). You shouldn't test for cancellation when
ctxtype == CTX_UC because you are running on the scheduler
stack, not the threads stack.  You also have a bug in the
way you changed the check for cancellation flags.

There only clean way to fix this is to add a return frame
to the interrupted context so that it can check for cancellation
(and other things) before returning to the threads interrupted
context.

-- 
Dan Eischen

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Possible libc_r pthread bug

2001-12-04 Thread Alfred Perlstein

* Dan Eischen [EMAIL PROTECTED] [011204 06:26] wrote:
 
 There are already cancellation tests when resuming threads
 whose contexts are not saved as a result of a signal interrupt
 (ctxtype != CTX_UC). You shouldn't test for cancellation when
 ctxtype == CTX_UC because you are running on the scheduler
 stack, not the threads stack.

That makes sense, but why?

  You also have a bug in the
 way you changed the check for cancellation flags.

What?

 There only clean way to fix this is to add a return frame
 to the interrupted context so that it can check for cancellation
 (and other things) before returning to the threads interrupted
 context.

No way to work around this?  Shouldn't the thread exit library
know which stack exactly to clean up even in the context of a 
signal handler?

-- 
-Alfred Perlstein [[EMAIL PROTECTED]]
'Instead of asking why a piece of software is using 1970s technology,
 start asking why software is ignoring 30 years of accumulated wisdom.'
   http://www.morons.org/rants/gpl-harmful.php3

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Possible libc_r pthread bug

2001-12-04 Thread Alfred Perlstein

* Alfred Perlstein [EMAIL PROTECTED] [011204 11:45] wrote:
 * Dan Eischen [EMAIL PROTECTED] [011204 06:26] wrote:
  
  There are already cancellation tests when resuming threads
  whose contexts are not saved as a result of a signal interrupt
  (ctxtype != CTX_UC). You shouldn't test for cancellation when
  ctxtype == CTX_UC because you are running on the scheduler
  stack, not the threads stack.
 
 That makes sense, but why?
 
   You also have a bug in the
  way you changed the check for cancellation flags.
 
 What?
 
  There only clean way to fix this is to add a return frame
  to the interrupted context so that it can check for cancellation
  (and other things) before returning to the threads interrupted
  context.
 
 No way to work around this?  Shouldn't the thread exit library
 know which stack exactly to clean up even in the context of a 
 signal handler?

Are you sure this is 100% needed?

Here's a recap of that patch, are you saying that the problem
is that the thread will use the current sp if it exits rather
than some value stashed away in the private pthread struct?

Also, I think my tests for cancellation are correct.  Although
I sort of think the PTHREAD_AT_CANCEL_POINT test should be 
removed because the code will catch this when it leaves the
cancellation point.

Index: uthread_kern.c
===
RCS file: /home/ncvs/src/lib/libc_r/uthread/uthread_kern.c,v
retrieving revision 1.39
diff -u -r1.39 uthread_kern.c
--- uthread_kern.c  7 Oct 2001 02:34:43 -   1.39
+++ uthread_kern.c  4 Dec 2001 17:58:31 -
@@ -180,7 +180,7 @@
struct pthread  *curthread = _get_curthread();
pthread_t   pthread, pthread_h;
unsigned intcurrent_tick;
-   int add_to_prioq;
+   int add_to_prioq, cfl;
 
/* If the currently running thread is a user thread, save it: */
if ((curthread-flags  PTHREAD_FLAGS_PRIVATE) == 0)
@@ -604,6 +604,15 @@
 */
_thread_kern_in_sched = 0;
 
+   /*
+* test for async cancel:
+*/
+   cfl = curthread-cancelflags;
+
+   cfl = (PTHREAD_CANCEL_ASYNCHRONOUS|
+   PTHREAD_AT_CANCEL_POINT);
+   if (cfl != 0)
+   pthread_testcancel();
 #if NOT_YET
_setcontext(curthread-ctx.uc);
 #else
@@ -1078,6 +1087,8 @@
curthread-sig_defer_count--;
}
else if (curthread-sig_defer_count == 1) {
+   int cfl;
+
/* Reenable signals: */
curthread-sig_defer_count = 0;
 
@@ -1091,8 +1102,9 @@
 * Check for asynchronous cancellation before delivering any
 * pending signals:
 */
-   if (((curthread-cancelflags  PTHREAD_AT_CANCEL_POINT) == 0) 
-   ((curthread-cancelflags  PTHREAD_CANCEL_ASYNCHRONOUS) != 0))
+   cfl = curthread-cancelflags;
+   cfl = (PTHREAD_CANCEL_ASYNCHRONOUS|PTHREAD_AT_CANCEL_POINT);
+   if (cfl != 0)
pthread_testcancel();
 
/*

-- 
-Alfred Perlstein [[EMAIL PROTECTED]]
'Instead of asking why a piece of software is using 1970s technology,
 start asking why software is ignoring 30 years of accumulated wisdom.'
   http://www.morons.org/rants/gpl-harmful.php3

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Possible libc_r pthread bug

2001-12-04 Thread Daniel Eischen

Alfred Perlstein wrote:
 
 * Dan Eischen [EMAIL PROTECTED] [011204 06:26] wrote:
 
  There are already cancellation tests when resuming threads
  whose contexts are not saved as a result of a signal interrupt
  (ctxtype != CTX_UC). You shouldn't test for cancellation when
  ctxtype == CTX_UC because you are running on the scheduler
  stack, not the threads stack.
 
 That makes sense, but why?

Because when a thread gets cancelled, pthread_exit gets called
which then calls the scheduler again.  It is also possible to
get interrupted during this process and the threads context
(which is operating on the scheduler stack) could get saved.
The scheduler could get entered again, and if the thread
gets resumed, it'll longjmp to the saved context which is the
scheduler stack (and which was just trashed by entering the
scheduler again).

It is too confusing to try to handle conditions like this, and
the threads library doesn't need to get any more confusing ;-)
Once the scheduler is entered, no pthread routines should
be called and the scheduler should not be recursively
entered.  The only way out of the scheduler should be a
longjmp or sigreturn to a saved threads context.

 
   You also have a bug in the
  way you changed the check for cancellation flags.
 
 What?

When a thread is at a cancellation point, you want to let the
cancellable routine handle the cancel.  The check as coded before
avoided calling pthread_testcancel() when at a cancellation
point.  I think you check for either PTHREAD_AT_CANCEL_POINT
or PTHREAD_CANCEL_ASYNCHRONOUS being set when you really want
((flags  PTHREAD_AT_CANCEL_POINT) == 0) 
((flags  PTHREAD_CANCEL_ASYNCHRONOUS) != 0))

 
  There only clean way to fix this is to add a return frame
  to the interrupted context so that it can check for cancellation
  (and other things) before returning to the threads interrupted
  context.
 
 No way to work around this?  Shouldn't the thread exit library
 know which stack exactly to clean up even in the context of a
 signal handler?

It assumes that you're running on the current threads stack.

I don't view this particular bug as a big problem.  It is a
somewhat perverse program that has a CPU bound thread that
never gets to any sort of blocking condition and yet still
wants to be cancelled.  The submitter of the problem doesn't
even want to upgrade to get a fix.  It can be worked around
easily enough by checking for cancellation or by using
pthread_kill to send a signal to the thread and have the
signal handler exit the thread or longjmp back to the thread
at a place that can exit and cleanup.

There is already a minor race condition in trying to resume
a thread that was interrupted by a signal.  Adding some code
to munge the stack of an interrupted context so that it calls
a wrapper function would solve both problems.  The signal
handling code already does this to install a signal handler
wrapper on a threads stack.

-- 
Dan Eischen

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Possible libc_r pthread bug

2001-12-04 Thread Alfred Perlstein

* Daniel Eischen [EMAIL PROTECTED] [011204 12:32] wrote:
 Alfred Perlstein wrote:
  
  * Dan Eischen [EMAIL PROTECTED] [011204 06:26] wrote:
  
   There are already cancellation tests when resuming threads
   whose contexts are not saved as a result of a signal interrupt
   (ctxtype != CTX_UC). You shouldn't test for cancellation when
   ctxtype == CTX_UC because you are running on the scheduler
   stack, not the threads stack.
  
  That makes sense, but why?
 
 Because when a thread gets cancelled, pthread_exit gets called
 which then calls the scheduler again.  It is also possible to
 get interrupted during this process and the threads context
 (which is operating on the scheduler stack) could get saved.
 The scheduler could get entered again, and if the thread
 gets resumed, it'll longjmp to the saved context which is the
 scheduler stack (and which was just trashed by entering the
 scheduler again).
 
 It is too confusing to try to handle conditions like this, and
 the threads library doesn't need to get any more confusing ;-)
 Once the scheduler is entered, no pthread routines should
 be called and the scheduler should not be recursively
 entered.  The only way out of the scheduler should be a
 longjmp or sigreturn to a saved threads context.

Ok, for the sake of beating a clue into me...

in uthread_kern.c:_thread_kern_sched

/* Save the state of the current thread: */
if (_setjmp(curthread-ctx.jb) == 0) {
/* Flag the jump buffer was the last state saved: */
curthread-ctxtype = CTX_JB_NOSIG;
curthread-longjmp_val = 1;
} else {
DBG_MSG(Returned from ___longjmp, thread %p\n,
curthread);
/*
 * This point is reached when a longjmp() is called
 * to restore the state of a thread.
 *
 * This is the normal way out of the scheduler.
 */
_thread_kern_in_sched = 0;

if (curthread-sig_defer_count == 0) {
if (((curthread-cancelflags 
PTHREAD_AT_CANCEL_POINT) == 0) 
((curthread-cancelflags 
PTHREAD_CANCEL_ASYNCHRONOUS) != 0))
/*
 * Cancellations override signals.
 *
 * Stick a cancellation point at the
 * start of each async-cancellable
 * thread's resumption.
 *
 * We allow threads woken at cancel
 * points to do their own checks.
 */
pthread_testcancel();
}

Why isn't this working, shouldn't it be doing the right thing?
What if curthread-sig_defer_count wasn't tested?
Maybe this should be a test against curthread-sig_defer_count = 1?

I'll play with this some more when I get back to my box at home,
it just seems bizarro to me.


-- 
-Alfred Perlstein [[EMAIL PROTECTED]]
'Instead of asking why a piece of software is using 1970s technology,
 start asking why software is ignoring 30 years of accumulated wisdom.'
   http://www.morons.org/rants/gpl-harmful.php3

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Possible libc_r pthread bug

2001-12-04 Thread Daniel Eischen

Alfred Perlstein wrote:
 
 * Daniel Eischen [EMAIL PROTECTED] [011204 12:32] wrote:
  Alfred Perlstein wrote:
  
   * Dan Eischen [EMAIL PROTECTED] [011204 06:26] wrote:
   
There are already cancellation tests when resuming threads
whose contexts are not saved as a result of a signal interrupt
(ctxtype != CTX_UC). You shouldn't test for cancellation when
ctxtype == CTX_UC because you are running on the scheduler
stack, not the threads stack.
  
   That makes sense, but why?
 
  Because when a thread gets cancelled, pthread_exit gets called
  which then calls the scheduler again.  It is also possible to
  get interrupted during this process and the threads context
  (which is operating on the scheduler stack) could get saved.
  The scheduler could get entered again, and if the thread
  gets resumed, it'll longjmp to the saved context which is the
  scheduler stack (and which was just trashed by entering the
  scheduler again).
 
  It is too confusing to try to handle conditions like this, and
  the threads library doesn't need to get any more confusing ;-)
  Once the scheduler is entered, no pthread routines should
  be called and the scheduler should not be recursively
  entered.  The only way out of the scheduler should be a
  longjmp or sigreturn to a saved threads context.
 
 Ok, for the sake of beating a clue into me...
 
 in uthread_kern.c:_thread_kern_sched
 
 /* Save the state of the current thread: */
 if (_setjmp(curthread-ctx.jb) == 0) {
 /* Flag the jump buffer was the last state saved: */
 curthread-ctxtype = CTX_JB_NOSIG;
 curthread-longjmp_val = 1;
 } else {
 DBG_MSG(Returned from ___longjmp, thread %p\n,
 curthread);
 /*
  * This point is reached when a longjmp() is called
  * to restore the state of a thread.
  *
  * This is the normal way out of the scheduler.
  */
 _thread_kern_in_sched = 0;
 
 if (curthread-sig_defer_count == 0) {
 if (((curthread-cancelflags 
 PTHREAD_AT_CANCEL_POINT) == 0) 
 ((curthread-cancelflags 
 PTHREAD_CANCEL_ASYNCHRONOUS) != 0))
 /*
  * Cancellations override signals.
  *
  * Stick a cancellation point at the
  * start of each async-cancellable
  * thread's resumption.
  *
  * We allow threads woken at cancel
  * points to do their own checks.
  */
 pthread_testcancel();
 }
 
 Why isn't this working, shouldn't it be doing the right thing?
 What if curthread-sig_defer_count wasn't tested?
 Maybe this should be a test against curthread-sig_defer_count = 1?

Because this is the normal way into the scheduler -- when a thread
hits a blocking condition or yields.  A signal interrupting a thread
does not go through this section.  The interrupted threads context is
argument 3 of the signal handler, and this context gets stored in
curthread-ctx.uc.  This is the crux of the problem.  When you resume
this context, you are not in the thread scheduling code and so you
can't check for cancellation.  I'm suggesting that the proper way
to handle this is to munge this interrupted context (a ucontext_t)
so that it first returns to a small wrapper function that can
check for cancellation (and clear the in scheduler flag which
is the other problem I mentioned) before returning to the interrupted
context.

There is another way to handle this, but it is more complicated
although probably better than the above method.  This would involve
changing the signal handling code to not use an alternate signal
stack, so an interrupted thread could do something like:

void
sighandler(int sig, siginfo_t *info, ucontext_t *ucp)
{
...
{
ucontext_t  uc;

/* Save interrupted context on stack: */
uc = *ucp;
uc.uc_sigmask = _process_sigmask;

/* Enter the scheduler: */
_thread_kern_sched(NULL);

/*
 * After return from the scheduler, the
 * in scheduler flag 

Re: Possible libc_r pthread bug

2001-12-04 Thread Alfred Perlstein

* Daniel Eischen [EMAIL PROTECTED] [011130 16:17] wrote:
 On Fri, 30 Nov 2001, Louis-Philippe Gagnon wrote:
  If at first you don't succeed...
  
  I've encountered a problem using pthread_cancel, pthread_join and 
  pthread_setcanceltype, I'm hoping someone can shed some light.
  
  (in a nutshell : pthread_setcanceltype doesn't seem to work in FreeBSD 4.4)
  
  (posted to -current and -hackers; if there's a more appropriate mailing list 
  for this, please let me know)
  
  I recently encountered a situation where, after calling pthread_cancel to 
  cancel a thread, the call to pthread_join hangs indefinitely. I quickly figured
  out that it was because the thread being cancelled was never reaching a 
  cancellation point (in fact it was an infinite loop with no function calls at 
all). 
  Sure enough, adding a pthread_testcancel() in the loop allowed
  pthread_join to return. However this solution isn't acceptable for my requirements.

please test the following patch:


Index: uthread/uthread_kern.c
===
RCS file: /home/ncvs/src/lib/libc_r/uthread/uthread_kern.c,v
retrieving revision 1.39
diff -u -r1.39 uthread_kern.c
--- uthread/uthread_kern.c  7 Oct 2001 02:34:43 -   1.39
+++ uthread/uthread_kern.c  4 Dec 2001 08:22:22 -
@@ -579,6 +579,18 @@
curthread);
}
/*
+* If the currently running thread is a user thread,
+* test for async cancel:
+*/
+   if ((curthread-flags  PTHREAD_FLAGS_PRIVATE) == 0) {
+   int cfl = curthread-cancelflags;
+
+   cfl = (PTHREAD_CANCEL_ASYNCHRONOUS|
+   PTHREAD_AT_CANCEL_POINT);
+   if (cfl != 0)
+   pthread_testcancel();
+   }
+   /*
 * Continue the thread at its current frame:
 */
switch(curthread-ctxtype) {
@@ -1078,6 +1090,8 @@
curthread-sig_defer_count--;
}
else if (curthread-sig_defer_count == 1) {
+   int cfl;
+
/* Reenable signals: */
curthread-sig_defer_count = 0;
 
@@ -1091,8 +1105,9 @@
 * Check for asynchronous cancellation before delivering any
 * pending signals:
 */
-   if (((curthread-cancelflags  PTHREAD_AT_CANCEL_POINT) == 0) 
-   ((curthread-cancelflags  PTHREAD_CANCEL_ASYNCHRONOUS) != 0))
+   cfl = curthread-cancelflags;
+   cfl = (PTHREAD_CANCEL_ASYNCHRONOUS|PTHREAD_AT_CANCEL_POINT);
+   if (cfl != 0)
pthread_testcancel();
 
/*

-- 
-Alfred Perlstein [[EMAIL PROTECTED]]
'Instead of asking why a piece of software is using 1970s technology,
 start asking why software is ignoring 30 years of accumulated wisdom.'
   http://www.morons.org/rants/gpl-harmful.php3

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Possible libc_r pthread bug

2001-11-30 Thread Alfred Perlstein

* Louis-Philippe Gagnon [EMAIL PROTECTED] [011130 15:57] wrote:
 If at first you don't succeed...
 
 I've encountered a problem using pthread_cancel, pthread_join and 
 pthread_setcanceltype, I'm hoping someone can shed some light.

Provide me with minimal sample code and a makefile and i should
have the problem fixed in a couple of days, either that or a scathing
flame for you misusing the API. :)

-Alfred

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Possible libc_r pthread bug

2001-11-30 Thread Daniel Eischen

On Fri, 30 Nov 2001, Louis-Philippe Gagnon wrote:
 If at first you don't succeed...
 
 I've encountered a problem using pthread_cancel, pthread_join and 
 pthread_setcanceltype, I'm hoping someone can shed some light.
 
 (in a nutshell : pthread_setcanceltype doesn't seem to work in FreeBSD 4.4)
 
 (posted to -current and -hackers; if there's a more appropriate mailing list 
 for this, please let me know)
 
 I recently encountered a situation where, after calling pthread_cancel to 
 cancel a thread, the call to pthread_join hangs indefinitely. I quickly figured
 out that it was because the thread being cancelled was never reaching a 
 cancellation point (in fact it was an infinite loop with no function calls at all). 
 Sure enough, adding a pthread_testcancel() in the loop allowed
 pthread_join to return. However this solution isn't acceptable for my requirements.
 
 I discovered the pthread_setcanceltype function and its 
 PTHREAD_CANCEL_ASYNCHRONOUS parameter, which looked like 
 they would give me exactly what I needed : allow threads to be cancelled 
 regardless of what they are doing (basically a pthread equivalent to 
 TerminateThread).
 
 Unfortunately, my tests have been less than conclusive : pthread_setcanceltype
 doesn't seem to do anything at all. It tells me it succeeds, subsequent calls 
 properly report the previous cancellation type as ASYNCHRONOUS. 
 But pthread_join still hangs, and adding pthread_testcancel calls still 
 makes it work...
 
 I'm working on a FreeBSD 4.4-release machine; I ran the same test under 
 FreeBSD 4.3-release and got the same results. However, running it on a 
 Linux box (Mandrake release, 2.4.x kernel), I get exactly the results I 
 was expecting (that is, setting the cancellation type to asynchronous allows
 the thread to be cancelled at any time)
 
 see the end of this message for my test program
 
 So the questions are
 -am I doing something wrong or misinterpreting the man pages?

No, not really.

 -if not, is this a known bug?

Or feature?

 -if so, is there a workaround (or is it already fixed)?

Not fixed.  Work-around could be to use pthread_signal and
exit the thread from there.

 -if not, can someone investigate? (I once had a look at the libc_r code 
  and ran away screaming)

Since your thread is compute bound, it is only woken up from the
thread library's scheduling signal handler.  In this case, it can
only resume the thread from the interrupted context, and so there
is no check for the thread being canceled.

-- 
Dan Eischen

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message