On 13/07/2023 19:53, Corinna Vinschen wrote:
On Jul 13 20:37, Corinna Vinschen wrote:
On Jul 13 20:16, Corinna Vinschen wrote:
On Jul 13 12:39, Jon Turney wrote:
These tests async thread cancellation of a thread that doesn't have any
cancellation points.

Unfortunately, since 450f557f the async cancellation silently fails when
the thread is inside the kernel function Sleep(), so it just exits

I'm not sure how this patch should be the actual culprit.  It only
handles thread priorities, not thread cancellation.  Isn't this rather
2b165a453ea7b or some such?

Yeah, I messed up there somehow. I will fix the commit id.

normally after 10 seconds. (See the commentary in pthread::cancel() in
thread.cc, where it checks if the target thread is inside the kernel,
and silently converts the cancellation into a deferred one)

Nevertheless, I think this is ok to do.  The description of pthread_cancel
contains this:

   Asynchronous cancelability means that the thread can be canceled at
   any time (usually immediately, but the system does not guarantee this).

And

   The above steps happen asynchronously with respect to the
   pthread_cancel() call; the return status of pthread_cancel() merely
   informs the caller whether the cancellation request was successfully
   queued.

So any assumption *when* the cancallation takes place is may be wrong.

Yeah.

I think the flakiness is when we happen to try to async cancel while in the Windows kernel, which implicitly converts to a deferred cancellation, but there are no cancellation points in the the thread, so it arrives at pthread_exit() and returns a exit code other than PTHREAD_CANCELED.

I did consider making the test non-flaky by adding a final call to pthread_testcancel(), to notice any failed async cancellation which has been converted to a deferred one.

But then that is just the same as the deferred cancellation tests, and confirms the cancellation happens, but not that it's async, which is part of the point of the test.

I guess this could also check that not all of the threads ran for all 10 seconds, which would indicate that at least some of them were cancelled asynchronously.

I wonder, though, if we can't come up with a better solution than just
waiting for the next cancellation point.

No solution comes to mind if the user code calls a Win32 function, but
maybe _sigbe could check if the thread's cancel_event is set?  It's all
in assembler, that complicates it a bit, but that would make it at least
working for POSIX functions which are no cancellation points.

I think you'd need to record the "pending async cancellation" as different to "deferred cancellation" so this doesn't turn everything into a cancellation point.

Alternatively, maybe we can utilize the existing signal handler and
just send a Cygwin-only signal outside the maskable signal range.
wait_sig calls sigpacket::process like for any other standard signal,
The signal handler is basically pthread::static_cancel_self().
Something like that.
I'm not sure this is worth lots of effort, as thread cancellation seems to be regarded as mis-specified in such as way as to make it unsafe for serious use.

Reply via email to