Re: [OpenAFS-devel] volserver hangs possible fix

Derrick J Brashear Mon, 11 Apr 2005 11:11:07 -0700

to BreakVolumeCallbacksLater, which doesn't do any callbacks itself.
Rather, it walks the callback hash chains, and sets the callback
delayed flag for all the callbacks for the requested volumeId.
BreakVolumeCallbacksLater and BreakLaterCallBacks both need H_LOCK,
but MultiBreakCallBack_r drops H_LOCK before the multirx loop.  So, if
I'm reading the code correctly, recent 1.3's aren't vulnerable to this
deadlock.


That was in fact the point of the code.

While auditing the callback code, I noticed some funky condition
variable usage between FsyncCheckLWP and BreakVolumeCallbacksLater.
One issue is BreakVolumeCallbacksLater calls pthread_cond_broadcast on
fsync_cond without holding the associated lock.  Here's a quick patch
for that race:


There was a reason we did this:


diff -uNr openafs-1.3.81-dist/src/viced/callback.c
openafs-1.3.81-callback-race-fix/src/viced/callback.c
--- openafs-1.3.81-dist/src/viced/callback.c    2005-03-11
02:03:46.000000000 -0500
+++ openafs-1.3.81-callback-race-fix/src/viced/callback.c
2005-04-06 05:21:02.358508897 -0400
@@ -1408,7 +1408,9 @@

    ViceLog(25, ("Fsync thread wakeup\n"));
#ifdef AFS_PTHREAD_ENV
+    FSYNC_LOCK;
    assert(pthread_cond_broadcast(&fsync_cond) == 0);
+    FSYNC_UNLOCK;
#else
    LWP_NoYieldSignal(fsync_wait);
#endif


Basically, FsyncCheckLWP treats the cond var like a semaphore.  One
solution that comes to mind is to add a global counter of outstanding
delayed callbacks that is protected by FSYNC_LOCK, and have
BreakVolumeCallbacksLater and other related functions increment it,
and have BreakLaterCallBacks decrement it.  Then, have FsyncCheckLWP
call pthread_cond_timedwait only if the counter is zero.


that sounds like a decent way to deal. maybe later this week.
_______________________________________________
OpenAFS-devel mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-devel

Re: [OpenAFS-devel] volserver hangs possible fix

Reply via email to