to BreakVolumeCallbacksLater, which doesn't do any callbacks itself.
Rather, it walks the callback hash chains, and sets the callback
delayed flag for all the callbacks for the requested volumeId.
BreakVolumeCallbacksLater and BreakLaterCallBacks both need H_LOCK,
but MultiBreakCallBack_r drops H_LOCK before the multirx loop.  So, if
I'm reading the code correctly, recent 1.3's aren't vulnerable to this
deadlock.

That was in fact the point of the code.

While auditing the callback code, I noticed some funky condition
variable usage between FsyncCheckLWP and BreakVolumeCallbacksLater.
One issue is BreakVolumeCallbacksLater calls pthread_cond_broadcast on
fsync_cond without holding the associated lock.  Here's a quick patch
for that race:

There was a reason we did this:


diff -uNr openafs-1.3.81-dist/src/viced/callback.c openafs-1.3.81-callback-race-fix/src/viced/callback.c --- openafs-1.3.81-dist/src/viced/callback.c 2005-03-11 02:03:46.000000000 -0500 +++ openafs-1.3.81-callback-race-fix/src/viced/callback.c 2005-04-06 05:21:02.358508897 -0400 @@ -1408,7 +1408,9 @@

    ViceLog(25, ("Fsync thread wakeup\n"));
#ifdef AFS_PTHREAD_ENV
+    FSYNC_LOCK;
    assert(pthread_cond_broadcast(&fsync_cond) == 0);
+    FSYNC_UNLOCK;
#else
    LWP_NoYieldSignal(fsync_wait);
#endif


Basically, FsyncCheckLWP treats the cond var like a semaphore. One solution that comes to mind is to add a global counter of outstanding delayed callbacks that is protected by FSYNC_LOCK, and have BreakVolumeCallbacksLater and other related functions increment it, and have BreakLaterCallBacks decrement it. Then, have FsyncCheckLWP call pthread_cond_timedwait only if the counter is zero.

that sounds like a decent way to deal. maybe later this week. _______________________________________________ OpenAFS-devel mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-devel

Reply via email to