to BreakVolumeCallbacksLater, which doesn't do any callbacks itself. Rather, it walks the callback hash chains, and sets the callback delayed flag for all the callbacks for the requested volumeId. BreakVolumeCallbacksLater and BreakLaterCallBacks both need H_LOCK, but MultiBreakCallBack_r drops H_LOCK before the multirx loop. So, if I'm reading the code correctly, recent 1.3's aren't vulnerable to this deadlock.
That was in fact the point of the code.
While auditing the callback code, I noticed some funky condition variable usage between FsyncCheckLWP and BreakVolumeCallbacksLater. One issue is BreakVolumeCallbacksLater calls pthread_cond_broadcast on fsync_cond without holding the associated lock. Here's a quick patch for that race:
There was a reason we did this:
diff -uNr openafs-1.3.81-dist/src/viced/callback.c openafs-1.3.81-callback-race-fix/src/viced/callback.c --- openafs-1.3.81-dist/src/viced/callback.c 2005-03-11 02:03:46.000000000 -0500 +++ openafs-1.3.81-callback-race-fix/src/viced/callback.c 2005-04-06 05:21:02.358508897 -0400 @@ -1408,7 +1408,9 @@
ViceLog(25, ("Fsync thread wakeup\n")); #ifdef AFS_PTHREAD_ENV + FSYNC_LOCK; assert(pthread_cond_broadcast(&fsync_cond) == 0); + FSYNC_UNLOCK; #else LWP_NoYieldSignal(fsync_wait); #endif
Basically, FsyncCheckLWP treats the cond var like a semaphore. One solution that comes to mind is to add a global counter of outstanding delayed callbacks that is protected by FSYNC_LOCK, and have BreakVolumeCallbacksLater and other related functions increment it, and have BreakLaterCallBacks decrement it. Then, have FsyncCheckLWP call pthread_cond_timedwait only if the counter is zero.
that sounds like a decent way to deal. maybe later this week. _______________________________________________ OpenAFS-devel mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-devel
