Hello Todd Lipcon, I'd like you to do a code review. Please visit
http://gerrit.cloudera.org:8080/4614 to review the following change. Change subject: mem_tracker: fix race between FindTracker() and destructor ...................................................................... mem_tracker: fix race between FindTracker() and destructor In very rare cases, it's possible for an interleaving between these two functions to lead to a recursive lock acquisition of child_trackers_lock_ in the destructor. For example: 1. The tracker hierarchy contains one parent (P) and one child (C1). 2. Thread 1 creates a second child (C2) parented to P. It has the sole ref to C2. 3. Thread 2 calls FindTracker() looking for C1. 4. Thread 2 runs as far as the loop in FindTrackerUnlocked(), getting descheduled just as it has locked a ref to C2. It also holds P's child_trackers_lock_. 5. Thread 1 is rescheduled and drops its ref to C2. 6. Thread 2 is rescheduled. It also drops its ref to C2, which was the last ref, so it runs C2's destructor. This acquires P's child_trackers_lock_, which it already owns. Boom. The fix is simple: when looking up a tracker, make a local copy of the list of children while holding child_trackers_lock_, then iterate without it. It gets a little complicated due to the unusual requirements of FindOrCreateTracker(); I ended up introducing a special static lock to handle that case. The rest of the changes are basically code motion. Unfortunately, the new test does not fail reliably without the fix. In my testing, it failed maybe once every few hundred runs. But it doesn't fail at all with the fix in place. Change-Id: I69610f782c7a00d161bfb48d2487c29c0309c985 --- M src/kudu/consensus/log_cache.cc M src/kudu/util/cache.cc M src/kudu/util/mem_tracker-test.cc M src/kudu/util/mem_tracker.cc M src/kudu/util/mem_tracker.h 5 files changed, 85 insertions(+), 61 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/14/4614/1 -- To view, visit http://gerrit.cloudera.org:8080/4614 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I69610f782c7a00d161bfb48d2487c29c0309c985 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Todd Lipcon <t...@apache.org>