https://bugs.openldap.org/show_bug.cgi?id=10095

          Issue ID: 10095
           Summary: Race condition causing corruption of mutexes when
                    closing the database
           Product: LMDB
           Version: 0.9.30
          Hardware: x86_64
                OS: Linux
            Status: UNCONFIRMED
          Keywords: needs_review
          Severity: normal
          Priority: ---
         Component: liblmdb
          Assignee: b...@openldap.org
          Reporter: pe...@peterzhu.ca
  Target Milestone: ---

We're running into a race condition across multiple processes causing the
corruption of mutexes when a process closes the database caused by the fix for
https://bugs.openldap.org/show_bug.cgi?id=9278 (commit
https://git.openldap.org/openldap/openldap/-/commit/f683ffdc81d0edb20437cb7d655cf15a60e31249).

Here's the interleaving of two processes (p0 and p1) that can cause this
situation.

p0: Opens connection to database using mdb_env_create and mdb_env_open.

...some things happen in between...

p0: Begins closing the database using mdb_env_close:
  p0: Calls mdb_env_close0:
    p0: Acquires write lock on the file lock using mdb_env_excl_lock.
    p0: Calls pthread_mutex_destroy on the mutexes.

SWITCH TO p1

p1: Begins opening the database using mdb_env_create. Then calls mdb_env_open,
in mdb_env_open: 
  p1: Calls mdb_env_setup_locks:
    p1: Calls mdb_env_excl_lock, but it's unable to acquire a write file lock
due to p0 holding the write file lock. It waits on acquiring a read file lock.

SWITCH TO p0

    p0: Calls close on the file descriptor which releases the write lock.

SWITCH TO p1

    p1: Acquires the read file lock.
    p1: Does NOT call pthread_mutex_init since it did not acquire a write file
lock.

...some things happen in between...

p1: Try to lock the mutex using pthread_mutex_lock. This call fails with a
EINVAL due to locking a destroyed mutex.



I'm not sure how to actually solve this problem. We're currently mitigating
this problem by reverting the commit linked above (so no mutexes get
destroyed).

-- 
You are receiving this mail because:
You are on the CC list for the issue.

Reply via email to