[Issue 10095] Race condition causing corruption of mutexes when closing the database

2023-10-23 Thread openldap-its
https://bugs.openldap.org/show_bug.cgi?id=10095

--- Comment #10 from Quanah Gibson-Mount  ---
re0.9: commit ce200dca1d648f696157e3e49b1800480fef1acb
Author: Howard Chu 

-- 
You are receiving this mail because:
You are on the CC list for the issue.

[Issue 10095] Race condition causing corruption of mutexes when closing the database

2023-09-18 Thread openldap-its
https://bugs.openldap.org/show_bug.cgi?id=10095

Quanah Gibson-Mount  changed:

   What|Removed |Added

   Target Milestone|--- |0.9.32
   Keywords|needs_review|

-- 
You are receiving this mail because:
You are on the CC list for the issue.

[Issue 10095] Race condition causing corruption of mutexes when closing the database

2023-08-27 Thread openldap-its
https://bugs.openldap.org/show_bug.cgi?id=10095

--- Comment #9 from Peter Zhu  ---
Thank you for fixing this issue!

-- 
You are receiving this mail because:
You are on the CC list for the issue.

[Issue 10095] Race condition causing corruption of mutexes when closing the database

2023-08-27 Thread openldap-its
https://bugs.openldap.org/show_bug.cgi?id=10095

--- Comment #8 from Howard Chu  ---
The FreeBSD team acknowledges this was a bug in their threads library, and it
has since been fixed. See discussion in ITS#9278.

-- 
You are receiving this mail because:
You are on the CC list for the issue.

[Issue 10095] Race condition causing corruption of mutexes when closing the database

2023-08-27 Thread openldap-its
https://bugs.openldap.org/show_bug.cgi?id=10095

Howard Chu  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |TEST

--- Comment #7 from Howard Chu  ---
Fixed in 3dde6c46e6c55458eadaf7f81492c822414be2c7

-- 
You are receiving this mail because:
You are on the CC list for the issue.

[Issue 10095] Race condition causing corruption of mutexes when closing the database

2023-08-27 Thread openldap-its
https://bugs.openldap.org/show_bug.cgi?id=10095

Howard Chu  changed:

   What|Removed |Added

 CC||jiri.novo...@gmail.com

--- Comment #6 from Howard Chu  ---
*** Issue 10058 has been marked as a duplicate of this issue. ***

-- 
You are receiving this mail because:
You are on the CC list for the issue.

[Issue 10095] Race condition causing corruption of mutexes when closing the database

2023-08-27 Thread openldap-its
https://bugs.openldap.org/show_bug.cgi?id=10095

Howard Chu  changed:

   What|Removed |Added

   See Also||https://bugs.openldap.org/s
   ||how_bug.cgi?id=10058

-- 
You are receiving this mail because:
You are on the CC list for the issue.

[Issue 10095] Race condition causing corruption of mutexes when closing the database

2023-08-26 Thread openldap-its
https://bugs.openldap.org/show_bug.cgi?id=10095

Howard Chu  changed:

   What|Removed |Added

   See Also||https://bugs.openldap.org/s
   ||how_bug.cgi?id=9278

-- 
You are receiving this mail because:
You are on the CC list for the issue.

[Issue 10095] Race condition causing corruption of mutexes when closing the database

2023-08-25 Thread openldap-its
https://bugs.openldap.org/show_bug.cgi?id=10095

--- Comment #5 from Peter Zhu  ---
> We can add a flag to the lockfile for "mutex is valid"

I think this will guarantee that this bug does not occur, but I think that
there is a chance of a livelock since p1 and p2 can be stuck in a cycle of
acquire read lock, check that the "mutex is valid" is not set, try to acquire a
write lock, fail because both are holding a read lock, release read lock and
try again. It might be able to mitigate this by performing random backoff, but
that's probably bad for performance.

> Probably we should revert the ITS#9278 patch.

That's what we did in our production systems, it seems to have resolved the
issue. AFAIK Linux does not allocate any memory in `pthread_mutex_init`, so not
calling `pthread_mutex_destroy` shouldn't leak memory (although according to
specification we're supposed to call `pthread_mutex_destroy` when we're done
using it).

-- 
You are receiving this mail because:
You are on the CC list for the issue.

[Issue 10095] Race condition causing corruption of mutexes when closing the database

2023-08-25 Thread openldap-its
https://bugs.openldap.org/show_bug.cgi?id=10095

--- Comment #4 from Howard Chu  ---
Probably we should revert the ITS#9278 patch. Instead, the fact that the
FreeBSD thread library breaks if the mutex is unamapped should be treated as a
FreeBSD bug.

-- 
You are receiving this mail because:
You are on the CC list for the issue.

[Issue 10095] Race condition causing corruption of mutexes when closing the database

2023-08-25 Thread openldap-its
https://bugs.openldap.org/show_bug.cgi?id=10095

--- Comment #3 from Howard Chu  ---
We can add a flag to the lockfile for "mutex is valid" but we still wouldn't
have a good way to resolve which of p1 or p2 should do the initialization then.

And p0 has no way to know that other processes are waiting to open the env, in
which case it could just skip the mutex_destroy.

-- 
You are receiving this mail because:
You are on the CC list for the issue.

[Issue 10095] Race condition causing corruption of mutexes when closing the database

2023-08-25 Thread openldap-its
https://bugs.openldap.org/show_bug.cgi?id=10095

--- Comment #2 from Peter Zhu  ---
Thank you for the quick reply. I considered doing the try to acquire write
lock, acquire read lock, then try to acquire write lock approach. But I think
there's still an issue if two or more processes (e.g. p1 and p2) attempt to
connect to the database. The issue looks like the following:

p0: Opens connection to database using mdb_env_create and mdb_env_open.

...some things happen in between...

p0: Begins closing the database using mdb_env_close:
  p0: Calls mdb_env_close0:
p0: Acquires write lock on the file lock using mdb_env_excl_lock.
p0: Calls pthread_mutex_destroy on the mutexes.

SWITCH TO p1 and p2

p1, p2: Begins opening the database using mdb_env_create. Then calls
mdb_env_open, in mdb_env_open: 
  p1, p2: Calls mdb_env_setup_locks:
p1, p2: Calls mdb_env_excl_lock, but it's unable to acquire a write file
lock due to p0 holding the write file lock. It waits on acquiring a read file
lock.

SWITCH TO p0

p0: Calls close on the file descriptor which releases the write file lock.

SWITCH TO p1, p2

p1, p2: Acquires the read file lock.
p1, p2: Fails to acquire the write file lock due to both p1 and p2 holding
a read file lock.
p1, p2: Does NOT call pthread_mutex_init since it did not acquire a write
file lock.

...some things happen in between...

p1, p2: Try to lock the mutex using pthread_mutex_lock. This call fails with a
EINVAL due to locking a destroyed mutex.

-- 
You are receiving this mail because:
You are on the CC list for the issue.

[Issue 10095] Race condition causing corruption of mutexes when closing the database

2023-08-25 Thread openldap-its
https://bugs.openldap.org/show_bug.cgi?id=10095

--- Comment #1 from Howard Chu  ---
We had a discussion of this problem before, but I don't recall where. My
suggestion was to immediately attempt to change the readlock to a writelock
after acquiring the readlock. (Again, with no wait.) The only objection I
recall was that this may significantly delay env open operations, but I don't
think that's true, if we use F_SETLK and not F_SETLKW.

-- 
You are receiving this mail because:
You are on the CC list for the issue.