The branch, master has been updated via ec96ea6... tdb: handle processes dying during transaction commit. via 1bf482b... patch tdb-refactor-tdb_lock-and-tdb_lock_nonblock.patch via ececeff... tdb: add -k option to tdbtorture via 8c3fda4... tdb: don't truncate tdb on recovery via 9f295ee... tdb: remove lock ops via a84222b... tdb: rename tdb_release_extra_locks() to tdb_release_transaction_locks() via dd1b508... tdb: cleanup: remove ltype argument from _tdb_transaction_cancel. via fca1621... tdb: tdb_allrecord_lock/tdb_allrecord_unlock/tdb_allrecord_upgrade via caaf5c6... tdb: suppress record write locks when allrecord lock is taken. via 9341f23... tdb: cleanup: always grab allrecord lock to infinity. via 1ab8776... tdb: remove num_locks via d48c3e4... tdb: use tdb_nest_lock() for seqnum lock. via 4738d47... tdb: use tdb_nest_lock() for active lock. via 9136818... tdb: use tdb_nest_lock() for open lock. via e8fa70a... tdb: use tdb_nest_lock() for transaction lock. via ce41411... tdb: cleanup: find_nestlock() helper. via db27073... tdb: cleanup: tdb_release_extra_locks() helper via fba42f1... tdb: cleanup: tdb_have_extra_locks() helper via b754f61... tdb: don't suppress the transaction lock because of the allrecord lock. via 5d9de60... tdb: cleanup: tdb_nest_lock/tdb_nest_unlock via e9114a7... tdb: cleanup: rename global_lock to allrecord_lock. via 7ab422d... tdb: cleanup: rename GLOBAL_LOCK to OPEN_LOCK. via a6e0ef8... tdb: make _tdb_transaction_cancel static. via 452b4a5... tdb: cleanup: split brlock and brunlock methods. from fffdce6... s4/schema: Move msDS-IntId implementation to samldb.c module
http://gitweb.samba.org/?p=samba.git;a=shortlog;h=master - Log ----------------------------------------------------------------- commit ec96ea690edbe3398d690b4a953d487ca1773f1c Author: Rusty Russell <ru...@rustcorp.com.au> Date: Wed Feb 24 13:23:58 2010 +1030 tdb: handle processes dying during transaction commit. tdb transactions were designed to be robust against the machine powering off, but interestingly were never designed to handle the case where an administrator kill -9's a process during commit. Because recovery is only done on tdb_open, processes with the tdb already mapped will simply use it despite it being corrupt and needing recovery. The solution to this is to check for recovery every time we grab a data lock: we could have gained the lock because a process just died. This has no measurable cost: here is the time for tdbtorture -s 0 -n 1 -l 10000: Before: 2.75 2.50 2.81 3.19 2.91 2.53 2.72 2.50 2.78 2.77 = Avg 2.75 After: 2.81 2.57 3.42 2.49 3.02 2.49 2.84 2.48 2.80 2.43 = Avg 2.74 Signed-off-by: Rusty Russell <ru...@rustcorp.com.au> commit 1bf482b9ef9ec73dd7ee4387d7087aa3955503dd Author: Rusty Russell <ru...@rustcorp.com.au> Date: Wed Feb 24 13:18:06 2010 +1030 patch tdb-refactor-tdb_lock-and-tdb_lock_nonblock.patch commit ececeffd85db1b27c07cdf91a921fd203006daf6 Author: Rusty Russell <ru...@rustcorp.com.au> Date: Wed Feb 24 10:53:05 2010 +1030 tdb: add -k option to tdbtorture To test the case of death of a process during transaction commit, add a -k (kill random) option to tdbtorture. The easiest way to do this is to make every worker a child (unless there's only one child), which is why this patch is bigger than you might expect. Using -k without -t (always transactions) you expect corruption, though it doesn't happen every time. With -t, we currently get corruption but the next patch fixes that. Signed-off-by: Rusty Russell <ru...@rustcorp.com.au> commit 8c3fda4318adc71899bc41486d5616da3a91a688 Author: Rusty Russell <ru...@rustcorp.com.au> Date: Wed Feb 24 10:50:41 2010 +1030 tdb: don't truncate tdb on recovery The current recovery code truncates the tdb file on recovery. This is fine if recovery is only done on first open, but is a really bad idea as we move to allowing recovery on "live" databases. Signed-off-by: Rusty Russell <ru...@rustcorp.com.au> commit 9f295eecffd92e55584fc36539cd85cd32c832de Author: Rusty Russell <ru...@rustcorp.com.au> Date: Wed Feb 24 10:49:22 2010 +1030 tdb: remove lock ops Now the transaction code uses the standard allrecord lock, that stops us from trying to grab any per-record locks anyway. We don't need to have special noop lock ops for transactions. This is a nice simplification: if you see brlock, you know it's really going to grab a lock. Signed-off-by: Rusty Russell <ru...@rustcorp.com.au> commit a84222bbaf9ed2c7b9c61b8157b2e3c85f17fa32 Author: Rusty Russell <ru...@rustcorp.com.au> Date: Wed Feb 24 11:02:55 2010 +1030 tdb: rename tdb_release_extra_locks() to tdb_release_transaction_locks() tdb_release_extra_locks() is too general: it carefully skips over the transaction lock, even though the only caller then drops it. Change this, and rename it to show it's clearly transaction-specific. Signed-off-by: Rusty Russell <ru...@rustcorp.com.au> commit dd1b508c63034452673dbfee9956f52a1b6c90a5 Author: Rusty Russell <ru...@rustcorp.com.au> Date: Wed Feb 24 12:42:24 2010 +1030 tdb: cleanup: remove ltype argument from _tdb_transaction_cancel. Now the transaction allrecord lock is the standard one, and thus is cleaned in tdb_release_extra_locks(), _tdb_transaction_cancel() doesn't need to know what type it is. Signed-off-by: Rusty Russell <ru...@rustcorp.com.au> commit fca1621965c547e2d076eca2a2599e9629f91266 Author: Rusty Russell <ru...@rustcorp.com.au> Date: Wed Feb 17 15:42:15 2010 +1030 tdb: tdb_allrecord_lock/tdb_allrecord_unlock/tdb_allrecord_upgrade Centralize locking of all chains of the tdb; rename _tdb_lockall to tdb_allrecord_lock and _tdb_unlockall to tdb_allrecord_unlock, and tdb_brlock_upgrade to tdb_allrecord_upgrade. Then we use this in the transaction code. Unfortunately, if the transaction code records that it has grabbed the allrecord lock read-only, write locks will fail, so we treat this upgradable lock as a write lock, and mark it as upgradable using the otherwise-unused offset field. One subtlety: now the transaction code is using the allrecord_lock, the tdb_release_extra_locks() function drops it for us, so we no longer need to do it manually in _tdb_transaction_cancel. Signed-off-by: Rusty Russell <ru...@rustcorp.com.au> commit caaf5c6baa1a4f340c1f38edd99b3a8b56621b8b Author: Rusty Russell <ru...@rustcorp.com.au> Date: Wed Feb 24 10:45:26 2010 +1030 tdb: suppress record write locks when allrecord lock is taken. Records themselves get (read) locked by the traversal code against delete. Interestingly, this locking isn't done when the allrecord lock has been taken, though the allrecord lock until recently didn't cover the actual records (it now goes to end of file). The write record lock, grabbed by the delete code, is not suppressed by the allrecord lock. This is now bad: it causes us to punch a hole in the allrecord lock when we release the write record lock. Make this consistent: *no* record locks of any kind when the allrecord lock is taken. Signed-off-by: Rusty Russell <ru...@rustcorp.com.au> commit 9341f230f8968b4b18e451d15dda5ccbe7787768 Author: Rusty Russell <ru...@rustcorp.com.au> Date: Wed Feb 24 10:45:14 2010 +1030 tdb: cleanup: always grab allrecord lock to infinity. We were previously inconsistent with our "global" lock: the transaction code grabbed it from FREELIST_TOP to end of file, and the rest of the code grabbed it from FREELIST_TOP to end of the hash chains. Change it to always grab to end of file for simplicity and so we can merge the two. Signed-off-by: Rusty Russell <ru...@rustcorp.com.au> commit 1ab8776247f89b143b6e58f4b038ab4bcea20d3a Author: Rusty Russell <ru...@rustcorp.com.au> Date: Wed Feb 17 15:01:07 2010 +1030 tdb: remove num_locks This was redundant before this patch series: it mirrored num_lockrecs exactly. It still does. Also, skip useless branch when locks == 1: unconditional assignment is cheaper anyway. Signed-off-by: Rusty Russell <ru...@rustcorp.com.au> commit d48c3e4982a38fb6b568ed3903e55e07a0fe5ca6 Author: Rusty Russell <ru...@rustcorp.com.au> Date: Wed Feb 17 12:40:57 2010 +1030 tdb: use tdb_nest_lock() for seqnum lock. This is pure overhead, but it centralizes the locking. Realloc (esp. as most implementations are lazy) is fast compared to the fnctl anyway. Signed-off-by: Rusty Russell <ru...@rustcorp.com.au> commit 4738d474c412cc59d26fcea64007e99094e8b675 Author: Rusty Russell <ru...@rustcorp.com.au> Date: Wed Feb 24 10:44:40 2010 +1030 tdb: use tdb_nest_lock() for active lock. Use our newly-generic nested lock tracking for the active lock. Note that the tdb_have_extra_locks() and tdb_release_extra_locks() functions have to skip over this lock now it is tracked. Signed-off-by: Rusty Russell <ru...@rustcorp.com.au> commit 9136818df30c7179e1cffa18201cdfc990ebd7b7 Author: Rusty Russell <ru...@rustcorp.com.au> Date: Mon Feb 22 13:58:07 2010 +1030 tdb: use tdb_nest_lock() for open lock. This never nests, so it's overkill, but it centralizes the locking into lock.c and removes the ugly flag in the transaction code to track whether we have the lock or not. Note that we have a temporary hack so this places a real lock, despite the fact that we are in a transaction. Signed-off-by: Rusty Russell <ru...@rustcorp.com.au> commit e8fa70a321d489b454b07bd65e9b0d95084168de Author: Rusty Russell <ru...@rustcorp.com.au> Date: Wed Feb 17 12:37:34 2010 +1030 tdb: use tdb_nest_lock() for transaction lock. Rather than a boutique lock and a separate nest count, use our newly-generic nested lock tracking for the transaction lock. Note that the tdb_have_extra_locks() and tdb_release_extra_locks() functions have to skip over this lock now it is tracked. Signed-off-by: Rusty Russell <ru...@rustcorp.com.au> commit ce41411c84760684ce539b6a302a0623a6a78a72 Author: Rusty Russell <ru...@rustcorp.com.au> Date: Wed Feb 17 12:35:54 2010 +1030 tdb: cleanup: find_nestlock() helper. Factor out two loops which find locks; we are going to introduce a couple more so a helper makes sense. Signed-off-by: Rusty Russell <ru...@rustcorp.com.au> commit db270734d8b4208e00ce9de5af1af7ee11823f6d Author: Rusty Russell <ru...@rustcorp.com.au> Date: Wed Feb 24 10:41:15 2010 +1030 tdb: cleanup: tdb_release_extra_locks() helper Move locking intelligence back into lock.c, rather than open-coding the lock release in transaction.c. Signed-off-by: Rusty Russell <ru...@rustcorp.com.au> commit fba42f1fb4f81b8913cce5a23ca5350ba45f40e1 Author: Rusty Russell <ru...@rustcorp.com.au> Date: Wed Feb 17 12:34:26 2010 +1030 tdb: cleanup: tdb_have_extra_locks() helper In many places we check whether locks are held: add a helper to do this. The _tdb_lockall() case has already checked for the allrecord lock, so the extra work done by tdb_have_extra_locks() is merely redundant. Signed-off-by: Rusty Russell <ru...@rustcorp.com.au> commit b754f61d235bdc3e410b60014d6be4072645e16f Author: Rusty Russell <ru...@rustcorp.com.au> Date: Wed Feb 17 12:31:49 2010 +1030 tdb: don't suppress the transaction lock because of the allrecord lock. tdb_transaction_lock() and tdb_transaction_unlock() do nothing if we hold the allrecord lock. However, the two locks don't overlap, so this is wrong. This simplification makes the transaction lock a straight-forward nested lock. There are two callers for these functions: 1) The transaction code, which already makes sure the allrecord_lock isn't held. 2) The traverse code, which wants to stop transactions whether it has the allrecord lock or not. There have been deadlocks here before, however this should not bring them back (I hope!) Signed-off-by: Rusty Russell <ru...@rustcorp.com.au> commit 5d9de604d92d227899e9b861c6beafb2e4fa61e0 Author: Rusty Russell <ru...@rustcorp.com.au> Date: Wed Feb 17 12:26:13 2010 +1030 tdb: cleanup: tdb_nest_lock/tdb_nest_unlock Because fcntl locks don't nest, we track them in the tdb->lockrecs array and only place/release them when the count goes to 1/0. We only do this for record locks, so we simply place the list number (or -1 for the free list) in the structure. To generalize this: 1) Put the offset rather than list number in struct tdb_lock_type. 2) Rename _tdb_lock() to tdb_nest_lock, make it non-static and move the allrecord check out to the callers (except the mark case which doesn't care). 3) Rename _tdb_unlock() to tdb_nest_unlock(), make it non-static and move the allrecord out to the callers (except mark again). Signed-off-by: Rusty Russell <ru...@rustcorp.com.au> commit e9114a758538d460d4f9deae5ce631bf44b1eff8 Author: Rusty Russell <ru...@rustcorp.com.au> Date: Wed Feb 17 12:19:47 2010 +1030 tdb: cleanup: rename global_lock to allrecord_lock. The word global is overloaded in tdb. The global_lock inside struct tdb_context is used to indicate we hold a lock across all the chains. Rename it to allrecord_lock. Signed-off-by: Rusty Russell <ru...@rustcorp.com.au> commit 7ab422d6fbd4f8be02838089a41f872d538ee7a7 Author: Rusty Russell <ru...@rustcorp.com.au> Date: Wed Feb 17 12:18:33 2010 +1030 tdb: cleanup: rename GLOBAL_LOCK to OPEN_LOCK. The word global is overloaded in tdb. The GLOBAL_LOCK offset is used at open time to serialize initialization (and by the transaction code to block open). Rename it to OPEN_LOCK. Signed-off-by: Rusty Russell <ru...@rustcorp.com.au> commit a6e0ef87d25734760fe77b87a9fd11db56760955 Author: Rusty Russell <ru...@rustcorp.com.au> Date: Wed Feb 24 10:39:59 2010 +1030 tdb: make _tdb_transaction_cancel static. Now tdb_open() calls tdb_transaction_cancel() instead of _tdb_transaction_cancel, we can make it static. Signed-off-by: Rusty Russell<ru...@rustcorp.com.au> commit 452b4a5a6efeecfb5c83475f1375ddc25bcddfbe Author: Rusty Russell <ru...@rustcorp.com.au> Date: Wed Feb 17 12:17:19 2010 +1030 tdb: cleanup: split brlock and brunlock methods. This is taken from the CCAN code base: rather than using tdb_brlock for locking and unlocking, we split it into brlock and brunlock functions. For extra debugging information, brunlock says what kind of lock it is unlocking (even though fnctl locks don't need this). This requires an extra argument to tdb_transaction_unlock() so we know whether the lock was upgraded to a write lock or not. We also use a "flags" argument tdb_brlock: 1) TDB_LOCK_NOWAIT replaces lck_type = F_SETLK (vs F_SETLKW). 2) TDB_LOCK_MARK_ONLY replaces setting TDB_MARK_LOCK bit in ltype. 3) TDB_LOCK_PROBE replaces the "probe" argument. Signed-off-by: Rusty Russell <ru...@rustcorp.com.au> ----------------------------------------------------------------------- Summary of changes: lib/tdb/common/io.c | 1 - lib/tdb/common/lock.c | 578 +++++++++++++++++++++++++++++------------- lib/tdb/common/open.c | 32 ++- lib/tdb/common/tdb.c | 7 +- lib/tdb/common/tdb_private.h | 39 ++- lib/tdb/common/transaction.c | 107 +++----- lib/tdb/common/traverse.c | 4 +- lib/tdb/tools/tdbtorture.c | 199 +++++++++++---- 8 files changed, 636 insertions(+), 331 deletions(-) Changeset truncated at 500 lines: diff --git a/lib/tdb/common/io.c b/lib/tdb/common/io.c index d549715..5b20fa1 100644 --- a/lib/tdb/common/io.c +++ b/lib/tdb/common/io.c @@ -461,7 +461,6 @@ static const struct tdb_methods io_methods = { tdb_next_hash_chain, tdb_oob, tdb_expand_file, - tdb_brlock }; /* diff --git a/lib/tdb/common/lock.c b/lib/tdb/common/lock.c index 0984e51..65d6843 100644 --- a/lib/tdb/common/lock.c +++ b/lib/tdb/common/lock.c @@ -27,13 +27,104 @@ #include "tdb_private.h" -#define TDB_MARK_LOCK 0x80000000 - void tdb_setalarm_sigptr(struct tdb_context *tdb, volatile sig_atomic_t *ptr) { tdb->interrupt_sig_ptr = ptr; } +static int fcntl_lock(struct tdb_context *tdb, + int rw, off_t off, off_t len, bool waitflag) +{ + struct flock fl; + + fl.l_type = rw; + fl.l_whence = SEEK_SET; + fl.l_start = off; + fl.l_len = len; + fl.l_pid = 0; + + if (waitflag) + return fcntl(tdb->fd, F_SETLKW, &fl); + else + return fcntl(tdb->fd, F_SETLK, &fl); +} + +static int fcntl_unlock(struct tdb_context *tdb, int rw, off_t off, off_t len) +{ + struct flock fl; +#if 0 /* Check they matched up locks and unlocks correctly. */ + char line[80]; + FILE *locks; + bool found = false; + + locks = fopen("/proc/locks", "r"); + + while (fgets(line, 80, locks)) { + char *p; + int type, start, l; + + /* eg. 1: FLOCK ADVISORY WRITE 2440 08:01:2180826 0 EOF */ + p = strchr(line, ':') + 1; + if (strncmp(p, " POSIX ADVISORY ", strlen(" POSIX ADVISORY "))) + continue; + p += strlen(" FLOCK ADVISORY "); + if (strncmp(p, "READ ", strlen("READ ")) == 0) + type = F_RDLCK; + else if (strncmp(p, "WRITE ", strlen("WRITE ")) == 0) + type = F_WRLCK; + else + abort(); + p += 6; + if (atoi(p) != getpid()) + continue; + p = strchr(strchr(p, ' ') + 1, ' ') + 1; + start = atoi(p); + p = strchr(p, ' ') + 1; + if (strncmp(p, "EOF", 3) == 0) + l = 0; + else + l = atoi(p) - start + 1; + + if (off == start) { + if (len != l) { + fprintf(stderr, "Len %u should be %u: %s", + (int)len, l, line); + abort(); + } + if (type != rw) { + fprintf(stderr, "Type %s wrong: %s", + rw == F_RDLCK ? "READ" : "WRITE", line); + abort(); + } + found = true; + break; + } + } + + if (!found) { + fprintf(stderr, "Unlock on %...@%u not found!\n", + (int)off, (int)len); + abort(); + } + + fclose(locks); +#endif + + fl.l_type = F_UNLCK; + fl.l_whence = SEEK_SET; + fl.l_start = off; + fl.l_len = len; + fl.l_pid = 0; + + return fcntl(tdb->fd, F_SETLKW, &fl); +} + +/* list -1 is the alloc list, otherwise a hash chain. */ +static tdb_off_t lock_offset(int list) +{ + return FREELIST_TOP + 4*list; +} + /* a byte range locking function - return 0 on success this functions locks/unlocks 1 byte at the specified offset. @@ -42,30 +133,36 @@ void tdb_setalarm_sigptr(struct tdb_context *tdb, volatile sig_atomic_t *ptr) note that a len of zero means lock to end of file */ -int tdb_brlock(struct tdb_context *tdb, tdb_off_t offset, - int rw_type, int lck_type, int probe, size_t len) +int tdb_brlock(struct tdb_context *tdb, + int rw_type, tdb_off_t offset, size_t len, + enum tdb_lock_flags flags) { - struct flock fl; int ret; if (tdb->flags & TDB_NOLOCK) { return 0; } + if (flags & TDB_LOCK_MARK_ONLY) { + return 0; + } + if ((rw_type == F_WRLCK) && (tdb->read_only || tdb->traverse_read)) { tdb->ecode = TDB_ERR_RDONLY; return -1; } - fl.l_type = rw_type; - fl.l_whence = SEEK_SET; - fl.l_start = offset; - fl.l_len = len; - fl.l_pid = 0; + /* Sanity check */ + if (tdb->transaction && offset >= lock_offset(-1) && len != 0) { + tdb->ecode = TDB_ERR_RDONLY; + TDB_LOG((tdb, TDB_DEBUG_TRACE, "tdb_brlock attempted in transaction at offset %d rw_type=%d flags=%d len=%d\n", + offset, rw_type, flags, (int)len)); + return -1; + } do { - ret = fcntl(tdb->fd,lck_type,&fl); - + ret = fcntl_lock(tdb, rw_type, offset, len, + flags & TDB_LOCK_WAIT); /* Check for a sigalarm break. */ if (ret == -1 && errno == EINTR && tdb->interrupt_sig_ptr && @@ -79,15 +176,34 @@ int tdb_brlock(struct tdb_context *tdb, tdb_off_t offset, /* Generic lock error. errno set by fcntl. * EAGAIN is an expected return from non-blocking * locks. */ - if (!probe && lck_type != F_SETLK) { - TDB_LOG((tdb, TDB_DEBUG_TRACE,"tdb_brlock failed (fd=%d) at offset %d rw_type=%d lck_type=%d len=%d\n", - tdb->fd, offset, rw_type, lck_type, (int)len)); + if (!(flags & TDB_LOCK_PROBE) && errno != EAGAIN) { + TDB_LOG((tdb, TDB_DEBUG_TRACE,"tdb_brlock failed (fd=%d) at offset %d rw_type=%d flags=%d len=%d\n", + tdb->fd, offset, rw_type, flags, (int)len)); } return -1; } return 0; } +int tdb_brunlock(struct tdb_context *tdb, + int rw_type, tdb_off_t offset, size_t len) +{ + int ret; + + if (tdb->flags & TDB_NOLOCK) { + return 0; + } + + do { + ret = fcntl_unlock(tdb, rw_type, offset, len); + } while (ret == -1 && errno == EINTR); + + if (ret == -1) { + TDB_LOG((tdb, TDB_DEBUG_TRACE,"tdb_brunlock failed (fd=%d) at offset %d rw_type=%d len=%d\n", + tdb->fd, offset, rw_type, (int)len)); + } + return ret; +} /* upgrade a read lock to a write lock. This needs to be handled in a @@ -95,12 +211,29 @@ int tdb_brlock(struct tdb_context *tdb, tdb_off_t offset, deadlock detection and claim a deadlock when progress can be made. For those OSes we may loop for a while. */ -int tdb_brlock_upgrade(struct tdb_context *tdb, tdb_off_t offset, size_t len) +int tdb_allrecord_upgrade(struct tdb_context *tdb) { int count = 1000; + + if (tdb->allrecord_lock.count != 1) { + TDB_LOG((tdb, TDB_DEBUG_ERROR, + "tdb_allrecord_upgrade failed: count %u too high\n", + tdb->allrecord_lock.count)); + return -1; + } + + if (tdb->allrecord_lock.off != 1) { + TDB_LOG((tdb, TDB_DEBUG_ERROR, + "tdb_allrecord_upgrade failed: already upgraded?\n")); + return -1; + } + while (count--) { struct timeval tv; - if (tdb_brlock(tdb, offset, F_WRLCK, F_SETLKW, 1, len) == 0) { + if (tdb_brlock(tdb, F_WRLCK, FREELIST_TOP, 0, + TDB_LOCK_WAIT|TDB_LOCK_PROBE) == 0) { + tdb->allrecord_lock.ltype = F_WRLCK; + tdb->allrecord_lock.off = 0; return 0; } if (errno != EDEADLK) { @@ -111,57 +244,46 @@ int tdb_brlock_upgrade(struct tdb_context *tdb, tdb_off_t offset, size_t len) tv.tv_usec = 1; select(0, NULL, NULL, NULL, &tv); } - TDB_LOG((tdb, TDB_DEBUG_TRACE,"tdb_brlock_upgrade failed at offset %d\n", offset)); + TDB_LOG((tdb, TDB_DEBUG_TRACE,"tdb_allrecord_upgrade failed\n")); return -1; } - -/* lock a list in the database. list -1 is the alloc list */ -static int _tdb_lock(struct tdb_context *tdb, int list, int ltype, int op) +static struct tdb_lock_type *find_nestlock(struct tdb_context *tdb, + tdb_off_t offset) { - struct tdb_lock_type *new_lck; - int i; - bool mark_lock = ((ltype & TDB_MARK_LOCK) == TDB_MARK_LOCK); - - ltype &= ~TDB_MARK_LOCK; + unsigned int i; - /* a global lock allows us to avoid per chain locks */ - if (tdb->global_lock.count && - (ltype == tdb->global_lock.ltype || ltype == F_RDLCK)) { - return 0; + for (i=0; i<tdb->num_lockrecs; i++) { + if (tdb->lockrecs[i].off == offset) { + return &tdb->lockrecs[i]; + } } + return NULL; +} - if (tdb->global_lock.count) { - tdb->ecode = TDB_ERR_LOCK; - return -1; - } +/* lock an offset in the database. */ +int tdb_nest_lock(struct tdb_context *tdb, uint32_t offset, int ltype, + enum tdb_lock_flags flags) +{ + struct tdb_lock_type *new_lck; - if (list < -1 || list >= (int)tdb->header.hash_size) { + if (offset >= lock_offset(tdb->header.hash_size)) { tdb->ecode = TDB_ERR_LOCK; - TDB_LOG((tdb, TDB_DEBUG_ERROR,"tdb_lock: invalid list %d for ltype=%d\n", - list, ltype)); + TDB_LOG((tdb, TDB_DEBUG_ERROR,"tdb_lock: invalid offset %u for ltype=%d\n", + offset, ltype)); return -1; } if (tdb->flags & TDB_NOLOCK) return 0; - for (i=0; i<tdb->num_lockrecs; i++) { - if (tdb->lockrecs[i].list == list) { - if (tdb->lockrecs[i].count == 0) { - /* - * Can't happen, see tdb_unlock(). It should - * be an assert. - */ - TDB_LOG((tdb, TDB_DEBUG_ERROR, "tdb_lock: " - "lck->count == 0 for list %d", list)); - } - /* - * Just increment the in-memory struct, posix locks - * don't stack. - */ - tdb->lockrecs[i].count++; - return 0; - } + new_lck = find_nestlock(tdb, offset); + if (new_lck) { + /* + * Just increment the in-memory struct, posix locks + * don't stack. + */ + new_lck->count++; + return 0; } new_lck = (struct tdb_lock_type *)realloc( @@ -175,27 +297,89 @@ static int _tdb_lock(struct tdb_context *tdb, int list, int ltype, int op) /* Since fcntl locks don't nest, we do a lock for the first one, and simply bump the count for future ones */ - if (!mark_lock && - tdb->methods->tdb_brlock(tdb,FREELIST_TOP+4*list, ltype, op, - 0, 1)) { + if (tdb_brlock(tdb, ltype, offset, 1, flags)) { return -1; } - tdb->num_locks++; - - tdb->lockrecs[tdb->num_lockrecs].list = list; + tdb->lockrecs[tdb->num_lockrecs].off = offset; tdb->lockrecs[tdb->num_lockrecs].count = 1; tdb->lockrecs[tdb->num_lockrecs].ltype = ltype; - tdb->num_lockrecs += 1; + tdb->num_lockrecs++; return 0; } +static int tdb_lock_and_recover(struct tdb_context *tdb) +{ + int ret; + + /* We need to match locking order in transaction commit. */ + if (tdb_brlock(tdb, F_WRLCK, FREELIST_TOP, 0, TDB_LOCK_WAIT)) { + return -1; + } + + if (tdb_brlock(tdb, F_WRLCK, OPEN_LOCK, 1, TDB_LOCK_WAIT)) { + tdb_brunlock(tdb, F_WRLCK, FREELIST_TOP, 0); + return -1; + } + + ret = tdb_transaction_recover(tdb); + + tdb_brunlock(tdb, F_WRLCK, OPEN_LOCK, 1); + tdb_brunlock(tdb, F_WRLCK, FREELIST_TOP, 0); + + return ret; +} + +static bool have_data_locks(const struct tdb_context *tdb) +{ + unsigned int i; + + for (i = 0; i < tdb->num_lockrecs; i++) { + if (tdb->lockrecs[i].off >= lock_offset(-1)) + return true; + } + return false; +} + +static int tdb_lock_list(struct tdb_context *tdb, int list, int ltype, + enum tdb_lock_flags waitflag) +{ + int ret; + bool check = false; + + /* a allrecord lock allows us to avoid per chain locks */ + if (tdb->allrecord_lock.count && + (ltype == tdb->allrecord_lock.ltype || ltype == F_RDLCK)) { + return 0; + } + + if (tdb->allrecord_lock.count) { + tdb->ecode = TDB_ERR_LOCK; + ret = -1; + } else { + /* Only check when we grab first data lock. */ + check = !have_data_locks(tdb); + ret = tdb_nest_lock(tdb, lock_offset(list), ltype, waitflag); + + if (ret == 0 && check && tdb_needs_recovery(tdb)) { + tdb_nest_unlock(tdb, lock_offset(list), ltype, false); + + if (tdb_lock_and_recover(tdb) == -1) { + return -1; + } + return tdb_lock_list(tdb, list, ltype, waitflag); + } + } + return ret; +} + /* lock a list in the database. list -1 is the alloc list */ int tdb_lock(struct tdb_context *tdb, int list, int ltype) { int ret; - ret = _tdb_lock(tdb, list, ltype, F_SETLKW); + + ret = tdb_lock_list(tdb, list, ltype, TDB_LOCK_WAIT); if (ret) { TDB_LOG((tdb, TDB_DEBUG_ERROR, "tdb_lock failed on list %d " "ltype=%d (%s)\n", list, ltype, strerror(errno))); @@ -206,49 +390,26 @@ int tdb_lock(struct tdb_context *tdb, int list, int ltype) /* lock a list in the database. list -1 is the alloc list. non-blocking lock */ int tdb_lock_nonblock(struct tdb_context *tdb, int list, int ltype) { - return _tdb_lock(tdb, list, ltype, F_SETLK); + return tdb_lock_list(tdb, list, ltype, TDB_LOCK_NOWAIT); } -/* unlock the database: returns void because it's too late for errors. */ - /* changed to return int it may be interesting to know there - has been an error --simo */ -int tdb_unlock(struct tdb_context *tdb, int list, int ltype) +int tdb_nest_unlock(struct tdb_context *tdb, uint32_t offset, int ltype, + bool mark_lock) { int ret = -1; - int i; - struct tdb_lock_type *lck = NULL; - bool mark_lock = ((ltype & TDB_MARK_LOCK) == TDB_MARK_LOCK); - - ltype &= ~TDB_MARK_LOCK; - - /* a global lock allows us to avoid per chain locks */ - if (tdb->global_lock.count && - (ltype == tdb->global_lock.ltype || ltype == F_RDLCK)) { - return 0; - } - - if (tdb->global_lock.count) { - tdb->ecode = TDB_ERR_LOCK; - return -1; - } + struct tdb_lock_type *lck; if (tdb->flags & TDB_NOLOCK) return 0; /* Sanity checks */ - if (list < -1 || list >= (int)tdb->header.hash_size) { - TDB_LOG((tdb, TDB_DEBUG_ERROR, "tdb_unlock: list %d invalid (%d)\n", list, tdb->header.hash_size)); + if (offset >= lock_offset(tdb->header.hash_size)) { + TDB_LOG((tdb, TDB_DEBUG_ERROR, "tdb_unlock: offset %u invalid (%d)\n", offset, tdb->header.hash_size)); return ret; } - for (i=0; i<tdb->num_lockrecs; i++) { - if (tdb->lockrecs[i].list == list) { - lck = &tdb->lockrecs[i]; - break; - } - } - + lck = find_nestlock(tdb, offset); if ((lck == NULL) || (lck->count == 0)) { TDB_LOG((tdb, TDB_DEBUG_ERROR, "tdb_unlock: count is 0\n")); return -1; @@ -269,20 +430,14 @@ int tdb_unlock(struct tdb_context *tdb, int list, int ltype) if (mark_lock) { ret = 0; } else { - ret = tdb->methods->tdb_brlock(tdb, FREELIST_TOP+4*list, F_UNLCK, - F_SETLKW, 0, 1); + ret = tdb_brunlock(tdb, ltype, offset, 1); } - tdb->num_locks--; /* * Shrink the array by overwriting the element just unlocked with the * last array element. */ - - if (tdb->num_lockrecs > 1) { - *lck = tdb->lockrecs[tdb->num_lockrecs-1]; - } - tdb->num_lockrecs -= 1; + *lck = tdb->lockrecs[--tdb->num_lockrecs]; -- Samba Shared Repository