The branch, master has been updated via 8278d38 tdb: change version to 1.2.13. via 3034a5a tdb: Reduce freelist contention via 1461362 tdb: Make "tdb_purge_dead" internally public via 92ce9fd tdb: Make "tdb_find_dead" internally public via 4ca0186 tdb: Add "last_ptr" to tdb_find_dead via cb09d79 tdb: Move adding tailer space to tdb_find_dead via 255edd1 tdb: Do a best fit search for dead records via d1ce011 tdb: Don't purge records to a blocked freelist via 5f7b481 tdb: Fix a tdb corruption from 9c9df40 dsdb: Further assert that we always have an objectClass and an rDN
http://gitweb.samba.org/?p=samba.git;a=shortlog;h=master - Log ----------------------------------------------------------------- commit 8278d3823aac83bc5edb14353c8de772878ae915 Author: Michael Adam <ob...@samba.org> Date: Tue Mar 18 13:05:42 2014 +0100 tdb: change version to 1.2.13. * internal code cleanups * always open internal TDBs with incompatible hash * avoid reallocations in locking code * systematize output format in tdbtool dump * reduce freelist contention when allocating new records - try to find dead records also in other chains - don't do blocking locks on the freelist Signed-off-by: Michael Adam <ob...@samba.org> Reviewed-by: Stefan Metzmacher <me...@samba.org> Autobuild-User(master): Michael Adam <ob...@samba.org> Autobuild-Date(master): Tue Mar 18 15:42:48 CET 2014 on sn-devel-104 commit 3034a5a62b8eaac7bdc52bfb7af1beb29e888b34 Author: Volker Lendecke <v...@samba.org> Date: Tue Mar 18 08:04:42 2014 +0100 tdb: Reduce freelist contention In a metadata-intensive benchmark we have seen the locking.tdb freelist to be one of the central contention points. This patch removes most of the contention on the freelist. Ages ago we already reduced freelist contention by using the even much older DEAD records: If TDB_VOLATILE is set, don't directly put deleted records on the freelist, but just mark a few of them just as DEAD. The next new record can them re-use that space without consulting the freelist. This patch builds upon the DEAD records: If we need space and the freelist is busy, instead of doing a blocking wait on the freelist, start looking into other chains for DEAD records and steal them from there. This way every hash chain becomes a small freelist. Just wander around the hash chains as long as the freelist is still busy. With this patch and the tdb mutex patch (following hopefully some time soon) you can see a heavily busy clustered smbd run without locking.tdb futex syscalls. Signed-off-by: Volker Lendecke <v...@samba.org> Reviewed-by: Michael Adam <ob...@samba.org> Reviewed-by: Stefan Metzmacher <me...@samba.org> commit 1461362e936e5beebeaae1555cf96f6731287c35 Author: Volker Lendecke <v...@samba.org> Date: Tue Mar 18 08:03:16 2014 +0100 tdb: Make "tdb_purge_dead" internally public Signed-off-by: Volker Lendecke <v...@samba.org> Reviewed-by: Michael Adam <ob...@samba.org> Reviewed-by: Stefan Metzmacher <me...@samba.org> commit 92ce9fd9afd080954c0509cf62def6b355d79e94 Author: Volker Lendecke <v...@samba.org> Date: Tue Mar 18 08:01:40 2014 +0100 tdb: Make "tdb_find_dead" internally public Signed-off-by: Volker Lendecke <v...@samba.org> Reviewed-by: Michael Adam <ob...@samba.org> Reviewed-by: Stefan Metzmacher <me...@samba.org> commit 4ca018692f1bd9fe85b6d8be546bbaf704ba038d Author: Volker Lendecke <v...@samba.org> Date: Tue Mar 18 08:00:45 2014 +0100 tdb: Add "last_ptr" to tdb_find_dead Will be used soon to unlink a dead record from a chain Signed-off-by: Volker Lendecke <v...@samba.org> Reviewed-by: Michael Adam <ob...@samba.org> Reviewed-by: Stefan Metzmacher <me...@samba.org> commit cb09d7937c93a6cdf855b84bd5f58f30a46cbfc7 Author: Volker Lendecke <v...@samba.org> Date: Tue Mar 18 07:52:59 2014 +0100 tdb: Move adding tailer space to tdb_find_dead This aligns the tdb_find_dead API with the tdb_allocate API and thus makes it a bit easier to understand, at least for me. Signed-off-by: Volker Lendecke <v...@samba.org> Reviewed-by: Michael Adam <ob...@samba.org> Reviewed-by: Stefan Metzmacher <me...@samba.org> commit 255edd1b417480ab033c51165702c19fb5fff56f Author: Volker Lendecke <v...@samba.org> Date: Tue Mar 18 07:46:38 2014 +0100 tdb: Do a best fit search for dead records Hash chains are (or can be made) short enough that a full search for the best-fitting dead record is feasible. The freelist can become much longer, there we don't do the full search but accept records which are too large. Signed-off-by: Volker Lendecke <v...@samba.org> Reviewed-by: Michael Adam <ob...@samba.org> Reviewed-by: Stefan Metzmacher <me...@samba.org> commit d1ce0110f0a1666df4d1eb81e76631a90e9e5a73 Author: Volker Lendecke <v...@samba.org> Date: Mon Mar 17 06:47:11 2014 +0100 tdb: Don't purge records to a blocked freelist If the freelist is heavily contended, we should avoid accessing it Signed-off-by: Volker Lendecke <v...@samba.org> Reviewed-by: Michael Adam <ob...@samba.org> Reviewed-by: Stefan Metzmacher <me...@samba.org> commit 5f7b481349796cc0e90563ed01353809b403e429 Author: Volker Lendecke <v...@samba.org> Date: Sun Mar 16 20:08:32 2014 +0000 tdb: Fix a tdb corruption tdb_purge_dead can change the next pointer of "rec" if we purge the record right behind the current record to be deleted. Just overwrite the magic, not the whole record with stale data. Signed-off-by: Volker Lendecke <v...@samba.org> Reviewed-by: Michael Adam <ob...@samba.org> Reviewed-by: Stefan Metzmacher <me...@samba.org> ----------------------------------------------------------------------- Summary of changes: .../tdb-1.2.11.sigs => lib/tdb/ABI/tdb-1.2.13.sigs | 0 lib/tdb/common/freelist.c | 100 ++++++++++++++++++-- lib/tdb/common/tdb.c | 78 ++++++++------- lib/tdb/common/tdb_private.h | 7 +- lib/tdb/wscript | 2 +- 5 files changed, 138 insertions(+), 49 deletions(-) copy ctdb/lib/tdb/ABI/tdb-1.2.11.sigs => lib/tdb/ABI/tdb-1.2.13.sigs (100%) Changeset truncated at 500 lines: diff --git a/ctdb/lib/tdb/ABI/tdb-1.2.11.sigs b/lib/tdb/ABI/tdb-1.2.13.sigs similarity index 100% copy from ctdb/lib/tdb/ABI/tdb-1.2.11.sigs copy to lib/tdb/ABI/tdb-1.2.13.sigs diff --git a/lib/tdb/common/freelist.c b/lib/tdb/common/freelist.c index ea14dd0..6f8f812 100644 --- a/lib/tdb/common/freelist.c +++ b/lib/tdb/common/freelist.c @@ -273,7 +273,8 @@ static tdb_off_t tdb_allocate_ofs(struct tdb_context *tdb, 0 is returned if the space could not be allocated */ -tdb_off_t tdb_allocate(struct tdb_context *tdb, tdb_len_t length, struct tdb_record *rec) +static tdb_off_t tdb_allocate_from_freelist( + struct tdb_context *tdb, tdb_len_t length, struct tdb_record *rec) { tdb_off_t rec_ptr, last_ptr, newrec_ptr; struct { @@ -282,9 +283,6 @@ tdb_off_t tdb_allocate(struct tdb_context *tdb, tdb_len_t length, struct tdb_rec } bestfit; float multiplier = 1.0; - if (tdb_lock(tdb, -1, F_WRLCK) == -1) - return 0; - /* over-allocate to reduce fragmentation */ length *= 1.25; @@ -297,7 +295,7 @@ tdb_off_t tdb_allocate(struct tdb_context *tdb, tdb_len_t length, struct tdb_rec /* read in the freelist top */ if (tdb_ofs_read(tdb, FREELIST_TOP, &rec_ptr) == -1) - goto fail; + return 0; bestfit.rec_ptr = 0; bestfit.last_ptr = 0; @@ -310,7 +308,7 @@ tdb_off_t tdb_allocate(struct tdb_context *tdb, tdb_len_t length, struct tdb_rec */ while (rec_ptr) { if (tdb_rec_free_read(tdb, rec_ptr, rec) == -1) { - goto fail; + return 0; } if (rec->rec_len >= length) { @@ -344,12 +342,11 @@ tdb_off_t tdb_allocate(struct tdb_context *tdb, tdb_len_t length, struct tdb_rec if (bestfit.rec_ptr != 0) { if (tdb_rec_free_read(tdb, bestfit.rec_ptr, rec) == -1) { - goto fail; + return 0; } newrec_ptr = tdb_allocate_ofs(tdb, length, bestfit.rec_ptr, rec, bestfit.last_ptr); - tdb_unlock(tdb, -1, F_WRLCK); return newrec_ptr; } @@ -357,12 +354,95 @@ tdb_off_t tdb_allocate(struct tdb_context *tdb, tdb_len_t length, struct tdb_rec database and if we can then try again */ if (tdb_expand(tdb, length + sizeof(*rec)) == 0) goto again; - fail: - tdb_unlock(tdb, -1, F_WRLCK); + return 0; } +static bool tdb_alloc_dead( + struct tdb_context *tdb, int hash, tdb_len_t length, + tdb_off_t *rec_ptr, struct tdb_record *rec) +{ + tdb_off_t last_ptr; + + *rec_ptr = tdb_find_dead(tdb, hash, rec, length, &last_ptr); + if (*rec_ptr == 0) { + return false; + } + /* + * Unlink the record from the hash chain, it's about to be moved into + * another one. + */ + return (tdb_ofs_write(tdb, last_ptr, &rec->next) == 0); +} + +/* + * Chain "hash" is assumed to be locked + */ + +tdb_off_t tdb_allocate(struct tdb_context *tdb, int hash, tdb_len_t length, + struct tdb_record *rec) +{ + tdb_off_t ret; + int i; + + if (tdb->max_dead_records == 0) { + /* + * No dead records to expect anywhere. Do the blocking + * freelist lock without trying to steal from others + */ + goto blocking_freelist_allocate; + } + + /* + * The following loop tries to get the freelist lock nonblocking. If + * it gets the lock, allocate from there. If the freelist is busy, + * instead of waiting we try to steal dead records from other hash + * chains. + * + * Be aware that we do nonblocking locks on the other hash chains as + * well and fail gracefully. This way we avoid deadlocks (we block two + * hash chains, something which is pretty bad normally) + */ + + for (i=1; i<tdb->hash_size; i++) { + + int list; + + if (tdb_lock_nonblock(tdb, -1, F_WRLCK) == 0) { + /* + * Under the freelist lock take the chance to give + * back our dead records. + */ + tdb_purge_dead(tdb, hash); + + ret = tdb_allocate_from_freelist(tdb, length, rec); + tdb_unlock(tdb, -1, F_WRLCK); + return ret; + } + + list = BUCKET(hash+i); + + if (tdb_lock_nonblock(tdb, list, F_WRLCK) == 0) { + bool got_dead; + got_dead = tdb_alloc_dead(tdb, list, length, &ret, rec); + tdb_unlock(tdb, list, F_WRLCK); + + if (got_dead) { + return ret; + } + } + } + +blocking_freelist_allocate: + + if (tdb_lock(tdb, -1, F_WRLCK) == -1) { + return 0; + } + ret = tdb_allocate_from_freelist(tdb, length, rec); + tdb_unlock(tdb, -1, F_WRLCK); + return ret; +} /* return the size of the freelist - used to decide if we should repack diff --git a/lib/tdb/common/tdb.c b/lib/tdb/common/tdb.c index 1e41e84..ba1c98e 100644 --- a/lib/tdb/common/tdb.c +++ b/lib/tdb/common/tdb.c @@ -345,13 +345,16 @@ static int tdb_count_dead(struct tdb_context *tdb, uint32_t hash) /* * Purge all DEAD records from a hash chain */ -static int tdb_purge_dead(struct tdb_context *tdb, uint32_t hash) +int tdb_purge_dead(struct tdb_context *tdb, uint32_t hash) { int res = -1; struct tdb_record rec; tdb_off_t rec_ptr; - if (tdb_lock(tdb, -1, F_WRLCK) == -1) { + if (tdb_lock_nonblock(tdb, -1, F_WRLCK) == -1) { + /* + * Don't block the freelist if not strictly necessary + */ return -1; } @@ -394,6 +397,8 @@ static int tdb_delete_hash(struct tdb_context *tdb, TDB_DATA key, uint32_t hash) if (tdb->max_dead_records != 0) { + uint32_t magic = TDB_DEAD_MAGIC; + /* * Allow for some dead records per hash chain, mainly for * tdb's with a very high create/delete rate like locking.tdb. @@ -410,8 +415,9 @@ static int tdb_delete_hash(struct tdb_context *tdb, TDB_DATA key, uint32_t hash) /* * Just mark the record as dead. */ - rec.magic = TDB_DEAD_MAGIC; - ret = tdb_rec_write(tdb, rec_ptr, &rec); + ret = tdb_ofs_write( + tdb, rec_ptr + offsetof(struct tdb_record, magic), + &magic); } else { ret = tdb_do_delete(tdb, rec_ptr, &rec); @@ -439,13 +445,21 @@ _PUBLIC_ int tdb_delete(struct tdb_context *tdb, TDB_DATA key) /* * See if we have a dead record around with enough space */ -static tdb_off_t tdb_find_dead(struct tdb_context *tdb, uint32_t hash, - struct tdb_record *r, tdb_len_t length) +tdb_off_t tdb_find_dead(struct tdb_context *tdb, uint32_t hash, + struct tdb_record *r, tdb_len_t length, + tdb_off_t *p_last_ptr) { - tdb_off_t rec_ptr; + tdb_off_t rec_ptr, last_ptr; + tdb_off_t best_rec_ptr = 0; + tdb_off_t best_last_ptr = 0; + struct tdb_record best = { .rec_len = UINT32_MAX }; + + length += sizeof(tdb_off_t); /* tailer */ + + last_ptr = TDB_HASH_TOP(hash); /* read in the hash top */ - if (tdb_ofs_read(tdb, TDB_HASH_TOP(hash), &rec_ptr) == -1) + if (tdb_ofs_read(tdb, last_ptr, &rec_ptr) == -1) return 0; /* keep looking until we find the right record */ @@ -453,16 +467,23 @@ static tdb_off_t tdb_find_dead(struct tdb_context *tdb, uint32_t hash, if (tdb_rec_read(tdb, rec_ptr, r) == -1) return 0; - if (TDB_DEAD(r) && r->rec_len >= length) { - /* - * First fit for simple coding, TODO: change to best - * fit - */ - return rec_ptr; + if (TDB_DEAD(r) && (r->rec_len >= length) && + (r->rec_len < best.rec_len)) { + best_rec_ptr = rec_ptr; + best_last_ptr = last_ptr; + best = *r; } + last_ptr = rec_ptr; rec_ptr = r->next; } - return 0; + + if (best.rec_len == UINT32_MAX) { + return 0; + } + + *r = best; + *p_last_ptr = best_last_ptr; + return best_rec_ptr; } static int _tdb_store(struct tdb_context *tdb, TDB_DATA key, @@ -500,15 +521,16 @@ static int _tdb_store(struct tdb_context *tdb, TDB_DATA key, tdb_delete_hash(tdb, key, hash); if (tdb->max_dead_records != 0) { + tdb_off_t last_ptr; /* * Allow for some dead records per hash chain, look if we can * find one that can hold the new record. We need enough space * for key, data and tailer. If we find one, we don't have to * consult the central freelist. */ - rec_ptr = tdb_find_dead( - tdb, hash, &rec, - key.dsize + dbuf.dsize + sizeof(tdb_off_t)); + rec_ptr = tdb_find_dead(tdb, hash, &rec, + key.dsize + dbuf.dsize, + &last_ptr); if (rec_ptr != 0) { rec.key_len = key.dsize; @@ -528,26 +550,8 @@ static int _tdb_store(struct tdb_context *tdb, TDB_DATA key, } } - /* - * We have to allocate some space from the freelist, so this means we - * have to lock it. Use the chance to purge all the DEAD records from - * the hash chain under the freelist lock. - */ - - if (tdb_lock(tdb, -1, F_WRLCK) == -1) { - goto fail; - } - - if ((tdb->max_dead_records != 0) - && (tdb_purge_dead(tdb, hash) == -1)) { - tdb_unlock(tdb, -1, F_WRLCK); - goto fail; - } - /* we have to allocate some space */ - rec_ptr = tdb_allocate(tdb, key.dsize + dbuf.dsize, &rec); - - tdb_unlock(tdb, -1, F_WRLCK); + rec_ptr = tdb_allocate(tdb, hash, key.dsize + dbuf.dsize, &rec); if (rec_ptr == 0) { goto fail; diff --git a/lib/tdb/common/tdb_private.h b/lib/tdb/common/tdb_private.h index 7227b43..a672159 100644 --- a/lib/tdb/common/tdb_private.h +++ b/lib/tdb/common/tdb_private.h @@ -255,7 +255,8 @@ int tdb_ofs_read(struct tdb_context *tdb, tdb_off_t offset, tdb_off_t *d); int tdb_ofs_write(struct tdb_context *tdb, tdb_off_t offset, tdb_off_t *d); void *tdb_convert(void *buf, uint32_t size); int tdb_free(struct tdb_context *tdb, tdb_off_t offset, struct tdb_record *rec); -tdb_off_t tdb_allocate(struct tdb_context *tdb, tdb_len_t length, struct tdb_record *rec); +tdb_off_t tdb_allocate(struct tdb_context *tdb, int hash, tdb_len_t length, + struct tdb_record *rec); int tdb_ofs_read(struct tdb_context *tdb, tdb_off_t offset, tdb_off_t *d); int tdb_ofs_write(struct tdb_context *tdb, tdb_off_t offset, tdb_off_t *d); int tdb_lock_record(struct tdb_context *tdb, tdb_off_t off); @@ -272,6 +273,10 @@ int tdb_parse_data(struct tdb_context *tdb, TDB_DATA key, void *private_data); tdb_off_t tdb_find_lock_hash(struct tdb_context *tdb, TDB_DATA key, uint32_t hash, int locktype, struct tdb_record *rec); +tdb_off_t tdb_find_dead(struct tdb_context *tdb, uint32_t hash, + struct tdb_record *r, tdb_len_t length, + tdb_off_t *p_last_ptr); +int tdb_purge_dead(struct tdb_context *tdb, uint32_t hash); void tdb_io_init(struct tdb_context *tdb); int tdb_expand(struct tdb_context *tdb, tdb_off_t size); tdb_off_t tdb_expand_adjust(tdb_off_t map_size, tdb_off_t size, int page_size); diff --git a/lib/tdb/wscript b/lib/tdb/wscript index 00a1c34..fc396d6 100644 --- a/lib/tdb/wscript +++ b/lib/tdb/wscript @@ -1,7 +1,7 @@ #!/usr/bin/env python APPNAME = 'tdb' -VERSION = '1.2.12' +VERSION = '1.2.13' blddir = 'bin' -- Samba Shared Repository