Mark Zealey wrote:
On 22/08/13 23:37, Howard Chu wrote:
1) Can you update documentation to explain what happens when I do a
mdb_cursor_del() ? I am assuming it advances the cursor to the next
record (this seems to be the behaviour). However there is some sort of
bug with this assumption. Basically I have a loop which jumps
(MDB_SET_RANGE) to a key and then wants to do a delete until key is like
something else. So I do while(..) { mdb_cursor_del(),
mdb_cursor_get(..., MDB_GET_CURRENT)}. This works fine mostly, but
roughly 1% of the time I get EINVAL returned when I try to
MDB_GET_CURRENT after a delete. This always seems to happen on the same
records - not sure about the memory structure but could it be something
to do with hitting a page boundary somehow invalidating the cursor?
That's exactly what it does, yes.
Any idea about the EINVAL issue?
Yes, as I said already, it does exactly what you said. When you've deleted the
last item on the page the cursor no longer points at a valid node, so
GET_CURRENT returns EINVAL.
None of the memory behavior you just described makes any sense to me.
LMDB uses a shared memory map, exclusively. All of the memory growth
you see in the process should be shared memory. If it's anywhere else
then I'm pretty sure you have a memory leak. With all the valgrind
sessions we've run I'm also pretty sure that *we* don't have a memory
leak.
As for the random I/O, it also seems a bit suspect. Are you doing a
commit on every key, or batching multiple keys per commit?
I'm not doing *any* commits just one big txn for all the data...
The below C works fine up until i=4m (ie 500mb of residential memory
shown in top), then has massive slowdown, shared memory (again, as seen
in top) increases, waits about 20-30 seconds and then disks get hammered
writing 10mb/sec (200txns) when they are capable of 100-200mb/sec
streaming writes... Does it do the same for you?
int main(int argc,char * argv[]) {
int i = 0, j = 0, rc;
MDB_env *env; MDB_dbi dbi; MDB_val key, data; MDB_txn *txn; char
buf[40];
int count = 100000000;
rc = mdb_env_create(&env);
rc = mdb_env_set_mapsize(env, (size_t)1024*1024*1024*10);
rc = mdb_env_open(env, "./testdb", 0, 0664);
rc = mdb_txn_begin(env, NULL, 0, &txn);
rc = mdb_open(txn, NULL, 0, &dbi);
for (i=0;i<count;i++) {
sprintf( buf, "blah foo %9d%9d%9d", (long)(random() *
(float)count / RAND_MAX) - i, i, i );
if( i %100000 == 0 )
printf("%s\n", buf);
key.mv_size = sizeof(buf); key.mv_data = &buf;
data.mv_size = sizeof(buf); data.mv_data = &buf;
rc = mdb_put(txn, dbi, &key, &data, 0);
}
rc = mdb_txn_commit(txn);
mdb_close(env, dbi);
mdb_env_close(env);
return 0;
}
By the way, I've just generated our biggest database (~4.5gb) from
scratch using our standard perl script. Using kyoto (treedb) with
various tunings it did it in 18 min real time vs lmdb at 50 minutes
(both ssd-backed in a box with 24gb free memory).
Kyoto writes async by default. You should do the same here, use MDB_NOSYNC on
the env_open.
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/