I'm currently working on importing a very large dataset (800M) into Riak
and running into some serious performance problems. Hopefully this is
just configuration issues and nothing deeper...
Hardware -
* 8 proc box
* 32 Gb ram
* 5TB disk - RAID10
Have a cluster of 4 for these boxes all running riak - riak
configuration options that are different from stock:
* Listening on all IP address "0.0.0.0"
* {storage_backend, riak_kv_innostore_backend},
* innostore section - {buffer_pool_size, 17179869184}, %% 16GB
* innostore section - {flush_method, "O_DIRECT"}
What I see is that the performance of my import script runs at about
200...300 keys per/second for keys that it's seen recently (e.g.
re-runs) then drops to 20ish keys per/sec for new keys.
STATS: 1000 keys handled in 3 seconds 250.75 keys/sec
STATS: 1000 keys handled in 3 seconds 258.20 keys/sec
STATS: 1000 keys handled in 4 seconds 240.11 keys/sec
STATS: 1000 keys handled in 5 seconds 177.63 keys/sec
STATS: 1000 keys handled in 4 seconds 246.26 keys/sec
STATS: 1000 keys handled in 5 seconds 184.79 keys/sec
STATS: 1000 keys handled in 5 seconds 195.95 keys/sec
STATS: 1000 keys handled in 47 seconds 21.02 keys/sec
STATS: 1000 keys handled in 44 seconds 22.63 keys/sec
STATS: 1000 keys handled in 42 seconds 23.64 keys/sec
STATS: 1000 keys handled in 43 seconds 22.88 keys/sec
STATS: 1000 keys handled in 45 seconds 22.12 keys/sec
STATS: 1000 keys handled in 43 seconds 22.83 keys/sec
STATS: 1000 keys handled in 43 seconds 23.11 keys/sec
Of course with 800M records to import a performance of 20 keys/sec is
not useful, plus as time goes on having an insert rate at that level is
going to be problematic.
Questions -
Is there additional things to change for imports and datasets on this
scale?
Is there a way to get additional debugging to see where the
performance issues are?
Thanks,
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com