Guillaume - You said earlier "My data are stored on an openstack volume that support up to 3000IOPS". There is a likelihood that your write load is exceeding the capacity of your virtual environment, especially if some Riak nodes are sharing physical disk or server infrastructure.
Some suggestions: * If you're not using Riak Search, set "search = off" in riak.conf * Be sure to carefully read and apply all tunings: http://docs.basho.com/riak/kv/2.1.4/using/performance/ * You may wish to increase the memory dedicated to leveldb: http://docs.basho.com/riak/kv/2.1.4/configuring/backend/#leveldb -- Luke Bakken Engineer lbak...@basho.com On Tue, May 3, 2016 at 7:33 AM, Guillaume Boddaert <guilla...@lighthouse-analytics.co> wrote: > Hi, > > Sorry for the delay, I've spent a lot of time trying to understand if the > problem was elsewhere. > I've simplified my infrastructure and got a simple layout that don't rely > anymore on loadbalancer and also corrected some minor performance issue on > my workers. > > At the moment, i have up to 32 workers that are calling riak for writes, > each of them are set to : > w=1 > dw=0 > timeout=1000 > using protobuf > a timeouted attempt is rerun 180s later > > From my application server perspective, 23% of the calls are rejected by > timeout (75446 tries, 57564 success, 17578 timeout). > > Here is a sample riak-admin stat for one of my 5 hosts: > > node_put_fsm_time_100 : 999331 > node_put_fsm_time_95 : 773682 > node_put_fsm_time_99 : 959444 > node_put_fsm_time_mean : 156242 > node_put_fsm_time_median : 20235 > vnode_put_fsm_time_100 : 5267527 > vnode_put_fsm_time_95 : 2437457 > vnode_put_fsm_time_99 : 4819538 > vnode_put_fsm_time_mean : 175567 > vnode_put_fsm_time_median : 6928 > > I am using leveldb, so i can't tune bitcask backend as suggested. > > I've changed the vmdirty settings and enabled them: > admin@riak1:~$ sudo sysctl -a | grep dirtyvm.dirty_background_ratio = 0 > vm.dirty_background_bytes = 209715200 > vm.dirty_ratio = 40 > vm.dirty_bytes = 0 > vm.dirty_writeback_centisecs = 100 > vm.dirty_expire_centisecs = 200 > > I've seen less idle time between writes, iostat is showing near constant > writes between 20 and 500 kb/s, with some surges around 4000 kb/s. That's > better, but not that great. > > Here is the current configuration for my "activity_fr" bucket type and > "tweet" bucket: > > > admin@riak1:~$ http localhost:8098/types/activity_fr/props > HTTP/1.1 200 OK > Content-Encoding: gzip > Content-Length: 314 > Content-Type: application/json > Date: Tue, 03 May 2016 14:30:21 GMT > Server: MochiWeb/1.1 WebMachine/1.10.8 (that head fake, tho) > Vary: Accept-Encoding > { > "props": { > "active": true, > "allow_mult": false, > "basic_quorum": false, > "big_vclock": 50, > "chash_keyfun": { > "fun": "chash_std_keyfun", > "mod": "riak_core_util" > }, > "claimant": "r...@riak2.lighthouse-analytics.co", > "dvv_enabled": false, > "dw": "quorum", > "last_write_wins": true, > "linkfun": { > "fun": "mapreduce_linkfun", > "mod": "riak_kv_wm_link_walker" > }, > "n_val": 3, > "notfound_ok": true, > "old_vclock": 86400, > "postcommit": [], > "pr": 0, > "precommit": [], > "pw": 0, > "r": "quorum", > "rw": "quorum", > "search_index": "activity_fr.20160422104506", > "small_vclock": 50, > "w": "quorum", > "young_vclock": 20 > } > } > > admin@riak1:~$ http localhost:8098/types/activity_fr/buckets/tweet/props > HTTP/1.1 200 OK > Content-Encoding: gzip > Content-Length: 322 > Content-Type: application/json > Date: Tue, 03 May 2016 14:30:02 GMT > Server: MochiWeb/1.1 WebMachine/1.10.8 (that head fake, tho) > Vary: Accept-Encoding > > { > "props": { > "active": true, > "allow_mult": false, > "basic_quorum": false, > "big_vclock": 50, > "chash_keyfun": { > "fun": "chash_std_keyfun", > "mod": "riak_core_util" > }, > "claimant": "r...@riak2.lighthouse-analytics.co", > "dvv_enabled": false, > "dw": "quorum", > "last_write_wins": true, > "linkfun": { > "fun": "mapreduce_linkfun", > "mod": "riak_kv_wm_link_walker" > }, > "n_val": 3, > "name": "tweet", > "notfound_ok": true, > "old_vclock": 86400, > "postcommit": [], > "pr": 0, > "precommit": [], > "pw": 0, > "r": "quorum", > "rw": "quorum", > "search_index": "activity_fr.20160422104506", > "small_vclock": 50, > "w": "quorum", > "young_vclock": 20 > } > } > > I really don't know what to do. Can you help ? > > Guillaume _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com