
Sorry for the delay, I've spent a lot of time trying to understand if the problem was elsewhere. I've simplified my infrastructure and got a simple layout that don't rely anymore on loadbalancer and also corrected some minor performance issue on my workers.

At the moment, i have up to 32 workers that are calling riak for writes, each of them are set to :
using protobuf
a timeouted attempt is rerun 180s later

From my application server perspective, 23% of the calls are rejected by timeout (75446 tries, 57564 success, 17578 timeout).

Here is a sample riak-admin stat for one of my 5 hosts:

node_put_fsm_time_100 : 999331
node_put_fsm_time_95 : 773682
node_put_fsm_time_99 : 959444
node_put_fsm_time_mean : 156242
node_put_fsm_time_median : 20235
vnode_put_fsm_time_100 : 5267527
vnode_put_fsm_time_95 : 2437457
vnode_put_fsm_time_99 : 4819538
vnode_put_fsm_time_mean : 175567
vnode_put_fsm_time_median : 6928

I am using leveldb, so i can't tune bitcask backend as suggested.

I've changed the vmdirty settings and enabled them:
admin@riak1:~$ sudo sysctl -a | grep dirtyvm.dirty_background_ratio = 0
vm.dirty_background_bytes = 209715200
vm.dirty_ratio = 40
vm.dirty_bytes = 0
vm.dirty_writeback_centisecs = 100
vm.dirty_expire_centisecs = 200

I've seen less idle time between writes, iostat is showing near constant writes between 20 and 500 kb/s, with some surges around 4000 kb/s. That's better, but not that great.

Here is the current configuration for my "activity_fr" bucket type and "tweet" bucket:

admin@riak1:~$ http localhost:8098/types/activity_fr/props
HTTP/1.1 200 OK
Content-Encoding: gzip
Content-Length: 314
Content-Type: application/json
Date: Tue, 03 May 2016 14:30:21 GMT
Server: MochiWeb/1.1 WebMachine/1.10.8 (that head fake, tho)
Vary: Accept-Encoding
    "props": {
        "active": true,
        "allow_mult": false,
        "basic_quorum": false,
        "big_vclock": 50,
        "chash_keyfun": {
            "fun": "chash_std_keyfun",
            "mod": "riak_core_util"
        "claimant": "r...@riak2.lighthouse-analytics.co",
        "dvv_enabled": false,
        "dw": "quorum",
        "last_write_wins": true,
        "linkfun": {
            "fun": "mapreduce_linkfun",
            "mod": "riak_kv_wm_link_walker"
        "n_val": 3,
        "notfound_ok": true,
        "old_vclock": 86400,
        "postcommit": [],
        "pr": 0,
        "precommit": [],
        "pw": 0,
        "r": "quorum",
        "rw": "quorum",
        "search_index": "activity_fr.20160422104506",
        "small_vclock": 50,
        "w": "quorum",
        "young_vclock": 20

admin@riak1:~$ http localhost:8098/types/activity_fr/buckets/tweet/props
HTTP/1.1 200 OK
Content-Encoding: gzip
Content-Length: 322
Content-Type: application/json
Date: Tue, 03 May 2016 14:30:02 GMT
Server: MochiWeb/1.1 WebMachine/1.10.8 (that head fake, tho)
Vary: Accept-Encoding

    "props": {
        "active": true,
        "allow_mult": false,
        "basic_quorum": false,
        "big_vclock": 50,
        "chash_keyfun": {
            "fun": "chash_std_keyfun",
            "mod": "riak_core_util"
        "claimant": "r...@riak2.lighthouse-analytics.co",
        "dvv_enabled": false,
        "dw": "quorum",
        "last_write_wins": true,
        "linkfun": {
            "fun": "mapreduce_linkfun",
            "mod": "riak_kv_wm_link_walker"
        "n_val": 3,
        "name": "tweet",
        "notfound_ok": true,
        "old_vclock": 86400,
        "postcommit": [],
        "pr": 0,
        "precommit": [],
        "pw": 0,
        "r": "quorum",
        "rw": "quorum",
        "search_index": "activity_fr.20160422104506",
        "small_vclock": 50,
        "w": "quorum",
        "young_vclock": 20

I really don't know what to do. Can you help ?


On 02/05/2016 17:53, Luke Bakken wrote:
Guillaume -

Some colleagues had me carefully re-read those stats. You'll notice
that those "put" stats are only for consistent or write_once
operations, so they don't apply to you.

Your read stats show objects well within Riak's recommended object size:

node_get_fsm_objsize_100 : 10916
node_get_fsm_objsize_95 : 7393
node_get_fsm_objsize_99 : 8845
node_get_fsm_objsize_mean : 4098
node_get_fsm_objsize_median : 3891

So that is not the issue.

Are you using Bitcask? If so, please apply these sysctl settings:


If you are using the default "vm.dirty_*" settings Linux will appear
to pause as it flushes disk buffers to the underlying device. The
settings in the document change this so that flushes happen more often
and asynchronously.

Luke Bakken

On Mon, May 2, 2016 at 8:43 AM, Guillaume Boddaert
<guilla...@lighthouse-analytics.co> wrote:
Here we go for a complete round of my hosts, all are objsize : 0

Here is a sample answer (headers only, that are followed by the full set of
JSON content) from the RIAK5 host

HTTP/1.1 200 OK
X-Riak-Vclock: a85hYGBgzGDKBVI8xTxKnGbpn7QYuPafyWBKZMxjZXjyYfYFviwA
Vary: Accept-Encoding
Server: MochiWeb/1.1 WebMachine/1.10.8 (that head fake, tho)
Link: </buckets/twitter>; rel="up"
Last-Modified: Mon, 02 May 2016 15:40:20 GMT
ETag: "2l2QODpewyBZQFqDnyEy3F"
Date: Mon, 02 May 2016 15:40:20 GMT
Content-Type: application/json
Content-Length: 10722

Below the riak-admin status output.

admin@riak1:~$ sudo riak-admin status | grep -e 'objsize' | grep 'put'
consistent_put_objsize_100 : 0
consistent_put_objsize_95 : 0
consistent_put_objsize_99 : 0
consistent_put_objsize_mean : 0
consistent_put_objsize_median : 0
write_once_put_objsize_100 : 0
write_once_put_objsize_95 : 0
write_once_put_objsize_99 : 0
write_once_put_objsize_mean : 0
write_once_put_objsize_median : 0

admin@riak2:~$ sudo riak-admin status | grep -e 'objsize' | grep 'put'
consistent_put_objsize_100 : 0
consistent_put_objsize_95 : 0
consistent_put_objsize_99 : 0
consistent_put_objsize_mean : 0
consistent_put_objsize_median : 0
write_once_put_objsize_100 : 0
write_once_put_objsize_95 : 0
write_once_put_objsize_99 : 0
write_once_put_objsize_mean : 0
write_once_put_objsize_median : 0

admin@riak3:~$ sudo riak-admin status | grep -e 'objsize' | grep 'put'
consistent_put_objsize_100 : 0
consistent_put_objsize_95 : 0
consistent_put_objsize_99 : 0
consistent_put_objsize_mean : 0
consistent_put_objsize_median : 0
write_once_put_objsize_100 : 0
write_once_put_objsize_95 : 0
write_once_put_objsize_99 : 0
write_once_put_objsize_mean : 0
write_once_put_objsize_median : 0

admin@riak4:~$ sudo riak-admin status | grep -e 'objsize' | grep 'put'

consistent_put_objsize_100 : 0
consistent_put_objsize_95 : 0
consistent_put_objsize_99 : 0
consistent_put_objsize_mean : 0
consistent_put_objsize_median : 0
write_once_put_objsize_100 : 0
write_once_put_objsize_95 : 0
write_once_put_objsize_99 : 0
write_once_put_objsize_mean : 0
write_once_put_objsize_median : 0

admin@riak5:~$ sudo riak-admin status | grep -e 'objsize' | grep 'put'

consistent_put_objsize_100 : 0
consistent_put_objsize_95 : 0
consistent_put_objsize_99 : 0
consistent_put_objsize_mean : 0
consistent_put_objsize_median : 0
write_once_put_objsize_100 : 0
write_once_put_objsize_95 : 0
write_once_put_objsize_99 : 0
write_once_put_objsize_mean : 0
write_once_put_objsize_median : 0

On 02/05/2016 17:32, Luke Bakken wrote:
Could you please check the objsize stats on every Riak node? If they
are all zero then ... ????
Luke Bakken

riak-users mailing list

Reply via email to