Hi,

Sorry for the delay, I've spent a lot of time trying to understand if the problem was elsewhere. I've simplified my infrastructure and got a simple layout that don't rely anymore on loadbalancer and also corrected some minor performance issue on my workers.

At the moment, i have up to 32 workers that are calling riak for writes, each of them are set to :
w=1
dw=0
timeout=1000
using protobuf
a timeouted attempt is rerun 180s later

From my application server perspective, 23% of the calls are rejected by timeout (75446 tries, 57564 success, 17578 timeout).

Here is a sample riak-admin stat for one of my 5 hosts:

node_put_fsm_time_100 : 999331
node_put_fsm_time_95 : 773682
node_put_fsm_time_99 : 959444
node_put_fsm_time_mean : 156242
node_put_fsm_time_median : 20235
vnode_put_fsm_time_100 : 5267527
vnode_put_fsm_time_95 : 2437457
vnode_put_fsm_time_99 : 4819538
vnode_put_fsm_time_mean : 175567
vnode_put_fsm_time_median : 6928

I am using leveldb, so i can't tune bitcask backend as suggested.

I've changed the vmdirty settings and enabled them:
admin@riak1:~$ sudo sysctl -a | grep dirtyvm.dirty_background_ratio = 0
vm.dirty_background_bytes = 209715200
vm.dirty_ratio = 40
vm.dirty_bytes = 0
vm.dirty_writeback_centisecs = 100
vm.dirty_expire_centisecs = 200

I've seen less idle time between writes, iostat is showing near constant writes between 20 and 500 kb/s, with some surges around 4000 kb/s. That's better, but not that great.

Here is the current configuration for my "activity_fr" bucket type and "tweet" bucket:


admin@riak1:~$ http localhost:8098/types/activity_fr/props
HTTP/1.1 200 OK
Content-Encoding: gzip
Content-Length: 314
Content-Type: application/json
Date: Tue, 03 May 2016 14:30:21 GMT
Server: MochiWeb/1.1 WebMachine/1.10.8 (that head fake, tho)
Vary: Accept-Encoding
{
    "props": {
        "active": true,
        "allow_mult": false,
        "basic_quorum": false,
        "big_vclock": 50,
        "chash_keyfun": {
            "fun": "chash_std_keyfun",
            "mod": "riak_core_util"
        },
        "claimant": "r...@riak2.lighthouse-analytics.co",
        "dvv_enabled": false,
        "dw": "quorum",
        "last_write_wins": true,
        "linkfun": {
            "fun": "mapreduce_linkfun",
            "mod": "riak_kv_wm_link_walker"
        },
        "n_val": 3,
        "notfound_ok": true,
        "old_vclock": 86400,
        "postcommit": [],
        "pr": 0,
        "precommit": [],
        "pw": 0,
        "r": "quorum",
        "rw": "quorum",
        "search_index": "activity_fr.20160422104506",
        "small_vclock": 50,
        "w": "quorum",
        "young_vclock": 20
    }
}

admin@riak1:~$ http localhost:8098/types/activity_fr/buckets/tweet/props
HTTP/1.1 200 OK
Content-Encoding: gzip
Content-Length: 322
Content-Type: application/json
Date: Tue, 03 May 2016 14:30:02 GMT
Server: MochiWeb/1.1 WebMachine/1.10.8 (that head fake, tho)
Vary: Accept-Encoding

{
    "props": {
        "active": true,
        "allow_mult": false,
        "basic_quorum": false,
        "big_vclock": 50,
        "chash_keyfun": {
            "fun": "chash_std_keyfun",
            "mod": "riak_core_util"
        },
        "claimant": "r...@riak2.lighthouse-analytics.co",
        "dvv_enabled": false,
        "dw": "quorum",
        "last_write_wins": true,
        "linkfun": {
            "fun": "mapreduce_linkfun",
            "mod": "riak_kv_wm_link_walker"
        },
        "n_val": 3,
        "name": "tweet",
        "notfound_ok": true,
        "old_vclock": 86400,
        "postcommit": [],
        "pr": 0,
        "precommit": [],
        "pw": 0,
        "r": "quorum",
        "rw": "quorum",
        "search_index": "activity_fr.20160422104506",
        "small_vclock": 50,
        "w": "quorum",
        "young_vclock": 20
    }
}

I really don't know what to do. Can you help ?

Guillaume


On 02/05/2016 17:53, Luke Bakken wrote:
Guillaume -

Some colleagues had me carefully re-read those stats. You'll notice
that those "put" stats are only for consistent or write_once
operations, so they don't apply to you.

Your read stats show objects well within Riak's recommended object size:

node_get_fsm_objsize_100 : 10916
node_get_fsm_objsize_95 : 7393
node_get_fsm_objsize_99 : 8845
node_get_fsm_objsize_mean : 4098
node_get_fsm_objsize_median : 3891

So that is not the issue.

Are you using Bitcask? If so, please apply these sysctl settings:

http://docs.basho.com/riak/kv/2.1.4/using/performance/#optional-i-o-settings

If you are using the default "vm.dirty_*" settings Linux will appear
to pause as it flushes disk buffers to the underlying device. The
settings in the document change this so that flushes happen more often
and asynchronously.

--
Luke Bakken
Engineer
lbak...@basho.com


On Mon, May 2, 2016 at 8:43 AM, Guillaume Boddaert
<guilla...@lighthouse-analytics.co> wrote:
Here we go for a complete round of my hosts, all are objsize : 0

Here is a sample answer (headers only, that are followed by the full set of
JSON content) from the RIAK5 host

HTTP/1.1 200 OK
X-Riak-Vclock: a85hYGBgzGDKBVI8xTxKnGbpn7QYuPafyWBKZMxjZXjyYfYFviwA
Vary: Accept-Encoding
Server: MochiWeb/1.1 WebMachine/1.10.8 (that head fake, tho)
Link: </buckets/twitter>; rel="up"
Last-Modified: Mon, 02 May 2016 15:40:20 GMT
ETag: "2l2QODpewyBZQFqDnyEy3F"
Date: Mon, 02 May 2016 15:40:20 GMT
Content-Type: application/json
Content-Length: 10722

Below the riak-admin status output.


admin@riak1:~$ sudo riak-admin status | grep -e 'objsize' | grep 'put'
consistent_put_objsize_100 : 0
consistent_put_objsize_95 : 0
consistent_put_objsize_99 : 0
consistent_put_objsize_mean : 0
consistent_put_objsize_median : 0
write_once_put_objsize_100 : 0
write_once_put_objsize_95 : 0
write_once_put_objsize_99 : 0
write_once_put_objsize_mean : 0
write_once_put_objsize_median : 0

admin@riak2:~$ sudo riak-admin status | grep -e 'objsize' | grep 'put'
consistent_put_objsize_100 : 0
consistent_put_objsize_95 : 0
consistent_put_objsize_99 : 0
consistent_put_objsize_mean : 0
consistent_put_objsize_median : 0
write_once_put_objsize_100 : 0
write_once_put_objsize_95 : 0
write_once_put_objsize_99 : 0
write_once_put_objsize_mean : 0
write_once_put_objsize_median : 0

admin@riak3:~$ sudo riak-admin status | grep -e 'objsize' | grep 'put'
consistent_put_objsize_100 : 0
consistent_put_objsize_95 : 0
consistent_put_objsize_99 : 0
consistent_put_objsize_mean : 0
consistent_put_objsize_median : 0
write_once_put_objsize_100 : 0
write_once_put_objsize_95 : 0
write_once_put_objsize_99 : 0
write_once_put_objsize_mean : 0
write_once_put_objsize_median : 0

admin@riak4:~$ sudo riak-admin status | grep -e 'objsize' | grep 'put'

consistent_put_objsize_100 : 0
consistent_put_objsize_95 : 0
consistent_put_objsize_99 : 0
consistent_put_objsize_mean : 0
consistent_put_objsize_median : 0
write_once_put_objsize_100 : 0
write_once_put_objsize_95 : 0
write_once_put_objsize_99 : 0
write_once_put_objsize_mean : 0
write_once_put_objsize_median : 0

admin@riak5:~$ sudo riak-admin status | grep -e 'objsize' | grep 'put'

consistent_put_objsize_100 : 0
consistent_put_objsize_95 : 0
consistent_put_objsize_99 : 0
consistent_put_objsize_mean : 0
consistent_put_objsize_median : 0
write_once_put_objsize_100 : 0
write_once_put_objsize_95 : 0
write_once_put_objsize_99 : 0
write_once_put_objsize_mean : 0
write_once_put_objsize_median : 0

On 02/05/2016 17:32, Luke Bakken wrote:
Could you please check the objsize stats on every Riak node? If they
are all zero then ... ????
--
Luke Bakken
Engineer
lbak...@basho.com


_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to