Hi,
Sorry for the delay, I've spent a lot of time trying to understand if
the problem was elsewhere.
I've simplified my infrastructure and got a simple layout that don't
rely anymore on loadbalancer and also corrected some minor performance
issue on my workers.
At the moment, i have up to 32 workers that are calling riak for writes,
each of them are set to :
w=1
dw=0
timeout=1000
using protobuf
a timeouted attempt is rerun 180s later
From my application server perspective, 23% of the calls are rejected
by timeout (75446 tries, 57564 success, 17578 timeout).
Here is a sample riak-admin stat for one of my 5 hosts:
node_put_fsm_time_100 : 999331
node_put_fsm_time_95 : 773682
node_put_fsm_time_99 : 959444
node_put_fsm_time_mean : 156242
node_put_fsm_time_median : 20235
vnode_put_fsm_time_100 : 5267527
vnode_put_fsm_time_95 : 2437457
vnode_put_fsm_time_99 : 4819538
vnode_put_fsm_time_mean : 175567
vnode_put_fsm_time_median : 6928
I am using leveldb, so i can't tune bitcask backend as suggested.
I've changed the vmdirty settings and enabled them:
admin@riak1:~$ sudo sysctl -a | grep dirtyvm.dirty_background_ratio = 0
vm.dirty_background_bytes = 209715200
vm.dirty_ratio = 40
vm.dirty_bytes = 0
vm.dirty_writeback_centisecs = 100
vm.dirty_expire_centisecs = 200
I've seen less idle time between writes, iostat is showing near constant
writes between 20 and 500 kb/s, with some surges around 4000 kb/s.
That's better, but not that great.
Here is the current configuration for my "activity_fr" bucket type and
"tweet" bucket:
admin@riak1:~$ http localhost:8098/types/activity_fr/props
HTTP/1.1 200 OK
Content-Encoding: gzip
Content-Length: 314
Content-Type: application/json
Date: Tue, 03 May 2016 14:30:21 GMT
Server: MochiWeb/1.1 WebMachine/1.10.8 (that head fake, tho)
Vary: Accept-Encoding
{
"props": {
"active": true,
"allow_mult": false,
"basic_quorum": false,
"big_vclock": 50,
"chash_keyfun": {
"fun": "chash_std_keyfun",
"mod": "riak_core_util"
},
"claimant": "r...@riak2.lighthouse-analytics.co",
"dvv_enabled": false,
"dw": "quorum",
"last_write_wins": true,
"linkfun": {
"fun": "mapreduce_linkfun",
"mod": "riak_kv_wm_link_walker"
},
"n_val": 3,
"notfound_ok": true,
"old_vclock": 86400,
"postcommit": [],
"pr": 0,
"precommit": [],
"pw": 0,
"r": "quorum",
"rw": "quorum",
"search_index": "activity_fr.20160422104506",
"small_vclock": 50,
"w": "quorum",
"young_vclock": 20
}
}
admin@riak1:~$ http localhost:8098/types/activity_fr/buckets/tweet/props
HTTP/1.1 200 OK
Content-Encoding: gzip
Content-Length: 322
Content-Type: application/json
Date: Tue, 03 May 2016 14:30:02 GMT
Server: MochiWeb/1.1 WebMachine/1.10.8 (that head fake, tho)
Vary: Accept-Encoding
{
"props": {
"active": true,
"allow_mult": false,
"basic_quorum": false,
"big_vclock": 50,
"chash_keyfun": {
"fun": "chash_std_keyfun",
"mod": "riak_core_util"
},
"claimant": "r...@riak2.lighthouse-analytics.co",
"dvv_enabled": false,
"dw": "quorum",
"last_write_wins": true,
"linkfun": {
"fun": "mapreduce_linkfun",
"mod": "riak_kv_wm_link_walker"
},
"n_val": 3,
"name": "tweet",
"notfound_ok": true,
"old_vclock": 86400,
"postcommit": [],
"pr": 0,
"precommit": [],
"pw": 0,
"r": "quorum",
"rw": "quorum",
"search_index": "activity_fr.20160422104506",
"small_vclock": 50,
"w": "quorum",
"young_vclock": 20
}
}
I really don't know what to do. Can you help ?
Guillaume
On 02/05/2016 17:53, Luke Bakken wrote:
Guillaume -
Some colleagues had me carefully re-read those stats. You'll notice
that those "put" stats are only for consistent or write_once
operations, so they don't apply to you.
Your read stats show objects well within Riak's recommended object size:
node_get_fsm_objsize_100 : 10916
node_get_fsm_objsize_95 : 7393
node_get_fsm_objsize_99 : 8845
node_get_fsm_objsize_mean : 4098
node_get_fsm_objsize_median : 3891
So that is not the issue.
Are you using Bitcask? If so, please apply these sysctl settings:
http://docs.basho.com/riak/kv/2.1.4/using/performance/#optional-i-o-settings
If you are using the default "vm.dirty_*" settings Linux will appear
to pause as it flushes disk buffers to the underlying device. The
settings in the document change this so that flushes happen more often
and asynchronously.
--
Luke Bakken
Engineer
lbak...@basho.com
On Mon, May 2, 2016 at 8:43 AM, Guillaume Boddaert
<guilla...@lighthouse-analytics.co> wrote:
Here we go for a complete round of my hosts, all are objsize : 0
Here is a sample answer (headers only, that are followed by the full set of
JSON content) from the RIAK5 host
HTTP/1.1 200 OK
X-Riak-Vclock: a85hYGBgzGDKBVI8xTxKnGbpn7QYuPafyWBKZMxjZXjyYfYFviwA
Vary: Accept-Encoding
Server: MochiWeb/1.1 WebMachine/1.10.8 (that head fake, tho)
Link: </buckets/twitter>; rel="up"
Last-Modified: Mon, 02 May 2016 15:40:20 GMT
ETag: "2l2QODpewyBZQFqDnyEy3F"
Date: Mon, 02 May 2016 15:40:20 GMT
Content-Type: application/json
Content-Length: 10722
Below the riak-admin status output.
admin@riak1:~$ sudo riak-admin status | grep -e 'objsize' | grep 'put'
consistent_put_objsize_100 : 0
consistent_put_objsize_95 : 0
consistent_put_objsize_99 : 0
consistent_put_objsize_mean : 0
consistent_put_objsize_median : 0
write_once_put_objsize_100 : 0
write_once_put_objsize_95 : 0
write_once_put_objsize_99 : 0
write_once_put_objsize_mean : 0
write_once_put_objsize_median : 0
admin@riak2:~$ sudo riak-admin status | grep -e 'objsize' | grep 'put'
consistent_put_objsize_100 : 0
consistent_put_objsize_95 : 0
consistent_put_objsize_99 : 0
consistent_put_objsize_mean : 0
consistent_put_objsize_median : 0
write_once_put_objsize_100 : 0
write_once_put_objsize_95 : 0
write_once_put_objsize_99 : 0
write_once_put_objsize_mean : 0
write_once_put_objsize_median : 0
admin@riak3:~$ sudo riak-admin status | grep -e 'objsize' | grep 'put'
consistent_put_objsize_100 : 0
consistent_put_objsize_95 : 0
consistent_put_objsize_99 : 0
consistent_put_objsize_mean : 0
consistent_put_objsize_median : 0
write_once_put_objsize_100 : 0
write_once_put_objsize_95 : 0
write_once_put_objsize_99 : 0
write_once_put_objsize_mean : 0
write_once_put_objsize_median : 0
admin@riak4:~$ sudo riak-admin status | grep -e 'objsize' | grep 'put'
consistent_put_objsize_100 : 0
consistent_put_objsize_95 : 0
consistent_put_objsize_99 : 0
consistent_put_objsize_mean : 0
consistent_put_objsize_median : 0
write_once_put_objsize_100 : 0
write_once_put_objsize_95 : 0
write_once_put_objsize_99 : 0
write_once_put_objsize_mean : 0
write_once_put_objsize_median : 0
admin@riak5:~$ sudo riak-admin status | grep -e 'objsize' | grep 'put'
consistent_put_objsize_100 : 0
consistent_put_objsize_95 : 0
consistent_put_objsize_99 : 0
consistent_put_objsize_mean : 0
consistent_put_objsize_median : 0
write_once_put_objsize_100 : 0
write_once_put_objsize_95 : 0
write_once_put_objsize_99 : 0
write_once_put_objsize_mean : 0
write_once_put_objsize_median : 0
On 02/05/2016 17:32, Luke Bakken wrote:
Could you please check the objsize stats on every Riak node? If they
are all zero then ... ????
--
Luke Bakken
Engineer
lbak...@basho.com
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com