For the mailing list's reference, this issue has been resolved by the following:

* Increase pb_backlog to 256 in the Riak app.config on all nodes
* Increase +zdbbl to 96000 in the Riak vm.args on all nodes
* Switch proxies from tengine (patched nginx) to HAProxy
* Reduce ring size from 256 to 128, since there are 8 nodes (with a
plan to expand later, but not beyond 16 nodes). Node memory increased
to 32GB from 14GB per node
* Tune leveldb memory usage according to this spreadsheet:
https://github.com/basho/basho_docs/raw/master/source/data/leveldb_sizing_1.4.xls
* Increase net.core.somaxconn to 40000 (all other Linux tunings were
present from http://docs.basho.com/riak/latest/ops/tuning/linux/)

--
Luke Bakken
CSE
lbak...@basho.com


On Tue, Apr 1, 2014 at 9:32 PM, Stanislav Vlasov
<stanislav....@gmail.com> wrote:
>
> Hello!
>
> I have 8x cluster of riak+riak-cs on debian. Config templates attached
> Versions:
> ii  riak                            1.4.8-1
> amd64        Riak is a distributed data store
> ii  riak-cs                         1.4.5-1
> amd64        Riak CS
>
> Every riak-cs connect to local node. Between clients and riak-cs exist
> frontend (Tengine version: Tengine/1.5.1 (nginx/1.2.9)), config
> attached
> Clients - s3cmd + some numbers of php (read-only)
>
> When 1-3 clients wants write to riak-cs, write speed is near 3-4MB/sec.
> If 30-40 clients wants write, write speed slow down to lower than 100kB/sec.
>
> In riak-cs crash.log:
>
> 2014-04-02 03:52:11 =ERROR REPORT====
> webmachine error:
> path="/buckets/test/objects/win.img/uploads/PuqEyz0BRCCk6rDxtH7tRQ=="
> {error,{error,{badmatch,{error,closed}},[{webmachine_request,recv_unchunked_body,3,[{file,"src/webmachine_request.erl"},{line,474}]},{webmachine_request,call,2,[{file,"src/webmachine_request.erl"},{line,193}]},{wrq,stream_req_body,2,[{file,"src/wrq.erl"},{line,121}]},{riak_cs_wm_object_upload_part,accept_body,2,[{file,"src/riak_cs_wm_object_upload_part.erl"},{line,235}]},{riak_cs_wm_common,accept_body,2,[{file,"src/riak_cs_wm_common.erl"},{line,337}]},{webmachine_resource,resource_call,3,[{file,"src/webmachine_resource.erl"},{line,186}]},{webmachine_resource,do,3,[{file,"src/webmachine_resource.erl"},{line,142}]},{webmachine_decision_core,resource_call,1,[{file,"src/webmachine_decision_core.erl"},{line,48}]}]}}
> [{webmachine_request,recv_unchunked_body,3,[{file,"src/webmachine_request.erl"},{line,474}]},{webmachine_request,call,2,[{file,"src/webmachine_request.erl"},{line,193}]},{wrq,stream_req_body,2,[{file,"src/wrq.erl"},{line,121}]},{riak_cs_wm_object_upload_part,accept_body,2,[{file,"src/riak_cs_wm_object_upload_part.erl"},{line,235}]},{riak_cs_wm_common,accept_body,2,[{file,"src/riak_cs_wm_common.erl"},{line,337}]},{webmachine_resource,resource_call,3,[{file,"src/webmachine_resource.erl"},{line,186}]},{webmachine_resource,do,3,[{file,"src/webmachine_resource.erl"},{line,142}]},{webmachine_decision_core,resource_call,1,[{file,"src/webmachine_decision_core.erl"},{line,48}]}]
>
> After this event s3cmd makes throttling to slower speed:
>
> $ s3cmd put win.img s3://test/
> win.img -> s3://test/win.img  [part 1 of 1366, 15MB]
>    184320 of 15728640     1% in    0s     2.16 MB/s  failed
> WARNING: Upload failed:
> /win.img?partNumber=1&uploadId=PuqEyz0BRCCk6rDxtH7tRQ== ([Errno 104]
> Connection reset by peer)
> WARNING: Retrying on lower speed (throttle=0.00)
> WARNING: Waiting 3 sec...
> win.img -> s3://test/win.img  [part 1 of 1366, 15MB]
>  13799424 of 15728640    87% in    2s     5.18 MB/s  failed
> WARNING: Upload failed:
> /win.img?partNumber=1&uploadId=PuqEyz0BRCCk6rDxtH7tRQ== ([Errno 104]
> Connection reset by peer)
> WARNING: Retrying on lower speed (throttle=0.01)
> WARNING: Waiting 6 sec...
> win.img -> s3://test/win.img  [part 1 of 1366, 15MB]
>    167936 of 15728640     1% in    0s   249.46 kB/s  failed
> WARNING: Upload failed:
> /win.img?partNumber=1&uploadId=PuqEyz0BRCCk6rDxtH7tRQ== ([Errno 104]
> Connection reset by peer)
> WARNING: Retrying on lower speed (throttle=0.05)
> WARNING: Waiting 9 sec...
> win.img -> s3://test/win.img  [part 1 of 1366, 15MB]
>   6225920 of 15728640    39% in   76s    79.51 kB/s  failed
> WARNING: Upload failed:
> /win.img?partNumber=1&uploadId=PuqEyz0BRCCk6rDxtH7tRQ== ([Errno 104]
> Connection reset by peer)
> WARNING: Retrying on lower speed (throttle=0.25)
> WARNING: Waiting 12 sec...
> win.img -> s3://test/win.img  [part 1 of 1366, 15MB]
>  15728640 of 15728640   100% in  962s    15.96 kB/s  done
>
> I think, even on 1Gbit network betwen nodes, write speed should be
> higher, but i don't understand where the bottleneck.
>
> --
> Stanislav
>
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to