For the mailing list's reference, this issue has been resolved by the following:
* Increase pb_backlog to 256 in the Riak app.config on all nodes * Increase +zdbbl to 96000 in the Riak vm.args on all nodes * Switch proxies from tengine (patched nginx) to HAProxy * Reduce ring size from 256 to 128, since there are 8 nodes (with a plan to expand later, but not beyond 16 nodes). Node memory increased to 32GB from 14GB per node * Tune leveldb memory usage according to this spreadsheet: https://github.com/basho/basho_docs/raw/master/source/data/leveldb_sizing_1.4.xls * Increase net.core.somaxconn to 40000 (all other Linux tunings were present from http://docs.basho.com/riak/latest/ops/tuning/linux/) -- Luke Bakken CSE lbak...@basho.com On Tue, Apr 1, 2014 at 9:32 PM, Stanislav Vlasov <stanislav....@gmail.com> wrote: > > Hello! > > I have 8x cluster of riak+riak-cs on debian. Config templates attached > Versions: > ii riak 1.4.8-1 > amd64 Riak is a distributed data store > ii riak-cs 1.4.5-1 > amd64 Riak CS > > Every riak-cs connect to local node. Between clients and riak-cs exist > frontend (Tengine version: Tengine/1.5.1 (nginx/1.2.9)), config > attached > Clients - s3cmd + some numbers of php (read-only) > > When 1-3 clients wants write to riak-cs, write speed is near 3-4MB/sec. > If 30-40 clients wants write, write speed slow down to lower than 100kB/sec. > > In riak-cs crash.log: > > 2014-04-02 03:52:11 =ERROR REPORT==== > webmachine error: > path="/buckets/test/objects/win.img/uploads/PuqEyz0BRCCk6rDxtH7tRQ==" > {error,{error,{badmatch,{error,closed}},[{webmachine_request,recv_unchunked_body,3,[{file,"src/webmachine_request.erl"},{line,474}]},{webmachine_request,call,2,[{file,"src/webmachine_request.erl"},{line,193}]},{wrq,stream_req_body,2,[{file,"src/wrq.erl"},{line,121}]},{riak_cs_wm_object_upload_part,accept_body,2,[{file,"src/riak_cs_wm_object_upload_part.erl"},{line,235}]},{riak_cs_wm_common,accept_body,2,[{file,"src/riak_cs_wm_common.erl"},{line,337}]},{webmachine_resource,resource_call,3,[{file,"src/webmachine_resource.erl"},{line,186}]},{webmachine_resource,do,3,[{file,"src/webmachine_resource.erl"},{line,142}]},{webmachine_decision_core,resource_call,1,[{file,"src/webmachine_decision_core.erl"},{line,48}]}]}} > [{webmachine_request,recv_unchunked_body,3,[{file,"src/webmachine_request.erl"},{line,474}]},{webmachine_request,call,2,[{file,"src/webmachine_request.erl"},{line,193}]},{wrq,stream_req_body,2,[{file,"src/wrq.erl"},{line,121}]},{riak_cs_wm_object_upload_part,accept_body,2,[{file,"src/riak_cs_wm_object_upload_part.erl"},{line,235}]},{riak_cs_wm_common,accept_body,2,[{file,"src/riak_cs_wm_common.erl"},{line,337}]},{webmachine_resource,resource_call,3,[{file,"src/webmachine_resource.erl"},{line,186}]},{webmachine_resource,do,3,[{file,"src/webmachine_resource.erl"},{line,142}]},{webmachine_decision_core,resource_call,1,[{file,"src/webmachine_decision_core.erl"},{line,48}]}] > > After this event s3cmd makes throttling to slower speed: > > $ s3cmd put win.img s3://test/ > win.img -> s3://test/win.img [part 1 of 1366, 15MB] > 184320 of 15728640 1% in 0s 2.16 MB/s failed > WARNING: Upload failed: > /win.img?partNumber=1&uploadId=PuqEyz0BRCCk6rDxtH7tRQ== ([Errno 104] > Connection reset by peer) > WARNING: Retrying on lower speed (throttle=0.00) > WARNING: Waiting 3 sec... > win.img -> s3://test/win.img [part 1 of 1366, 15MB] > 13799424 of 15728640 87% in 2s 5.18 MB/s failed > WARNING: Upload failed: > /win.img?partNumber=1&uploadId=PuqEyz0BRCCk6rDxtH7tRQ== ([Errno 104] > Connection reset by peer) > WARNING: Retrying on lower speed (throttle=0.01) > WARNING: Waiting 6 sec... > win.img -> s3://test/win.img [part 1 of 1366, 15MB] > 167936 of 15728640 1% in 0s 249.46 kB/s failed > WARNING: Upload failed: > /win.img?partNumber=1&uploadId=PuqEyz0BRCCk6rDxtH7tRQ== ([Errno 104] > Connection reset by peer) > WARNING: Retrying on lower speed (throttle=0.05) > WARNING: Waiting 9 sec... > win.img -> s3://test/win.img [part 1 of 1366, 15MB] > 6225920 of 15728640 39% in 76s 79.51 kB/s failed > WARNING: Upload failed: > /win.img?partNumber=1&uploadId=PuqEyz0BRCCk6rDxtH7tRQ== ([Errno 104] > Connection reset by peer) > WARNING: Retrying on lower speed (throttle=0.25) > WARNING: Waiting 12 sec... > win.img -> s3://test/win.img [part 1 of 1366, 15MB] > 15728640 of 15728640 100% in 962s 15.96 kB/s done > > I think, even on 1Gbit network betwen nodes, write speed should be > higher, but i don't understand where the bottleneck. > > -- > Stanislav > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com