Hi, Sorry for my long silence. Kazhuiro thank you for your answer. The situation is more clear. I think that I need to extend my Riak cluster with more nodes to increase performance. The reason for my opinion is:
Nov 19 11:42:40 localhost haproxy[24678]: 172.18.103.31:49608 [19/Nov/2015:11:42:40.137] riak riak_backend/viper 3/5/113 1471 -- 8191/2594/2594/138/0 0/0 Nov 19 11:42:41 localhost haproxy[24678]: 172.18.108.170:44517 [19/Nov/2015:11:41:42.264] riak riak_backend/serpent 1/0/58806 5982 cD 8191/2849/2849/155/0 0/0 Nov 19 11:59:46 localhost haproxy[24678]: 172.18.102.39:42919 [19/Nov/2015:11:42:14.566] riak riak_backend/mussurana 1/0/1052250 1484789 -- 3134/2888/2888/154/0 0/0 Nov 19 12:07:26 localhost haproxy[24678]: 172.18.40.2:44946 [19/Nov/2015:11:42:14.508] riak riak_backend/rattler 1/0/1511814 2471638 cD 3172/2888/2888/161/0 0/0 Nov 19 12:17:56 localhost haproxy[24678]: 172.18.103.30:58654 [19/Nov/2015:11:42:40.141] riak riak_backend/mamba 3/1/2116572 3383878 cD 2988/2886/2886/166/0 0/0 Nov 19 12:23:55 localhost haproxy[24678]: 172.18.40.4:59089 [19/Nov/2015:11:41:39.831] riak riak_backend/eggeater 1/0/2535854 4109579 CD 3020/2888/2888/153/0 0/0 Nov 19 12:38:54 localhost haproxy[24678]: 172.18.40.4:37536 [19/Nov/2015:11:41:47.533] riak riak_backend/cobra 1/0/3427457 3387298 -- 2983/2886/2886/159/0 0/0 Nov 19 12:50:37 localhost haproxy[24678]: 172.18.102.39:51870 [19/Nov/2015:11:41:49.413] riak riak_backend/lora 1/0/4128262 6445878 -- 2989/2889/2889/164/0 0/0 I think that it is not haproxy's timeouts issue. Am I right? Regarding to HAProxy config I have the following config for Riak pb: frontend riak bind 172.18.108.170:8087 mode tcp option tcplog option contstats timeout client 30s default_backend riak_backend backend riak_backend mode tcp balance roundrobin option tcpka option srvtcpka option httpchk GET /ping timeout server 60s server rinkhals rinkhals.pleiad.uaprom:8087 weight 1 maxconn 1024 check port 8090 server chuckwalla chuckwalla.pleiad.uaprom:8087 weight 1 maxconn 1024 check port 8090 and so on... And config for Riak CS: frontend riakcs bind 193.34.169.1:80 mode http option contstats option httplog option http-server-close timeout client 30s default_backend riakcs_backend backend riakcs_backend mode http balance roundrobin option httpchk GET /riak-cs/ping option redispatch retries 3 timeout server 60s timeout connect 60s timeout http-request 60s server rinkhals rinkhals.pleiad.uaprom:8080 weight 1 maxconn 1024 check port 8000 server chuckwalla chuckwalla.pleiad.uaprom:8080 weight 1 maxconn 1024 check port 8000 and so on... And default section: defaults log global option dontlognull retries 3 option redispatch maxconn 8192 timeout connect 5000 timeout client 4h timeout server 4h balance leastconn I want to mention again taht I have about 1000 rps to Riak CS, average object size is 10 Kb. Thank you. On Mon, Nov 16, 2015 at 4:01 AM Kazuhiro Suzuki <k...@basho.com> wrote: > Hi, > > ha_proxy's timeout settings often causes disconnected errors on a Riak > CS deployment by high work load. termination_stat [1] in tcplog [2] > lets you know if timeout happens or not. > > > 2015-11-13 13:13:09.514 [error] > <0.11264.1387>@riak_cs_wm_common:maybe_create_user:222 Retrieval of user > record for s3 failed. Reason: disconnected > > This means Riak CS failed to read a user data from Riak for > authentication due to a disconnected error. > > > Riak CS adds, removes, gets properties through Stanchion service. Am I > right? I can't exactly understand where is my bottleneck - Riak, Riak CS or > Stanchion. > > Mainly Stanchion is only used to update/delete data of users and > buckets. To inspect a node, Riak S2/CS 2.1 introduced new metrics > including various latencies and counters, which help to identify > bottleneck. > > > When we need authenticated access for reading object from bucket do we > need Stanchion? If not I can't understand why I had a lot of error during > getting objects from Riak CS. > > Authenticated access is always necessary but a read request of user > data for auth is issued from Riak CS to Riak directly, not through > Stanchion. > > > P. S. Sometimes when there is some issues with Riak CS - Stanchion > connectivity I need to restart Riak CS. > > Riak CS 1.5.0 has connection pool leak problem [3]. You might hit the > issue... > > [1]: https://cbonte.github.io/haproxy-dconv/configuration-1.5.html#8.5 > [2]: https://cbonte.github.io/haproxy-dconv/configuration-1.5.html#8.2.2 > [3]: > http://docs.basho.com/riakcs/latest/cookbooks/Riak-CS-Release-Notes/#Riak-CS-1-5-2 > > On Sat, Nov 14, 2015 at 2:04 AM, Vladyslav Zakhozhai > <v.zakhoz...@smartweb.com.ua> wrote: > > > > Hello. > > > > I have Riak CS cluster with 18 nodes. On each node there is Riak CS and > Riak > > service and one Stanchion node. > > > > Versions: > > Riak 1.4.12 > > Riak CS 1.5.0 > > Stanchion 1.5.0 > > > > Riak CS and Riak allocated behind HAProxy balancers: > > > > WAN -> HAProxy -> Riak CS nodes -> HAProxy -> Riak nodes. > > ans > > Stanchion -> HAProxy -> Riak > > > > Today due a spike of traffic load (about 1000 rps) on the cluster 50% of > > Riak CS returned HTTP 500 and 503 (querying /riak-cs/ping resource also > was > > not successful). > > > > In Riak CS logs I've seen the following messages: > > > > 2015-11-13 13:13:09.514 [error] > > <0.11264.1387>@riak_cs_wm_common:maybe_create_user:222 Retrieval of user > > record for s3 failed. Reason: disconnected > > > > In Riak CS logs I see the following: > > 2015-11-13 17:31:52.995 [error] <0.11254.6534> Lager event handler > > error_logger_lager_h exited with reason > > > {'EXIT',{{badmatch,["/buckets/uaprom-image/objects/272547384_cid1322007_pid183135512-26a7c1f3.jpg",{error,{error,{badmatch,{error,closed}},[{webmachine_request,recv_unchunked_body,3,[{file,"src/webmachine_request.erl"},{line,471}]},{webmachine_request,call,2,[{file,"src/webmachine_request.erl"},{line,193}]},{wrq,stream_req_body,2,[{file,"src/wrq.erl"},{line,121}]},{riak_cs_wm_object,handle_normal_put,2,[{file,"src/riak_cs_wm_object.erl"},{line,341}]},{riak_cs_wm_common,accept_body,2,[{file,...},...]},...]}},...]},...}} > > > > I suspect that there were problem between Riak CS - Stanhion or Stanhion > - > > Riak. I have no clear idea in Stanchion troubleshooting. The main reason > is > > the following. Stanhion works fine, service is up (answers on ping > command). > > But it is very laconic: there is almost nothing in console and error logs > > (even with debug log level). > > > > Riak CS adds, removes, gets properties through Stanchion service. Am I > > right? I can't exactly understand where is my bottleneck - Riak, Riak CS > or > > Stanhion. > > > > When we need authenticated access for reading object from bucket do we > need > > Stanchion? If not I can't understand why I had a lot of error during > getting > > objects from Riak CS. > > > > Thank you in advance. > > > > P. S. Sometimes when there is some issues with Riak CS - Stanchion > > connectivity I need to restart Riak CS. > > > > > > _______________________________________________ > > riak-users mailing list > > riak-users@lists.basho.com > > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > > > > > -- > Kazuhiro Suzuki | Basho Japan KK >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com