Cluster balance problem
Hello! We have riak cluster for riak-cs. Failed node riak@192.168.0.133 has been replaced via 'riak-admin replace' by riak@192.168.0.141 (new node), then cleaned up, prepared and joined to cluster as new. Member-status after last 'riak-admin cluster commit', when all transfers complete: = Membership == Status RingPendingNode --- valid 8.6% -- 'riak@192.168.0.130' valid 8.2% -- 'riak@192.168.0.131' valid 8.2% -- 'riak@192.168.0.132' valid 8.2% -- 'riak@192.168.0.133' valid 8.2% -- 'riak@192.168.0.134' valid 8.2% -- 'riak@192.168.0.135' valid 8.2% -- 'riak@192.168.0.136' valid 8.2% -- 'riak@192.168.0.137' valid 8.2% -- 'riak@192.168.0.138' valid 8.2% -- 'riak@192.168.0.139' valid 8.2% -- 'riak@192.168.0.140' valid 9.4% -- 'riak@192.168.0.141' --- Valid:12 / Leaving:0 / Exiting:0 / Joining:0 / Down:0 Every node have 3.6T raid. Free place on disk and used %: 192.168.0.130 996G 73% 192.168.0.131 1.2T 69% 192.168.0.132 1.2T 68% 192.168.0.133 1.1T 70% 192.168.0.134 1.1T 70% 192.168.0.135 1.2T 69% 192.168.0.136 1.2T 68% 192.168.0.137 1.2T 69% 192.168.0.138 1.2T 69% 192.168.0.139 1.2T 69% 192.168.0.140 1.2T 68% 192.168.0.141 808G 78% Problem: on 192.168.0.141 very little free space compared to rest, which may affect in merges on this node after uploading some additional data into cluster. Software versions: ii riak 2.1.3-1 amd64 Riak is a distributed data store ii riak-cs 2.0.0-1 amd64 Riak CS Can i rebalance riak without adding/removing nodes or any hardware changes? -- Stanislav ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: s3cmd error: access to bucket was denied
; >> >> > >>> >> >> >> > >>> >> >> >> > >>> >> >> >> > http://docs.basho.com/riakcs/1.4.2/cookbooks/configuration/Configuring-an-S3-Client/#Sample-s3cmd-Configuration-File-for-Production-Use >>> >> >> >> > >>> >> >> >> > There's no "signature_v2" parameter in "s3cfg". However, I >>> >> >> >> > added >>> >> >> >> > this >>> >> >> >> > parameter to "s3cfg" and tried again with same errors. >>> >> >> >> > >>> >> >> >> > >>> >> >> >> > >>> >> >> >> > >>> >> >> >> > On Thu, Aug 20, 2015 at 10:31 PM, Kazuhiro Suzuki >>> >> >> >> > >>> >> >> >> > wrote: >>> >> >> >> >> >>> >> >> >> >> Hi Changmao, >>> >> >> >> >> >>> >> >> >> >> It seems your s3cmd config should include 2 items: >>> >> >> >> >> >>> >> >> >> >> signature_v2 = True >>> >> >> >> >> host_base = api2.cloud-datayes.com >>> >> >> >> >> >>> >> >> >> >> Riak CS requires "signature_v2 = True" since Riak CS has not >>> >> >> >> >> supported >>> >> >> >> >> s3 authentication version 4 yet. >>> >> >> >> >> You can find a sample configuration of s3cmd here to interact >>> >> >> >> >> with >>> >> >> >> >> Riak >>> >> >> >> >> CS >>> >> >> >> >> [1]. >>> >> >> >> >> >>> >> >> >> >> [1]: >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> http://docs.basho.com/riakcs/2.0.1/cookbooks/configuration/Configuring-an-S3-Client/#Sample-s3cmd-Configuration-File-for-Production-Use >>> >> >> >> >> >>> >> >> >> >> Thanks, >>> >> >> >> >> >>> >> >> >> >> On Thu, Aug 20, 2015 at 7:44 PM, changmao wang >>> >> >> >> >> >>> >> >> >> >> wrote: >>> >> >> >> >> > Just now, I used "admin_key" and "admin_secret" from >>> >> >> >> >> > /etc/riak-cs/app.config >>> >> >> >> >> > to run "s3cmd -c s3-stock ls s3://stock/XSHE/0/000600" >>> >> >> >> >> > and I got the below error: >>> >> >> >> >> > ERROR: Access to bucket 'stock' was denied >>> >> >> >> >> > >>> >> >> >> >> > Below is abstract from "/var/log/riak-cs/console.log" >>> >> >> >> >> > 2015-08-20 18:40:22.790 [debug] >>> >> >> >> >> > <0.28085.18>@riak_cs_s3_auth:calculate_signature:129 STS: >>> >> >> >> >> > ["GET","\n",[],"\n",[],"\n","\n",[["x-amz-date",":",<<"Thu, >>> >> >> >> >> > 20 >>> >> >> >> >> > Aug >>> >> >> >> >> > 2015 >>> >> >> >> >> > 10:40:22 +">>,"\n"]],["/stock/",[]]] >>> >> >> >> >> > 2015-08-20 18:40:32.861 [error] >>> >> >> >> >> > <0.28153.18>@riak_cs_wm_common:maybe_create_user:223 >>> >> >> >> >> > Retrieval >>> >> >> >> >> > of >>> >> >> >> >> > user >>> >> >> >> >> > record for s3 failed. Reason: no_user_key >>> >> >> >> >> > 2015-08-20 18:40:32.861 [debug] >>> >> >> >> >> > <0.
Re: s3cmd error: access to bucket was denied
2015-08-20 14:47 GMT+05:00 changmao wang : > what's your meaning of domain name of /etc/riak-cs/app.config and ~/.s3cfg? > I guess it's cs_root_host parameter from /etc/riak-cs/app.config and > host_base from '~/.s3cfg'. > If so, there're same as "api2.cloud-datayes.com". Yes, is that i mean, but i see, it is not your case Try to set {level, debug} in lager_file_backend section for console.log. > However, I can not ping this host from localhost. It's ok, if you write proper proxy_host and proxy_port in .s3cfg > On Thu, Aug 20, 2015 at 5:23 PM, Stanislav Vlasov > wrote: >> >> 2015-08-20 13:57 GMT+05:00 changmao wang : >> > somebody watching on this? >> >> Do you set up same domain in riak-cs.conf and in .s3cfg? >> I got such error in this case. >> >> > On Wed, Aug 19, 2015 at 9:01 AM, changmao wang >> > wrote: >> >> >> >> Matthew, >> >> >> >> I used s3cmd --configure to generate ".s3cfg" config file and then >> >> access >> >> RIAK service by s3cmd. >> >> The access_key and secret_key from ".s3cfg" is same as admin_key and >> >> admin_secret from "/etc/riak-cs/app.config". >> >> >> >> However, I got error as below using s3cmd to access one bucket. >> >> >> >> root@cluster-s3-hd1:~# s3cmd -c /root/.s3cfg ls >> >> s3://pipeline/article/111.pdf >> >> ERROR: Access to bucket 'pipeline' was denied >> >> >> >> By the way, I used Riak and Riak-CS 1.4.2 on Ubuntu. Current production >> >> cluster is a legacy system without documents for co-workers. >> >> >> >> Attached file is "s3cfg" generated by "s3cmd --configure". >> >> -- >> >> Amao Wang >> >> Best & Regards >> > >> > >> > >> > >> > -- >> > Amao Wang >> > Best & Regards >> > >> > ___ >> > riak-users mailing list >> > riak-users@lists.basho.com >> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> > >> >> >> >> -- >> Stanislav > > > > > -- > Amao Wang > Best & Regards -- Stanislav ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: s3cmd error: access to bucket was denied
2015-08-20 13:57 GMT+05:00 changmao wang : > somebody watching on this? Do you set up same domain in riak-cs.conf and in .s3cfg? I got such error in this case. > On Wed, Aug 19, 2015 at 9:01 AM, changmao wang > wrote: >> >> Matthew, >> >> I used s3cmd --configure to generate ".s3cfg" config file and then access >> RIAK service by s3cmd. >> The access_key and secret_key from ".s3cfg" is same as admin_key and >> admin_secret from "/etc/riak-cs/app.config". >> >> However, I got error as below using s3cmd to access one bucket. >> >> root@cluster-s3-hd1:~# s3cmd -c /root/.s3cfg ls >> s3://pipeline/article/111.pdf >> ERROR: Access to bucket 'pipeline' was denied >> >> By the way, I used Riak and Riak-CS 1.4.2 on Ubuntu. Current production >> cluster is a legacy system without documents for co-workers. >> >> Attached file is "s3cfg" generated by "s3cmd --configure". >> -- >> Amao Wang >> Best & Regards > > > > > -- > Amao Wang > Best & Regards > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > -- Stanislav ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: riak 2.0 + riak-cs 2.0 trouble
2015-04-17 17:31 GMT+05:00 John Daily : > Thanks, that helps. You’re right that the documentation is a bit buggy, or at > least incomplete. > The problem is that the example provided in the docs is just a snippet. To > make a fully-functional advanced.config file requires a bit more syntactical > structure. > You’ll need to wrap what you provided in an Erlang array, with square > brackets and a period terminating it. > See http://pastebin.com/90gh6amg. Thank you. Please, add template for this config to next riak package. > -John > > On Apr 17, 2015, at 8:24 AM, Stanislav Vlasov wrote: > >> 2015-04-17 17:04 GMT+05:00 John Daily : >>> Unfortunately it’s very easy to introduce syntax errors into Erlang >>> configuration files (and tricky to diagnose them without Erlang >>> experience), which is why we’re moving toward the newer sysctl-style files >>> like riak.conf. >>> >>> The example in the documentation looks ok; can we see a copy of your >>> advanced.config file? Please redact any sensitive information, and I’d >>> suggest Pastebin or a GitHub gist. >> >> It was copied from site by cut-n-paste, no any additions: >> http://pastebin.com/Hyv3tvMS >> >> Last comma in line before second '%%' removed by me. >> >>> -John >>> >>> On Apr 17, 2015, at 7:57 AM, Stanislav Vlasov >>> wrote: >>> >>>> I have troubles setting up a test riak node for riak-cs. Here's how to >>>> reproduce my problem: >>>> >>>> 1) install on Debian 7 riak 2.0.5 and riak 2.0.0 from apt repository >>>> as in >>>> http://docs.basho.com/riak/latest/ops/building/installing/debian-ubuntu/ >>>> and >>>> http://docs.basho.com/riakcs/latest/cookbooks/installing/Installing-Riak-CS/ >>>> 2) create advanced.config in /etc/riak as in >>>> http://docs.basho.com/riakcs/latest/cookbooks/configuration/Configuring-Riak/#Setting-up-the-Proper-Riak-Backend >>>> >>>> After that I get an error in advanced.config >>>> >>>> last lines of 'riak config generate -l debug': >>>> 10:54:47.488 [info] /etc/riak/advanced.config detected, overlaying >>>> proplists >>>> 10:54:47.488 [error] Error parsing /etc/riak/advanced.config: 17: >>>> syntax error before: ']' >>>> >>>> If i remove last comma in advanced.config, I get another error: >>>> >>>> 10:58:21.398 [info] /etc/riak/advanced.config detected, overlaying >>>> proplists >>>> 10:58:21.399 [error] Error parsing /etc/riak/advanced.config: 17: >>>> syntax error before: >>>> >>>> I think, it is a bug either in documentation or in config generator >>>> >>>> -- >>>> Stanislav >>>> >>>> ___ >>>> riak-users mailing list >>>> riak-users@lists.basho.com >>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >>> >> >> >> >> -- >> Stanislav > -- Stanislav ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: riak 2.0 + riak-cs 2.0 trouble
2015-04-17 17:04 GMT+05:00 John Daily : > Unfortunately it’s very easy to introduce syntax errors into Erlang > configuration files (and tricky to diagnose them without Erlang experience), > which is why we’re moving toward the newer sysctl-style files like riak.conf. > > The example in the documentation looks ok; can we see a copy of your > advanced.config file? Please redact any sensitive information, and I’d > suggest Pastebin or a GitHub gist. It was copied from site by cut-n-paste, no any additions: http://pastebin.com/Hyv3tvMS Last comma in line before second '%%' removed by me. > -John > > On Apr 17, 2015, at 7:57 AM, Stanislav Vlasov wrote: > >> I have troubles setting up a test riak node for riak-cs. Here's how to >> reproduce my problem: >> >> 1) install on Debian 7 riak 2.0.5 and riak 2.0.0 from apt repository >> as in >> http://docs.basho.com/riak/latest/ops/building/installing/debian-ubuntu/ >> and >> http://docs.basho.com/riakcs/latest/cookbooks/installing/Installing-Riak-CS/ >> 2) create advanced.config in /etc/riak as in >> http://docs.basho.com/riakcs/latest/cookbooks/configuration/Configuring-Riak/#Setting-up-the-Proper-Riak-Backend >> >> After that I get an error in advanced.config >> >> last lines of 'riak config generate -l debug': >> 10:54:47.488 [info] /etc/riak/advanced.config detected, overlaying proplists >> 10:54:47.488 [error] Error parsing /etc/riak/advanced.config: 17: >> syntax error before: ']' >> >> If i remove last comma in advanced.config, I get another error: >> >> 10:58:21.398 [info] /etc/riak/advanced.config detected, overlaying proplists >> 10:58:21.399 [error] Error parsing /etc/riak/advanced.config: 17: >> syntax error before: >> >> I think, it is a bug either in documentation or in config generator >> >> -- >> Stanislav >> >> ___ >> riak-users mailing list >> riak-users@lists.basho.com >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > -- Stanislav ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
riak 2.0 + riak-cs 2.0 trouble
I have troubles setting up a test riak node for riak-cs. Here's how to reproduce my problem: 1) install on Debian 7 riak 2.0.5 and riak 2.0.0 from apt repository as in http://docs.basho.com/riak/latest/ops/building/installing/debian-ubuntu/ and http://docs.basho.com/riakcs/latest/cookbooks/installing/Installing-Riak-CS/ 2) create advanced.config in /etc/riak as in http://docs.basho.com/riakcs/latest/cookbooks/configuration/Configuring-Riak/#Setting-up-the-Proper-Riak-Backend After that I get an error in advanced.config last lines of 'riak config generate -l debug': 10:54:47.488 [info] /etc/riak/advanced.config detected, overlaying proplists 10:54:47.488 [error] Error parsing /etc/riak/advanced.config: 17: syntax error before: ']' If i remove last comma in advanced.config, I get another error: 10:58:21.398 [info] /etc/riak/advanced.config detected, overlaying proplists 10:58:21.399 [error] Error parsing /etc/riak/advanced.config: 17: syntax error before: I think, it is a bug either in documentation or in config generator -- Stanislav ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Storage statistic calculation errors
2015-03-13 7:08 GMT+05:00 Kota Uenishi : > Which version of Riak CS are you using? If it's 1.5.3 or later, > "storage_calc_timeout" in "riak_cs" section should be used instead of > "mapred_timeout" because it won't work [1]. This is undocumented, but > we should've documented this. Already used after upgrade to 1.5.3: {storage_calc_timeout, 60} Value was increased to 90 now. Bucket calc errors removed, but all calculation takes too much time and too much load after cluster extension. I think, it may be riak issue, not riak-cs. > Moreover, in CS 1.5.3 or later "mapred_timeout" or "timeout" in > "riakc" section won't work due to the same reason - all names of > configuration knobs can be found in code [2]. I hope if you don't read > Erlang, this code might be simple enough to understand what's > configurable. > > [1] > https://github.com/basho/riak_cs/blob/release/1.5/RELEASE-NOTES.md#additions > [2] > https://github.com/basho/riak_cs/blob/release/1.5/src/riak_cs_config.erl#L418 > > On Fri, Mar 13, 2015 at 10:45 AM, Kazuhiro Suzuki wrote: >> Hi Stanislav, >> >> You can change the timeout for a MapReduce job the storage calculation >> uses. Could you try to add riakc section which contains mapred_timeout >> into Riak CS's app.config like this ? : >> >> ``` >> [ >> %% riakc section >> {riakc, [ >> {mapred_timeout, 180}], %% msec >> }, >> >> [ >> %% Riak CS section >> {riak_cs, [ >> >> >> ``` >> >> Thanks, >> Kaz >> >> >> 2015-03-11 17:26 GMT+09:00 Stanislav Vlasov : >>> Our riak-cs cluster can't calculate storage statistic for some buckets >>> and all calculation takes too long: >>> >>> riak-cs/console.log: >>> 2015-03-11 01:25:56.791 [error] >>> <0.485.0>@riak_cs_storage:maybe_sum_bucket:75 failed to calculate >>> usage of bucket 'x' of user ''. >>> Reason: {error,{timeout,[]}} >>> 2015-03-11 01:37:36.212 [info] >>> <0.485.0>@riak_cs_storage_d:calculating:150 Finished storage >>> calculation in 5794 seconds. >>> >>> This bucket contains over 500 files. It could be the cause of this >>> error, but several days ago, after cluster restart, all storage >>> statistic was calculated without errors: >>> >>> 2015-02-27 01:23:59.777 [info] >>> <0.483.0>@riak_cs_storage_d:calculating:150 Finished storage >>> calculation in 1138 seconds. >>> >>> Please advise anything to fix it >>> >>> >>> Our node config files and last logs can be found at http://ovh.to/MuavQVP >>> >>> 12 nodes in cluster, node hardware configuration: >>> CPU: 6 cores of Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz >>> RAM: 60G >>> >>> -- >>> Stanislav >>> >>> ___ >>> riak-users mailing list >>> riak-users@lists.basho.com >>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> >> -- >> Kazuhiro Suzuki | Basho Japan KK >> >> ___ >> riak-users mailing list >> riak-users@lists.basho.com >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > > > -- > Kota UENISHI / @kuenishi > Basho Japan KK -- Stanislav ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Storage statistic calculation errors
2015-03-13 6:45 GMT+05:00 Kazuhiro Suzuki : > You can change the timeout for a MapReduce job the storage calculation > uses. Could you try to add riakc section which contains mapred_timeout > into Riak CS's app.config like this ? : > {riakc, [ > {mapred_timeout, 180}], %% msec > }, > Current config contain: {riakc, [ %% increase timeout for LARGE bucket statistic calculation {mapred_timeout, 360} %% default mapred_call_timeout == 6 { mapred_call_timeout, 18 } ]} It's help to calculate without errors, but does not improve calculation speed. > > 2015-03-11 17:26 GMT+09:00 Stanislav Vlasov : >> Our riak-cs cluster can't calculate storage statistic for some buckets >> and all calculation takes too long: >> >> riak-cs/console.log: >> 2015-03-11 01:25:56.791 [error] >> <0.485.0>@riak_cs_storage:maybe_sum_bucket:75 failed to calculate >> usage of bucket 'x' of user ''. >> Reason: {error,{timeout,[]}} >> 2015-03-11 01:37:36.212 [info] >> <0.485.0>@riak_cs_storage_d:calculating:150 Finished storage >> calculation in 5794 seconds. >> >> This bucket contains over 500 files. It could be the cause of this >> error, but several days ago, after cluster restart, all storage >> statistic was calculated without errors: >> >> 2015-02-27 01:23:59.777 [info] >> <0.483.0>@riak_cs_storage_d:calculating:150 Finished storage >> calculation in 1138 seconds. >> >> Please advise anything to fix it >> >> >> Our node config files and last logs can be found at http://ovh.to/MuavQVP >> >> 12 nodes in cluster, node hardware configuration: >> CPU: 6 cores of Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz >> RAM: 60G >> >> -- >> Stanislav >> >> ___ >> riak-users mailing list >> riak-users@lists.basho.com >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > -- > Kazuhiro Suzuki | Basho Japan KK -- Stanislav ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Storage statistic calculation errors
Our riak-cs cluster can't calculate storage statistic for some buckets and all calculation takes too long: riak-cs/console.log: 2015-03-11 01:25:56.791 [error] <0.485.0>@riak_cs_storage:maybe_sum_bucket:75 failed to calculate usage of bucket 'x' of user ''. Reason: {error,{timeout,[]}} 2015-03-11 01:37:36.212 [info] <0.485.0>@riak_cs_storage_d:calculating:150 Finished storage calculation in 5794 seconds. This bucket contains over 500 files. It could be the cause of this error, but several days ago, after cluster restart, all storage statistic was calculated without errors: 2015-02-27 01:23:59.777 [info] <0.483.0>@riak_cs_storage_d:calculating:150 Finished storage calculation in 1138 seconds. Please advise anything to fix it Our node config files and last logs can be found at http://ovh.to/MuavQVP 12 nodes in cluster, node hardware configuration: CPU: 6 cores of Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz RAM: 60G -- Stanislav ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: some problems with storage statistic calculation
2014-08-11 17:03 GMT+06:00 Stanislav Vlasov : > We have 8 nodes with riak+riak-cs, about 7 Tb data in cluster. > Some time riak process dead by OOM on several nodes when large > (~100Gb) file was writed via s3cmd, because riak-cs eated all memory. > After recovering we experiencing the following problems: > > 1) very slow storage statistic calculation (about two hour). Before > oom it was done in 40 minutes. After total riak-cs restart statistic calculate fine (about 10 min from begin to end). But 15 minutes after restart statstic calculation slowing agan and we see in console log: 2014-08-13 08:18:31.854 [warning] <0.4043.0>@riak_cs_manifest:maybe_warn_bloated_manifests:145 Large manifest size (54195024 bytes) for bucket=<<"test">> key= <<"u7850.netangels.ru-20140328-full.tar.lzo">> 2014-08-13 08:18:43.620 [warning] <0.4081.0>@riak_cs_manifest:maybe_warn_bloated_manifests:145 Many manifest siblings (21 siblings) for bucket=<<"test">> key= <<"u7850.netangels.ru-20140328-full.tar.lzo">> Filesize was ~120Gb, uploaded before OOM. File does not exists now (deleted, uploaded with zero size and deleted again), but nothing changed. I guess it's a garbage collector issue, because it happen exactly every 15 minutes ({gc_interval,900}) after restart. What can i do with it? -- Stanislav ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
some problems with storage statistic calculation
Hello! We have 8 nodes with riak+riak-cs, about 7 Tb data in cluster. Some time riak process dead by OOM on several nodes when large (~100Gb) file was writed via s3cmd, because riak-cs eated all memory. After recovering we experiencing the following problems: 1) very slow storage statistic calculation (about two hour). Before oom it was done in 40 minutes. 2) 400+ files in bucket touristerru, no storage statistic counted: riak-cs/error.log: 2014-08-11 00:12:28.096 [error] <0.5987.49>@riak_cs_storage:maybe_sum_bucket:74 failed to calculate usage of bucket 'touristerru' of user 'OF6DQ0FRBTEVLKGY-X0 P'. Reason: {error,<<"{\"phase\":0,\"error\":\"[{vnode_proxy_timeout, After statistic request for this user we see some like this: {u'Access': u'not_requested', u'Storage': {u'Errors': [], u'Samples': [{u'EndTime': u'20140811T091836Z', u'StartTime': u'20140811T091113Z', u'touristerru': u'{error,<<"{\\"phase\\":0,\\"error\\":\\"[{vnode_proxy_timeout,{228359630832953580969325755111919221821239459840,\'riak@192.168.0.8\'}}]\\",\\"input\\":\\"{<<48,111,58,185,253,24,64,48,197,1,20,36,130,111,222,189,75,202,107>>,<<\\"files/3/8/6/5/3/5/2/clones/870_527_fixedwidth.jpg\\">>}\\",\\"type\\":\\"result\\",\\"stack\\":\\"[{gen,do_call,4,[{file,\\"gen.erl\\"},{line,234}]},{riak_core_vnode_proxy,call,2,[{file,\\"src/riak_core_vnode_proxy.erl\\"},{line,109}]},{riak_pipe_vnode,queue_work_send,4,[{file,\\"src/riak_pipe_vnode.erl\\"},{line,333}]},{riak_pipe_vnode,queue_work_erracc,6,[{file,\\"src/riak_pipe_vnode.erl\\"},{line,281}]},{riak_kv_pipe_get,process,3,[{file,\\"src/riak_kv_pipe_get.erl\\"},{line,92}]},{riak_pipe_vnode_worker,process_input,3,[{file,\\"src/riak_pipe_vnode_worker.erl\\"},{line,445}]},{riak_pipe_vnode_worker,wait_for_input,...},...]\\"}">>}'}]}} 3) Crash calculation process: riak-cs/console.log 2014-08-11 09:22:51.580 [warning] <0.24095.1>@riak_cs_storage_d:read_storage_schedule1:300 No storage schedule defined. Calculation must be triggered manually. 2014-08-11 09:22:51.580 [error] <0.438.0> gen_fsm riak_cs_storage_d in state calculating terminated with reason: no match of right hand value false in riak_cs_storage:sum_bucket/1 line 104 2014-08-11 09:22:51.580 [error] <0.438.0> CRASH REPORT Process riak_cs_storage_d with 1 neighbours exited with reason: no match of right hand value false in riak_cs_storage:sum_bucket/1 line 104 in gen_fsm:terminate/7 line 611 2014-08-11 09:22:51.581 [error] <0.153.0> Supervisor riak_cs_sup had child riak_cs_storage_d started with riak_cs_storage_d:start_link() at <0.438.0> exit with reason no match of right hand value false in riak_cs_storage:sum_bucket/1 line 104 in context child_terminated What has been done: System: 1) RAM upgrade from 30 to 61Gb on every node. 2) add some swap on additional ssd (only to avoid OOM, sysctl vm.swappiness=0 is set) Riak configs: 1) increase cache_size in backend config 2) set {mapred_reduce_phase_batch_size, 5000} 3) set {mapred_always_prereduce, true} Riak-CS configs: 1) set {storage_archive_period, 14400} 2) upgrade to 1.5.0 from 1.4.8 Configs: http://ovh.to/iwTiMby Last logs: http://ovh.to/AHaASw I don't know what i must to do now. -- Stanislav ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: very slow write speed to riak-cs
2014-04-03 22:54 GMT+06:00 Luke Bakken : > Before you go down the path of changing proxies, could you provide logs from > one instance of your proxy server? They may provide more insight into what's > going on here. In addition, the config and logs from one Riak CS node would > be helpful - the command I gave earlier didn't have the final arguments: > > tar -czf /tmp/riak-cs-$(hostname).tgz /etc/riak-cs /var/log/riak-cs Archive is here: http://ovh.to/qZEjGRt -- Stanislav ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: very slow write speed to riak-cs
2014-04-02 19:38 GMT+06:00 Luke Bakken : > In your Riak /etc/riak/app.config files, please use the following value: > > {pb_backlog, 256}, I try even {pb_backlog, 512} - no changes. > After changing this, you will have to restart Riak in a rolling fashion. > Could you please run riak-debug on one node in your cluster and make the > generated archive available? (dropbox, for instance). # riak-debug ./usr/sbin/riak-debug: 538: [: =: argument expected Unable to locate be_default LevelDB data directory. Aborting. Using riak_kv_eleveldb_backend data_root: Generated data in archive: http://ovh.to/QmPhSAy > Also, could you run > tar -czf /tmp/riak-cs-$(hostname).tgz and make the archive available? What data you need in archive if not debug info? > Thanks > -- > Luke Bakken > CSE > lbak...@basho.com > > > On Tue, Apr 1, 2014 at 9:32 PM, Stanislav Vlasov > wrote: >> >> Hello! >> >> I have 8x cluster of riak+riak-cs on debian. Config templates attached >> Versions: >> ii riak1.4.8-1 >> amd64Riak is a distributed data store >> ii riak-cs 1.4.5-1 >> amd64Riak CS >> >> Every riak-cs connect to local node. Between clients and riak-cs exist >> frontend (Tengine version: Tengine/1.5.1 (nginx/1.2.9)), config >> attached >> Clients - s3cmd + some numbers of php (read-only) >> >> When 1-3 clients wants write to riak-cs, write speed is near 3-4MB/sec. >> If 30-40 clients wants write, write speed slow down to lower than >> 100kB/sec. >> >> In riak-cs crash.log: >> >> 2014-04-02 03:52:11 =ERROR REPORT >> webmachine error: >> path="/buckets/test/objects/win.img/uploads/PuqEyz0BRCCk6rDxtH7tRQ==" >> >> {error,{error,{badmatch,{error,closed}},[{webmachine_request,recv_unchunked_body,3,[{file,"src/webmachine_request.erl"},{line,474}]},{webmachine_request,call,2,[{file,"src/webmachine_request.erl"},{line,193}]},{wrq,stream_req_body,2,[{file,"src/wrq.erl"},{line,121}]},{riak_cs_wm_object_upload_part,accept_body,2,[{file,"src/riak_cs_wm_object_upload_part.erl"},{line,235}]},{riak_cs_wm_common,accept_body,2,[{file,"src/riak_cs_wm_common.erl"},{line,337}]},{webmachine_resource,resource_call,3,[{file,"src/webmachine_resource.erl"},{line,186}]},{webmachine_resource,do,3,[{file,"src/webmachine_resource.erl"},{line,142}]},{webmachine_decision_core,resource_call,1,[{file,"src/webmachine_decision_core.erl"},{line,48}]}]}} >> >> [{webmachine_request,recv_unchunked_body,3,[{file,"src/webmachine_request.erl"},{line,474}]},{webmachine_request,call,2,[{file,"src/webmachine_request.erl"},{line,193}]},{wrq,stream_req_body,2,[{file,"src/wrq.erl"},{line,121}]},{riak_cs_wm_object_upload_part,accept_body,2,[{file,"src/riak_cs_wm_object_upload_part.erl"},{line,235}]},{riak_cs_wm_common,accept_body,2,[{file,"src/riak_cs_wm_common.erl"},{line,337}]},{webmachine_resource,resource_call,3,[{file,"src/webmachine_resource.erl"},{line,186}]},{webmachine_resource,do,3,[{file,"src/webmachine_resource.erl"},{line,142}]},{webmachine_decision_core,resource_call,1,[{file,"src/webmachine_decision_core.erl"},{line,48}]}] >> >> After this event s3cmd makes throttling to slower speed: >> >> $ s3cmd put win.img s3://test/ >> win.img -> s3://test/win.img [part 1 of 1366, 15MB] >>184320 of 15728640 1% in0s 2.16 MB/s failed >> WARNING: Upload failed: >> /win.img?partNumber=1&uploadId=PuqEyz0BRCCk6rDxtH7tRQ== ([Errno 104] >> Connection reset by peer) >> WARNING: Retrying on lower speed (throttle=0.00) >> WARNING: Waiting 3 sec... >> win.img -> s3://test/win.img [part 1 of 1366, 15MB] >> 13799424 of 1572864087% in2s 5.18 MB/s failed >> WARNING: Upload failed: >> /win.img?partNumber=1&uploadId=PuqEyz0BRCCk6rDxtH7tRQ== ([Errno 104] >> Connection reset by peer) >> WARNING: Retrying on lower speed (throttle=0.01) >> WARNING: Waiting 6 sec... >> win.img -> s3://test/win.img [part 1 of 1366, 15MB] >>167936 of 15728640 1% in0s 249.46 kB/s failed >> WARNING: Upload failed: >> /win.img?partNumber=1&uploadId=PuqEyz0BRCCk6rDxtH7tRQ== ([Errno 104] >> Connection reset by peer) >> WARNING: Retrying on lower speed (throttle=0.05) >> WARNING: Waiting 9 sec... >> win.img -> s3://test/win.img [part 1 of 1366, 15MB] >> 6225920 of 1572864039% in 76s79.51 kB/s failed >> WARNING: Upload failed: >> /win.img?partNumber
Re: very slow write speed to riak-cs
2014-04-03 1:36 GMT+06:00 Seth Thomas : > Could you also include your riak app.config and vm.args. It seems like > you're load balancing Riak CS but I'm curious how the underlying Riak > topology looks as well since that will likely be where the performance > bottlenecks are uncovered. Config templates attached. > On Wed, Apr 2, 2014 at 6:38 AM, Luke Bakken wrote: >> >> Hi Stanislav, >> >> In your Riak /etc/riak/app.config files, please use the following value: >> >> {pb_backlog, 256}, >> >> After changing this, you will have to restart Riak in a rolling fashion. >> >> Could you please run riak-debug on one node in your cluster and make the >> generated archive available? (dropbox, for instance). Also, could you run >> tar -czf /tmp/riak-cs-$(hostname).tgz and make the archive available? >> >> Thanks >> -- >> Luke Bakken >> CSE >> lbak...@basho.com >> >> >> On Tue, Apr 1, 2014 at 9:32 PM, Stanislav Vlasov >> wrote: >>> >>> Hello! >>> >>> I have 8x cluster of riak+riak-cs on debian. Config templates attached >>> Versions: >>> ii riak1.4.8-1 >>> amd64Riak is a distributed data store >>> ii riak-cs 1.4.5-1 >>> amd64Riak CS >>> >>> Every riak-cs connect to local node. Between clients and riak-cs exist >>> frontend (Tengine version: Tengine/1.5.1 (nginx/1.2.9)), config >>> attached >>> Clients - s3cmd + some numbers of php (read-only) >>> >>> When 1-3 clients wants write to riak-cs, write speed is near 3-4MB/sec. >>> If 30-40 clients wants write, write speed slow down to lower than >>> 100kB/sec. >>> >>> In riak-cs crash.log: >>> >>> 2014-04-02 03:52:11 =ERROR REPORT >>> webmachine error: >>> path="/buckets/test/objects/win.img/uploads/PuqEyz0BRCCk6rDxtH7tRQ==" >>> >>> {error,{error,{badmatch,{error,closed}},[{webmachine_request,recv_unchunked_body,3,[{file,"src/webmachine_request.erl"},{line,474}]},{webmachine_request,call,2,[{file,"src/webmachine_request.erl"},{line,193}]},{wrq,stream_req_body,2,[{file,"src/wrq.erl"},{line,121}]},{riak_cs_wm_object_upload_part,accept_body,2,[{file,"src/riak_cs_wm_object_upload_part.erl"},{line,235}]},{riak_cs_wm_common,accept_body,2,[{file,"src/riak_cs_wm_common.erl"},{line,337}]},{webmachine_resource,resource_call,3,[{file,"src/webmachine_resource.erl"},{line,186}]},{webmachine_resource,do,3,[{file,"src/webmachine_resource.erl"},{line,142}]},{webmachine_decision_core,resource_call,1,[{file,"src/webmachine_decision_core.erl"},{line,48}]}]}} >>> >>> [{webmachine_request,recv_unchunked_body,3,[{file,"src/webmachine_request.erl"},{line,474}]},{webmachine_request,call,2,[{file,"src/webmachine_request.erl"},{line,193}]},{wrq,stream_req_body,2,[{file,"src/wrq.erl"},{line,121}]},{riak_cs_wm_object_upload_part,accept_body,2,[{file,"src/riak_cs_wm_object_upload_part.erl"},{line,235}]},{riak_cs_wm_common,accept_body,2,[{file,"src/riak_cs_wm_common.erl"},{line,337}]},{webmachine_resource,resource_call,3,[{file,"src/webmachine_resource.erl"},{line,186}]},{webmachine_resource,do,3,[{file,"src/webmachine_resource.erl"},{line,142}]},{webmachine_decision_core,resource_call,1,[{file,"src/webmachine_decision_core.erl"},{line,48}]}] >>> >>> After this event s3cmd makes throttling to slower speed: >>> >>> $ s3cmd put win.img s3://test/ >>> win.img -> s3://test/win.img [part 1 of 1366, 15MB] >>>184320 of 15728640 1% in0s 2.16 MB/s failed >>> WARNING: Upload failed: >>> /win.img?partNumber=1&uploadId=PuqEyz0BRCCk6rDxtH7tRQ== ([Errno 104] >>> Connection reset by peer) >>> WARNING: Retrying on lower speed (throttle=0.00) >>> WARNING: Waiting 3 sec... >>> win.img -> s3://test/win.img [part 1 of 1366, 15MB] >>> 13799424 of 1572864087% in2s 5.18 MB/s failed >>> WARNING: Upload failed: >>> /win.img?partNumber=1&uploadId=PuqEyz0BRCCk6rDxtH7tRQ== ([Errno 104] >>> Connection reset by peer) >>> WARNING: Retrying on lower speed (throttle=0.01) >>> WARNING: Waiting 6 sec... >>> win.img -> s3://test/win.img [part 1 of 1366, 15MB] >>>167936 of 15728640 1% in0s 249.46 kB/s failed >>> WARNING: Upload failed: >>> /win.img?partNumber=1&uploadId=PuqEyz0BRCCk6rDxtH7tRQ== ([Errno 104] >>> Connec
very slow write speed to riak-cs
Hello! I have 8x cluster of riak+riak-cs on debian. Config templates attached Versions: ii riak1.4.8-1 amd64Riak is a distributed data store ii riak-cs 1.4.5-1 amd64Riak CS Every riak-cs connect to local node. Between clients and riak-cs exist frontend (Tengine version: Tengine/1.5.1 (nginx/1.2.9)), config attached Clients - s3cmd + some numbers of php (read-only) When 1-3 clients wants write to riak-cs, write speed is near 3-4MB/sec. If 30-40 clients wants write, write speed slow down to lower than 100kB/sec. In riak-cs crash.log: 2014-04-02 03:52:11 =ERROR REPORT webmachine error: path="/buckets/test/objects/win.img/uploads/PuqEyz0BRCCk6rDxtH7tRQ==" {error,{error,{badmatch,{error,closed}},[{webmachine_request,recv_unchunked_body,3,[{file,"src/webmachine_request.erl"},{line,474}]},{webmachine_request,call,2,[{file,"src/webmachine_request.erl"},{line,193}]},{wrq,stream_req_body,2,[{file,"src/wrq.erl"},{line,121}]},{riak_cs_wm_object_upload_part,accept_body,2,[{file,"src/riak_cs_wm_object_upload_part.erl"},{line,235}]},{riak_cs_wm_common,accept_body,2,[{file,"src/riak_cs_wm_common.erl"},{line,337}]},{webmachine_resource,resource_call,3,[{file,"src/webmachine_resource.erl"},{line,186}]},{webmachine_resource,do,3,[{file,"src/webmachine_resource.erl"},{line,142}]},{webmachine_decision_core,resource_call,1,[{file,"src/webmachine_decision_core.erl"},{line,48}]}]}} [{webmachine_request,recv_unchunked_body,3,[{file,"src/webmachine_request.erl"},{line,474}]},{webmachine_request,call,2,[{file,"src/webmachine_request.erl"},{line,193}]},{wrq,stream_req_body,2,[{file,"src/wrq.erl"},{line,121}]},{riak_cs_wm_object_upload_part,accept_body,2,[{file,"src/riak_cs_wm_object_upload_part.erl"},{line,235}]},{riak_cs_wm_common,accept_body,2,[{file,"src/riak_cs_wm_common.erl"},{line,337}]},{webmachine_resource,resource_call,3,[{file,"src/webmachine_resource.erl"},{line,186}]},{webmachine_resource,do,3,[{file,"src/webmachine_resource.erl"},{line,142}]},{webmachine_decision_core,resource_call,1,[{file,"src/webmachine_decision_core.erl"},{line,48}]}] After this event s3cmd makes throttling to slower speed: $ s3cmd put win.img s3://test/ win.img -> s3://test/win.img [part 1 of 1366, 15MB] 184320 of 15728640 1% in0s 2.16 MB/s failed WARNING: Upload failed: /win.img?partNumber=1&uploadId=PuqEyz0BRCCk6rDxtH7tRQ== ([Errno 104] Connection reset by peer) WARNING: Retrying on lower speed (throttle=0.00) WARNING: Waiting 3 sec... win.img -> s3://test/win.img [part 1 of 1366, 15MB] 13799424 of 1572864087% in2s 5.18 MB/s failed WARNING: Upload failed: /win.img?partNumber=1&uploadId=PuqEyz0BRCCk6rDxtH7tRQ== ([Errno 104] Connection reset by peer) WARNING: Retrying on lower speed (throttle=0.01) WARNING: Waiting 6 sec... win.img -> s3://test/win.img [part 1 of 1366, 15MB] 167936 of 15728640 1% in0s 249.46 kB/s failed WARNING: Upload failed: /win.img?partNumber=1&uploadId=PuqEyz0BRCCk6rDxtH7tRQ== ([Errno 104] Connection reset by peer) WARNING: Retrying on lower speed (throttle=0.05) WARNING: Waiting 9 sec... win.img -> s3://test/win.img [part 1 of 1366, 15MB] 6225920 of 1572864039% in 76s79.51 kB/s failed WARNING: Upload failed: /win.img?partNumber=1&uploadId=PuqEyz0BRCCk6rDxtH7tRQ== ([Errno 104] Connection reset by peer) WARNING: Retrying on lower speed (throttle=0.25) WARNING: Waiting 12 sec... win.img -> s3://test/win.img [part 1 of 1366, 15MB] 15728640 of 15728640 100% in 962s15.96 kB/s done I think, even on 1Gbit network betwen nodes, write speed should be higher, but i don't understand where the bottleneck. -- Stanislav app.config.template Description: Binary data vm.args.template Description: Binary data riak-cs-nginx Description: Binary data ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com