Cluster balance problem

2016-02-08 Thread Stanislav Vlasov
Hello!

We have riak cluster for riak-cs.
Failed node riak@192.168.0.133 has been replaced via 'riak-admin
replace' by riak@192.168.0.141 (new node), then cleaned up, prepared
and joined to cluster as new.
Member-status after last 'riak-admin cluster commit', when all
transfers complete:

= Membership ==
Status RingPendingNode
---
valid   8.6%  --  'riak@192.168.0.130'
valid   8.2%  --  'riak@192.168.0.131'
valid   8.2%  --  'riak@192.168.0.132'
valid   8.2%  --  'riak@192.168.0.133'
valid   8.2%  --  'riak@192.168.0.134'
valid   8.2%  --  'riak@192.168.0.135'
valid   8.2%  --  'riak@192.168.0.136'
valid   8.2%  --  'riak@192.168.0.137'
valid   8.2%  --  'riak@192.168.0.138'
valid   8.2%  --  'riak@192.168.0.139'
valid   8.2%  --  'riak@192.168.0.140'
valid   9.4%  --  'riak@192.168.0.141'
---
Valid:12 / Leaving:0 / Exiting:0 / Joining:0 / Down:0

Every node have 3.6T raid.
Free place on disk and used %:

192.168.0.130 996G 73%
192.168.0.131 1.2T 69%
192.168.0.132 1.2T 68%
192.168.0.133 1.1T 70%
192.168.0.134 1.1T 70%
192.168.0.135 1.2T 69%
192.168.0.136 1.2T 68%
192.168.0.137 1.2T 69%
192.168.0.138 1.2T 69%
192.168.0.139 1.2T 69%
192.168.0.140 1.2T 68%
192.168.0.141 808G 78%

Problem: on 192.168.0.141 very little free space compared to rest,
which may affect in merges on this node after uploading some
additional data into cluster.

Software versions:
ii riak 2.1.3-1 amd64 Riak is a distributed data store
ii riak-cs 2.0.0-1 amd64 Riak CS

Can i rebalance riak without adding/removing nodes or any hardware changes?

-- 
Stanislav

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: s3cmd error: access to bucket was denied

2015-08-25 Thread Stanislav Vlasov
; >> >> >
>>> >> >> >> >
>>> >> >> >> >
>>> >> >> >> > http://docs.basho.com/riakcs/1.4.2/cookbooks/configuration/Configuring-an-S3-Client/#Sample-s3cmd-Configuration-File-for-Production-Use
>>> >> >> >> >
>>> >> >> >> > There's no "signature_v2" parameter in "s3cfg". However, I
>>> >> >> >> > added
>>> >> >> >> > this
>>> >> >> >> > parameter to "s3cfg" and tried again with same errors.
>>> >> >> >> >
>>> >> >> >> >
>>> >> >> >> >
>>> >> >> >> >
>>> >> >> >> > On Thu, Aug 20, 2015 at 10:31 PM, Kazuhiro Suzuki
>>> >> >> >> > 
>>> >> >> >> > wrote:
>>> >> >> >> >>
>>> >> >> >> >> Hi Changmao,
>>> >> >> >> >>
>>> >> >> >> >> It seems your s3cmd config should include 2 items:
>>> >> >> >> >>
>>> >> >> >> >> signature_v2 = True
>>> >> >> >> >> host_base  = api2.cloud-datayes.com
>>> >> >> >> >>
>>> >> >> >> >> Riak CS requires "signature_v2 = True" since Riak CS has not
>>> >> >> >> >> supported
>>> >> >> >> >> s3 authentication version 4 yet.
>>> >> >> >> >> You can find a sample configuration of s3cmd here to interact
>>> >> >> >> >> with
>>> >> >> >> >> Riak
>>> >> >> >> >> CS
>>> >> >> >> >> [1].
>>> >> >> >> >>
>>> >> >> >> >> [1]:
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >> http://docs.basho.com/riakcs/2.0.1/cookbooks/configuration/Configuring-an-S3-Client/#Sample-s3cmd-Configuration-File-for-Production-Use
>>> >> >> >> >>
>>> >> >> >> >> Thanks,
>>> >> >> >> >>
>>> >> >> >> >> On Thu, Aug 20, 2015 at 7:44 PM, changmao wang
>>> >> >> >> >> 
>>> >> >> >> >> wrote:
>>> >> >> >> >> > Just now, I used "admin_key" and "admin_secret" from
>>> >> >> >> >> > /etc/riak-cs/app.config
>>> >> >> >> >> > to run "s3cmd -c s3-stock ls  s3://stock/XSHE/0/000600"
>>> >> >> >> >> > and I got the below error:
>>> >> >> >> >> > ERROR: Access to bucket 'stock' was denied
>>> >> >> >> >> >
>>> >> >> >> >> > Below is abstract from "/var/log/riak-cs/console.log"
>>> >> >> >> >> > 2015-08-20 18:40:22.790 [debug]
>>> >> >> >> >> > <0.28085.18>@riak_cs_s3_auth:calculate_signature:129 STS:
>>> >> >> >> >> > ["GET","\n",[],"\n",[],"\n","\n",[["x-amz-date",":",<<"Thu,
>>> >> >> >> >> > 20
>>> >> >> >> >> > Aug
>>> >> >> >> >> > 2015
>>> >> >> >> >> > 10:40:22 +">>,"\n"]],["/stock/",[]]]
>>> >> >> >> >> > 2015-08-20 18:40:32.861 [error]
>>> >> >> >> >> > <0.28153.18>@riak_cs_wm_common:maybe_create_user:223
>>> >> >> >> >> > Retrieval
>>> >> >> >> >> > of
>>> >> >> >> >> > user
>>> >> >> >> >> > record for s3 failed. Reason: no_user_key
>>> >> >> >> >> > 2015-08-20 18:40:32.861 [debug]
>>> >> >> >> >> > <0.

Re: s3cmd error: access to bucket was denied

2015-08-20 Thread Stanislav Vlasov
2015-08-20 14:47 GMT+05:00 changmao wang :

> what's your meaning of domain name of /etc/riak-cs/app.config and ~/.s3cfg?
> I guess it's cs_root_host parameter from /etc/riak-cs/app.config and
> host_base from '~/.s3cfg'.
> If so, there're same as "api2.cloud-datayes.com".

Yes, is that i mean, but i see, it is not your case
Try to set {level, debug} in lager_file_backend section for console.log.

> However, I can not ping this host from localhost.

It's ok, if you write proper proxy_host and proxy_port in .s3cfg

> On Thu, Aug 20, 2015 at 5:23 PM, Stanislav Vlasov 
> wrote:
>>
>> 2015-08-20 13:57 GMT+05:00 changmao wang :
>> > somebody watching on this?
>>
>> Do you set up same domain in riak-cs.conf and in .s3cfg?
>> I got such error in this case.
>>
>> > On Wed, Aug 19, 2015 at 9:01 AM, changmao wang 
>> > wrote:
>> >>
>> >> Matthew,
>> >>
>> >> I used s3cmd --configure to generate ".s3cfg" config file and then
>> >> access
>> >> RIAK service by s3cmd.
>> >> The access_key and secret_key from ".s3cfg" is same as admin_key and
>> >> admin_secret from "/etc/riak-cs/app.config".
>> >>
>> >> However, I got error as below using s3cmd to access one bucket.
>> >>
>> >> root@cluster-s3-hd1:~# s3cmd -c /root/.s3cfg ls
>> >> s3://pipeline/article/111.pdf
>> >> ERROR: Access to bucket 'pipeline' was denied
>> >>
>> >> By the way, I used Riak and Riak-CS 1.4.2 on Ubuntu. Current production
>> >> cluster is a legacy system without documents for co-workers.
>> >>
>> >> Attached file is "s3cfg" generated by "s3cmd --configure".
>> >> --
>> >> Amao Wang
>> >> Best & Regards
>> >
>> >
>> >
>> >
>> > --
>> > Amao Wang
>> > Best & Regards
>> >
>> > ___
>> > riak-users mailing list
>> > riak-users@lists.basho.com
>> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> >
>>
>>
>>
>> --
>> Stanislav
>
>
>
>
> --
> Amao Wang
> Best & Regards



-- 
Stanislav

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: s3cmd error: access to bucket was denied

2015-08-20 Thread Stanislav Vlasov
2015-08-20 13:57 GMT+05:00 changmao wang :
> somebody watching on this?

Do you set up same domain in riak-cs.conf and in .s3cfg?
I got such error in this case.

> On Wed, Aug 19, 2015 at 9:01 AM, changmao wang 
> wrote:
>>
>> Matthew,
>>
>> I used s3cmd --configure to generate ".s3cfg" config file and then access
>> RIAK service by s3cmd.
>> The access_key and secret_key from ".s3cfg" is same as admin_key and
>> admin_secret from "/etc/riak-cs/app.config".
>>
>> However, I got error as below using s3cmd to access one bucket.
>>
>> root@cluster-s3-hd1:~# s3cmd -c /root/.s3cfg ls
>> s3://pipeline/article/111.pdf
>> ERROR: Access to bucket 'pipeline' was denied
>>
>> By the way, I used Riak and Riak-CS 1.4.2 on Ubuntu. Current production
>> cluster is a legacy system without documents for co-workers.
>>
>> Attached file is "s3cfg" generated by "s3cmd --configure".
>> --
>> Amao Wang
>> Best & Regards
>
>
>
>
> --
> Amao Wang
> Best & Regards
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>



-- 
Stanislav

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: riak 2.0 + riak-cs 2.0 trouble

2015-04-17 Thread Stanislav Vlasov
2015-04-17 17:31 GMT+05:00 John Daily :
> Thanks, that helps. You’re right that the documentation is a bit buggy, or at 
> least incomplete.

> The problem is that the example provided in the docs is just a snippet. To 
> make a fully-functional advanced.config file requires a bit more syntactical 
> structure.

> You’ll need to wrap what you provided in an Erlang array, with square 
> brackets and a period terminating it.

> See http://pastebin.com/90gh6amg.

Thank you.

Please, add template for this config to next riak package.

> -John
>
> On Apr 17, 2015, at 8:24 AM, Stanislav Vlasov  wrote:
>
>> 2015-04-17 17:04 GMT+05:00 John Daily :
>>> Unfortunately it’s very easy to introduce syntax errors into Erlang 
>>> configuration files (and tricky to diagnose them without Erlang 
>>> experience), which is why we’re moving toward the newer sysctl-style files 
>>> like riak.conf.
>>>
>>> The example in the documentation looks ok; can we see a copy of your 
>>> advanced.config file? Please redact any sensitive information, and I’d 
>>> suggest Pastebin or a GitHub gist.
>>
>> It was copied from site by cut-n-paste, no any additions:
>> http://pastebin.com/Hyv3tvMS
>>
>> Last comma in line before second '%%' removed by me.
>>
>>> -John
>>>
>>> On Apr 17, 2015, at 7:57 AM, Stanislav Vlasov  
>>> wrote:
>>>
>>>> I have troubles setting up a test riak node for riak-cs. Here's how to
>>>> reproduce my problem:
>>>>
>>>> 1) install on Debian 7 riak 2.0.5 and riak 2.0.0 from apt repository
>>>> as in 
>>>> http://docs.basho.com/riak/latest/ops/building/installing/debian-ubuntu/
>>>> and 
>>>> http://docs.basho.com/riakcs/latest/cookbooks/installing/Installing-Riak-CS/
>>>> 2) create advanced.config in /etc/riak as in
>>>> http://docs.basho.com/riakcs/latest/cookbooks/configuration/Configuring-Riak/#Setting-up-the-Proper-Riak-Backend
>>>>
>>>> After that I get an error in advanced.config
>>>>
>>>> last lines of 'riak config generate -l debug':
>>>> 10:54:47.488 [info] /etc/riak/advanced.config detected, overlaying 
>>>> proplists
>>>> 10:54:47.488 [error] Error parsing /etc/riak/advanced.config: 17:
>>>> syntax error before: ']'
>>>>
>>>> If i remove last comma in advanced.config, I get another error:
>>>>
>>>> 10:58:21.398 [info] /etc/riak/advanced.config detected, overlaying 
>>>> proplists
>>>> 10:58:21.399 [error] Error parsing /etc/riak/advanced.config: 17:
>>>> syntax error before:
>>>>
>>>> I think, it is a bug either in documentation or in config generator
>>>>
>>>> --
>>>> Stanislav
>>>>
>>>> ___
>>>> riak-users mailing list
>>>> riak-users@lists.basho.com
>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>
>>
>>
>>
>> --
>> Stanislav
>



-- 
Stanislav

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: riak 2.0 + riak-cs 2.0 trouble

2015-04-17 Thread Stanislav Vlasov
2015-04-17 17:04 GMT+05:00 John Daily :
> Unfortunately it’s very easy to introduce syntax errors into Erlang 
> configuration files (and tricky to diagnose them without Erlang experience), 
> which is why we’re moving toward the newer sysctl-style files like riak.conf.
>
> The example in the documentation looks ok; can we see a copy of your 
> advanced.config file? Please redact any sensitive information, and I’d 
> suggest Pastebin or a GitHub gist.

It was copied from site by cut-n-paste, no any additions:
http://pastebin.com/Hyv3tvMS

Last comma in line before second '%%' removed by me.

> -John
>
> On Apr 17, 2015, at 7:57 AM, Stanislav Vlasov  wrote:
>
>> I have troubles setting up a test riak node for riak-cs. Here's how to
>> reproduce my problem:
>>
>> 1) install on Debian 7 riak 2.0.5 and riak 2.0.0 from apt repository
>> as in 
>> http://docs.basho.com/riak/latest/ops/building/installing/debian-ubuntu/
>> and 
>> http://docs.basho.com/riakcs/latest/cookbooks/installing/Installing-Riak-CS/
>> 2) create advanced.config in /etc/riak as in
>> http://docs.basho.com/riakcs/latest/cookbooks/configuration/Configuring-Riak/#Setting-up-the-Proper-Riak-Backend
>>
>> After that I get an error in advanced.config
>>
>> last lines of 'riak config generate -l debug':
>> 10:54:47.488 [info] /etc/riak/advanced.config detected, overlaying proplists
>> 10:54:47.488 [error] Error parsing /etc/riak/advanced.config: 17:
>> syntax error before: ']'
>>
>> If i remove last comma in advanced.config, I get another error:
>>
>> 10:58:21.398 [info] /etc/riak/advanced.config detected, overlaying proplists
>> 10:58:21.399 [error] Error parsing /etc/riak/advanced.config: 17:
>> syntax error before:
>>
>> I think, it is a bug either in documentation or in config generator
>>
>> --
>> Stanislav
>>
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>



-- 
Stanislav

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


riak 2.0 + riak-cs 2.0 trouble

2015-04-17 Thread Stanislav Vlasov
I have troubles setting up a test riak node for riak-cs. Here's how to
reproduce my problem:

1) install on Debian 7 riak 2.0.5 and riak 2.0.0 from apt repository
as in http://docs.basho.com/riak/latest/ops/building/installing/debian-ubuntu/
and http://docs.basho.com/riakcs/latest/cookbooks/installing/Installing-Riak-CS/
2) create advanced.config in /etc/riak as in
http://docs.basho.com/riakcs/latest/cookbooks/configuration/Configuring-Riak/#Setting-up-the-Proper-Riak-Backend

After that I get an error in advanced.config

last lines of 'riak config generate -l debug':
10:54:47.488 [info] /etc/riak/advanced.config detected, overlaying proplists
10:54:47.488 [error] Error parsing /etc/riak/advanced.config: 17:
syntax error before: ']'

If i remove last comma in advanced.config, I get another error:

10:58:21.398 [info] /etc/riak/advanced.config detected, overlaying proplists
10:58:21.399 [error] Error parsing /etc/riak/advanced.config: 17:
syntax error before:

I think, it is a bug either in documentation or in config generator

-- 
Stanislav

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Storage statistic calculation errors

2015-03-12 Thread Stanislav Vlasov
2015-03-13 7:08 GMT+05:00 Kota Uenishi :
> Which version of Riak CS are you using? If it's 1.5.3 or later,
> "storage_calc_timeout" in "riak_cs" section should be used instead of
> "mapred_timeout" because it won't work [1]. This is undocumented, but
> we should've documented this.

Already used after upgrade to 1.5.3:

{storage_calc_timeout, 60}

Value was increased to 90 now.
Bucket calc errors removed, but all calculation takes too much time
and too much load after cluster extension.

I think, it may be riak issue, not riak-cs.

> Moreover, in CS 1.5.3 or later "mapred_timeout" or "timeout" in
> "riakc" section won't work due to the same reason - all names of
> configuration knobs can be found in code [2]. I hope if you don't read
> Erlang, this code might be simple enough to understand what's
> configurable.
>
> [1] 
> https://github.com/basho/riak_cs/blob/release/1.5/RELEASE-NOTES.md#additions
> [2] 
> https://github.com/basho/riak_cs/blob/release/1.5/src/riak_cs_config.erl#L418
>
> On Fri, Mar 13, 2015 at 10:45 AM, Kazuhiro Suzuki  wrote:
>> Hi Stanislav,
>>
>> You can change the timeout for a MapReduce job the storage calculation
>> uses. Could you try to add riakc section which contains mapred_timeout
>> into Riak CS's app.config like this ? :
>>
>> ```
>> [
>> %% riakc section
>>  {riakc, [
>>  {mapred_timeout, 180}], %% msec
>>  },
>>
>> [
>>  %% Riak CS section
>>  {riak_cs, [
>>
>>
>> ```
>>
>> Thanks,
>> Kaz
>>
>>
>> 2015-03-11 17:26 GMT+09:00 Stanislav Vlasov :
>>> Our riak-cs cluster can't calculate storage statistic for some buckets
>>> and all calculation takes too long:
>>>
>>> riak-cs/console.log:
>>> 2015-03-11 01:25:56.791 [error]
>>> <0.485.0>@riak_cs_storage:maybe_sum_bucket:75 failed to calculate
>>> usage of bucket 'x' of user ''.
>>> Reason: {error,{timeout,[]}}
>>> 2015-03-11 01:37:36.212 [info]
>>> <0.485.0>@riak_cs_storage_d:calculating:150 Finished storage
>>> calculation in 5794 seconds.
>>>
>>> This bucket contains over 500 files. It could be the cause of this
>>> error, but several days ago, after cluster restart, all storage
>>> statistic was calculated without errors:
>>>
>>> 2015-02-27 01:23:59.777 [info]
>>> <0.483.0>@riak_cs_storage_d:calculating:150 Finished storage
>>> calculation in 1138 seconds.
>>>
>>> Please advise anything to fix it
>>>
>>>
>>> Our node config files and last logs can be found at http://ovh.to/MuavQVP
>>>
>>> 12 nodes in cluster, node hardware configuration:
>>> CPU: 6 cores of Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz
>>> RAM: 60G
>>>
>>> --
>>> Stanislav
>>>
>>> ___
>>> riak-users mailing list
>>> riak-users@lists.basho.com
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>> --
>> Kazuhiro Suzuki | Basho Japan KK
>>
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
>
> --
> Kota UENISHI / @kuenishi
> Basho Japan KK



-- 
Stanislav

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Storage statistic calculation errors

2015-03-12 Thread Stanislav Vlasov
2015-03-13 6:45 GMT+05:00 Kazuhiro Suzuki :

> You can change the timeout for a MapReduce job the storage calculation
> uses. Could you try to add riakc section which contains mapred_timeout
> into Riak CS's app.config like this ? :

>  {riakc, [
>  {mapred_timeout, 180}], %% msec
>  },
>

Current config contain:

 {riakc, [ %% increase timeout for LARGE bucket statistic calculation
 {mapred_timeout, 360}
 %% default mapred_call_timeout == 6
 { mapred_call_timeout, 18 }
 ]}

It's help to calculate without errors, but does not improve calculation speed.

>
> 2015-03-11 17:26 GMT+09:00 Stanislav Vlasov :
>> Our riak-cs cluster can't calculate storage statistic for some buckets
>> and all calculation takes too long:
>>
>> riak-cs/console.log:
>> 2015-03-11 01:25:56.791 [error]
>> <0.485.0>@riak_cs_storage:maybe_sum_bucket:75 failed to calculate
>> usage of bucket 'x' of user ''.
>> Reason: {error,{timeout,[]}}
>> 2015-03-11 01:37:36.212 [info]
>> <0.485.0>@riak_cs_storage_d:calculating:150 Finished storage
>> calculation in 5794 seconds.
>>
>> This bucket contains over 500 files. It could be the cause of this
>> error, but several days ago, after cluster restart, all storage
>> statistic was calculated without errors:
>>
>> 2015-02-27 01:23:59.777 [info]
>> <0.483.0>@riak_cs_storage_d:calculating:150 Finished storage
>> calculation in 1138 seconds.
>>
>> Please advise anything to fix it
>>
>>
>> Our node config files and last logs can be found at http://ovh.to/MuavQVP
>>
>> 12 nodes in cluster, node hardware configuration:
>> CPU: 6 cores of Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz
>> RAM: 60G
>>
>> --
>> Stanislav
>>
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
> --
> Kazuhiro Suzuki | Basho Japan KK



-- 
Stanislav

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Storage statistic calculation errors

2015-03-11 Thread Stanislav Vlasov
Our riak-cs cluster can't calculate storage statistic for some buckets
and all calculation takes too long:

riak-cs/console.log:
2015-03-11 01:25:56.791 [error]
<0.485.0>@riak_cs_storage:maybe_sum_bucket:75 failed to calculate
usage of bucket 'x' of user ''.
Reason: {error,{timeout,[]}}
2015-03-11 01:37:36.212 [info]
<0.485.0>@riak_cs_storage_d:calculating:150 Finished storage
calculation in 5794 seconds.

This bucket contains over 500 files. It could be the cause of this
error, but several days ago, after cluster restart, all storage
statistic was calculated without errors:

2015-02-27 01:23:59.777 [info]
<0.483.0>@riak_cs_storage_d:calculating:150 Finished storage
calculation in 1138 seconds.

Please advise anything to fix it


Our node config files and last logs can be found at http://ovh.to/MuavQVP

12 nodes in cluster, node hardware configuration:
CPU: 6 cores of Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz
RAM: 60G

-- 
Stanislav

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: some problems with storage statistic calculation

2014-08-13 Thread Stanislav Vlasov
2014-08-11 17:03 GMT+06:00 Stanislav Vlasov :

> We have 8 nodes with riak+riak-cs, about 7 Tb data in cluster.
> Some time riak process dead by OOM on several nodes when large
> (~100Gb) file was writed via s3cmd, because riak-cs eated all memory.
> After recovering we experiencing the following problems:
>
> 1) very slow storage statistic calculation (about two hour). Before
> oom it was done in 40 minutes.

After total riak-cs restart statistic calculate fine (about 10 min
from begin to end). But 15 minutes after restart statstic calculation
slowing agan and we see in console log:

2014-08-13 08:18:31.854 [warning]
<0.4043.0>@riak_cs_manifest:maybe_warn_bloated_manifests:145 Large
manifest size (54195024 bytes) for bucket=<<"test">> key=
<<"u7850.netangels.ru-20140328-full.tar.lzo">>
2014-08-13 08:18:43.620 [warning]
<0.4081.0>@riak_cs_manifest:maybe_warn_bloated_manifests:145 Many
manifest siblings (21 siblings) for bucket=<<"test">> key=
<<"u7850.netangels.ru-20140328-full.tar.lzo">>

Filesize  was ~120Gb, uploaded before OOM. File does not exists now
(deleted, uploaded with zero size and deleted again), but nothing
changed.

I guess it's a garbage collector issue, because it happen exactly
every 15 minutes ({gc_interval,900}) after restart.
What can i do with it?

-- 
Stanislav

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


some problems with storage statistic calculation

2014-08-11 Thread Stanislav Vlasov
Hello!

We have 8 nodes with riak+riak-cs, about 7 Tb data in cluster.
Some time riak process dead by OOM on several nodes when large
(~100Gb) file was writed via s3cmd, because riak-cs eated all memory.
After recovering we experiencing the following problems:

1) very slow storage statistic calculation (about two hour). Before
oom it was done in 40 minutes.

2) 400+ files in bucket touristerru, no storage statistic counted:
riak-cs/error.log:
2014-08-11 00:12:28.096 [error]
<0.5987.49>@riak_cs_storage:maybe_sum_bucket:74 failed to calculate
usage of bucket 'touristerru' of user 'OF6DQ0FRBTEVLKGY-X0
P'. Reason: {error,<<"{\"phase\":0,\"error\":\"[{vnode_proxy_timeout,

After statistic request for this user we see some like this:

{u'Access': u'not_requested',
 u'Storage': {u'Errors': [],
  u'Samples': [{u'EndTime': u'20140811T091836Z',
u'StartTime': u'20140811T091113Z',
u'touristerru':
u'{error,<<"{\\"phase\\":0,\\"error\\":\\"[{vnode_proxy_timeout,{228359630832953580969325755111919221821239459840,\'riak@192.168.0.8\'}}]\\",\\"input\\":\\"{<<48,111,58,185,253,24,64,48,197,1,20,36,130,111,222,189,75,202,107>>,<<\\"files/3/8/6/5/3/5/2/clones/870_527_fixedwidth.jpg\\">>}\\",\\"type\\":\\"result\\",\\"stack\\":\\"[{gen,do_call,4,[{file,\\"gen.erl\\"},{line,234}]},{riak_core_vnode_proxy,call,2,[{file,\\"src/riak_core_vnode_proxy.erl\\"},{line,109}]},{riak_pipe_vnode,queue_work_send,4,[{file,\\"src/riak_pipe_vnode.erl\\"},{line,333}]},{riak_pipe_vnode,queue_work_erracc,6,[{file,\\"src/riak_pipe_vnode.erl\\"},{line,281}]},{riak_kv_pipe_get,process,3,[{file,\\"src/riak_kv_pipe_get.erl\\"},{line,92}]},{riak_pipe_vnode_worker,process_input,3,[{file,\\"src/riak_pipe_vnode_worker.erl\\"},{line,445}]},{riak_pipe_vnode_worker,wait_for_input,...},...]\\"}">>}'}]}}

3) Crash calculation process:
riak-cs/console.log
2014-08-11 09:22:51.580 [warning]
<0.24095.1>@riak_cs_storage_d:read_storage_schedule1:300 No storage
schedule defined. Calculation must be triggered manually.
2014-08-11 09:22:51.580 [error] <0.438.0> gen_fsm riak_cs_storage_d in
state calculating terminated with reason: no match of right hand value
false in riak_cs_storage:sum_bucket/1 line 104
2014-08-11 09:22:51.580 [error] <0.438.0> CRASH REPORT Process
riak_cs_storage_d with 1 neighbours exited with reason: no match of
right hand value false in riak_cs_storage:sum_bucket/1 line 104 in
gen_fsm:terminate/7 line 611
2014-08-11 09:22:51.581 [error] <0.153.0> Supervisor riak_cs_sup had
child riak_cs_storage_d started with riak_cs_storage_d:start_link() at
<0.438.0> exit with reason no match of right hand value false in
riak_cs_storage:sum_bucket/1 line 104 in context child_terminated

What has been done:

System:
1) RAM upgrade from 30 to 61Gb on every node.
2) add some swap on additional ssd (only to avoid OOM, sysctl
vm.swappiness=0 is set)

Riak configs:
1) increase cache_size in backend config
2) set {mapred_reduce_phase_batch_size, 5000}
3) set {mapred_always_prereduce, true}

Riak-CS configs:
1) set {storage_archive_period, 14400}
2) upgrade to 1.5.0 from 1.4.8

Configs: http://ovh.to/iwTiMby
Last logs: http://ovh.to/AHaASw

I don't know what i must to do now.

-- 
Stanislav

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: very slow write speed to riak-cs

2014-04-03 Thread Stanislav Vlasov
2014-04-03 22:54 GMT+06:00 Luke Bakken :

> Before you go down the path of changing proxies, could you provide logs from
> one instance of your proxy server? They may provide more insight into what's
> going on here. In addition, the config and logs from one Riak CS node would
> be helpful - the command I gave earlier didn't have the final arguments:
>
> tar -czf /tmp/riak-cs-$(hostname).tgz /etc/riak-cs /var/log/riak-cs

Archive is here:
http://ovh.to/qZEjGRt


-- 
Stanislav

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: very slow write speed to riak-cs

2014-04-02 Thread Stanislav Vlasov
2014-04-02 19:38 GMT+06:00 Luke Bakken :

> In your Riak /etc/riak/app.config files, please use the following value:
>
> {pb_backlog, 256},

I try even {pb_backlog, 512} - no changes.

> After changing this, you will have to restart Riak in a rolling fashion.

> Could you please run riak-debug on one node in your cluster and make the
> generated archive available? (dropbox, for instance).

# riak-debug
./usr/sbin/riak-debug:
538: [: =: argument expected
Unable to locate be_default LevelDB data directory. Aborting.
Using riak_kv_eleveldb_backend data_root:

Generated data in archive: http://ovh.to/QmPhSAy

> Also, could you run
> tar -czf /tmp/riak-cs-$(hostname).tgz and make the archive available?

What data you need in archive if not debug info?

> Thanks
> --
> Luke Bakken
> CSE
> lbak...@basho.com
>
>
> On Tue, Apr 1, 2014 at 9:32 PM, Stanislav Vlasov 
> wrote:
>>
>> Hello!
>>
>> I have 8x cluster of riak+riak-cs on debian. Config templates attached
>> Versions:
>> ii  riak1.4.8-1
>> amd64Riak is a distributed data store
>> ii  riak-cs 1.4.5-1
>> amd64Riak CS
>>
>> Every riak-cs connect to local node. Between clients and riak-cs exist
>> frontend (Tengine version: Tengine/1.5.1 (nginx/1.2.9)), config
>> attached
>> Clients - s3cmd + some numbers of php (read-only)
>>
>> When 1-3 clients wants write to riak-cs, write speed is near 3-4MB/sec.
>> If 30-40 clients wants write, write speed slow down to lower than
>> 100kB/sec.
>>
>> In riak-cs crash.log:
>>
>> 2014-04-02 03:52:11 =ERROR REPORT
>> webmachine error:
>> path="/buckets/test/objects/win.img/uploads/PuqEyz0BRCCk6rDxtH7tRQ=="
>>
>> {error,{error,{badmatch,{error,closed}},[{webmachine_request,recv_unchunked_body,3,[{file,"src/webmachine_request.erl"},{line,474}]},{webmachine_request,call,2,[{file,"src/webmachine_request.erl"},{line,193}]},{wrq,stream_req_body,2,[{file,"src/wrq.erl"},{line,121}]},{riak_cs_wm_object_upload_part,accept_body,2,[{file,"src/riak_cs_wm_object_upload_part.erl"},{line,235}]},{riak_cs_wm_common,accept_body,2,[{file,"src/riak_cs_wm_common.erl"},{line,337}]},{webmachine_resource,resource_call,3,[{file,"src/webmachine_resource.erl"},{line,186}]},{webmachine_resource,do,3,[{file,"src/webmachine_resource.erl"},{line,142}]},{webmachine_decision_core,resource_call,1,[{file,"src/webmachine_decision_core.erl"},{line,48}]}]}}
>>
>> [{webmachine_request,recv_unchunked_body,3,[{file,"src/webmachine_request.erl"},{line,474}]},{webmachine_request,call,2,[{file,"src/webmachine_request.erl"},{line,193}]},{wrq,stream_req_body,2,[{file,"src/wrq.erl"},{line,121}]},{riak_cs_wm_object_upload_part,accept_body,2,[{file,"src/riak_cs_wm_object_upload_part.erl"},{line,235}]},{riak_cs_wm_common,accept_body,2,[{file,"src/riak_cs_wm_common.erl"},{line,337}]},{webmachine_resource,resource_call,3,[{file,"src/webmachine_resource.erl"},{line,186}]},{webmachine_resource,do,3,[{file,"src/webmachine_resource.erl"},{line,142}]},{webmachine_decision_core,resource_call,1,[{file,"src/webmachine_decision_core.erl"},{line,48}]}]
>>
>> After this event s3cmd makes throttling to slower speed:
>>
>> $ s3cmd put win.img s3://test/
>> win.img -> s3://test/win.img  [part 1 of 1366, 15MB]
>>184320 of 15728640 1% in0s 2.16 MB/s  failed
>> WARNING: Upload failed:
>> /win.img?partNumber=1&uploadId=PuqEyz0BRCCk6rDxtH7tRQ== ([Errno 104]
>> Connection reset by peer)
>> WARNING: Retrying on lower speed (throttle=0.00)
>> WARNING: Waiting 3 sec...
>> win.img -> s3://test/win.img  [part 1 of 1366, 15MB]
>>  13799424 of 1572864087% in2s 5.18 MB/s  failed
>> WARNING: Upload failed:
>> /win.img?partNumber=1&uploadId=PuqEyz0BRCCk6rDxtH7tRQ== ([Errno 104]
>> Connection reset by peer)
>> WARNING: Retrying on lower speed (throttle=0.01)
>> WARNING: Waiting 6 sec...
>> win.img -> s3://test/win.img  [part 1 of 1366, 15MB]
>>167936 of 15728640 1% in0s   249.46 kB/s  failed
>> WARNING: Upload failed:
>> /win.img?partNumber=1&uploadId=PuqEyz0BRCCk6rDxtH7tRQ== ([Errno 104]
>> Connection reset by peer)
>> WARNING: Retrying on lower speed (throttle=0.05)
>> WARNING: Waiting 9 sec...
>> win.img -> s3://test/win.img  [part 1 of 1366, 15MB]
>>   6225920 of 1572864039% in   76s79.51 kB/s  failed
>> WARNING: Upload failed:
>> /win.img?partNumber

Re: very slow write speed to riak-cs

2014-04-02 Thread Stanislav Vlasov
2014-04-03 1:36 GMT+06:00 Seth Thomas :

> Could you also include your riak app.config and vm.args. It seems like
> you're load balancing Riak CS but I'm curious how the underlying Riak
> topology looks as well since that will likely be where the performance
> bottlenecks are uncovered.

Config templates attached.

> On Wed, Apr 2, 2014 at 6:38 AM, Luke Bakken  wrote:
>>
>> Hi Stanislav,
>>
>> In your Riak /etc/riak/app.config files, please use the following value:
>>
>> {pb_backlog, 256},
>>
>> After changing this, you will have to restart Riak in a rolling fashion.
>>
>> Could you please run riak-debug on one node in your cluster and make the
>> generated archive available? (dropbox, for instance). Also, could you run
>> tar -czf /tmp/riak-cs-$(hostname).tgz and make the archive available?
>>
>> Thanks
>> --
>> Luke Bakken
>> CSE
>> lbak...@basho.com
>>
>>
>> On Tue, Apr 1, 2014 at 9:32 PM, Stanislav Vlasov 
>> wrote:
>>>
>>> Hello!
>>>
>>> I have 8x cluster of riak+riak-cs on debian. Config templates attached
>>> Versions:
>>> ii  riak1.4.8-1
>>> amd64Riak is a distributed data store
>>> ii  riak-cs 1.4.5-1
>>> amd64Riak CS
>>>
>>> Every riak-cs connect to local node. Between clients and riak-cs exist
>>> frontend (Tengine version: Tengine/1.5.1 (nginx/1.2.9)), config
>>> attached
>>> Clients - s3cmd + some numbers of php (read-only)
>>>
>>> When 1-3 clients wants write to riak-cs, write speed is near 3-4MB/sec.
>>> If 30-40 clients wants write, write speed slow down to lower than
>>> 100kB/sec.
>>>
>>> In riak-cs crash.log:
>>>
>>> 2014-04-02 03:52:11 =ERROR REPORT
>>> webmachine error:
>>> path="/buckets/test/objects/win.img/uploads/PuqEyz0BRCCk6rDxtH7tRQ=="
>>>
>>> {error,{error,{badmatch,{error,closed}},[{webmachine_request,recv_unchunked_body,3,[{file,"src/webmachine_request.erl"},{line,474}]},{webmachine_request,call,2,[{file,"src/webmachine_request.erl"},{line,193}]},{wrq,stream_req_body,2,[{file,"src/wrq.erl"},{line,121}]},{riak_cs_wm_object_upload_part,accept_body,2,[{file,"src/riak_cs_wm_object_upload_part.erl"},{line,235}]},{riak_cs_wm_common,accept_body,2,[{file,"src/riak_cs_wm_common.erl"},{line,337}]},{webmachine_resource,resource_call,3,[{file,"src/webmachine_resource.erl"},{line,186}]},{webmachine_resource,do,3,[{file,"src/webmachine_resource.erl"},{line,142}]},{webmachine_decision_core,resource_call,1,[{file,"src/webmachine_decision_core.erl"},{line,48}]}]}}
>>>
>>> [{webmachine_request,recv_unchunked_body,3,[{file,"src/webmachine_request.erl"},{line,474}]},{webmachine_request,call,2,[{file,"src/webmachine_request.erl"},{line,193}]},{wrq,stream_req_body,2,[{file,"src/wrq.erl"},{line,121}]},{riak_cs_wm_object_upload_part,accept_body,2,[{file,"src/riak_cs_wm_object_upload_part.erl"},{line,235}]},{riak_cs_wm_common,accept_body,2,[{file,"src/riak_cs_wm_common.erl"},{line,337}]},{webmachine_resource,resource_call,3,[{file,"src/webmachine_resource.erl"},{line,186}]},{webmachine_resource,do,3,[{file,"src/webmachine_resource.erl"},{line,142}]},{webmachine_decision_core,resource_call,1,[{file,"src/webmachine_decision_core.erl"},{line,48}]}]
>>>
>>> After this event s3cmd makes throttling to slower speed:
>>>
>>> $ s3cmd put win.img s3://test/
>>> win.img -> s3://test/win.img  [part 1 of 1366, 15MB]
>>>184320 of 15728640 1% in0s 2.16 MB/s  failed
>>> WARNING: Upload failed:
>>> /win.img?partNumber=1&uploadId=PuqEyz0BRCCk6rDxtH7tRQ== ([Errno 104]
>>> Connection reset by peer)
>>> WARNING: Retrying on lower speed (throttle=0.00)
>>> WARNING: Waiting 3 sec...
>>> win.img -> s3://test/win.img  [part 1 of 1366, 15MB]
>>>  13799424 of 1572864087% in2s 5.18 MB/s  failed
>>> WARNING: Upload failed:
>>> /win.img?partNumber=1&uploadId=PuqEyz0BRCCk6rDxtH7tRQ== ([Errno 104]
>>> Connection reset by peer)
>>> WARNING: Retrying on lower speed (throttle=0.01)
>>> WARNING: Waiting 6 sec...
>>> win.img -> s3://test/win.img  [part 1 of 1366, 15MB]
>>>167936 of 15728640 1% in0s   249.46 kB/s  failed
>>> WARNING: Upload failed:
>>> /win.img?partNumber=1&uploadId=PuqEyz0BRCCk6rDxtH7tRQ== ([Errno 104]
>>> Connec

very slow write speed to riak-cs

2014-04-01 Thread Stanislav Vlasov
Hello!

I have 8x cluster of riak+riak-cs on debian. Config templates attached
Versions:
ii  riak1.4.8-1
amd64Riak is a distributed data store
ii  riak-cs 1.4.5-1
amd64Riak CS

Every riak-cs connect to local node. Between clients and riak-cs exist
frontend (Tengine version: Tengine/1.5.1 (nginx/1.2.9)), config
attached
Clients - s3cmd + some numbers of php (read-only)

When 1-3 clients wants write to riak-cs, write speed is near 3-4MB/sec.
If 30-40 clients wants write, write speed slow down to lower than 100kB/sec.

In riak-cs crash.log:

2014-04-02 03:52:11 =ERROR REPORT
webmachine error:
path="/buckets/test/objects/win.img/uploads/PuqEyz0BRCCk6rDxtH7tRQ=="
{error,{error,{badmatch,{error,closed}},[{webmachine_request,recv_unchunked_body,3,[{file,"src/webmachine_request.erl"},{line,474}]},{webmachine_request,call,2,[{file,"src/webmachine_request.erl"},{line,193}]},{wrq,stream_req_body,2,[{file,"src/wrq.erl"},{line,121}]},{riak_cs_wm_object_upload_part,accept_body,2,[{file,"src/riak_cs_wm_object_upload_part.erl"},{line,235}]},{riak_cs_wm_common,accept_body,2,[{file,"src/riak_cs_wm_common.erl"},{line,337}]},{webmachine_resource,resource_call,3,[{file,"src/webmachine_resource.erl"},{line,186}]},{webmachine_resource,do,3,[{file,"src/webmachine_resource.erl"},{line,142}]},{webmachine_decision_core,resource_call,1,[{file,"src/webmachine_decision_core.erl"},{line,48}]}]}}
[{webmachine_request,recv_unchunked_body,3,[{file,"src/webmachine_request.erl"},{line,474}]},{webmachine_request,call,2,[{file,"src/webmachine_request.erl"},{line,193}]},{wrq,stream_req_body,2,[{file,"src/wrq.erl"},{line,121}]},{riak_cs_wm_object_upload_part,accept_body,2,[{file,"src/riak_cs_wm_object_upload_part.erl"},{line,235}]},{riak_cs_wm_common,accept_body,2,[{file,"src/riak_cs_wm_common.erl"},{line,337}]},{webmachine_resource,resource_call,3,[{file,"src/webmachine_resource.erl"},{line,186}]},{webmachine_resource,do,3,[{file,"src/webmachine_resource.erl"},{line,142}]},{webmachine_decision_core,resource_call,1,[{file,"src/webmachine_decision_core.erl"},{line,48}]}]

After this event s3cmd makes throttling to slower speed:

$ s3cmd put win.img s3://test/
win.img -> s3://test/win.img  [part 1 of 1366, 15MB]
   184320 of 15728640 1% in0s 2.16 MB/s  failed
WARNING: Upload failed:
/win.img?partNumber=1&uploadId=PuqEyz0BRCCk6rDxtH7tRQ== ([Errno 104]
Connection reset by peer)
WARNING: Retrying on lower speed (throttle=0.00)
WARNING: Waiting 3 sec...
win.img -> s3://test/win.img  [part 1 of 1366, 15MB]
 13799424 of 1572864087% in2s 5.18 MB/s  failed
WARNING: Upload failed:
/win.img?partNumber=1&uploadId=PuqEyz0BRCCk6rDxtH7tRQ== ([Errno 104]
Connection reset by peer)
WARNING: Retrying on lower speed (throttle=0.01)
WARNING: Waiting 6 sec...
win.img -> s3://test/win.img  [part 1 of 1366, 15MB]
   167936 of 15728640 1% in0s   249.46 kB/s  failed
WARNING: Upload failed:
/win.img?partNumber=1&uploadId=PuqEyz0BRCCk6rDxtH7tRQ== ([Errno 104]
Connection reset by peer)
WARNING: Retrying on lower speed (throttle=0.05)
WARNING: Waiting 9 sec...
win.img -> s3://test/win.img  [part 1 of 1366, 15MB]
  6225920 of 1572864039% in   76s79.51 kB/s  failed
WARNING: Upload failed:
/win.img?partNumber=1&uploadId=PuqEyz0BRCCk6rDxtH7tRQ== ([Errno 104]
Connection reset by peer)
WARNING: Retrying on lower speed (throttle=0.25)
WARNING: Waiting 12 sec...
win.img -> s3://test/win.img  [part 1 of 1366, 15MB]
 15728640 of 15728640   100% in  962s15.96 kB/s  done

I think, even on 1Gbit network betwen nodes, write speed should be
higher, but i don't understand where the bottleneck.

-- 
Stanislav


app.config.template
Description: Binary data


vm.args.template
Description: Binary data


riak-cs-nginx
Description: Binary data
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com