Choose/Wants claim functions.

2014-04-10 Thread Guido Medina

Hi,

What's the latest non-standard version of this function? v3 right? If 
Basho adds more versions to this, is this somewhere documented?


For our nodes standard choose/wants claim functions were doing a weird 
distribution so the numbers even out a bit better (just a bit better) by 
using v3, so it would be nice to know if improvements are done in this 
area and where they are being documented.


As of the latest 
http://docs.basho.com/riak/latest/ops/advanced/configs/configuration-files/ 
both parameters have no default where my understanding is that the 
default for both is v2.


Regards,

Guido.

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: RIAK 1.4.6 - Mass key deletion

2014-04-10 Thread Edgar Veiga
Hi Matthew!

I have a possibility of moving the data of anti-entropy directory to a
mechanic disk 7200, that exists on each of the machines. I was thinking of
changing the anti_entropy data dir config in app.config file and restart
the riak process.

Is there any problem using a mechanic disk to store the anti-entropy data?

Best regards!


On 8 April 2014 23:58, Edgar Veiga  wrote:

> I'll wait a few more days, see if the AAE maybe "stabilises" and only
> after that make a decision regarding this.
> The cluster expanding was on the roadmap, but not right now :)
>
> I've attached a few screenshot, you can clearly observe  the evolution of
> one of the machines after the anti-entropy data removal and consequent
> restart  (5th of April).
>
> https://cloudup.com/cB0a15lCMeS
>
> Best regards!
>
>
> On 8 April 2014 23:44, Matthew Von-Maszewski  wrote:
>
>> No.  I do not see a problem with your plan.  But ...
>>
>> I would prefer to see you add servers to your cluster.  Scalabilty is one
>> of Riak's fundamental characteristics.  As your database needs grow, we
>> grow with you ... just add another server and migrate some of the vnodes
>> there.
>>
>> I obviously cannot speak to your budgetary constraints.  All of the
>> engineers at Basho, I am just one, are focused upon providing you
>> performance and features along with your scalability needs.  This seems to
>> be a situation where you might be sacrificing data integrity where another
>> server or two would address the situation.
>>
>> And if 2.0 makes things better ... sell the extra servers on Ebay.
>>
>> Matthew
>>
>>
>> On Apr 8, 2014, at 6:31 PM, Edgar Veiga  wrote:
>>
>> Thanks Matthew!
>>
>> Today this situation has become unsustainable, In two of the machines I
>> have an anti-entropy dir of 250G... It just keeps growing and growing and
>> I'm almost reaching max size of the disks.
>>
>> Maybe I'll just turn off aae in the cluster, remove all the data in the
>> anti-entropy directory and wait for the v2 of riak. Do you see any problem
>> with this?
>>
>> Best regards!
>>
>>
>> On 8 April 2014 22:11, Matthew Von-Maszewski  wrote:
>>
>>> Edgar,
>>>
>>> Today we disclosed a new feature for Riak's leveldb, Tiered Storage.
>>>  The details are here:
>>>
>>> https://github.com/basho/leveldb/wiki/mv-tiered-options
>>>
>>> This feature might give you another option in managing your storage
>>> volume.
>>>
>>>
>>> Matthew
>>>
>>> On Apr 8, 2014, at 11:07 AM, Edgar Veiga  wrote:
>>>
>>> It makes sense, I do a lot, and I really mean a LOT of updates per key,
>>> maybe thousands a day! The cluster is experiencing a lot more updates per
>>> each key, than new keys being inserted.
>>>
>>> The hash trees will rebuild during the next weekend (normally it takes
>>> about two days to complete the operation) so I'll come back and give you
>>> some feedback (hopefully good) on the next Monday!
>>>
>>> Again, thanks a lot, You've been very helpful.
>>> Edgar
>>>
>>>
>>> On 8 April 2014 15:47, Matthew Von-Maszewski  wrote:
>>>
 Edgar,

 The test I have running currently has reach 1 Billion keys.  It is
 running against a single node with N=1.  It has 42G of AAE data.  Here is
 my extrapolation to compare your numbers:

 You have ~2.5 Billion keys.  I assume you are running N=3 (the
 default).  AAE therefore is actually tracking ~7.5 Billion keys.  You have
 six nodes, therefore tracking ~1.25 Billion keys per node.

 Raw math would suggest that my 42G of AAE data for 1 billion keys would
 extrapolate to 52.5G of AAE data for you.  Yet you have ~120G of AAE data.
  Is something wrong?  No.  My data is still loading and has experience zero
 key/value updates/edits.

 AAE hashes get rewritten every time a user updates the value of a key.
  AAE's leveldb is just like the user leveldb, all prior values of a key
 accumulate in the .sst table files until compaction removes duplicates.
  Similarly, a user delete of a key causes a delete tombstone in the AAE
 hash tree.  Those delete tombstones have to await compactions too before
 leveldb recovers the disk space.

 AAE's hash trees rebuild weekly.  I am told that the rebuild operation
 will actually destroy the existing files and start over.  That is when you
 should see AAE space usage dropping dramatically.

 Matthew


 On Apr 8, 2014, at 9:31 AM, Edgar Veiga  wrote:

 Thanks a lot Matthew!

 A little bit of more info, I've gathered a sample of the contents of
 anti-entropy data of one of my machines:
 - 44 folders with the name equal to the name of the folders in level-db
 dir (i.e. 393920363186844927172086927568060657641638068224/)
 - each folder has a 5 files (log, current, log, etc) and 5 sst_*
 folders.
 - The biggest sst folder is sst_3 with 4.3G
 - Inside sst_3 folder there are 1219 files name 00.sst.
 - Each of the 00*.sst files has ~3.7M

>>

Re: RIAK 1.4.6 - Mass key deletion

2014-04-10 Thread Matthew Von-Maszewski
Yes, you can send the AAE (active anti-entropy) data to a different disk.  

AAE calculates a hash each time you PUT new data to the regular database.  AAE 
then buffers around 1,000 hashes (I forget the exact value) to write as a block 
to the AAE database.  The AAE write is NOT in series with the user database 
writes.  Your throughput should not be impacted.  But this is not something I 
have personally measured/validated.

Matthew


On Apr 10, 2014, at 7:33 AM, Edgar Veiga  wrote:

> Hi Matthew!
> 
> I have a possibility of moving the data of anti-entropy directory to a 
> mechanic disk 7200, that exists on each of the machines. I was thinking of 
> changing the anti_entropy data dir config in app.config file and restart the 
> riak process.
> 
> Is there any problem using a mechanic disk to store the anti-entropy data?
> 
> Best regards!
> 
> 
> On 8 April 2014 23:58, Edgar Veiga  wrote:
> I'll wait a few more days, see if the AAE maybe "stabilises" and only after 
> that make a decision regarding this.
> The cluster expanding was on the roadmap, but not right now :)
> 
> I've attached a few screenshot, you can clearly observe  the evolution of one 
> of the machines after the anti-entropy data removal and consequent restart  
> (5th of April).
> 
> https://cloudup.com/cB0a15lCMeS
> 
> Best regards!
> 
> 
> On 8 April 2014 23:44, Matthew Von-Maszewski  wrote:
> No.  I do not see a problem with your plan.  But ...
> 
> I would prefer to see you add servers to your cluster.  Scalabilty is one of 
> Riak's fundamental characteristics.  As your database needs grow, we grow 
> with you … just add another server and migrate some of the vnodes there.
> 
> I obviously cannot speak to your budgetary constraints.  All of the engineers 
> at Basho, I am just one, are focused upon providing you performance and 
> features along with your scalability needs.  This seems to be a situation 
> where you might be sacrificing data integrity where another server or two 
> would address the situation.
> 
> And if 2.0 makes things better … sell the extra servers on Ebay.
> 
> Matthew
> 
> 
> On Apr 8, 2014, at 6:31 PM, Edgar Veiga  wrote:
> 
>> Thanks Matthew!
>> 
>> Today this situation has become unsustainable, In two of the machines I have 
>> an anti-entropy dir of 250G... It just keeps growing and growing and I'm 
>> almost reaching max size of the disks.
>> 
>> Maybe I'll just turn off aae in the cluster, remove all the data in the 
>> anti-entropy directory and wait for the v2 of riak. Do you see any problem 
>> with this?
>> 
>> Best regards!
>> 
>> 
>> On 8 April 2014 22:11, Matthew Von-Maszewski  wrote:
>> Edgar,
>> 
>> Today we disclosed a new feature for Riak's leveldb, Tiered Storage.  The 
>> details are here:
>> 
>> https://github.com/basho/leveldb/wiki/mv-tiered-options
>> 
>> This feature might give you another option in managing your storage volume. 
>> 
>> 
>> Matthew
>> 
>>> On Apr 8, 2014, at 11:07 AM, Edgar Veiga  wrote:
>>> 
 It makes sense, I do a lot, and I really mean a LOT of updates per key, 
 maybe thousands a day! The cluster is experiencing a lot more updates per 
 each key, than new keys being inserted.
 
 The hash trees will rebuild during the next weekend (normally it takes 
 about two days to complete the operation) so I'll come back and give you 
 some feedback (hopefully good) on the next Monday!
 
 Again, thanks a lot, You've been very helpful.
 Edgar
 
 
 On 8 April 2014 15:47, Matthew Von-Maszewski  wrote:
 Edgar,
 
 The test I have running currently has reach 1 Billion keys.  It is running 
 against a single node with N=1.  It has 42G of AAE data.  Here is my 
 extrapolation to compare your numbers:
 
 You have ~2.5 Billion keys.  I assume you are running N=3 (the default).  
 AAE therefore is actually tracking ~7.5 Billion keys.  You have six nodes, 
 therefore tracking ~1.25 Billion keys per node.
 
 Raw math would suggest that my 42G of AAE data for 1 billion keys would 
 extrapolate to 52.5G of AAE data for you.  Yet you have ~120G of AAE data. 
  Is something wrong?  No.  My data is still loading and has experience 
 zero key/value updates/edits.
 
 AAE hashes get rewritten every time a user updates the value of a key.  
 AAE's leveldb is just like the user leveldb, all prior values of a key 
 accumulate in the .sst table files until compaction removes duplicates.  
 Similarly, a user delete of a key causes a delete tombstone in the AAE 
 hash tree.  Those delete tombstones have to await compactions too before 
 leveldb recovers the disk space.
 
 AAE's hash trees rebuild weekly.  I am told that the rebuild operation 
 will actually destroy the existing files and start over.  That is when you 
 should see AAE space usage dropping dramatically.
 
 Matthew
 
 
 On Apr 8, 2014, at 9:31 AM, Edgar Veiga  wrote:
>>

Re: Rebuilding AAE hashes - small question

2014-04-10 Thread Engel Sanchez
Hey there. There are a couple of things to keep in mind when deleting
invalid AAE trees from the 1.4.3-1.4.7 series after upgrading to 1.4.8:

* If AAE is disabled, you don't have to stop the node to delete the data in
the anti_entropy directories
* If AAE is enabled, deleting the AAE data in a rolling manner may trigger
an avalanche of read repairs between nodes with the bad trees and nodes
with good trees as the data seems to diverge.

If your nodes are already up, with AAE enabled and with old incorrect trees
in the mix, there is a better way.  You can dynamically disable AAE with
some console commands. At that point, without stopping the nodes, you can
delete all AAE data across the cluster.  At a convenient time, re-enable
AAE.  I say convenient because all trees will start to rebuild, and that
can be problematic in an overloaded cluster.  Doing this over the weekend
might be a good idea unless your cluster can take the extra load.

To dynamically disable AAE from the Riak console, you can run this command:

> riak_core_util:rpc_every_member_ann(riak_kv_entropy_manager, disable, [],
6).

and enable with the similar:

> riak_core_util:rpc_every_member_ann(riak_kv_entropy_manager, enable, [],
6).

That last number is just a timeout for the RPC operation.  I hope this
saves you some extra load on your clusters.


On Wed, Apr 9, 2014 at 11:02 AM, Luke Bakken  wrote:

> Hi Guido,
>
> I specifically meant riak-admin transfers however using riak-admin
> wait-for-service riak_kv riak@node is a good first step before waiting
> for transfers.
>
> Thanks!
>
> --
> Luke Bakken
> CSE
> lbak...@basho.com
>
>
> On Wed, Apr 9, 2014 at 7:54 AM, Guido Medina wrote:
>
>>  What do you mean by "wait for handoff" to finish?
>>
>> Are you referring to wait for the service to be fully started? i.e.
>> "riak-admin wait-for-service riak_kv riak@node"
>>
>> Or do you mean to check for "riak-admin transfers" on the started node
>> and wait until those handoffs/transfers are gone?
>>
>> Guido.
>>
>>
>>
>> On 09/04/14 15:46, Luke Bakken wrote:
>>
>> Hi Guido,
>>
>>  That is the correct process. Be sure to use the rolling restart
>> procedure when restarting nodes (i.e. wait for handoff to finish before
>> moving on).
>>
>> --
>> Luke Bakken
>> CSE
>> lbak...@basho.com
>>
>>
>> On Wed, Apr 9, 2014 at 6:34 AM, Guido Medina wrote:
>>
>>>  Hi,
>>>
>>> If nodes are already upgraded to 1.4.8 (and they went all the way from
>>> 1.4.0 to 1.4.8 including AAE buggy versions)
>>>
>>> Will the following command (as root) on Ubuntu Servers 12.04:
>>>
>>> riak stop; rm -Rf /var/lib/riak/anti_entropy/*; riak start
>>>
>>> executed on each node be enough to rebuild AAE hashes?
>>>
>>> Regards,
>>>
>>> Guido.
>>>
>>> ___
>>> riak-users mailing list
>>> riak-users@lists.basho.com
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>
>>>
>>
>>
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: RIAK 1.4.6 - Mass key deletion

2014-04-10 Thread Edgar Veiga
Thanks, I'll start the process and give you guys some feedback in the mean
while.

The plan is
1 - Disable AAE in the cluster via riak attach:


a.
rpc:multicall(riak_kv_entropy_manager, disable, []).
rpc:multicall(riak_kv_entropy_manager, cancel_exchanges, []).
z.

2 - Update the app.config changing the aae dir to the mechanic disk;

3 - Restart riak process in each machine one by one;

4 - Remove old aae data;

By the way, I've seen here in the list different ways of disabling the
aae via riak attach... The former one is the most complete. What does
the a. and z. stand for? I've been disabling the aae just running
"rpc:multicall(riak_kv_entropy_manager, disable, []).", is there any
difference if we ignore the a., z. and the cancel_exchanges?

Best regards!



On 10 April 2014 13:41, Matthew Von-Maszewski  wrote:

> Yes, you can send the AAE (active anti-entropy) data to a different disk.
>
> AAE calculates a hash each time you PUT new data to the regular database.
>  AAE then buffers around 1,000 hashes (I forget the exact value) to write
> as a block to the AAE database.  The AAE write is NOT in series with the
> user database writes.  Your throughput should not be impacted.  But this is
> not something I have personally measured/validated.
>
> Matthew
>
>
> On Apr 10, 2014, at 7:33 AM, Edgar Veiga  wrote:
>
> Hi Matthew!
>
> I have a possibility of moving the data of anti-entropy directory to a
> mechanic disk 7200, that exists on each of the machines. I was thinking of
> changing the anti_entropy data dir config in app.config file and restart
> the riak process.
>
> Is there any problem using a mechanic disk to store the anti-entropy data?
>
> Best regards!
>
>
> On 8 April 2014 23:58, Edgar Veiga  wrote:
>
>> I'll wait a few more days, see if the AAE maybe "stabilises" and only
>> after that make a decision regarding this.
>> The cluster expanding was on the roadmap, but not right now :)
>>
>> I've attached a few screenshot, you can clearly observe  the evolution of
>> one of the machines after the anti-entropy data removal and consequent
>> restart  (5th of April).
>>
>> https://cloudup.com/cB0a15lCMeS
>>
>> Best regards!
>>
>>
>> On 8 April 2014 23:44, Matthew Von-Maszewski  wrote:
>>
>>> No.  I do not see a problem with your plan.  But ...
>>>
>>> I would prefer to see you add servers to your cluster.  Scalabilty is
>>> one of Riak's fundamental characteristics.  As your database needs grow, we
>>> grow with you ... just add another server and migrate some of the vnodes
>>> there.
>>>
>>> I obviously cannot speak to your budgetary constraints.  All of the
>>> engineers at Basho, I am just one, are focused upon providing you
>>> performance and features along with your scalability needs.  This seems to
>>> be a situation where you might be sacrificing data integrity where another
>>> server or two would address the situation.
>>>
>>> And if 2.0 makes things better ... sell the extra servers on Ebay.
>>>
>>> Matthew
>>>
>>>
>>> On Apr 8, 2014, at 6:31 PM, Edgar Veiga  wrote:
>>>
>>> Thanks Matthew!
>>>
>>> Today this situation has become unsustainable, In two of the machines I
>>> have an anti-entropy dir of 250G... It just keeps growing and growing and
>>> I'm almost reaching max size of the disks.
>>>
>>> Maybe I'll just turn off aae in the cluster, remove all the data in the
>>> anti-entropy directory and wait for the v2 of riak. Do you see any problem
>>> with this?
>>>
>>> Best regards!
>>>
>>>
>>> On 8 April 2014 22:11, Matthew Von-Maszewski  wrote:
>>>
 Edgar,

 Today we disclosed a new feature for Riak's leveldb, Tiered Storage.
  The details are here:

 https://github.com/basho/leveldb/wiki/mv-tiered-options

 This feature might give you another option in managing your storage
 volume.


 Matthew

 On Apr 8, 2014, at 11:07 AM, Edgar Veiga  wrote:

 It makes sense, I do a lot, and I really mean a LOT of updates per key,
 maybe thousands a day! The cluster is experiencing a lot more updates per
 each key, than new keys being inserted.

 The hash trees will rebuild during the next weekend (normally it takes
 about two days to complete the operation) so I'll come back and give you
 some feedback (hopefully good) on the next Monday!

 Again, thanks a lot, You've been very helpful.
 Edgar


 On 8 April 2014 15:47, Matthew Von-Maszewski wrote:

> Edgar,
>
> The test I have running currently has reach 1 Billion keys.  It is
> running against a single node with N=1.  It has 42G of AAE data.  Here is
> my extrapolation to compare your numbers:
>
> You have ~2.5 Billion keys.  I assume you are running N=3 (the
> default).  AAE therefore is actually tracking ~7.5 Billion keys.  You have
> six nodes, therefore tracking ~1.25 Billion keys per node.
>
> Raw math would suggest that my 42G of AAE data for 1 billion keys
> would extrapolate to 52.

Re: Choose/Wants claim functions.

2014-04-10 Thread Alex Moore
Hi Guido,

> What's the latest non-standard version of this function? v3 right? If Basho 
> adds more versions to this, is this somewhere documented?

> For our nodes standard choose/wants claim functions were doing a weird 
> distribution so the numbers even out a bit better (just a bit better) by 
> using v3, so it would be nice to know if improvements are done in this area 
> and where they are being documented.

v3 would be the “latest non-standard” version of this function. It works better 
than v2 for balancing nodes but it has a performance caveat with larger ring 
sizes, which is why we still default to v2.  I will address the documentation 
issue of this, but for now the source code is the best documentation (see below 
for links).

> As of the latest 
> http://docs.basho.com/riak/latest/ops/advanced/configs/configuration-files/ 
> both parameters have no default where my understanding is that the default 
> for both is v2.

So in the typical case, the default will be v2, via the `default_wants_claim` 
and `default_choose_claim` functions in  `riak_core_claim.erl`.  If you’re 
running a legacy ring, it will default to v1 instead.
https://github.com/basho/riak_core/blob/1.4.4/src/riak_core_claim.erl#L119-L125
https://github.com/basho/riak_core/blob/1.4.4/src/riak_core_claim.erl#L140-L146

I’ve put in a docs issue to get the documentation clarified.
https://github.com/basho/basho_docs/issues/1017

Thanks,
Alex

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Rebuilding AAE hashes - small question

2014-04-10 Thread Guido Medina

Thanks Engel,

That approach looks very accurate, I would only suggest to have a 
riak-admin cluster stop-aae and similar for start, for the dummies ;-)


Guido.

On 10/04/14 14:22, Engel Sanchez wrote:
Hey there. There are a couple of things to keep in mind when deleting 
invalid AAE trees from the 1.4.3-1.4.7 series after upgrading to 1.4.8:


* If AAE is disabled, you don't have to stop the node to delete the 
data in the anti_entropy directories
* If AAE is enabled, deleting the AAE data in a rolling manner may 
trigger an avalanche of read repairs between nodes with the bad trees 
and nodes with good trees as the data seems to diverge.


If your nodes are already up, with AAE enabled and with old incorrect 
trees in the mix, there is a better way.  You can dynamically disable 
AAE with some console commands. At that point, without stopping the 
nodes, you can delete all AAE data across the cluster.  At a 
convenient time, re-enable AAE.  I say convenient because all trees 
will start to rebuild, and that can be problematic in an overloaded 
cluster.  Doing this over the weekend might be a good idea unless your 
cluster can take the extra load.


To dynamically disable AAE from the Riak console, you can run this 
command:


> riak_core_util:rpc_every_member_ann(riak_kv_entropy_manager, 
disable, [], 6).


and enable with the similar:

> riak_core_util:rpc_every_member_ann(riak_kv_entropy_manager, enable, 
[], 6).


That last number is just a timeout for the RPC operation.  I hope this 
saves you some extra load on your clusters.



On Wed, Apr 9, 2014 at 11:02 AM, Luke Bakken > wrote:


Hi Guido,

I specifically meant riak-admin transfers however using riak-admin
wait-for-service riak_kv riak@node is a good first step before
waiting for transfers.

Thanks!

--
Luke Bakken
CSE
lbak...@basho.com 


On Wed, Apr 9, 2014 at 7:54 AM, Guido Medina
mailto:guido.med...@temetra.com>> wrote:

What do you mean by "wait for handoff" to finish?

Are you referring to wait for the service to be fully started?
i.e. "riak-admin wait-for-service riak_kv riak@node"

Or do you mean to check for "riak-admin transfers" on the
started node and wait until those handoffs/transfers are gone?

Guido.



On 09/04/14 15:46, Luke Bakken wrote:

Hi Guido,

That is the correct process. Be sure to use the rolling
restart procedure when restarting nodes (i.e. wait for
handoff to finish before moving on).

--
Luke Bakken
CSE
lbak...@basho.com 


On Wed, Apr 9, 2014 at 6:34 AM, Guido Medina
mailto:guido.med...@temetra.com>>
wrote:

Hi,

If nodes are already upgraded to 1.4.8 (and they went all
the way from 1.4.0 to 1.4.8 including AAE buggy versions)

Will the following command (as root) on Ubuntu Servers 12.04:

riak stop; rm -Rf /var/lib/riak/anti_entropy/*; riak
start

executed on each node be enough to rebuild AAE hashes?

Regards,

Guido.

___
riak-users mailing list
riak-users@lists.basho.com

http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com





___
riak-users mailing list
riak-users@lists.basho.com 
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



___
riak-users mailing list
riak-users@lists.basho.com 
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com




___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: very slow write speed to riak-cs

2014-04-10 Thread Luke Bakken
For the mailing list's reference, this issue has been resolved by the following:

* Increase pb_backlog to 256 in the Riak app.config on all nodes
* Increase +zdbbl to 96000 in the Riak vm.args on all nodes
* Switch proxies from tengine (patched nginx) to HAProxy
* Reduce ring size from 256 to 128, since there are 8 nodes (with a
plan to expand later, but not beyond 16 nodes). Node memory increased
to 32GB from 14GB per node
* Tune leveldb memory usage according to this spreadsheet:
https://github.com/basho/basho_docs/raw/master/source/data/leveldb_sizing_1.4.xls
* Increase net.core.somaxconn to 4 (all other Linux tunings were
present from http://docs.basho.com/riak/latest/ops/tuning/linux/)

--
Luke Bakken
CSE
lbak...@basho.com


On Tue, Apr 1, 2014 at 9:32 PM, Stanislav Vlasov
 wrote:
>
> Hello!
>
> I have 8x cluster of riak+riak-cs on debian. Config templates attached
> Versions:
> ii  riak1.4.8-1
> amd64Riak is a distributed data store
> ii  riak-cs 1.4.5-1
> amd64Riak CS
>
> Every riak-cs connect to local node. Between clients and riak-cs exist
> frontend (Tengine version: Tengine/1.5.1 (nginx/1.2.9)), config
> attached
> Clients - s3cmd + some numbers of php (read-only)
>
> When 1-3 clients wants write to riak-cs, write speed is near 3-4MB/sec.
> If 30-40 clients wants write, write speed slow down to lower than 100kB/sec.
>
> In riak-cs crash.log:
>
> 2014-04-02 03:52:11 =ERROR REPORT
> webmachine error:
> path="/buckets/test/objects/win.img/uploads/PuqEyz0BRCCk6rDxtH7tRQ=="
> {error,{error,{badmatch,{error,closed}},[{webmachine_request,recv_unchunked_body,3,[{file,"src/webmachine_request.erl"},{line,474}]},{webmachine_request,call,2,[{file,"src/webmachine_request.erl"},{line,193}]},{wrq,stream_req_body,2,[{file,"src/wrq.erl"},{line,121}]},{riak_cs_wm_object_upload_part,accept_body,2,[{file,"src/riak_cs_wm_object_upload_part.erl"},{line,235}]},{riak_cs_wm_common,accept_body,2,[{file,"src/riak_cs_wm_common.erl"},{line,337}]},{webmachine_resource,resource_call,3,[{file,"src/webmachine_resource.erl"},{line,186}]},{webmachine_resource,do,3,[{file,"src/webmachine_resource.erl"},{line,142}]},{webmachine_decision_core,resource_call,1,[{file,"src/webmachine_decision_core.erl"},{line,48}]}]}}
> [{webmachine_request,recv_unchunked_body,3,[{file,"src/webmachine_request.erl"},{line,474}]},{webmachine_request,call,2,[{file,"src/webmachine_request.erl"},{line,193}]},{wrq,stream_req_body,2,[{file,"src/wrq.erl"},{line,121}]},{riak_cs_wm_object_upload_part,accept_body,2,[{file,"src/riak_cs_wm_object_upload_part.erl"},{line,235}]},{riak_cs_wm_common,accept_body,2,[{file,"src/riak_cs_wm_common.erl"},{line,337}]},{webmachine_resource,resource_call,3,[{file,"src/webmachine_resource.erl"},{line,186}]},{webmachine_resource,do,3,[{file,"src/webmachine_resource.erl"},{line,142}]},{webmachine_decision_core,resource_call,1,[{file,"src/webmachine_decision_core.erl"},{line,48}]}]
>
> After this event s3cmd makes throttling to slower speed:
>
> $ s3cmd put win.img s3://test/
> win.img -> s3://test/win.img  [part 1 of 1366, 15MB]
>184320 of 15728640 1% in0s 2.16 MB/s  failed
> WARNING: Upload failed:
> /win.img?partNumber=1&uploadId=PuqEyz0BRCCk6rDxtH7tRQ== ([Errno 104]
> Connection reset by peer)
> WARNING: Retrying on lower speed (throttle=0.00)
> WARNING: Waiting 3 sec...
> win.img -> s3://test/win.img  [part 1 of 1366, 15MB]
>  13799424 of 1572864087% in2s 5.18 MB/s  failed
> WARNING: Upload failed:
> /win.img?partNumber=1&uploadId=PuqEyz0BRCCk6rDxtH7tRQ== ([Errno 104]
> Connection reset by peer)
> WARNING: Retrying on lower speed (throttle=0.01)
> WARNING: Waiting 6 sec...
> win.img -> s3://test/win.img  [part 1 of 1366, 15MB]
>167936 of 15728640 1% in0s   249.46 kB/s  failed
> WARNING: Upload failed:
> /win.img?partNumber=1&uploadId=PuqEyz0BRCCk6rDxtH7tRQ== ([Errno 104]
> Connection reset by peer)
> WARNING: Retrying on lower speed (throttle=0.05)
> WARNING: Waiting 9 sec...
> win.img -> s3://test/win.img  [part 1 of 1366, 15MB]
>   6225920 of 1572864039% in   76s79.51 kB/s  failed
> WARNING: Upload failed:
> /win.img?partNumber=1&uploadId=PuqEyz0BRCCk6rDxtH7tRQ== ([Errno 104]
> Connection reset by peer)
> WARNING: Retrying on lower speed (throttle=0.25)
> WARNING: Waiting 12 sec...
> win.img -> s3://test/win.img  [part 1 of 1366, 15MB]
>  15728640 of 15728640   100% in  962s15.96 kB/s  done
>
> I think, even on 1Gbit network betwen nodes, write speed should be
> higher, but i don't understand where the bottleneck.
>
> --
> Stanislav
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Choose/Wants claim functions.

2014-04-10 Thread Guido Medina
What would be a high ring size that would degrade performance for v3: 
128+? 256+?


I should had asked using the original response but I deleted it by accident.

Guido.

On 10/04/14 10:30, Guido Medina wrote:

Hi,

What's the latest non-standard version of this function? v3 right? If 
Basho adds more versions to this, is this somewhere documented?


For our nodes standard choose/wants claim functions were doing a weird 
distribution so the numbers even out a bit better (just a bit better) 
by using v3, so it would be nice to know if improvements are done in 
this area and where they are being documented.


As of the latest 
http://docs.basho.com/riak/latest/ops/advanced/configs/configuration-files/ 
both parameters have no default where my understanding is that the 
default for both is v2.


Regards,

Guido.



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


power outage

2014-04-10 Thread Gerlach, Sebastian
Dear mailing list,

Our prod riak cluster goes down with an power outage. We restartet riak an
found a lot of the following messages:

[error] <0.22012.0> Hintfile
'/var/lib/riak/bitcask/174124218510127105489110888272838406638695088128/142
.bitcask.hint' has bad CRC 28991324 expected 0
[error] <0.5063.0> Bad datafile entry, discarding(3450/46927 bytes)
'/var/lib/riak/bitcask/302576510853663494784356625523292968913142284288/140
.bitcask.hint' contains pointer 949927320 13881 that is greater than total
data size 949927936


Our Cluster contains 12 Server.
We run version 1.2.0-1 on Debian 6

We are not sure if the error are fixed by riak itself. We found an old
mailing list entry with the hint to run
"riak_kv:repair(22835963083295358096932575511191922182123945984)."

We also checked the Dokumentaion and found:
http://docs.basho.com/riak/1.2.0/cookbooks/Repairing-KV-Indexes/ ">
[riak_kv_vnode:repair(P) || P <- Partitions]."

Both hints doesn't work on our system.

Have anyone some hints for me.

Thanks a lot

Sebastian


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: power outage

2014-04-10 Thread Brian Sparrow
Sebastian,

Those errors are normal following an outage of this sort. Hintfiles will be
regenerated during the next bitcask merge and any data file that is
incomplete or has invalid entries will be truncated. This does result in a
loss of replicas but as long as at least 1 replica of the data is not lost
the replicas can be repaired by manual repair or read-repair. The manual
repair procedure is outlined here:
http://docs.basho.com/riak/1.2.0/cookbooks/Repairing-KV-Indexes/ .

I recommend that you manually repair any partition that discarded data file
entries but other than that your cluster should be working fine.

Thanks,
Brian




On Thu, Apr 10, 2014 at 1:44 PM, Gerlach, Sebastian <
sebastian.gerl...@immonet.de> wrote:

> Dear mailing list,
>
> Our prod riak cluster goes down with an power outage. We restartet riak an
> found a lot of the following messages:
>
> [error] <0.22012.0> Hintfile
> '/var/lib/riak/bitcask/174124218510127105489110888272838406638695088128/142
> .bitcask.hint' has bad CRC 28991324 expected 0
> [error] <0.5063.0> Bad datafile entry, discarding(3450/46927 bytes)
> '/var/lib/riak/bitcask/302576510853663494784356625523292968913142284288/140
> .bitcask.hint' contains pointer 949927320 13881 that is greater than total
> data size 949927936
>
>
> Our Cluster contains 12 Server.
> We run version 1.2.0-1 on Debian 6
>
> We are not sure if the error are fixed by riak itself. We found an old
> mailing list entry with the hint to run
> "riak_kv:repair(22835963083295358096932575511191922182123945984)."
>
> We also checked the Dokumentaion and found:
> http://docs.basho.com/riak/1.2.0/cookbooks/Repairing-KV-Indexes/ ">
> [riak_kv_vnode:repair(P) || P <- Partitions]."
>
> Both hints doesn't work on our system.
>
> Have anyone some hints for me.
>
> Thanks a lot
>
> Sebastian
>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: power outage

2014-04-10 Thread Gerlach, Sebastian
He Brian,

Thanks a Lot for The Instant answer . I will try the steps from the 
documentation an an test cluster an if anything works fine I try to repair my 
production cluster.

Best regards
Sebastian

Von meinem iPad gesendet

Am 10.04.2014 um 20:20 schrieb "Brian Sparrow" 
mailto:bspar...@basho.com>>:

Sebastian,

Those errors are normal following an outage of this sort. Hintfiles will be 
regenerated during the next bitcask merge and any data file that is incomplete 
or has invalid entries will be truncated. This does result in a loss of 
replicas but as long as at least 1 replica of the data is not lost the replicas 
can be repaired by manual repair or read-repair. The manual repair procedure is 
outlined here: http://docs.basho.com/riak/1.2.0/cookbooks/Repairing-KV-Indexes/ 
.

I recommend that you manually repair any partition that discarded data file 
entries but other than that your cluster should be working fine.

Thanks,
Brian




On Thu, Apr 10, 2014 at 1:44 PM, Gerlach, Sebastian 
mailto:sebastian.gerl...@immonet.de>> wrote:
Dear mailing list,

Our prod riak cluster goes down with an power outage. We restartet riak an
found a lot of the following messages:

[error] <0.22012.0> Hintfile
'/var/lib/riak/bitcask/174124218510127105489110888272838406638695088128/142
.bitcask.hint' has bad CRC 28991324 expected 0
[error] <0.5063.0> Bad datafile entry, discarding(3450/46927 bytes)
'/var/lib/riak/bitcask/302576510853663494784356625523292968913142284288/140
.bitcask.hint' contains pointer 949927320 13881 that is greater than total
data size 949927936


Our Cluster contains 12 Server.
We run version 1.2.0-1 on Debian 6

We are not sure if the error are fixed by riak itself. We found an old
mailing list entry with the hint to run
"riak_kv:repair(22835963083295358096932575511191922182123945984)."

We also checked the Dokumentaion and found:
http://docs.basho.com/riak/1.2.0/cookbooks/Repairing-KV-Indexes/ ">
[riak_kv_vnode:repair(P) || P <- Partitions]."

Both hints doesn't work on our system.

Have anyone some hints for me.

Thanks a lot

Sebastian


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: power outage

2014-04-10 Thread Gerlach, Sebastian
Hi Brian, Hi mailing list,

Is ist possible to read all keys from the cluster an trigger in this way a read 
repay on the failed nodes?

Thanks and best regards
Sebastian


Von: Brian Sparrow mailto:bspar...@basho.com>>
Datum: Donnerstag, 10. April 2014 20:19
An: Sebastian Gerlach 
mailto:sebastian.gerl...@immonet.de>>
Cc: riak-users mailto:riak-users@lists.basho.com>>
Betreff: Re: power outage

Sebastian,

Those errors are normal following an outage of this sort. Hintfiles will be 
regenerated during the next bitcask merge and any data file that is incomplete 
or has invalid entries will be truncated. This does result in a loss of 
replicas but as long as at least 1 replica of the data is not lost the replicas 
can be repaired by manual repair or read-repair. The manual repair procedure is 
outlined here: http://docs.basho.com/riak/1.2.0/cookbooks/Repairing-KV-Indexes/ 
.

I recommend that you manually repair any partition that discarded data file 
entries but other than that your cluster should be working fine.

Thanks,
Brian




On Thu, Apr 10, 2014 at 1:44 PM, Gerlach, Sebastian 
mailto:sebastian.gerl...@immonet.de>> wrote:
Dear mailing list,

Our prod riak cluster goes down with an power outage. We restartet riak an
found a lot of the following messages:

[error] <0.22012.0> Hintfile
'/var/lib/riak/bitcask/174124218510127105489110888272838406638695088128/142
.bitcask.hint' has bad CRC 28991324 expected 0
[error] <0.5063.0> Bad datafile entry, discarding(3450/46927 bytes)
'/var/lib/riak/bitcask/302576510853663494784356625523292968913142284288/140
.bitcask.hint' contains pointer 949927320 13881 that is greater than total
data size 949927936


Our Cluster contains 12 Server.
We run version 1.2.0-1 on Debian 6

We are not sure if the error are fixed by riak itself. We found an old
mailing list entry with the hint to run
"riak_kv:repair(22835963083295358096932575511191922182123945984)."

We also checked the Dokumentaion and found:
http://docs.basho.com/riak/1.2.0/cookbooks/Repairing-KV-Indexes/ ">
[riak_kv_vnode:repair(P) || P <- Partitions]."

Both hints doesn't work on our system.

Have anyone some hints for me.

Thanks a lot

Sebastian


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com