from:"Jeppe Toustrup"

Re: Riak transfer limit.

2014-01-28 Thread Jeppe Toustrup

On 27 January 2014 18:00, Guido Medina  wrote:

> What's a good value for transfer limit when re-arranging adding/removing
> nodes?
> Or if there is a generic rule of thumb like physical nodes, processors,
> etc.
>
> Once transfer is completed, is it a good practice to set it back to its
> default value or should the calculated (guessed?) transfer limit stay?
>

I have just removed a node from our Riak cluster, and I just turned up the
transfer limit on the removing node high, and set the other machines in the
cluster to 1. That way the node going out of the cluster got rid of its
data as fast as possible, while the nodes serving clients only had 1
transfer each to make sure they weren't overloaded. It worked fine for me,
but it might depend on how much load you have on the cluster during the
data migration and how important response times are for your system.

-- 
*Jeppe Toustrup*
Operations Engineer

*Falcon Social*
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Stalled handoffs on a prod cluster after server crash

2013-12-10 Thread Jeppe Toustrup

Try to take a look at this thread from November where I experienced a
similar problem:
http://lists.basho.com/pipermail/riak-users_lists.basho.com/2013-November/014027.html

The following mails in the thread mentions things you try to correct
the problem, and what I ended up doing with the help of Basho
employees.

-- 
Jeppe Fihl Toustrup
Operations Engineer
Falcon Social

On 10 December 2013 22:03, Ivaylo Panitchkov  wrote:
> Hello,
> Below is the transfers info:
>
> ~# riak-admin transfers
>
> Attempting to restart script through sudo -u riak
> 'r...@ccc.ccc.ccc.ccc' waiting to handoff 7 partitions
> 'r...@bbb.bbb.bbb.bbb' waiting to handoff 7 partitions
> 'r...@aaa.aaa.aaa.aaa' waiting to handoff 5 partitions
>
>
> ~# riak-admin member_status
> Attempting to restart script through sudo -u riak
> = Membership
> ==
> Status RingPendingNode
> ---
> valid  45.3% 34.4%'r...@aaa.aaa.aaa.aaa'
> valid  26.6% 32.8%'r...@bbb.bbb.bbb.bbb'
> valid  28.1% 32.8%'r...@ccc.ccc.ccc.ccc'
> ---
>
> It's stuck with all those handoffs for few days now.
> riak-admin ring_status gives me the same info like the one I mentioned when
> opened the case.
> I noticed AAA.AAA.AAA.AAA experience more load than other servers as it's
> responsible for almost half of the data.
> Is it safe to add another machine to the cluster in order to relief
> AAA.AAA.AAA.AAA even when the issue with handoffs is not yet resolved?
>
> Thanks,
> Ivaylo

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Stalled handoffs on a prod cluster after server crash

2013-12-10 Thread Jeppe Toustrup

What does "riak-admin transfers" tell you? Are there any transfers in progress?
You can try to set the amount of allowed transfers per host to 0 and
then back to 2 (the default) or whatever you want, in order to restart
any transfers which may be in progress. You can do that with the
"riak-admin transfer-limit " command.

-- 
Jeppe Fihl Toustrup
Operations Engineer
Falcon Social

On 9 December 2013 15:48, Ivaylo Panitchkov  wrote:
>
>
> Hello,
>
> We have a prod cluster of four machines running riak (1.1.4 2012-06-19) 
> Debian x86_64.
> Two days ago one of the servers went down because of a hardware failure.
> I force-removed the machine in question to re-balance the cluster before 
> adding the new machine.
> Since then the cluster is operating properly, but I noticed some handoffs are 
> stalled now.
> I had similar situation awhile ago that was solved by simply forcing the 
> handoffs, but this time the same approach didn't work.
> Any ideas, solutions or just hints are greatly appreciated.

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Ownership handoff never completes

2013-11-20 Thread Jeppe Toustrup

I've got the problem solved thanks to Brian Sparrow on the IRC channel.

Here's the steps we tried during the troubleshooting session:

1. We first tried to delete the data folders on the receiving node for
the two partitions, while the node was stopped, to see if it would
retrigger the ownership handoff. It didn't change anything.

2. We then tried to insert the following Erlang code on the sending
node, in order to see if it would retrigger the ownership handoff. The
partition IDs were for the partitions needing to be transfered:
IdxList = [696496874040508421956443553091353626554780352512,
239777612374601260017792042867515182912301432832],
  Mod = riak_kv,
  Ring = riak_core_ring_manager:get_my_ring(),
  riak_core_ring_manager:ring_trans(
fun(Ring, _) ->
Ring2 = lists:foldl(
  fun(Idx, Ring) ->

riak_core_ring:handoff_complete(Ring, Idx, Mod)
  end,
  Ring,
  IdxList),
{new_ring, Ring2}
end, []).

That piece of code didn't help anything either. The output of the
command showed the two partitions to be in the "awaiting" state:

[{239777612374601260017792042867515182912301432832,
  'riak@10.0.0.96','riak@10.0.0.93',
  [riak_kv,riak_kv_vnode,riak_pipe_vnode],
  awaiting},
 {696496874040508421956443553091353626554780352512,
  'riak@10.0.0.96','riak@10.0.0.93',
  [riak_kv,riak_kv_vnode,riak_pipe_vnode],
  awaiting}],

3. Brian suggested that I should run
"riak_core_ring_events:force_update()." in the Erlang console as well,
but that didn't have any effect.

4. I send the ring directories from the source and destination nodes
to Brian, and he came back with the following Erlang code which
problem for us:

IdxList = [696496874040508421956443553091353626554780352512,
239777612374601260017792042867515182912301432832],
  Mod = riak_kv_vnode,
  Ring = riak_core_ring_manager:get_my_ring(),
  riak_core_ring_manager:ring_trans(
fun(Ring, _) ->
Ring1 = begin
A = element(7, Ring),
B = [{B1, B2, B3,
  [B4E || B4E <- B4, B4E /= riak_kv],
B5} || {B1, B2, B3, B4, B5} <- A],
setelement(7,Ring, B)
end,
Ring2 = lists:foldl(
  fun(Idx, R) ->
  riak_core_ring:handoff_complete(R, Idx, Mod)
  end,
  Ring1,
  IdxList),
{new_ring, Ring2}
end, []).

The output of the command showed the handoffs was complete:

[{239777612374601260017792042867515182912301432832,
  'riak@10.0.0.96','riak@10.0.0.93',
  [riak_kv_vnode,riak_pipe_vnode],
  complete},
 {696496874040508421956443553091353626554780352512,
  'riak@10.0.0.96','riak@10.0.0.93',
  [riak_kv_vnode,riak_pipe_vnode],
  complete}],

And I could confirm that with the usual "ring-status", "member-status"
and "transfers" commands. There were no pending transfers, no pending
ownership handoffs and the cluster didn't show the rebalancing to be
in progress any more.

Thanks a lot to Brian for helping solve this issue. I hope anybody
else who may encounter it can use the above info.

-- 
Jeppe Fihl Toustrup
Operations Engineer
Falcon Social

On 20 November 2013 17:52, Mark Phillips  wrote:
> Hmm. The fact that you've disabled Search probably changes things but I'm
> not entirely sure how.
>
> Ryan et al - any ideas?
>
> Mark
>
> On Wednesday, November 20, 2013, Jeppe Toustrup wrote:
>>
>> Hi
>>
>> Thank you for the guide. I stopped two of the nodes (the source and
>> the destination of the partition transfers), renamed the folders
>> inside the merge_index folder and started them again. The ownership
>> handoff does however not seem to be retried.
>>
>> Looking at the logs it seems like the last attempt was 48 hours ago.
>> Is there any logic inside Riak which causes it to give up after a
>> certain amount of tries?
>> Is there a way I can retrigger the handoffs?
>> I have tried to set the transfer-limit on the cluster to 0 and then
>> back to 2, but it doesn't seem to do anything.
>>
>> I wonder if we need the merge_index folder at all, as we have disabled
>> Riak search since the initial configuration of the cluster. We found

Re: Ownership handoff never completes

2013-11-20 Thread Jeppe Toustrup

Hi

Thank you for the guide. I stopped two of the nodes (the source and
the destination of the partition transfers), renamed the folders
inside the merge_index folder and started them again. The ownership
handoff does however not seem to be retried.

Looking at the logs it seems like the last attempt was 48 hours ago.
Is there any logic inside Riak which causes it to give up after a
certain amount of tries?
Is there a way I can retrigger the handoffs?
I have tried to set the transfer-limit on the cluster to 0 and then
back to 2, but it doesn't seem to do anything.

I wonder if we need the merge_index folder at all, as we have disabled
Riak search since the initial configuration of the cluster. We found a
better way to query our data so that we don't need Riak search
anymore. We disabled it by resetting the properties on the buckets
where search was enabled, and then disabled search in app.config
followed by a restart of each of the nodes. This was done after the
ownership handoff issue first occurred.

-- 
Jeppe Fihl Toustrup
Operations Engineer
Falcon Social


On 19 November 2013 23:17, Mark Phillips  wrote:
> Hi Jeppe,
>
>
>
> As you suspected, this looks like index corruption in Search that's
> preventing handoff from finishing.  Specifically, you'll need to delete the
>
> segment files for the two partitions' indexes and rebuild those indexes
> post-transfer.
>
>
> Here's the full process:
>
>
>
> - Stop each node that owns the partitions in question.
> - Delete the data directory for each partition (which contains the segment
> files). It should be something like:
>
>
>
>
> "rm -rf /var/lib/riak/merge_index/"
>
>
> - Restart each node
>
> - Wait for the transfers to complete
> - Rebuild the indexes in question [1]
>
>
> Let us know if you run into any further issues.
>
>
>
> Mark
>
>
> [1]
> http://docs.basho.com/riak/latest/ops/running/recovery/repairing-indexes/
>
>
>
> On Tue, Nov 19, 2013 at 4:26 AM, Jeppe Toustrup 
> wrote:
>>
>> Hi
>>
>> I have recently added two extra nodes to the now seven node Riak
>> cluster. The rebalancing following the expansion worked fine, except
>> for two partitions which seem to not being able to go through. Running
>> "riak-admin ring-status" shows the following:
>>
>> == Ownership Handoff
>> ==
>> Owner:  riak@10.0.0.96
>> Next Owner: riak@10.0.0.93
>>
>> Index: 239777612374601260017792042867515182912301432832
>>   Waiting on: []
>>   Complete:   [riak_kv_vnode,riak_pipe_vnode]
>>
>> Index: 696496874040508421956443553091353626554780352512
>>   Waiting on: []
>>   Complete:   [riak_kv_vnode,riak_pipe_vnode]
>>
>>
>> ---
>>
>> I can see from the log file on the source node (10.0.0.96) that it has
>> made numerous attempt to transfer the partitions, but it ends up
>> failing all the time. Here's an except of the log file showing the
>> lines from when the transfer attempt ends up failing:
>>
>> 2013-11-18 12:29:03.694 [error] emulator Error in process <0.5745.8>
>> on node 'riak@10.0.0.96' with exit value:
>> {badarg,[{erlang,binary_to_term,[<<29942
>>
>> bytes>>],[]},{mi_segment,iterate_all_bytes,2,[{file,"src/mi_segment.erl"},{line,167}]},{mi_server,'-group_iterator/2-fun-1-',2,[{file,"src/mi_server.erl"},{line,725}]},{mi_server,'-group_iterator/2-fun-0-'...
>> 2013-11-18 12:29:03.885 [error] <0.3269.0>@mi_server:handle_info:524
>> lookup/range failure:
>>
>> {badarg,[{erlang,binary_to_term,[<<131,109,0,0,244,240,108,109,102,97,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,

Ownership handoff never completes

2013-11-19 Thread Jeppe Toustrup

Hi

I have recently added two extra nodes to the now seven node Riak
cluster. The rebalancing following the expansion worked fine, except
for two partitions which seem to not being able to go through. Running
"riak-admin ring-status" shows the following:

== Ownership Handoff ==
Owner:  riak@10.0.0.96
Next Owner: riak@10.0.0.93

Index: 239777612374601260017792042867515182912301432832
  Waiting on: []
  Complete:   [riak_kv_vnode,riak_pipe_vnode]

Index: 696496874040508421956443553091353626554780352512
  Waiting on: []
  Complete:   [riak_kv_vnode,riak_pipe_vnode]

---

I can see from the log file on the source node (10.0.0.96) that it has
made numerous attempt to transfer the partitions, but it ends up
failing all the time. Here's an except of the log file showing the
lines from when the transfer attempt ends up failing:

2013-11-18 12:29:03.694 [error] emulator Error in process <0.5745.8>
on node 'riak@10.0.0.96' with exit value:
{badarg,[{erlang,binary_to_term,[<<29942
bytes>>],[]},{mi_segment,iterate_all_bytes,2,[{file,"src/mi_segment.erl"},{line,167}]},{mi_server,'-group_iterator/2-fun-1-',2,[{file,"src/mi_server.erl"},{line,725}]},{mi_server,'-group_iterator/2-fun-0-'...
2013-11-18 12:29:03.885 [error] <0.3269.0>@mi_server:handle_info:524
lookup/range failure:
{badarg,[{erlang,binary_to_term,[<<131,109,0,0,244,240,108,109,102,97,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,
 
111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,1
 
11,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,11
 
1,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,11

Re: Riak transfer limit.

Re: Stalled handoffs on a prod cluster after server crash

Re: Stalled handoffs on a prod cluster after server crash

Re: Ownership handoff never completes

Re: Ownership handoff never completes

Ownership handoff never completes

6 matches

Site Navigation

Mail list logo

Footer information