Re: Hinted handoff failed because of tcp errors
Hi Alexander, Excellent! Thanks for the feedback - I will see what I can find there. Regards, Ryan On Tue, Nov 1, 2016 at 11:06 AM, Alexander Sicular wrote: > Hi Ryan, yes, you can change a number of settings. Have you had a look > at http://docs.basho.com/riak/kv/2.1.4/using/admin/riak-admin/# > transfer-limit > and http://lists.basho.com/pipermail/riak-users_lists. > basho.com/2014-July/015529.html > ? > > -Alexander > > On Tue, Nov 1, 2016 at 2:43 AM, Ryan Maclear > wrote: > > Good Day, > > > > We have a 4 node riak cluster running inside AWS. The riak is riak-kv > 2.1.2 > > with AAE enabled on Ubuntu 14.04.4 LTS > > > > We are in the process of replacing one node with another using the > process > > described here: > > > > http://docs.basho.com/riak/kv/2.1.4/using/cluster- > operations/replacing-node/ > > > > We have successfully replaced two of the nodes so far but we are having a > > problem with the third. If we look at /var/log/riak/console.log we see > the > > start of the hinted handoff, and some time later (sometimes minutes and > > sometimes hours) we see: > > > > 2016-10-31 06:30:40.090 [error] > > <0.19834.2101>@riak_core_handoff_sender:start_fold:272 hinted transfer > of > > riak_kv_vnode from 'r...@aew54.miranetworks.net' > > 274031556999544297163190906134303066185487351808 to > > 'r...@aew75.miranetworks.net' > > 274031556999544297163190906134303066185487351808 failed because of TCP > recv > > timeout > > 2016-10-31 06:30:40.090 [error] > > <0.187.0>@riak_core_handoff_manager:handle_info:303 An outbound handoff > of > > partition riak_kv_vnode 274031556999544297163190906134303066185487351808 > was > > terminated for reason: {shutdown,timeout} > > > > So the handoff was terminated due to a tcp timeout. The handoff then > starts > > again. > > > > This has been going on for some times (about two weeks now). > > > > The current member status is as follows: > > > > riak-admin member-status > > = Membership > > == > > Status RingPendingNode > > > --- > > leaving 0.0% -- 'r...@aew54.miranetworks.net' > > valid 25.0% -- 'r...@aew59.miranetworks.net' > > valid 25.0% -- 'r...@aew73.miranetworks.net' > > valid 25.0% -- 'r...@aew74.miranetworks.net' > > valid 25.0% -- 'r...@aew75.miranetworks.net' > > > --- > > Valid:4 / Leaving:1 / Exiting:0 / Joining:0 / Down:0 > > > > > > Here are some questions: > > > > 1. What is the default tcp timeout? > > 2. Is there any way to increase this timeout? > > 3. Is there any way to increase the rate of handoff? > > 4. Are there any other parameters we can tune to try and avoid this? > > > > The output from riak-admin transfers is as follows: > > > > 'r...@aew54.miranetworks.net' waiting to handoff 1 partitions > > > > Active Transfers: > > > > transfer type: hinted > > vnode type: riak_kv_vnode > > partition: 274031556999544297163190906134303066185487351808 > > started: 2016-11-01 05:30:47 [2.10 hr ago] > > last update: 2016-11-01 07:36:51 [3.03 s ago] > > total size: 78393086512 bytes > > objects transferred: 11440967 > > > > 1513 Objs/s > > riak@aew54.miranetworks.n ===> riak@aew75.miranetworks.n > > et et > > |== | 15% > > 1.53 MB/s > > > > > > Thanks, > > Ryan Maclear > > > > ___ > > riak-users mailing list > > riak-users@lists.basho.com > > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Hinted handoff failed because of tcp errors
Hi Ryan, yes, you can change a number of settings. Have you had a look at http://docs.basho.com/riak/kv/2.1.4/using/admin/riak-admin/#transfer-limit and http://lists.basho.com/pipermail/riak-users_lists.basho.com/2014-July/015529.html ? -Alexander On Tue, Nov 1, 2016 at 2:43 AM, Ryan Maclear wrote: > Good Day, > > We have a 4 node riak cluster running inside AWS. The riak is riak-kv 2.1.2 > with AAE enabled on Ubuntu 14.04.4 LTS > > We are in the process of replacing one node with another using the process > described here: > > http://docs.basho.com/riak/kv/2.1.4/using/cluster-operations/replacing-node/ > > We have successfully replaced two of the nodes so far but we are having a > problem with the third. If we look at /var/log/riak/console.log we see the > start of the hinted handoff, and some time later (sometimes minutes and > sometimes hours) we see: > > 2016-10-31 06:30:40.090 [error] > <0.19834.2101>@riak_core_handoff_sender:start_fold:272 hinted transfer of > riak_kv_vnode from 'r...@aew54.miranetworks.net' > 274031556999544297163190906134303066185487351808 to > 'r...@aew75.miranetworks.net' > 274031556999544297163190906134303066185487351808 failed because of TCP recv > timeout > 2016-10-31 06:30:40.090 [error] > <0.187.0>@riak_core_handoff_manager:handle_info:303 An outbound handoff of > partition riak_kv_vnode 274031556999544297163190906134303066185487351808 was > terminated for reason: {shutdown,timeout} > > So the handoff was terminated due to a tcp timeout. The handoff then starts > again. > > This has been going on for some times (about two weeks now). > > The current member status is as follows: > > riak-admin member-status > = Membership > == > Status RingPendingNode > --- > leaving 0.0% -- 'r...@aew54.miranetworks.net' > valid 25.0% -- 'r...@aew59.miranetworks.net' > valid 25.0% -- 'r...@aew73.miranetworks.net' > valid 25.0% -- 'r...@aew74.miranetworks.net' > valid 25.0% -- 'r...@aew75.miranetworks.net' > --- > Valid:4 / Leaving:1 / Exiting:0 / Joining:0 / Down:0 > > > Here are some questions: > > 1. What is the default tcp timeout? > 2. Is there any way to increase this timeout? > 3. Is there any way to increase the rate of handoff? > 4. Are there any other parameters we can tune to try and avoid this? > > The output from riak-admin transfers is as follows: > > 'r...@aew54.miranetworks.net' waiting to handoff 1 partitions > > Active Transfers: > > transfer type: hinted > vnode type: riak_kv_vnode > partition: 274031556999544297163190906134303066185487351808 > started: 2016-11-01 05:30:47 [2.10 hr ago] > last update: 2016-11-01 07:36:51 [3.03 s ago] > total size: 78393086512 bytes > objects transferred: 11440967 > > 1513 Objs/s > riak@aew54.miranetworks.n ===> riak@aew75.miranetworks.n > et et > |== | 15% > 1.53 MB/s > > > Thanks, > Ryan Maclear > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Hinted handoff failed because of tcp errors
Good Day, We have a 4 node riak cluster running inside AWS. The riak is riak-kv 2.1.2 with AAE enabled on Ubuntu 14.04.4 LTS We are in the process of replacing one node with another using the process described here: http://docs.basho.com/riak/kv/2.1.4/using/cluster-operations/replacing-node/ We have successfully replaced two of the nodes so far but we are having a problem with the third. If we look at /var/log/riak/console.log we see the start of the hinted handoff, and some time later (sometimes minutes and sometimes hours) we see: 2016-10-31 06:30:40.090 [error] <0.19834.2101>@riak_core_handoff_sender:start_fold:272 hinted transfer of riak_kv_vnode from 'r...@aew54.miranetworks.net' 274031556999544297163190906134303066185487351808 to ' r...@aew75.miranetworks.net' 274031556999544297163190906134303066185487351808 failed because of TCP recv timeout 2016-10-31 06:30:40.090 [error] <0.187.0>@riak_core_handoff_manager:handle_info:303 An outbound handoff of partition riak_kv_vnode 274031556999544297163190906134303066185487351808 was terminated for reason: {shutdown,timeout} So the handoff was terminated due to a tcp timeout. The handoff then starts again. This has been going on for some times (about two weeks now). The current member status is as follows: riak-admin member-status = Membership == Status RingPendingNode --- leaving 0.0% -- 'r...@aew54.miranetworks.net' valid 25.0% -- 'r...@aew59.miranetworks.net' valid 25.0% -- 'r...@aew73.miranetworks.net' valid 25.0% -- 'r...@aew74.miranetworks.net' valid 25.0% -- 'r...@aew75.miranetworks.net' --- Valid:4 / Leaving:1 / Exiting:0 / Joining:0 / Down:0 Here are some questions: 1. What is the default tcp timeout? 2. Is there any way to increase this timeout? 3. Is there any way to increase the rate of handoff? 4. Are there any other parameters we can tune to try and avoid this? The output from riak-admin transfers is as follows: 'r...@aew54.miranetworks.net' waiting to handoff 1 partitions Active Transfers: transfer type: hinted vnode type: riak_kv_vnode partition: 274031556999544297163190906134303066185487351808 started: 2016-11-01 05:30:47 [2.10 hr ago] last update: 2016-11-01 07:36:51 [3.03 s ago] total size: 78393086512 bytes objects transferred: 11440967 1513 Objs/s riak@aew54.miranetworks.n ===> riak@aew75.miranetworks.n et et |== | 15% 1.53 MB/s Thanks, Ryan Maclear ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com