Overview

Default configuration for handoff may cause data loss in the OSS release of
Riak TS 1.3.0. If you are using Riak TS Enterprise, you are not impacted by
this bug but you SHOULD upgrade to Riak TS Enterprise 1.3.1 as soon as it
is available for other handoff bug fixes.

Description

In Riak TS 1.3.0, the default configuration for handoff.ip causes vnodes
marked for transfer during handoff to be removed without transferring data
to their new destination nodes. A mandatory change to configuration (in
riak.conf) will resolve this issue. All open source users are impacted by
this issue and we strongly recommend that all 1.3.0 users upgrade to 1.3.1
which will be released soon.

NOTE: This is known to occur for ownership handoff and fallback transfers
(hinted handoffs).

Affected Users

All open source users of TS 1.3.0 using riak.conf to configure their
clusters are potentially impacted.

To verify whether you are affected, the below command must be run on each
node in your cluster:
riak config effective | grep handoff.ip

Affected nodes will have a handoff ip of 127.0.0.1
handoff.ip = 127.0.0.1

Impact

This bug impacts vnodes that are in process of handoff. Handoff data will
be looped back to the source node during ownership handoff rather than
being transferred to the destination node. Once ownership handoff is
completed the data is removed from the source node. In the event of
significant ownership handoff, which can happen during cluster expansion or
contraction, all replicas of an object may be lost. Data loss occurs if all
replicas of an object are lost as a result of this configuration issue.
Replica loss can be triggered by cluster membership changes or other Riak
cluster activity that triggers handoff behavior. Data loss is mitigated as
long as at least one replica still exists and the below steps are followed.

Mitigation

You can immediately mitigate the issue by setting transfer limit to zero
across the cluster by issuing the following on any node:
riak-admin transfer-limit 0

Then configure handoff.ip in riak.conf to an external IP address or 0.0.0.0
on all nodes.

Perform a rolling restart
<http://docs.basho.com/riak/kv/2.1.4/using/repair-recovery/rolling-restart/>
of Riak across your cluster to activate the new setting.

For additional repair work, you will need to have Riak TS 1.3.1 or higher
installed across your cluster.
The advisory can be found on our Product Advisories page at
https://docs.basho.com/community/productadvisories/130-dataloss/

Let us know if you have questions.
-- 
Seema Jethani
Director of Product Management, Basho <http://basho.com>
4083455739 | @seemaj <http://twitter.com/seemaj>
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to