Hi, I am investigating the feasibility of using Riak for an application where there is a firm requirement for the system to remain available in the face of node failures and network partitions. Up to now, I have found Riak to be quite resilient to node failures, but network partitions are causing more problems.
My setup involves a cluster of five nodes, each running Riak 1.0 on a separate Linux box on a common subnet. Let's call the cluster [A,B,C,D,E]. On a sixth box, I run the Java PB Cluster Client 1.0.1. The client fires fetch and store requests at the cluster using various keys and a gle bucket, and everything works fine under normal operating conditions. The bucket has the default properties N=3, W=2, R=2. During the run, I use iptables to simulate a network partition in which one node in the cluster is disconnected from the other four (but all five remain connected to the client). To disconnect from node A, for example, I run: sudo /sbin/iptables -A INPUT -s <node A> -j REJECT sudo /sbin/iptables -A OUTPUT -d <node A> -j REJECT So at this point we have in effect a majority cluster [A,B,C,D] and a minority cluster [E}. My expectation was that any request from the client to the minority cluster will fail because a quorum cannot bbtained for either read or write requests. On the other hand, I expect requesto the majority cluster to succeed because there will always be at least two of the three copies stored within the majority cluster. Therefore I expect very little impact on the performance of the system, beyond the need to retry 20% of the requests. Is this a reasonable expectation, or am I missing something important here ? What actually happens after the partition is that the whole system freezes up for some time. For one minute there appears to be no processing done and nothing is written to the logs. Then after one minute the nodes in the majority cluster show the following in their logs: 2011-10-24 08:4:10.098 [error] <0.19280.0> ** Node 'riak@<node E>' not responding ** ** Removing (timedout) connection ** The node in the minority cluster has similar log entries, except that it finds four nodes not responding. If the network remains partitioned, processing stops for a total of two minutes before resuming. When the partition is repaired, as expected I see a lot of handoff activity in the logs and the whole cluster soon becomes consistent. The main concern I have is that processing stops altogether for 2 minutes when the network is partitioned. Can you explain why this is happening, and whether there is something I can do allow processing to continue ? As far as I can see, it ought to be possible for the majority partition to continue processing requests without interruption, and without any errors being generated. Thanks in advance for your help. Malcolm
_______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
