simulating physical node crash

francisco treacy Mon, 26 Sep 2011 09:17:36 -0700

Hi all,

I have a 3-node Riak cluster, and I am simulating the scenario of physical
nodes crashing.


When 2 nodes go down, and I query the remaining one, it fails with:

{error,
    {exit,
        {{{error,
              {no_candidate_nodes,exhausted_prefist,
                  [{riak_kv_mapred_planner,claim_keys,3},
                   {riak_kv_map_phase,schedule_input,5},
                   {riak_kv_map_phase,handle_input,3},
                   {luke_phase,executing,3},
                   {gen_fsm,handle_msg,7},
                   {proc_lib,init_p_do_apply,3}],
                  []}},
          {gen_fsm,sync_send_event,
              [<0.31566.2330>,
               {inputs,

(...)

Here I'm doing a M/R, inputs being fed by Search.

(1) All of the involved buckets have N=3, and all involved requests R=1 (I
don't really need quorum for this usecase)

Why is it failing? I'm sure i'm missing something basic here

(2) Probably worth noting, those 3 nodes are spread across *two* physical
servers (1 on small one, 2 on beefier one). I've heard it is "not a good
idea", not sure why though. These two servers are definitely enough still
for our current load; should I consider adding a third one?

(3) To overcome the aforementioned error, I added a new node to the cluster
(installed on the small server). Now the setup looks like: 4 nodes = 2 on
small server, 2 on beefier one.

When 2 nodes go down, this works.  Which brings me to another topic... could
you point me to good strategies to "pre-" invoke read-repair? Is it up to
clients to scan the keyspace forcing reads?  It's a disaster usability-wise
when first users start getting 404s all over the place.

Francisco

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

simulating physical node crash

Reply via email to