Hi all,
I have a 3-node Riak cluster, and I am simulating the scenario of physical
nodes crashing.
When 2 nodes go down, and I query the remaining one, it fails with:
{error,
{exit,
{{{error,
{no_candidate_nodes,exhausted_prefist,
[{riak_kv_mapred_planner,claim_keys,3},
{riak_kv_map_phase,schedule_input,5},
{riak_kv_map_phase,handle_input,3},
{luke_phase,executing,3},
{gen_fsm,handle_msg,7},
{proc_lib,init_p_do_apply,3}],
[]}},
{gen_fsm,sync_send_event,
[<0.31566.2330>,
{inputs,
(...)
Here I'm doing a M/R, inputs being fed by Search.
(1) All of the involved buckets have N=3, and all involved requests R=1 (I
don't really need quorum for this usecase)
Why is it failing? I'm sure i'm missing something basic here
(2) Probably worth noting, those 3 nodes are spread across *two* physical
servers (1 on small one, 2 on beefier one). I've heard it is "not a good
idea", not sure why though. These two servers are definitely enough still
for our current load; should I consider adding a third one?
(3) To overcome the aforementioned error, I added a new node to the cluster
(installed on the small server). Now the setup looks like: 4 nodes = 2 on
small server, 2 on beefier one.
When 2 nodes go down, this works. Which brings me to another topic... could
you point me to good strategies to "pre-" invoke read-repair? Is it up to
clients to scan the keyspace forcing reads? It's a disaster usability-wise
when first users start getting 404s all over the place.
Francisco
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com