Hi Jonathan, Sorry for the late reply. It looks like riak_ensemble still thinks that those old nodes are part of the cluster. Did you remove them with 'riak-admin cluster leave' ? If so they should have been removed from the root ensemble also, and the machines shouldn't have actually left the cluster until all the ensembles were reconfigured via joint consensus. Can you paste the results from the following commands:
riak-admin member-status riak-admin ring-status Thanks, Andrew On Mon, Mar 23, 2015 at 11:25 AM, Jonathan Koff <jonat...@projexity.com> wrote: > Hi all, > > I recently used Riak’s Strong Consistency functionality to get > auto-incrementing IDs for a feature of an application I’m working on, and > although this worked great in dev (5 nodes in 1 VM) and staging (3 servers > across NA) environments, I’ve run into some odd behaviour in production > (originally 3 servers, now 4) that prevents it from working. > > I initially noticed that consistent requests were immediately failing as > timeouts, and upon checking `riak-admin ensemble-status` saw that many > ensembles were at 0 / 3, from the vantage point of the box I was SSH’d > into. Interestingly, SSH-ing into different boxes showed different results. > Here’s a brief snippet of what I see now, after adding a fourth server in a > troubleshooting attempt: > > *Machine 1* (104.131.39.61) > > ============================== Consensus System > =============================== > Enabled: true > Active: true > Ring Ready: true > Validation: strong (trusted majority required) > Metadata: best-effort replication (asynchronous) > > ================================== Ensembles > ================================== > Ensemble Quorum Nodes Leader > > ------------------------------------------------------------------------------- > root 0 / 6 3 / 6 -- > 2 0 / 3 3 / 3 -- > 3 3 / 3 3 / 3 riak@104.131.130.237 > 4 3 / 3 3 / 3 riak@104.131.130.237 > 5 3 / 3 3 / 3 riak@104.131.130.237 > 6 0 / 3 3 / 3 -- > 7 0 / 3 3 / 3 -- > 8 0 / 3 3 / 3 -- > 9 3 / 3 3 / 3 riak@104.131.130.237 > 10 3 / 3 3 / 3 riak@104.131.130.237 > 11 0 / 3 3 / 3 -- > > *Machine 2* (104.236.79.78) > > ============================== Consensus System > =============================== > Enabled: true > Active: true > Ring Ready: true > Validation: strong (trusted majority required) > Metadata: best-effort replication (asynchronous) > > ================================== Ensembles > ================================== > Ensemble Quorum Nodes Leader > > ------------------------------------------------------------------------------- > root 0 / 6 3 / 6 -- > 2 3 / 3 3 / 3 riak@104.236.79.78 > 3 3 / 3 3 / 3 riak@104.131.130.237 > 4 3 / 3 3 / 3 riak@104.131.130.237 > 5 3 / 3 3 / 3 riak@104.131.130.237 > 6 3 / 3 3 / 3 riak@104.236.79.78 > 7 0 / 3 3 / 3 -- > 8 0 / 3 3 / 3 -- > 9 3 / 3 3 / 3 riak@104.131.130.237 > 10 3 / 3 3 / 3 riak@104.131.130.237 > 11 3 / 3 3 / 3 riak@104.236.79.78 > > *Machine 3* (104.131.130.237) > > ============================== Consensus System > =============================== > Enabled: true > Active: true > Ring Ready: true > Validation: strong (trusted majority required) > Metadata: best-effort replication (asynchronous) > > ================================== Ensembles > ================================== > Ensemble Quorum Nodes Leader > > ------------------------------------------------------------------------------- > root 0 / 6 3 / 6 -- > 2 0 / 3 3 / 3 -- > 3 3 / 3 3 / 3 riak@104.131.130.237 > 4 3 / 3 3 / 3 riak@104.131.130.237 > 5 3 / 3 3 / 3 riak@104.131.130.237 > 6 0 / 3 3 / 3 -- > 7 0 / 3 3 / 3 -- > 8 0 / 3 3 / 3 -- > 9 3 / 3 3 / 3 riak@104.131.130.237 > 10 3 / 3 3 / 3 riak@104.131.130.237 > 11 0 / 3 3 / 3 -- > > *Machine 4* (162.243.5.87) > > ============================== Consensus System > =============================== > Enabled: true > Active: true > Ring Ready: true > Validation: strong (trusted majority required) > Metadata: best-effort replication (asynchronous) > > ================================== Ensembles > ================================== > Ensemble Quorum Nodes Leader > > ------------------------------------------------------------------------------- > root 0 / 6 3 / 6 -- > 2 3 / 3 3 / 3 riak@104.236.79.78 > 3 3 / 3 3 / 3 riak@104.131.130.237 > 4 3 / 3 3 / 3 riak@104.131.130.237 > 5 3 / 3 3 / 3 riak@104.131.130.237 > 6 3 / 3 3 / 3 riak@104.236.79.78 > 7 3 / 3 3 / 3 riak@162.243.5.87 > 8 3 / 3 3 / 3 riak@162.243.5.87 > 9 3 / 3 3 / 3 riak@104.131.130.237 > 10 3 / 3 3 / 3 riak@104.131.130.237 > 11 3 / 3 3 / 3 riak@104.236.79.78 > > > Interestingly, Machine 4 has full quora for all ensembles except for root, > while Machine 3 only sees itself as a leader. > > Another interesting point is the output of `riak-admin ensemble-status > root`: > > ================================= Ensemble #1 > ================================= > Id: root > Leader: -- > Leader ready: false > > ==================================== Peers > ==================================== > Peer Status Trusted Epoch Node > > ------------------------------------------------------------------------------- > 1 (offline) -- -- riak@104.131.45.32 > 2 probe no 8 riak@104.131.130.237 > 3 (offline) -- -- riak@104.131.141.237 > 4 (offline) -- -- riak@104.131.199.79 > 5 probe no 8 riak@104.236.79.78 > 6 probe no 8 riak@162.243.5.87 > > This is consistent across all 4 machines, and seems to include some old > IPs from machines that left the cluster quite a while back, almost > definitely before I’d used Riak's Strong Consistency. Note that the reason > I added the fourth machine (104.131.39.61) was to see if this output would > change, perhaps resulting in a quorum for the root ensemble. > > For reference, here’s the status of a sample ensemble that isn’t “Leader > ready”, from the perspective of Machine 2: > ================================ Ensemble #62 > ================================= > Id: {kv,1370157784997721485815954530671515330927436759040,3} > Leader: -- > Leader ready: false > > ==================================== Peers > ==================================== > Peer Status Trusted Epoch Node > > ------------------------------------------------------------------------------- > 1 following yes 43 riak@104.131.130.237 > 2 following yes 43 riak@104.236.79.78 > 3 leading yes 43 riak@162.243.5.87 > > > My config consists of riak.conf with: > > strong_consistency = on > > and advanced.config with: > > [ > {riak_core, > [ > {target_n_val, 5} > ]}, > {riak_ensemble, > [ > {ensemble_tick, 5000} > ]} > ]. > > though I’ve experimented with the latter in an attempt to get this > resolved. > > I didn’t see any relevant-looking log output on any of the servers. > > Has anyone come across this before? > > Thanks! > > *Jonathan Koff* B.CS. > co-founder of Projexity > www.projexity.com > > follow us on facebook at: www.facebook.com/projexity > follow us on twitter at: twitter.com/projexity > > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com