Hi Jonathan,

Sorry for the late reply. It looks like riak_ensemble still thinks that
those old nodes are part of the cluster. Did you remove them with
'riak-admin cluster leave' ? If so they should have been removed from the
root ensemble also, and the machines shouldn't have actually left the
cluster until all the ensembles were reconfigured via joint consensus. Can
you paste the results from the following commands:

riak-admin member-status
riak-admin ring-status

Thanks,
Andrew


On Mon, Mar 23, 2015 at 11:25 AM, Jonathan Koff <jonat...@projexity.com>
wrote:

> Hi all,
>
> I recently used Riak’s Strong Consistency functionality to get
> auto-incrementing IDs for a feature of an application I’m working on, and
> although this worked great in dev (5 nodes in 1 VM) and staging (3 servers
> across NA) environments, I’ve run into some odd behaviour in production
> (originally 3 servers, now 4) that prevents it from working.
>
> I initially noticed that consistent requests were immediately failing as
> timeouts, and upon checking `riak-admin ensemble-status` saw that many
> ensembles were at 0 / 3, from the vantage point of the box I was SSH’d
> into. Interestingly, SSH-ing into different boxes showed different results.
> Here’s a brief snippet of what I see now, after adding a fourth server in a
> troubleshooting attempt:
>
> *Machine 1* (104.131.39.61)
>
> ============================== Consensus System
> ===============================
> Enabled:     true
> Active:      true
> Ring Ready:  true
> Validation:  strong (trusted majority required)
> Metadata:    best-effort replication (asynchronous)
>
> ================================== Ensembles
> ==================================
>  Ensemble     Quorum        Nodes      Leader
>
> -------------------------------------------------------------------------------
>    root       0 / 6         3 / 6      --
>     2         0 / 3         3 / 3      --
>     3         3 / 3         3 / 3      riak@104.131.130.237
>     4         3 / 3         3 / 3      riak@104.131.130.237
>     5         3 / 3         3 / 3      riak@104.131.130.237
>     6         0 / 3         3 / 3      --
>     7         0 / 3         3 / 3      --
>     8         0 / 3         3 / 3      --
>     9         3 / 3         3 / 3      riak@104.131.130.237
>     10        3 / 3         3 / 3      riak@104.131.130.237
>     11        0 / 3         3 / 3      --
>
> *Machine 2* (104.236.79.78)
>
> ============================== Consensus System
> ===============================
> Enabled:     true
> Active:      true
> Ring Ready:  true
> Validation:  strong (trusted majority required)
> Metadata:    best-effort replication (asynchronous)
>
> ================================== Ensembles
> ==================================
>  Ensemble     Quorum        Nodes      Leader
>
> -------------------------------------------------------------------------------
>    root       0 / 6         3 / 6      --
>     2         3 / 3         3 / 3      riak@104.236.79.78
>     3         3 / 3         3 / 3      riak@104.131.130.237
>     4         3 / 3         3 / 3      riak@104.131.130.237
>     5         3 / 3         3 / 3      riak@104.131.130.237
>     6         3 / 3         3 / 3      riak@104.236.79.78
>     7         0 / 3         3 / 3      --
>     8         0 / 3         3 / 3      --
>     9         3 / 3         3 / 3      riak@104.131.130.237
>     10        3 / 3         3 / 3      riak@104.131.130.237
>     11        3 / 3         3 / 3      riak@104.236.79.78
>
> *Machine 3* (104.131.130.237)
>
> ============================== Consensus System
> ===============================
> Enabled:     true
> Active:      true
> Ring Ready:  true
> Validation:  strong (trusted majority required)
> Metadata:    best-effort replication (asynchronous)
>
> ================================== Ensembles
> ==================================
>  Ensemble     Quorum        Nodes      Leader
>
> -------------------------------------------------------------------------------
>    root       0 / 6         3 / 6      --
>     2         0 / 3         3 / 3      --
>     3         3 / 3         3 / 3      riak@104.131.130.237
>     4         3 / 3         3 / 3      riak@104.131.130.237
>     5         3 / 3         3 / 3      riak@104.131.130.237
>     6         0 / 3         3 / 3      --
>     7         0 / 3         3 / 3      --
>     8         0 / 3         3 / 3      --
>     9         3 / 3         3 / 3      riak@104.131.130.237
>     10        3 / 3         3 / 3      riak@104.131.130.237
>     11        0 / 3         3 / 3      --
>
> *Machine 4* (162.243.5.87)
>
> ============================== Consensus System
> ===============================
> Enabled:     true
> Active:      true
> Ring Ready:  true
> Validation:  strong (trusted majority required)
> Metadata:    best-effort replication (asynchronous)
>
> ================================== Ensembles
> ==================================
>  Ensemble     Quorum        Nodes      Leader
>
> -------------------------------------------------------------------------------
>    root       0 / 6         3 / 6      --
>     2         3 / 3         3 / 3      riak@104.236.79.78
>     3         3 / 3         3 / 3      riak@104.131.130.237
>     4         3 / 3         3 / 3      riak@104.131.130.237
>     5         3 / 3         3 / 3      riak@104.131.130.237
>     6         3 / 3         3 / 3      riak@104.236.79.78
>     7         3 / 3         3 / 3      riak@162.243.5.87
>     8         3 / 3         3 / 3      riak@162.243.5.87
>     9         3 / 3         3 / 3      riak@104.131.130.237
>     10        3 / 3         3 / 3      riak@104.131.130.237
>     11        3 / 3         3 / 3      riak@104.236.79.78
>
>
> Interestingly, Machine 4 has full quora for all ensembles except for root,
> while Machine 3 only sees itself as a leader.
>
> Another interesting point is the output of `riak-admin ensemble-status
> root`:
>
> ================================= Ensemble #1
> =================================
> Id:           root
> Leader:       --
> Leader ready: false
>
> ==================================== Peers
> ====================================
>  Peer  Status     Trusted          Epoch         Node
>
> -------------------------------------------------------------------------------
>   1    (offline)    --              --           riak@104.131.45.32
>   2      probe      no              8            riak@104.131.130.237
>   3    (offline)    --              --           riak@104.131.141.237
>   4    (offline)    --              --           riak@104.131.199.79
>   5      probe      no              8            riak@104.236.79.78
>   6      probe      no              8            riak@162.243.5.87
>
> This is consistent across all 4 machines, and seems to include some old
> IPs from machines that left the cluster quite a while back, almost
> definitely before I’d used Riak's Strong Consistency. Note that the reason
> I added the fourth machine (104.131.39.61) was to see if this output would
> change, perhaps resulting in a quorum for the root ensemble.
>
> For reference, here’s the status of a sample ensemble that isn’t “Leader
> ready”, from the perspective of Machine 2:
> ================================ Ensemble #62
> =================================
> Id:           {kv,1370157784997721485815954530671515330927436759040,3}
> Leader:       --
> Leader ready: false
>
> ==================================== Peers
> ====================================
>  Peer  Status     Trusted          Epoch         Node
>
> -------------------------------------------------------------------------------
>   1    following    yes             43           riak@104.131.130.237
>   2    following    yes             43           riak@104.236.79.78
>   3     leading     yes             43           riak@162.243.5.87
>
>
> My config consists of riak.conf with:
>
> strong_consistency = on
>
> and advanced.config with:
>
> [
>   {riak_core,
>     [
>       {target_n_val, 5}
>       ]},
>   {riak_ensemble,
>     [
>       {ensemble_tick, 5000}
>     ]}
> ].
>
> though I’ve experimented with the latter in an attempt to get this
> resolved.
>
> I didn’t see any relevant-looking log output on any of the servers.
>
> Has anyone come across this before?
>
> Thanks!
>
> *Jonathan Koff* B.CS.
> co-founder of Projexity
> www.projexity.com
>
> follow us on facebook at: www.facebook.com/projexity
> follow us on twitter at: twitter.com/projexity
>
>
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to