Re: Siblings on first write to a key

Douglas Rohrer Tue, 18 Apr 2017 05:57:07 -0700

This sounds like an issue our Riak CS team ran into quite a while ago, which 
involved “slow nodes” and coordination retry. Take a look at 
https://github.com/basho/riak_kv/issues/1188 and see if it makes sense to you, 
but it certainly sounds like what’s happening.


The basic flow of the issue comes when one node in the preflist is down, and 
you write to a node _not in the preflist_, at which point the following happens 
(better formatted in the issue above, btw):

client        node-A              node-R         node-S
   ---(Put)-->
             Compute PL
               = P, Q and R
             Redirect to R --->  [frozen]
             |
             | 3 sec timeout
             V
             Compute new PL excluding R
               = P, Q and S
             Redirect to S --------------------> Compute PL without
             |                                     any knowlege about R (at 
this point)
             |                                     = P, Q and R
             |                                   Redirect to R  ---+
             |                                   |                 |
             |                 [what happnes?] <-|-----------------+
             |                                   | 3 sec timeout
             |                                   V
             |                                   Compute new PL excluding R
             |                                     = P, Q and S
             |                                   I'm coordinator this time
             |                                   Execute put
             V 3 sec timeout
             Compute new PL again
               [continues]

So, it’s possible for a slow/down node (node R in this case) to eventually 
cause two _other nodes_ to each write a sibling, even on a new key. In fact, 
depending on the number of nodes in the system and your luck, you could end up 
writing more than one sibling on a fresh write in this case. Given your comment 
about a network issue potentially being a factor, and the 3-second timing you 
noted (the default for the failure timeout), this increases the likelihood that 
this was, in fact, the issue.

A fix for this issue has been worked on and tested, but is not yet incorporated 
into a version of Riak for distribution. You can, however, disable the 
coordinator retry logic as noted in the issue I referenced above, or increase 
the timeout if your cluster is running slowly in general by setting `riak_kv`, 
`put_coordinator_failure_timeout` in your `advanced.config` file (see 
http://docs.basho.com/riak/kv/2.2.3/configuring/reference/#advanced-configuration
 for the general format of the advanced.config if you’re not familiar).

Hope this helps.

Doug Rohrer


On 4/18/17, 8:28 AM, "riak-users on behalf of Daniel Abrahamsson" 
<[email protected] on behalf of [email protected]> wrote:

    Hi Magnus,
    
    This cluster has been running in production for a few months. Key
    generation is based on flake (https://github.com/boundary/flake); we
    have never experienced a collision in the 3+ years we have been using
    it heavily in production. However, I will look into that possibility
    as well.
    
    I just noticed that one of the Riak nodes logged this at the time:
    
    2017-04-13 17:42:40.567 [error]
    <0.3624.28>@riak_api_pb_server:handle_info:331 Unrecognized message
    
{30320806,{ok,{r_object,<<"session">>,<<".12011742tWzDvu8mk5WAdfYihfV_T3DcnJ5VDyXC0c">>,[{r_content,{dict,3,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[],[],[],[],[],[],[],[],[[<<"X-Riak-VTag">>,53,114,86,115,108,71,120,112,73,55,108,118,114,100,105,114,107,104,50,66,105,119]],[[<<"index">>]],[],[[<<"X-Riak-Last-Modified">>|{1492,105357,453143}]],[],[]}}},<<...
    (actual value removed).
    
    I also have another example (from the same cluster) where there is a
    *single* writer to a key, but after a few writes/updates, it also got
    a sibling error. Also at that time, the write+read took significantly
    longer than normal. I'll check if we had any "unrecognized messages"
    in the Riak logs at that time as well.
    
    To answer your second question, we are talking to the riak cluster
    over protocol buffers, using the official Erlang client.
    
    //Daniel
    
    On Tue, Apr 18, 2017 at 1:51 PM, Magnus Kessler <[email protected]> wrote:
    > On 18 April 2017 at 08:20, Daniel Abrahamsson <[email protected]> wrote:
    >>
    >> I've run into a case where I got a sbiling error/response on the first
    >> ever write to a key. I would like to understand how this could happen.
    >> Normally when you get siblings, it is because you have written a value
    >> with an out-of-date vclock. But since this is the first write, there
    >> is no vclock. Could someone shed some light on this for me?
    >>
    >> It is worth to mention that the it took 3 seconds for Riak to deliver
    >> the response, so it is possible there was some kind of network issue
    >> at the time.
    >>
    >> Here are some details about my setup:
    >> Number of nodes: 8.
    >> n_val: 5
    >> write options: pw: 3 (quorum), return_body
    >>
    >> Regards,
    >> Daniel Abrahamsson
    >>
    >
    >
    > Hi Daniel,
    >
    > Please let me know if all nodes in this cluster were set up completely
    > fresh, with empty backend directories, or if any of them had been used
    > before for a Riak installation. If the latter is the case, it may be that
    > the key in question had already been used once before. Cluster nodes pick 
up
    > data from pre-existing backends.
    >
    > How do you access the key for read and write operations?
    >
    > Kind Regards,
    >
    > Magnus
    >
    >
    > Magnus Kessler
    > Client Services Engineer
    > Basho Technologies Limited
    >
    > Registered Office - 8 Lincoln’s Inn Fields London WC2A 3BP Reg 07970431
    
    _______________________________________________
    riak-users mailing list
    [email protected]
    http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
    



_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Siblings on first write to a key

Reply via email to