This sounds like an issue our Riak CS team ran into quite a while ago, which
involved “slow nodes” and coordination retry. Take a look at
https://github.com/basho/riak_kv/issues/1188 and see if it makes sense to you,
but it certainly sounds like what’s happening.
The basic flow of the issue comes when one node in the preflist is down, and
you write to a node _not in the preflist_, at which point the following happens
(better formatted in the issue above, btw):
client node-A node-R node-S
---(Put)-->
Compute PL
= P, Q and R
Redirect to R ---> [frozen]
|
| 3 sec timeout
V
Compute new PL excluding R
= P, Q and S
Redirect to S --------------------> Compute PL without
| any knowlege about R (at
this point)
| = P, Q and R
| Redirect to R ---+
| | |
| [what happnes?] <-|-----------------+
| | 3 sec timeout
| V
| Compute new PL excluding R
| = P, Q and S
| I'm coordinator this time
| Execute put
V 3 sec timeout
Compute new PL again
[continues]
So, it’s possible for a slow/down node (node R in this case) to eventually
cause two _other nodes_ to each write a sibling, even on a new key. In fact,
depending on the number of nodes in the system and your luck, you could end up
writing more than one sibling on a fresh write in this case. Given your comment
about a network issue potentially being a factor, and the 3-second timing you
noted (the default for the failure timeout), this increases the likelihood that
this was, in fact, the issue.
A fix for this issue has been worked on and tested, but is not yet incorporated
into a version of Riak for distribution. You can, however, disable the
coordinator retry logic as noted in the issue I referenced above, or increase
the timeout if your cluster is running slowly in general by setting `riak_kv`,
`put_coordinator_failure_timeout` in your `advanced.config` file (see
http://docs.basho.com/riak/kv/2.2.3/configuring/reference/#advanced-configuration
for the general format of the advanced.config if you’re not familiar).
Hope this helps.
Doug Rohrer
On 4/18/17, 8:28 AM, "riak-users on behalf of Daniel Abrahamsson"
<[email protected] on behalf of [email protected]> wrote:
Hi Magnus,
This cluster has been running in production for a few months. Key
generation is based on flake (https://github.com/boundary/flake); we
have never experienced a collision in the 3+ years we have been using
it heavily in production. However, I will look into that possibility
as well.
I just noticed that one of the Riak nodes logged this at the time:
2017-04-13 17:42:40.567 [error]
<0.3624.28>@riak_api_pb_server:handle_info:331 Unrecognized message
{30320806,{ok,{r_object,<<"session">>,<<".12011742tWzDvu8mk5WAdfYihfV_T3DcnJ5VDyXC0c">>,[{r_content,{dict,3,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[],[],[],[],[],[],[],[],[[<<"X-Riak-VTag">>,53,114,86,115,108,71,120,112,73,55,108,118,114,100,105,114,107,104,50,66,105,119]],[[<<"index">>]],[],[[<<"X-Riak-Last-Modified">>|{1492,105357,453143}]],[],[]}}},<<...
(actual value removed).
I also have another example (from the same cluster) where there is a
*single* writer to a key, but after a few writes/updates, it also got
a sibling error. Also at that time, the write+read took significantly
longer than normal. I'll check if we had any "unrecognized messages"
in the Riak logs at that time as well.
To answer your second question, we are talking to the riak cluster
over protocol buffers, using the official Erlang client.
//Daniel
On Tue, Apr 18, 2017 at 1:51 PM, Magnus Kessler <[email protected]> wrote:
> On 18 April 2017 at 08:20, Daniel Abrahamsson <[email protected]> wrote:
>>
>> I've run into a case where I got a sbiling error/response on the first
>> ever write to a key. I would like to understand how this could happen.
>> Normally when you get siblings, it is because you have written a value
>> with an out-of-date vclock. But since this is the first write, there
>> is no vclock. Could someone shed some light on this for me?
>>
>> It is worth to mention that the it took 3 seconds for Riak to deliver
>> the response, so it is possible there was some kind of network issue
>> at the time.
>>
>> Here are some details about my setup:
>> Number of nodes: 8.
>> n_val: 5
>> write options: pw: 3 (quorum), return_body
>>
>> Regards,
>> Daniel Abrahamsson
>>
>
>
> Hi Daniel,
>
> Please let me know if all nodes in this cluster were set up completely
> fresh, with empty backend directories, or if any of them had been used
> before for a Riak installation. If the latter is the case, it may be that
> the key in question had already been used once before. Cluster nodes pick
up
> data from pre-existing backends.
>
> How do you access the key for read and write operations?
>
> Kind Regards,
>
> Magnus
>
>
> Magnus Kessler
> Client Services Engineer
> Basho Technologies Limited
>
> Registered Office - 8 Lincoln’s Inn Fields London WC2A 3BP Reg 07970431
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com