Hi Dmitri, what would be the benefit of r=2, exactly? It isn't necessary to trigger read-repair, is it? If it's important I'd rather try it sooner than later...
Regards, Vanessa On Wed, Oct 7, 2015 at 4:02 PM, Dmitri Zagidulin <dzagidu...@basho.com> wrote: > Glad you sorted it out! > > (I do want to encourage you to bump your R setting to at least 2, though. > Run some tests -- I think you'll find that the difference in speed will not > be noticeable, but you do get a lot more data resilience with 2.) > > On Wed, Oct 7, 2015 at 6:24 PM, Vanessa Williams < > vanessa.willi...@thoughtwire.ca> wrote: > >> Hi Dmitri, well...we solved our problem to our satisfaction but it turned >> out to be something unexpected. >> >> The keys were two properties mentioned in a blog post on "configuring >> Riak’s oft-subtle behavioral characteristics": >> http://basho.com/posts/technical/riaks-config-behaviors-part-4/ >> >> notfound_ok= false >> basic_quorum=true >> >> The 2nd one just makes things a little faster, but the first one is the >> one whose default value of true was killing us. >> >> With r=1 and notfound_ok=true (default) the first node to respond, if it >> didn't find the requested key, the authoritative answer was "this key is >> not found". Not what we were expecting at all. >> >> With the changed settings, it will wait for a quorum of responses and >> only if *no one* finds the key will "not found" be returned. Perfect. >> (Without this setting it would wait for all responses, not ideal.) >> >> Now there is only one snag, which is that if the Riak node the client >> connects to goes down, there will be no communication and we have a >> problem. This is easily solvable with a load-balancer, though for >> complicated reasons we actually don't need to do that right now. It's just >> acceptable for us temporarily. Later, we'll get the load-balancer working >> and even that won't be a problem. >> >> I *think* we're ok now. Thanks for your help! >> >> Regards, >> Vanessa >> >> >> >> On Wed, Oct 7, 2015 at 9:33 AM, Dmitri Zagidulin <dzagidu...@basho.com> >> wrote: >> >>> Yeah, definitely find out what the sysadmin's experience was, with the >>> load balancer. It could have just been a wrong configuration or something. >>> >>> And yes, that's the documentation page I recommend - >>> http://docs.basho.com/riak/latest/ops/advanced/configs/load-balancing-proxy/ >>> Just set up HAProxy, and point your Java clients to its IP. >>> >>> The drawbacks to load-balancing on the java client side (yes, the >>> cluster object) instead of a standalone load balancer like HAProxy, are the >>> following: >>> >>> 1) Adding node means code changes (or at very least, config file >>> changes) rolled out to all your clients. Which turns out to be a pretty >>> serious hassle. Instead, HAProxy allows you to add or remove nodes without >>> changing any java code or config files. >>> >>> 2) Performance. We've ran many tests to compare performance, and >>> client-side load balancing results in significantly lower throughput than >>> you'd have using haproxy (or nginx). (Specifically, you actually want to >>> use the 'leastconn' load balancing algorithm with HAProxy, instead of round >>> robin). >>> >>> 3) The health check on the client side (so that the java load balancer >>> can tell when a remote node is down) is much less intelligent than a >>> dedicated load balancer would provide. With something like HAProxy, you >>> should be able to take down nodes with no ill effects for the client code. >>> >>> Now, if you load balance on the client side and you take a node down, >>> it's not supposed to stop working completely. (I'm not sure why it's >>> failing for you, we can investigate, but it'll be easier to just use a load >>> balancer). It should throw an error or two, but then start working again >>> (on the retry). >>> >>> Dmitri >>> >>> On Wed, Oct 7, 2015 at 2:45 PM, Vanessa Williams < >>> vanessa.willi...@thoughtwire.ca> wrote: >>> >>>> Hi Dmitri, thanks for the quick reply. >>>> >>>> It was actually our sysadmin who tried the load balancer approach and >>>> had no success, late last evening. However I haven't discussed the gory >>>> details with him yet. The failure he saw was at the application level (i.e. >>>> failure to read a key), but I don't know a) how he set up the LB or b) what >>>> the Java exception was, if any. I'll find that out in an hour or two and >>>> report back. >>>> >>>> I did find this article just now: >>>> >>>> >>>> http://docs.basho.com/riak/latest/ops/advanced/configs/load-balancing-proxy/ >>>> >>>> So I suppose we'll give those suggestions a try this morning. >>>> >>>> What is the drawback to having the client connect to all 4 nodes (the >>>> cluster client, I assume you mean?) My understanding from reading articles >>>> I've found is that one of the nodes going away causes that client to fail >>>> as well. Is that what you mean, or are there other drawbacks as well? >>>> >>>> If there's anything else you can recommend, or links other than the one >>>> above you can point me to, it would be much appreciated. We expect both >>>> node failure and deliberate node removal for upgrade, repair, replacement, >>>> etc. >>>> >>>> Regards, >>>> Vanessa >>>> >>>> On Wed, Oct 7, 2015 at 8:29 AM, Dmitri Zagidulin <dzagidu...@basho.com> >>>> wrote: >>>> >>>>> Hi Vanessa, >>>>> >>>>> Riak is definitely meant to run behind a load balancer. (Or, at the >>>>> worst case, to be load-balanced on the client side. That is, all clients >>>>> connect to all 4 nodes). >>>>> >>>>> When you say "we did try putting all 4 Riak nodes behind a >>>>> load-balancer and pointing the clients at it, but it didn't help." -- what >>>>> do you mean exactly, by "it didn't help"? What happened when you tried >>>>> using the load balancer? >>>>> >>>>> >>>>> >>>>> On Wed, Oct 7, 2015 at 1:57 PM, Vanessa Williams < >>>>> vanessa.willi...@thoughtwire.ca> wrote: >>>>> >>>>>> Hi all, we are still (for a while longer) using Riak 1.4 and the >>>>>> matching Java client. The client(s) connect to one node in the cluster >>>>>> (since that's all it can do in this client version). The cluster itself >>>>>> has >>>>>> 4 nodes (sorry, we can't use 5 in this scenario). There are 2 separate >>>>>> clients. >>>>>> >>>>>> We've tried both n_val = 3 and n_val=4. We achieve >>>>>> consistency-by-writes by setting w=all. Therefore, we only require one >>>>>> successful read (r=1). >>>>>> >>>>>> When all nodes are up, everything is fine. If one node fails, the >>>>>> clients can no longer read any keys at all. There's an exception like >>>>>> this: >>>>>> >>>>>> com.basho.riak.client.RiakRetryFailedException: >>>>>> java.net.ConnectException: Connection refused >>>>>> >>>>>> Now, it isn't possible that Riak can't operate when one node fails, >>>>>> so we're clearly missing something here. >>>>>> >>>>>> Note: we did try putting all 4 Riak nodes behind a load-balancer and >>>>>> pointing the clients at it, but it didn't help. >>>>>> >>>>>> Riak is a high-availability key-value store, so... why are we failing >>>>>> to achieve high-availability? Any suggestions greatly appreciated, and if >>>>>> more info is required I'll do my best to provide it. >>>>>> >>>>>> Thanks in advance, >>>>>> Vanessa >>>>>> >>>>>> -- >>>>>> Vanessa Williams >>>>>> ThoughtWire Corporation >>>>>> http://www.thoughtwire.com >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> riak-users mailing list >>>>>> riak-users@lists.basho.com >>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >>>>>> >>>>>> >>>>> >>>> >>> >> >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com