Hi Cian, Thank you! I should've mentioned in my initial email that I thought we were experiencing the same bug you called out (in fact the 2nd comment on that github issue is actually from me).
So, what I'm really curious about is whether or not the original "overload" error is happening because we're hitting the limit on TS max concurrent queries or if riak is actually "overloaded" and we shouldn't increase the configuration value for max concurrent queries. I'd like to know whether or not I should expect a certain value for max concurrent queries to be stable and performant for some given hardware specs. This is an experiment that we will probably run in house to determine a good value, but it would be great to know what range is expected to perform well. Also, I have no idea if the max concurrent queries setting includes subqueries over multiple quanta. For instance, if I have 4 TS queries hitting a riak node configured for 12 max queries and each query spans 3 - 4 quanta, should i expect an "overload" error? Thank you for the advice on implementing client backoff! Hopefully, we can do that as well as increase the overall TS query capacity of our cluster with a simple configuration change. I'm suspicious that we have a very conservative value at the moment. Chris ________________________________________ From: Cian Synnott <c...@emauton.org> Sent: Wednesday, July 27, 2016 6:03 PM To: Johnson Chris CJOH Cc: riak-users@lists.basho.com Subject: Re: riak TS max concurrent queries + overload error Hi Chris, This sounds like the issue described at https://github.com/basho/riak_kv/issues/1418 On Wed, Jul 27, 2016 at 11:19 PM, <chris.john...@vaisala.com> wrote: > Also, does anyone have any recommendations on query pooling so we can > guarantee that multiple clients will not generate more queries than the > cluster can handle? > Probably the right thing to do (when the RPC server is fixed) is to have the clients independently heck for backpressure from Riak (e.g. overload messages like this), retry with exponential backoff, and have each retry increment a counter somewhere in your monitoring system to make that problem visible. This should allow you to handle overload (somewhat) gracefully, respond to critical events (e.g. an alert), or to see any overload trends over time. Cian _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com