Hi all, The point I’m on is that we should take advantage of this extra bit of information that we acquire out-of-band (e.g. we just decide as a project that all operations take less than 5 seconds) and come up with smarter / cheaper / faster ways of doing load shedding based on that information.
For example, yes it could be interesting to use is_process_alive/1 to see if a client is still hanging around, and have the gen_server discard the work otherwise. It might also be too expensive to matter; I’m not sure anyone here has a good a priori sense of the cost of that call. But I’d certainly wager it’s more expensive than calling timer:now_diff/2 in the server and discarding any requests that were submitted more than 5 seconds ago. Most of our timeout / cleanup solutions to date have been focused top-down, without making any assumptions about the behavior of the workers or servers underneath. I think we should try to approach this problem bottoms-up, forcing every call to complete within 5 seconds and handling timeouts correctly as they bubble up. Adam > On Apr 23, 2019, at 2:48 PM, Nick Vatamaniuc <vatam...@gmail.com> wrote: > > We don't spawn (/link) or monitor remote processes, just monitor the local > coordinator process. That should cheaper performance-wise. It's also for > relatively long running streaming fabric requests (changes, all_docs). But > you're right, perhaps doing these for shorter requests (doc updates, doc > GETs) might become noticeable. Perhaps a pool of reusable monitoring > processes work there... > > For couch_server timeouts. I wonder if we can do a simpler thing and > inspect the `From` part of each call and if the Pid is not alive drop the > requestor at least avoid doing any expensive processing. For casts it might > involve sending a sender Pid in the message. That doesn't address timeouts, > just the case where the coordinating process went away while the message > was stuck in the long message queue. > > On Mon, Apr 22, 2019 at 4:32 PM Robert Newson <rnew...@apache.org> wrote: > >> My memory is fuzzy, but those items sound a lot like what happens with >> rex, that motivated us (i.e, Adam) to build rexi, which deliberately does >> less than the stock approach. >> >> -- >> Robert Samuel Newson >> rnew...@apache.org >> >> On Mon, 22 Apr 2019, at 18:33, Nick Vatamaniuc wrote: >>> Hi everyone, >>> >>> We partially implement the first part (cleaning rexi workers) for all >>> the >>> fabric streaming requests. Which should be all_docs, changes, view map, >>> view reduce: >>> >> https://github.com/apache/couchdb/commit/632f303a47bd89a97c831fd0532cb7541b80355d >>> >>> The pattern there is the following: >>> >>> - With every request spawn a monitoring process that is in charge of >>> keeping track of all the workers as they are spawned. >>> - If regular cleanup takes place, then this monitoring process is >> killed, >>> to avoid sending double the number of kill messages to workers. >>> - If the coordinating process doesn't run cleanup and just dies, the >>> monitoring process will performs cleanup on its behalf. >>> >>> Cheers, >>> -Nick >>> >>> >>> >>> On Thu, Apr 18, 2019 at 5:16 PM Robert Samuel Newson <rnew...@apache.org >>> >>> wrote: >>> >>>> My view is a) the server was unavailable for this request due to all >> the >>>> other requests it’s currently dealing with b) the connection was not >> idle, >>>> the client is not at fault. >>>> >>>> B. >>>> >>>>> On 18 Apr 2019, at 22:03, Done Collectively <sans...@inator.biz> >> wrote: >>>>> >>>>> Any reason 408 would be undesirable? >>>>> >>>>> https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/408 >>>>> >>>>> >>>>> On Thu, Apr 18, 2019 at 10:37 AM Robert Newson <rnew...@apache.org> >>>> wrote: >>>>> >>>>>> 503 imo. >>>>>> >>>>>> -- >>>>>> Robert Samuel Newson >>>>>> rnew...@apache.org >>>>>> >>>>>> On Thu, 18 Apr 2019, at 18:24, Adam Kocoloski wrote: >>>>>>> Yes, we should. Currently it’s a 500, maybe there’s something more >>>>>> appropriate: >>>>>>> >>>>>>> >>>>>> >>>> >> https://github.com/apache/couchdb/blob/8ef42f7241f8788afc1b6e7255ce78ce5d5ea5c3/src/chttpd/src/chttpd.erl#L947-L949 >>>>>>> >>>>>>> Adam >>>>>>> >>>>>>>> On Apr 18, 2019, at 12:50 PM, Joan Touzet <woh...@apache.org> >> wrote: >>>>>>>> >>>>>>>> What happens when it turns out the client *hasn't* timed out and >> we >>>>>>>> just...hang up on them? Should we consider at least trying to send >>>> back >>>>>>>> some sort of HTTP status code? >>>>>>>> >>>>>>>> -Joan >>>>>>>> >>>>>>>> On 2019-04-18 10:58, Garren Smith wrote: >>>>>>>>> I'm +1 on this. With partition queries, we added a few more >> timeouts >>>>>> that >>>>>>>>> can be enabled which Cloudant enable. So having the ability to >> shed >>>>>> old >>>>>>>>> requests when these timeouts get hit would be great. >>>>>>>>> >>>>>>>>> Cheers >>>>>>>>> Garren >>>>>>>>> >>>>>>>>> On Tue, Apr 16, 2019 at 2:41 AM Adam Kocoloski < >> kocol...@apache.org> >>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi all, >>>>>>>>>> >>>>>>>>>> For once, I’m coming to you with a topic that is not strictly >> about >>>>>>>>>> FoundationDB :) >>>>>>>>>> >>>>>>>>>> CouchDB offers a few config settings (some of them >> undocumented) to >>>>>> put a >>>>>>>>>> limit on how long the server is allowed to take to generate a >>>>>> response. The >>>>>>>>>> trouble with many of these timeouts is that, when they fire, >> they do >>>>>> not >>>>>>>>>> actually clean up all of the work that they initiated. A couple >> of >>>>>> examples: >>>>>>>>>> >>>>>>>>>> - Each HTTP response coordinated by the “fabric” application >> spawns >>>>>>>>>> several ephemeral processes via “rexi" on different nodes in the >>>>>> cluster to >>>>>>>>>> retrieve data and send it back to the process coordinating the >>>>>> response. If >>>>>>>>>> the request timeout fires, the coordinating process will be >> killed >>>>>> off, but >>>>>>>>>> the ephemeral workers might not be. In a healthy cluster they’ll >>>>>> exit on >>>>>>>>>> their own when they finish their jobs, but there are conditions >>>>>> under which >>>>>>>>>> they can sit around for extended periods of time waiting for an >>>>>> overloaded >>>>>>>>>> gen_server (e.g. couch_server) to respond. >>>>>>>>>> >>>>>>>>>> - Those named gen_servers (like couch_server) responsible for >>>>>> serializing >>>>>>>>>> access to important data structures will dutifully process >> messages >>>>>>>>>> received from old requests without any regard for (of even >> knowledge >>>>>> of) >>>>>>>>>> the fact that the client that sent the message timed out long >> ago. >>>>>> This can >>>>>>>>>> lead to a sort of death spiral in which the gen_server is >> ultimately >>>>>>>>>> spending ~all of its time serving dead clients and every client >> is >>>>>> timing >>>>>>>>>> out. >>>>>>>>>> >>>>>>>>>> I’d like to see us introduce a documented maximum request >> duration >>>>>> for all >>>>>>>>>> requests except the _changes feed, and then use that >> information to >>>>>> aid in >>>>>>>>>> load shedding throughout the stack. We can audit the codebase >> for >>>>>>>>>> gen_server calls with long timeouts (I know of a few on the >> critical >>>>>> path >>>>>>>>>> that set their timeouts to `infinity`) and we can design servers >>>> that >>>>>>>>>> efficiently drop old requests, knowing that the client who made >> the >>>>>> request >>>>>>>>>> must have timed out. A couple of topics for discussion: >>>>>>>>>> >>>>>>>>>> - the “gen_server that sheds old requests” is a very generic >>>>>> pattern, one >>>>>>>>>> that seems like it could be well-suited to its own behaviour. A >>>>>> cursory >>>>>>>>>> search of the internet didn’t turn up any prior art here, which >>>>>> surprises >>>>>>>>>> me a bit. I’m wondering if this is worth bringing up with the >>>> broader >>>>>>>>>> Erlang community. >>>>>>>>>> >>>>>>>>>> - setting and enforcing timeouts is a healthy pattern for >> read-only >>>>>>>>>> requests as it gives a lot more feedback to clients about the >> health >>>>>> of the >>>>>>>>>> server. When it comes to updates things are a little bit more >> muddy, >>>>>> just >>>>>>>>>> because there remains a chance that an update can be committed, >> but >>>>>> the >>>>>>>>>> caller times out before learning of the successful commit. We >> should >>>>>> try to >>>>>>>>>> minimize the likelihood of that occurring. >>>>>>>>>> >>>>>>>>>> Cheers, Adam >>>>>>>>>> >>>>>>>>>> P.S. I did say that this wasn’t _strictly_ about FoundationDB, >> but >>>> of >>>>>>>>>> course FDB has a hard 5 second limit on all transactions, so it >> is a >>>>>> bit of >>>>>>>>>> a forcing function :).Even putting FoundationDB aside, I would >> still >>>>>> argue >>>>>>>>>> to pursue this path based on our Ops experience with the current >>>>>> codebase. >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>> >>>> >>> >>