Re: [PATCH master] Inter-node RPC timeout design

Iustin Pop Tue, 01 Dec 2009 05:19:52 -0800

On Tue, Dec 01, 2009 at 01:09:50PM +0000, Michael Hanselmann wrote:
> 2009/12/1 Iustin Pop <[email protected]>:
> > On Mon, Nov 30, 2009 at 06:01:17PM +0100, Michael Hanselmann wrote:
> >> +There is one major problem with this design: Timeouts can not be used on
> >> +a per-request basis. Neither client or server know how long it will
> >> +take. Even if we might be able to group requests into different
> >> +categories (e.g. fast and slow), this is not reliable.
> >> +
> >> +If a node has an issue or the network connection fails while a request
> >> +is being handled, the master daemon can wait for a long time for the
> >> +connection to time out (due to the operating system's underlying TCP
> >> +keep-alive packets or timeouts). While the settings for keep-alive
> >> +packets can be changed using Linux-specific socket options, we don't
> >> +consider them reliable and responsive enough for our case.
> >
> > This is not really fair/correct, we use and rely on socket timeouts 
> > configured
> > system-wide for 'node down' case - and it works.
> 
> Yes, for the initial connect. However, the HTTP client disables read
> timeouts after connecting (see
> lib/http/client.py:HttpClientRequestExecutor.READ_TIMEOUT and
> HttpClientRequestExecutor._ReadResponse). Otherwise it would time out
> for long-running RPCs, depending on how the timeout is chosen. Hence
> the “while a request is being handled” above.


I think you're confusing protocol-level (L7) timeouts with TCP-level
(L4) timeouts; this is about timeouts at TCP stack level which the
application doesn't see (and doesn't care about).

> >> +This proposal can easily be implemented using HTTP, though it would
> >> +likely be more efficient and less complicated to use the LUXI protocol
> >> +already used to communicate between client tools and the Ganeti master
> >> +daemon.
> >
> > I'm not sure I understand here - what is the actual proposal, switch or
> > remain with HTTP?
> 
> Remain with HTTP for now:
> 
> @@ -73,7 +73,8 @@ libraries, which, unfortunately, turned out to miss
> important features
>  This proposal can easily be implemented using HTTP, though it would
>  likely be more efficient and less complicated to use the LUXI protocol
>  already used to communicate between client tools and the Ganeti master
> -daemon.
> +daemon. Switching to another protocol can occur at a later point. This
> +proposal should be implemented using HTTP as its underlying protocol.

ACK.

> >> +Function processes communicate with the parent process via stdio and
> >> +possibly their exit status. Every function process has a unique
> >> +identifier, though it shouldn't be the process ID (PIDs can be recycled
> >> +and are prone to race conditions for this use case).
> >
> > (I wonder if PIDs+other ID is not unique enough)
> 
> If the other ID is just a counter, there's no need to combine it with
> the PID. The node daemon will have to keep an internal list of its
> child processes anyway.

Exactly for the restart case I was saying we should still use PID.

> Actually, it probably should be something like "%s-%s-%s" %
> (time.time(), pid, unique_id). Otherwise, if the node daemon is
> restarted, function calls can collide again. A UUID would be even
> better, but probably be too expensive. The exact format or composition
> of the function call ID should not be part of this rather high-level
> proposal.

Well, I argue again that design docs should include low-level decisions
rather than leave them to be made arbitrarily at patch writing time ;-)

> >> +In the future, ``StartFunction`` could support an additional parameter
> >> +to specify after how long the function process should be aborted.
> >                                          ^ started, or process started?
> 
> I used the term “function process” to describe the child processes
> started by the node daemon to actually call a (backend) function.
> Should I add a small glossary?

No, I thought it's a typo, but now I get it.

> >> +Simplified timing diagram::
> >> +
> >> +  Master daemon        Node daemon                      Function process
> >> +   |
> >> +  Call function
> >> +  (timeout 10s) -----> Parse request and fork for ----> Start function
> >> +                       calling actual function, then     |
> >> +                       wait up to 10s for function to    |
> >> +                       finish                            |
> > […]
> >
> > Questions: the "wait up to 10s" there, is done in which process? parent
> > ganeti-noded - which would mean stalling all other requests?
> 
> It needs to be in the parent process (ganeti-noded). As Guido writes,
> we already have the library code to do this in an asynchronous fashion
> (which is, more or less, a necessity for this proposal).

With the async functionality, OK then. But this was not mentioned initially,
which is why I asked.

> > What happens if the noded process is restarted?
> 
> If we handle SIGINT/SIGTERM, it could wait for its child processes.
> Otherwise the function processes just run to the end. I don't think we
> should kill them, otherwise things get even more complicated with
> signal handling (assuming root won't send signals).

OK… I think we should at least try to handle them.

iustin

Re: [PATCH master] Inter-node RPC timeout design

Reply via email to