On Mon, Nov 30, 2009 at 06:01:17PM +0100, Michael Hanselmann wrote:
> Signed-off-by: Michael Hanselmann <[email protected]>
> ---
> doc/design-2.2.rst | 123
> ++++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 files changed, 123 insertions(+), 0 deletions(-)
>
> diff --git a/doc/design-2.2.rst b/doc/design-2.2.rst
> index cc78b51..f43146c 100644
> --- a/doc/design-2.2.rst
> +++ b/doc/design-2.2.rst
> @@ -33,6 +33,129 @@ As for 2.1 we divide the 2.2 design into three areas:
> Core changes
> ------------
>
> +Remote procedure call timeouts
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Current state and shortcomings
> +++++++++++++++++++++++++++++++
> +
> +The current RPC protocol used by Ganeti is based on HTTP. Every request
> +consists of an HTTP PUT request (e.g. ``PUT /hooks_runner HTTP/1.0``)
> +and doesn't return until the function called has returned. Parameters
> +and return values are encoded using JSON.
> +
> +On the server side, ``ganeti-noded`` handles every incoming connection
> +in a separate process by forking just after accepting the connection.
> +This process exits after sending the response.
> +
> +There is one major problem with this design: Timeouts can not be used on
> +a per-request basis. Neither client or server know how long it will
> +take. Even if we might be able to group requests into different
> +categories (e.g. fast and slow), this is not reliable.
> +
> +If a node has an issue or the network connection fails while a request
> +is being handled, the master daemon can wait for a long time for the
> +connection to time out (due to the operating system's underlying TCP
> +keep-alive packets or timeouts). While the settings for keep-alive
> +packets can be changed using Linux-specific socket options, we don't
> +consider them reliable and responsive enough for our case.
This is not really fair/correct, we use and rely on socket timeouts configured
system-wide for 'node down' case - and it works.
> +
> +Proposed changes
> +++++++++++++++++
> +
> +Protocol
> +^^^^^^^^
> +
> +Initially we chose HTTP as our RPC protocol because there were existing
> +libraries, which, unfortunately, turned out to miss important features
> +(such as SSL certificate authentication) and we had to write our own.
> +
> +This proposal can easily be implemented using HTTP, though it would
> +likely be more efficient and less complicated to use the LUXI protocol
> +already used to communicate between client tools and the Ganeti master
> +daemon.
I'm not sure I understand here - what is the actual proposal, switch or
remain with HTTP?
> +
> +The LUXI protocol currently contains two functions, ``WaitForJobChange``
> +and ``AutoArchiveJobs``, which can take a longer time. They both support
> +a parameter to specify the timeout. This timeout is usually chosen as
> +roughly half of the socket timeout, guaranteeing a response before the
> +socket times out. After the specified amount of time,
> +``AutoArchiveJobs`` returns and reports the number of archived jobs.
> +``WaitForJobChange`` returns and reports a timeout. In both cases, the
> +functions can be called again.
> +
> +A similar model can be used for the inter-node RPC protocol. In some
> +sense, the node daemon will implement a light variant of *"node daemon
> +jobs"*. When the function call is sent, it specifies an initial timeout.
> +If the function didn't finish within this timeout, a response is sent
> +with a unique identifier. The client can then choose to wait for the
> +function to finish again with a timeout. Inter-node RPC calls would no
> +longer be blocking indefinitely and there would be an implicit
> +ping-mechanism.
ACK.
> +
> +Request handling
> +^^^^^^^^^^^^^^^^
> +
> +To support the protocol changes described above, the way the node daemon
> +handles request will have to change. Instead of forking and handling
> +every connection in a separate process, there should be one fork per
> +RPC function call and the master process will handle the communication
> +with clients and the function handling processes.
OK.
> +
> +Function processes communicate with the parent process via stdio and
> +possibly their exit status. Every function process has a unique
> +identifier, though it shouldn't be the process ID (PIDs can be recycled
> +and are prone to race conditions for this use case).
(I wonder if PIDs+other ID is not unique enough)
> +
> +The following operations will be supported:
> +
> +``StartFunction(fn_name, fn_args, timeout)``
> + Starts a function specified by ``fn_name`` with arguments in
> + ``fn_args`` and waits up to ``timeout`` seconds for the function
> + to finish. Fire-and-forget calls can be made by specifying a timeout
> + of 0 seconds (e.g. for powercycling the node). Returns three values:
> + function call ID (if not finished), whether function finished (or
> + timeout) and the function's return value.
> +``WaitForFunction(fnc_id, timeout)``
> + Waits up to ``timeout`` seconds for function call to finish. Return
> + value same as ``StartFunction``.
> +
> +In the future, ``StartFunction`` could support an additional parameter
> +to specify after how long the function process should be aborted.
^ started, or process started?
> +
> +Simplified timing diagram::
> +
> + Master daemon Node daemon Function process
> + |
> + Call function
> + (timeout 10s) -----> Parse request and fork for ----> Start function
> + calling actual function, then |
> + wait up to 10s for function to |
> + finish |
> + | |
> + ... ...
> + | |
> + Examine return <---- | |
> + value and wait |
> + again -------------> Wait another 10s for function |
> + | |
> + ... ...
> + | |
> + Examine return <---- | |
> + value and wait |
> + again -------------> Wait another 10s for function |
> + | |
> + ... ...
> + | |
> + | Function ends,
> + Get return value and forward <-- process exits
> + Process return <---- it to caller
> + value and continue
> + |
> +
> +.. TODO: Convert diagram above to graphviz/dot graphic
Questions: the "wait up to 10s" there, is done in which process? parent
ganeti-noded - which would mean stalling all other requests?
What happens if the noded process is restarted?
iustin