On Tue, 28 Apr 2015 at 15:25 'Klaus Aehlig' via ganeti-devel <
[email protected]> wrote:

> ...and also explain why the original approach was good in theory but
> not able to cope with out-of-memory killers killing daemons in just
> the wrong moment.
>
> Signed-off-by: Klaus Aehlig <[email protected]>
> ---
>  doc/design-configlock.rst | 58
> +++++++++++++++++++++++++++++++++++------------
>  1 file changed, 44 insertions(+), 14 deletions(-)
>
> diff --git a/doc/design-configlock.rst b/doc/design-configlock.rst
> index 1c37e25..3347ac4 100644
> --- a/doc/design-configlock.rst
> +++ b/doc/design-configlock.rst
> @@ -86,20 +86,10 @@ Set-and-release action
>  ----------------------
>
>  As a typical pattern is to change the configuration and afterwards release
> -the ``ConfigLock``. To avoid unncecessary delay in this operation (the
> next
> -modification of the configuration can already happen while the last change
> -is written out), WConfD will offer a combined command that will
> -
> -- set the configuration to the specified value,
> -
> -- release the config lock,
> -
> -- and only then wait for the configuration write to finish; it will not
> -  wait for confirmation of the lock-release write.
> -
> -If jobs use this combined command instead of the sequential set followed
> -by release, new configuration changes can come in during writeout of the
> -current change; in particular, a writeout can contain more than one
> change.
> +the ``ConfigLock``. To avoid unncecessary RPC call overhead, WConfD will
> offer
> +a combined call. To make that call retryable, it will do nothing if the
> the
> +``ConfigLock`` is not held by the caller; in the return value, it will
> indicate
> +if the config lock was held when the call was made.
>
>  Short-lived ``ConfigLock``
>  --------------------------
> @@ -124,3 +114,43 @@ status can still happen, triggered by other requests.
> Now, if
>  ``WConfD`` gets restarted after the lock acquisition, if that happend
>  in the name of the job, it would own a lock without knowing about it,
>  and hence that lock would never get released.
> +
> +
> +Approaches considered, but not working
> +======================================
> +
> +Set-and-release action with asynchronous writes
> +-----------------------------------------------
> +
> +Approach
> +~~~~~~~~
> +
> +As a typical pattern is to change the configuration and afterwards release
> +the ``ConfigLock``. To avoid unncecessary delay in this operation (the
> next
>

s/unncecessary/unnecessary


> +modification of the configuration can already happen while the last change
> +is written out), WConfD will offer a combined command that will
> +
> +- set the configuration to the specified value,
> +
> +- release the config lock,
> +
> +- and only then wait for the configuration write to finish; it will not
> +  wait for confirmation of the lock-release write.
> +
> +If jobs use this combined command instead of the sequential set followed
> +by release, new configuration changes can come in during writeout of the
> +current change; in particular, a writeout can contain more than one
> change.
> +
> +Problem
> +~~~~~~~
> +
> +This approach works fine, as long as always either ``WConfD`` can do an
> ordered
> +shutdown or the calling process dies as well. If however, we allow random
> kill
> +signals to be sent to individual daemons (e.g., by an out-of-memory
> killer),
> +the following race occours. A process can ask for a combined write-


s/occours/occurs


> and-unlock
> +operation; while the configuration is still written out, the write out of
> the
> +updated lock status already finishes. Now, if ``WConfD`` forefully gets


s/forefully/forcefully


> killed
> +in that very moment, a restarted ``WConfD`` will read the old
> configuration but
> +the new lock status. This will make the calling process believe that its
> call,
> +while it didn't get an answer, succeeded nevertheless, thus resulting in a
> +wrong configuration state.
> --
> 2.2.0.rc0.207.ga3a616c
>
>
Apart from the typos, LGTM. No need to resend.

Reply via email to