Re: restartability of daemons (was: Re: [PATCH master] Add design document for a predictive queue system)

'Brian Foley' via ganeti-devel Thu, 25 Aug 2016 06:38:38 -0700

On Thu, Aug 25, 2016 at 02:55:12PM +0200, Klaus Aehlig wrote:
> 
> Hi Brian,
> 
> > I'm not sure that's true. For example if you have running job that has made
> > a WaitForJobChange RPC call over the luxi UDS and is waiting for the 
> > response,
> > isn't that going to be interrupted if the luxi daemon is restarted?
> 
> it certainly was a design goal of the daemon refactoring; we might have missed
> some bugs, but I think here we did it right.
> 
> Besides that I doubt that any jobs do a WaitForJobChange RPC (and, in fact, as
> far as I remember, jobs get all their information from WConfD), the mechanism
> used by jobs for calling UDS is aware that the daemon might be absent, 
> restarted,
> etc, and does all the needed retries. If I remember correctly, quite a lot of
> the magic is in lib/rpc/transport.py.


*lightbulb* Aaaah, you're right! The WaitForJobChanges I was seeing in our test
setup in fact all came from a bunch of running gnt-job watch commands, not the
jobs themselves.

Cheers,
Brian.
> Thanks,
> Klaus
> 
> -- 
> Klaus Aehlig
> Google Germany GmbH, Erika-Mann-Str. 33, 80636 Muenchen
> Registergericht und -nummer: Hamburg, HRB 86891
> Sitz der Gesellschaft: Hamburg
> Geschaeftsfuehrer: Matthew Scott Sucherman, Paul Terence Manicle

Re: restartability of daemons (was: Re: [PATCH master] Add design document for a predictive queue system)

Reply via email to