On Wed, Apr 06, 2016 at 10:31:53AM +0200, Klaus Aehlig wrote:
> On Tue, Apr 05, 2016 at 02:09:05PM +0100, 'Brian Foley' via ganeti-devel 
> wrote:
> > The upgrade process starts with a verify version, so there's no need
> > to do a second one after stopping the master daemons.
> 
> The main reason for this additional second verification step is that inbetween
> the queue is drained which includes waiting for all jobs to terminate. On
> a busy cluster that can take hours, at least. The cluster can change during
> that time.
> 
> Note that Ganeti is suposed to do the upgrade on its own, withoud assuming 
> that some
> external tool drains the queue before and waits for it to empty.

Ah, OK interesting point. However the version check is a pretty trivial
'is a directory with the right name present' check. This is useful as
a basic initial "go/no-go" check, but arguably is less useful later on.

Immediately after that second version check we stop the daemons, archive
the configuration, update the symlinks and run the ensure-dirs script.
ensure-dirs does a much more thorough test if the right files and directories
are present with the right permissions, so is a superset of the test -d
that _VerifyVersionInstalled() does.

After all these steps we can roll back if necessary, and with this patchset
if the symlink and ensure-dirs step fails we roll back by symlinking the old
version and re-running ensure-dirs.

In fact, thinking about it, maybe an additional improvement would be to
change the initial _VerifyVersionInstalled() to use ensure-dirs instead.
It's not very expensive to run and at the moment most of the time in each
step is spent in sequentially setting up each ssh session anyway.

Cheers,
Brian.

Reply via email to