I think it fundamentally comes down to "is the reason upgrade failed
transient or permanent", if we can try again later, do so, else log at
Error level, and keep on with your life, because that is the only chance of
recovery (from what you've said, at least).

John
=:->


On Thu, Jul 10, 2014 at 11:18 AM, Menno Smits <menno.sm...@canonical.com>
wrote:

> So I've noticed that the way we currently handle failed upgrades in the
> machine agent doesn't make a lot of sense.
>
> Looking at cmd/jujud/machine.go:821, an error is created if
> PerformUpgrade() fails but nothing is ever done with it. It's not returned
> and it's not logged. This means that if upgrade steps fail, the agent
> continues running with the new software version, probably with partially
> applied upgrade steps, and there is no way to know.
>
> I have a unit tested fix ready which causes the machine agent to exit (by
> returning the error as a fatalError) if PerformUpgrade fails but before
> proposing I realised that's not the right thing to do. The agent's upstart
> script will restart the agent and probably cause the upgrade to run and
> fail again so we end up with an endless restart loop.
>
> The error could also be returned as a "non-fatal" (to the runner) error
> but that will just cause the upgrade-steps worker to continuously restart,
> attempting the upgrade and failing.
>
> Another approach could be to set the global agent-version back to the
> previous software version before killing the machine agent but other agents
> may have already upgraded and we can't currently roll them back in any
> reliable way.
>
> Our upgrade story will be improving in the coming weeks (I'm working on
> that). In the mean time what should we do?
>
> Perhaps the safest thing to do is just log the error and keep the agent
> running the new version and hope for the best? There is a significant
> chance of problems but this is basically what we're doing now (except
> without logging that there's a problem).
>
> Does anyone have a better idea?
>
> - Menno
>
>
>
>
>
> --
> Juju-dev mailing list
> Juju-dev@lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Reply via email to