[Bug 1765699] Re: lxd fails to start main process, yet waitready doesn't bail out

2018-05-02 Thread Free Ekanayaka
Regarding systemd-specific logic, AFAICT we already have some systemd-
specific code for socket activation (see lxd/util/http.go:GetListeners).

I think that conditionally leveraging what systemd offers (in case LXD
is managed by systemd), wouldn't be a bad idea, and the strategy
proposed by Dimitri sounds robust.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1765699

Title:
  lxd fails to start main process, yet waitready doesn't bail out

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/lxd/+bug/1765699/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1765699] Re: lxd fails to start main process, yet waitready doesn't bail out

2018-05-02 Thread Free Ekanayaka
It seems that the "failed to open cluster database" error is related to:

https://github.com/lxc/lxd/issues/4485

which should be fixed by:

https://github.com/lxc/lxd/pull/4518

Issue #4485 also contains some info about how to get you unstuck right
now, but you'll want to upgrade to a LXD binary that includes with the
PR above. It should be released as snap in the next days and as deb for
18.04 probably a bit later.

** Bug watch added: LXD bug tracker #4485
   https://github.com/lxc/lxd/issues/4485

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1765699

Title:
  lxd fails to start main process, yet waitready doesn't bail out

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/lxd/+bug/1765699/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1765699] Re: lxd fails to start main process, yet waitready doesn't bail out

2018-04-20 Thread Stéphane Graber
The DB upgrade issue is something we've fixed but needs manual
intervention if the user ran into it when installing one of our betas.

I've helped xnox get back online on IRC by restoring a DB backup and
then re-importing some missing containers. This allowed getting back
online with all containers present and running (they didn't get stopped
during the failure).

** Changed in: lxd (Ubuntu)
   Status: New => Fix Released

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1765699

Title:
  lxd fails to start main process, yet waitready doesn't bail out

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/lxd/+bug/1765699/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1765699] Re: lxd fails to start main process, yet waitready doesn't bail out

2018-04-20 Thread Dimitri John Ledkov
Start-Date: 2018-04-03  22:22:21
Commandline: apt full-upgrade
Requested-By: xnox (1000)
...
lxc:amd64 (3.0.0~beta3-0ubuntu1, 3.0.0-0ubuntu2), lxd:amd64 
(3.0.0~beta5-0ubuntu2, 3.0.0-0ubuntu1),
...
Meanwhile
> Apr 04 09:45:04 sochi lxd[27714]: Error: failed to open
> cluster database: failed to ensure schema: failed to begin
> transaction: gRPC BEGIN response error: rpc error: code =
> Unknown desc = failed to handle BEGIN request: FSM out of
> sync: timed out enqueuing operation
...
End-Date: 2018-04-04  09:50:09

Meaning it kind was stuck, and I guess i had to kill things, like lxd
possibly since at about this time.

And I guess i tried to upgrade again, straight after because I also
have:

Start-Date: 2018-04-04  09:52:08
Commandline: apt full-upgrade
Requested-By: xnox (1000)
...
lxd:amd64 (3.0.0-0ubuntu1, 3.0.0-0ubuntu2)
...
Apr 04 09:57:43 sochi lxd[15068]: Error: failed to open cluster database: 
failed to ensure schema: failed to begin transaction: gRPC BEGIN response 
error: rpc error: code = Unknown desc = failed to handle BEGIN request: FSM out 
of sync: timed out enqueuing operation
...
End-Date: 2018-04-04  10:07:02

And many more "failed to open cluster database" since. Like a few on the
Apr 20.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1765699

Title:
  lxd fails to start main process, yet waitready doesn't bail out

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/lxd/+bug/1765699/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1765699] Re: lxd fails to start main process, yet waitready doesn't bail out

2018-04-20 Thread Stéphane Graber
We don't want systemd logic inside LXD itself. If there's a way to tell
systemd in its unit to kill any PostStart tasks when the main process
exits non-zero, then we'd definitely add that to the unit.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1765699

Title:
  lxd fails to start main process, yet waitready doesn't bail out

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/lxd/+bug/1765699/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1765699] Re: lxd fails to start main process, yet waitready doesn't bail out

2018-04-20 Thread Dimitri John Ledkov
In the environment, INVOCATION_ID= will be set, when the process is
executed under an invocation of a systemd unit. (This can also be
reliably verified from the process keyring). When that is set,
waitready, can check if said invocation _id is present on the current
system's systemd by talking to /run/systemd/private. Then it could
establish the watch on the main process of the matching invocation_id,
if found. If all of above checks out, it means waitready can know for a
fact that said invocation_id is what this waitready is blocked on. It
can then establish a job status watch, and after establishing the watch
to check if MAIN process has already died. By definition, waitready is
spawned after the main process is spawned, so this should not be racy.
Then waitready could receive the event that mainpid died, and bailout
with an error earlier than the 10 minute timeout.

If something like that would be welcomed, I could work on a patch that
does that inside the waitready function.

Let me did my logs w.r.t. database stuff.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1765699

Title:
  lxd fails to start main process, yet waitready doesn't bail out

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/lxd/+bug/1765699/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1765699] Re: lxd fails to start main process, yet waitready doesn't bail out

2018-04-20 Thread Stéphane Graber
LXD itself doesn't interact with systemd, so I'm not sure how waitready
would know that the main systemd unit failed.

What waitready does is attempt to talk to the LXD API (which will be
blocking due to lxd's socket activation), calling an internal REST API
endpoint to wait for LXD's early initialization to complete.


As for your database being unhappy, when did that happen? Is that on initial 
upgrade to 3.0 from 2.21 or what exactly was the upgrade sequence involved 
here? It should be reasonably easy to have LXD re-convert the database, likely 
fixing the problem, but I'd like to figure out when it broke to see if it's one 
of the issues that we have since resolved.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1765699

Title:
  lxd fails to start main process, yet waitready doesn't bail out

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/lxd/+bug/1765699/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs